Audio-visual feature fusion for speaker identification

Almaadeed, Noor; Aggoun, Amar; Amira, Abbes

Date

2012

Author

Almaadeed, Noor
Aggoun, Amar
Amira, Abbes

Metadata

Show full item record

Abstract

Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion.

DOI/handle

http://dx.doi.org/10.1007/978-3-642-34475-6_8
http://hdl.handle.net/10576/57541

Collections

Computer Science & Engineering [‎2484‎ items ]