Audio-visual feature fusion for speaker identification
Author | Almaadeed, Noor |
Author | Aggoun, Amar |
Author | Amira, Abbes |
Available date | 2024-08-11T05:39:17Z |
Publication Date | 2012 |
Publication Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Resource | Scopus |
ISSN | 3029743 |
Abstract | Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion. |
Language | en |
Publisher | Springer |
Subject | audio-visual fusion Gaussian mixture models Mel-frequency Cepstral coefficients Principal component Analysis speaker identification |
Type | Conference |
Pagination | 56-67 |
Issue Number | PART 1 |
Volume Number | 7663 LNCS |
Files in this item
Files | Size | Format | View |
---|---|---|---|
There are no files associated with this item. |
This item appears in the following Collection(s)
-
Computer Science & Engineering [2402 items ]