Audio-visual feature fusion for speaker identification

Almaadeed, Noor; Aggoun, Amar; Amira, Abbes

Author	Almaadeed, Noor
Author	Aggoun, Amar
Author	Amira, Abbes
Available date	2024-08-11T05:39:17Z
Publication Date	2012
Publication Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Resource	Scopus
ISSN	3029743
URI	http://dx.doi.org/10.1007/978-3-642-34475-6_8
URI	http://hdl.handle.net/10576/57541
Abstract	Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion.
Language	en
Publisher	Springer
Subject	audio-visual fusion Gaussian mixture models Mel-frequency Cepstral coefficients Principal component Analysis speaker identification
Title	Audio-visual feature fusion for speaker identification
Type	Conference
Pagination	56-67
Issue Number	PART 1
Volume Number	7663 LNCS
dc.accessType	Abstract Only

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Computer Science & Engineering [‎2426‎ items ]

Show simple item record

Audio-visual feature fusion for speaker identification

Files in this item

This item appears in the following Collection(s)

Video