Show simple item record

AuthorAlmaadeed, Noor
AuthorAggoun, Amar
AuthorAmira, Abbes
Available date2024-08-11T05:39:17Z
Publication Date2012
Publication NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ResourceScopus
ISSN3029743
URIhttp://dx.doi.org/10.1007/978-3-642-34475-6_8
URIhttp://hdl.handle.net/10576/57541
AbstractAnalyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion.
Languageen
PublisherSpringer
Subjectaudio-visual fusion
Gaussian mixture models
Mel-frequency Cepstral coefficients
Principal component Analysis
speaker identification
TitleAudio-visual feature fusion for speaker identification
TypeConference Paper
Pagination56-67
Issue NumberPART 1
Volume Number7663 LNCS
dc.accessType Abstract Only


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record