Audio-visual feature fusion for speaker identification

Almaadeed, Noor; Aggoun, Amar; Amira, Abbes

المؤلف	Almaadeed, Noor
المؤلف	Aggoun, Amar
المؤلف	Amira, Abbes
تاريخ الإتاحة	2024-08-11T05:39:17Z
تاريخ النشر	2012
اسم المنشور	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
المصدر	Scopus
الرقم المعياري الدولي للكتاب	3029743
معرّف المصادر الموحد	http://dx.doi.org/10.1007/978-3-642-34475-6_8
معرّف المصادر الموحد	http://hdl.handle.net/10576/57541
الملخص	Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion.
اللغة	en
الناشر	Springer
الموضوع	audio-visual fusion Gaussian mixture models Mel-frequency Cepstral coefficients Principal component Analysis speaker identification
العنوان	Audio-visual feature fusion for speaker identification
النوع	Conference
الصفحات	56-67
رقم العدد	PART 1
رقم المجلد	7663 LNCS
dc.accessType	Abstract Only

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2402‎ items ]

عرض بسيط للتسجيلة

Audio-visual feature fusion for speaker identification

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video