Audio-visual feature fusion for speaker identification
المؤلف | Almaadeed, Noor |
المؤلف | Aggoun, Amar |
المؤلف | Amira, Abbes |
تاريخ الإتاحة | 2024-08-11T05:39:17Z |
تاريخ النشر | 2012 |
اسم المنشور | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
المصدر | Scopus |
الرقم المعياري الدولي للكتاب | 3029743 |
الملخص | Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion. |
اللغة | en |
الناشر | Springer |
الموضوع | audio-visual fusion Gaussian mixture models Mel-frequency Cepstral coefficients Principal component Analysis speaker identification |
النوع | Conference |
الصفحات | 56-67 |
رقم العدد | PART 1 |
رقم المجلد | 7663 LNCS |
الملفات في هذه التسجيلة
الملفات | الحجم | الصيغة | العرض |
---|---|---|---|
لا توجد ملفات لها صلة بهذه التسجيلة. |
هذه التسجيلة تظهر في المجموعات التالية
-
علوم وهندسة الحاسب [2402 items ]