Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation

Latif, Siddique; Shahid, Abdullah; Qadir, Junaid

المؤلف	Latif, Siddique
المؤلف	Shahid, Abdullah
المؤلف	Qadir, Junaid
تاريخ الإتاحة	2023-07-13T05:40:51Z
تاريخ النشر	2023
اسم المنشور	Applied Acoustics
المصدر	Scopus
الرقم المعياري الدولي للكتاب	0003682X
معرّف المصادر الموحد	http://dx.doi.org/10.1016/j.apacoust.2023.109425
معرّف المصادر الموحد	http://hdl.handle.net/10576/45566
الملخص	Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
اللغة	en
الناشر	Elsevier
الموضوع	Emotional speech synthesis Speech emotion recognition Speech synthesis Tacotron Text-to-speech WaveRNN
العنوان	Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
النوع	Article
رقم المجلد	210
dc.accessType	Abstract Only

تحقق من خيارات الوصول

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2484‎ items ]

عرض بسيط للتسجيلة

Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

وثائق ذات صلة

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-Organized Operational Layer ﻿

Distinct neuropsychological correlates in positive and negative formal thought disorder syndromes: The thought and language disorder scale in endogenous psychoses ﻿

Decoding silent speech: a machine learning perspective on data, methods, and frameworks ﻿

Video

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-Organized Operational Layer

Distinct neuropsychological correlates in positive and negative formal thought disorder syndromes: The thought and language disorder scale in endogenous psychoses

Decoding silent speech: a machine learning perspective on data, methods, and frameworks