عرض بسيط للتسجيلة

المؤلفLatif, Siddique
المؤلفShahid, Abdullah
المؤلفQadir, Junaid
تاريخ الإتاحة2023-07-13T05:40:51Z
تاريخ النشر2023
اسم المنشورApplied Acoustics
المصدرScopus
الرقم المعياري الدولي للكتاب0003682X
معرّف المصادر الموحدhttp://dx.doi.org/10.1016/j.apacoust.2023.109425
معرّف المصادر الموحدhttp://hdl.handle.net/10576/45566
الملخصDespite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
اللغةen
الناشرElsevier
الموضوعEmotional speech synthesis
Speech emotion recognition
Speech synthesis
Tacotron
Text-to-speech
WaveRNN
العنوانGenerative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
النوعArticle
رقم المجلد210


الملفات في هذه التسجيلة

الملفاتالحجمالصيغةالعرض

لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

عرض بسيط للتسجيلة