Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
الملخص
Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
المجموعات
- علوم وهندسة الحاسب [2211 items ]
وثائق ذات صلة
عرض الوثائق المتصلة بواسطة: العنوان، المؤلف، المنشئ والموضوع.
-
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-Organized Operational Layer
Soltanian M.; Malik J.; Raitoharju J.; Iosifidis A.; Kiranyaz, Mustafa Serkan; Gabbouj M.... more authors ... less authors ( Institute of Electrical and Electronics Engineers Inc. , 2021 , Conference Paper)Automatic classification of speech commands has revolutionized human computer interactions in robotic applications. However, employed recognition models usually follow the methodology of deep learning with complicated ... -
Distinct neuropsychological correlates in positive and negative formal thought disorder syndromes: The thought and language disorder scale in endogenous psychoses
Nagels A.a Fahrmann; Stratmann M.a; Ghazi S.a; Schales C.a; Frauenheim M.a; Turner L.a; Hornig T.b; Katzev M.b; Muller-Isberner R.c; Grosvald M.d; Krug A.a; Kircher T.a; Kircher, Tilo... more authors ... less authors ( S. Karger AG , 2016 , Article Review)The correlation of formal thought disorder (FTD) symptoms and subsyndromes with neuropsychological dimensions is as yet unclear. Evidence for a dysexecutive syndrome and semantic access impairments has been discussed in ... -
Models of Speech Processing
Grosvald, Michael; Burton, Martha W.; Small, Steven L. ( Taylor & Francis , 2015 , Book chapter)One of the fundamental questions about language is how listeners map the acoustic signal onto syllables, words, and sentences, resulting in understanding of speech. For normal listeners, this mapping is so effortless ...