Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
View/ Open
Publisher version (Check access options)
Check access options
Date
2023Metadata
Show full item recordAbstract
Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
Collections
- Computer Science & Engineering [2402 items ]
Related items
Showing items related by title, author, creator and subject.
-
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-Organized Operational Layer
Soltanian M.; Malik J.; Raitoharju J.; Iosifidis A.; Kiranyaz, Mustafa Serkan; Gabbouj M.... more authors ... less authors ( Institute of Electrical and Electronics Engineers Inc. , 2021 , Conference Paper)Automatic classification of speech commands has revolutionized human computer interactions in robotic applications. However, employed recognition models usually follow the methodology of deep learning with complicated ... -
Distinct neuropsychological correlates in positive and negative formal thought disorder syndromes: The thought and language disorder scale in endogenous psychoses
Nagels A.a Fahrmann; Stratmann M.a; Ghazi S.a; Schales C.a; Frauenheim M.a; Turner L.a; Hornig T.b; Katzev M.b; Muller-Isberner R.c; Grosvald M.d; Krug A.a; Kircher T.a; Kircher, Tilo... more authors ... less authors ( S. Karger AG , 2016 , Article Review)The correlation of formal thought disorder (FTD) symptoms and subsyndromes with neuropsychological dimensions is as yet unclear. Evidence for a dysexecutive syndrome and semantic access impairments has been discussed in ... -
A Bilingual Scene-To-Speech Mobile Based Application
Karkar A.; Puthren M.; Al-Maadeed S. ( Institute of Electrical and Electronics Engineers Inc. , 2018 , Conference Paper)Scene-To-Speech (STS) is the process of recognizing visual objects in a picture or a video to say aloud a descriptive text that represents the scene. The recent advancement in convolution neural network (CNN), a deep ...