Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
View/ Open
Publisher version (Check access options)
Check access options
Date
2023Metadata
Show full item recordAbstract
Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
Collections
- Computer Science & Engineering [2491 items ]
Related items
Showing items related by title, author, creator and subject.
-
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-Organized Operational Layer
Soltanian M.; Malik J.; Raitoharju J.; Iosifidis A.; Kiranyaz, Mustafa Serkan; Gabbouj M.... more authors ... less authors ( Institute of Electrical and Electronics Engineers Inc. , 2021 , Conference)Automatic classification of speech commands has revolutionized human computer interactions in robotic applications. However, employed recognition models usually follow the methodology of deep learning with complicated ... -
Neural signals, machine learning, and the future of inner speech recognition
Chowdhury, Adiba Tabassum; Hassanein, Ahmed; Al Shibli, Aous N.; Khanafer, Youssuf; AbuHaweeleh, Mohannad Natheef; Pedersen, Shona; Chowdhury, Muhammad E.H.... more authors ... less authors ( Frontiers , 2025 , Article)Inner speech recognition (ISR) is an emerging field with significant potential for applications in brain-computer interfaces (BCIs) and assistive technologies. This review focuses on the critical role of machine learning ... -
Decoding silent speech: a machine learning perspective on data, methods, and frameworks
Chowdhury, Adiba Tabassum; Newaz, Mehrin; Saha, Purnata; AbuHaweeleh, Mohannad Natheef; Mohsen, Sara; Bushnaq, Diala; Chabbouh, Malek; Aljindi, Raghad; Pedersen, Shona; Chowdhury, Muhammad E. H.... more authors ... less authors ( Springer Science and Business Media Deutschland GmbH , 2025 , Article Review)At the nexus of signal processing and machine learning (ML), silent speech recognition (SSR) has evolved as a game-changing technology that allows for communication without audible voice. This study offers a thorough ...



