Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation

Author	Latif, Siddique
Author	Shahid, Abdullah
Author	Qadir, Junaid
Available date	2023-07-13T05:40:51Z
Publication Date	2023
Publication Name	Applied Acoustics
Resource	Scopus
ISSN	0003682X
URI	http://dx.doi.org/10.1016/j.apacoust.2023.109425
URI	http://hdl.handle.net/10576/45566
Abstract	Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
Language	en
Publisher	Elsevier
Subject	Emotional speech synthesis Speech emotion recognition Speech synthesis Tacotron Text-to-speech WaveRNN
Title	Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
Type	Article
Volume Number	210
dc.accessType	Abstract Only

Check access options

Files in this item

Files	Size	Format	View
There are no files associated with this item.