Show simple item record

AuthorLatif, Siddique
AuthorShahid, Abdullah
AuthorQadir, Junaid
Available date2023-07-13T05:40:51Z
Publication Date2023
Publication NameApplied Acoustics
ResourceScopus
ISSN0003682X
URIhttp://dx.doi.org/10.1016/j.apacoust.2023.109425
URIhttp://hdl.handle.net/10576/45566
AbstractDespite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron 2 architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance. 2023 Elsevier Ltd
Languageen
PublisherElsevier
SubjectEmotional speech synthesis
Speech emotion recognition
Speech synthesis
Tacotron
Text-to-speech
WaveRNN
TitleGenerative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
TypeArticle
Volume Number210
dc.accessType Abstract Only


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record