Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Melhem, Rawad; Jafar, Assef; Al Dakkak, Oumayma

المؤلف	Melhem, Rawad
المؤلف	Jafar, Assef
المؤلف	Al Dakkak, Oumayma
تاريخ الإتاحة	2023-12-24T09:56:14Z
تاريخ النشر	2024
اسم المنشور	The Sixth Youth Research Forum 2024
معرّف المصادر الموحد	http://hdl.handle.net/10576/50552
الملخص	Speech separation is very important in real-world applications such as human-machine interaction, hearing aids devices, and automatic meeting transcription. In recent years, a significant improvement occurred towards the solution based on deep learning. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of real-world mixtures. The difficulty in building a realistic dataset led researchers to use unsupervised learning methods, because of their ability to handle realistic mixtures directly. The results of unsupervised learning methods are still unconvincing. In this paper, a method is introduced to create a realistic dataset with ground truth sources for speech separation. The main challenge in designing a realistic dataset is the unavailability of ground truths for speakers’ signals. To address this, we propose a method for simultaneously recording two speakers and obtaining the ground truth for each. We present a methodology for benchmarking our realistic dataset using a deep learning model based on Bidirectional Gated Recurrent Units (BGRU) and clustering algorithm. The experiments show that our proposed dataset improved SI-SDR (Scale Invariant Signal to Distortion Ratio) by 1.65 dB and PESQ (Perceptual Evaluation of Speech Quality) by approximately 0.5. We also evaluated the effectiveness of our method at different distances between the microphone and the speakers, and found that it improved the stability of the learned model.
اللغة	ar
الناشر	Qatar University Young Scientists Center - Qatar University
الموضوع	Single Channel Speech Separation Deep Learning Realistic Datasets Ground Truths
العنوان	Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
النوع	Conference
dc.accessType	Open Access

الملفات في هذه التسجيلة

الاسم:: Abstract_.pdf
الحجم:: 105.2Kb
الصيغة:: PDF

عرض / فتح

هذه التسجيلة تظهر في المجموعات التالية

محور العلوم الإنسانية والاجتماعية [‎44‎ items ]
أبحاث مركز جامعة قطر للعلماء الشباب [‎216‎ items ]

عرض بسيط للتسجيلة

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video