Using Context Specific Generative Adversarial Networks for Audio Data Completion
Abstract
Audio quality plays an essential role in several applications ranging from music to voice conversations. Sound information is subject to quality loss caused by reasons such as intermittent network connections, or storage corruption. Recent approaches resorted to using GANs for audio reconstruction due to their successful deployment in visual applications. However, audio datasets often include sounds from different contexts which increase the complexity of the patterns to be learned, leading to sub-optimal quality reconstruction. We propose a novel audio completion pipeline that clusters audio based on similarity of features extracted by a pre-trained CNN model and then trains a dedicated specialized GAN for each context separately. The proposed technique is compared with the traditional method of training one general GAN in completing 200ms missing segments of 1-second audio samples. Experimental results on a public benchmark dataset show that using specialized GANs led to a clear improvement in the completion quality as measured by a higher PSNR and a lower MSE. Qualitative evaluation also supported these results.
Collections
- Computer Science & Engineering [2402 items ]