An Overview of Monaural Speech Denoising and Dereverberation Research
-
Graphical Abstract
-
Abstract
Speech enhancement refers to the use of audio signal processing techniques and various algorithms to improve the intelligibility and quality of the distorted speech signals. It has great research value and a wide range of applications including speech recognition, VoIP, tele-conference and hearing aids. Most early work utilized unsupervised digital signal analysis methods to decompose the speech signal to obtain the characteristics of the clean speech and the noise. With the development of machine learning, some supervised methods which aim to learn the relationship between noisy and clean speech signals were proposed. In particular, the introduction of deep learning has greatly improved the performance. In order to help beginners and related researchers to understand the current research status of this topic, this paper conducts a comprehensive survey of the development process of the monaural speech enhancement, and systematically summarizes from the aspect of model methods, datasets, features, evaluation metrics, etc. First, we divide speech enhancement into noise reduction and de-reverberation, then respectively sort out the existing work of traditional and machine-learning-based methods in these two directions. Moreover, we briefly introduce the main ideas of typical solutions, and compare the performance of different methods. Then, commonly used datasets, features, learning objectives and evaluation metrics in experiments are enumerated and illustrated. Finally, four major challenges and corresponding issues in this area are summarized.
-
-