ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (5): 928-953.doi: 10.7544/issn1000-1239.2020.20190306

• 人工智能 • 上一篇    下一篇



  1. (电子科技大学信息与软件工程学院 成都 610054) (
  • 出版日期: 2020-05-01
  • 基金资助: 

An Overview of Monaural Speech Denoising and Dereverberation Research

Lan Tian, Peng Chuan, Li Sen, Ye Wenzheng, Li Meng, Hui Guoqiang, Lü Yilan, Qian Yuxin, Liu Qiao   

  1. (School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054)
  • Online: 2020-05-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (U19B2028, 61772117); the Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory Open Fund Project (10-2018039), the Sichuan Hi-Tech Industrialization Program (2018GFW0150), and the Fundamental Research Funds for the Central Universities (ZYGX2019J077).

摘要: 语音增强是提高语音质量与可懂度的关键技术,在语音识别、语音通话、电话会议和听力辅助等领域具有广泛应用前景与重要研究价值.从模型方法、数据集、特征、评估指标等方面,对单声道语音增强研究工作的发展现状进行了全面调研和深入分析.1)对传统的与基于机器学习的单声道语音降噪以及语音去混响的已有研究工作进行了梳理分类,简要介绍了典型方法的研究思路,并对不同方法的实验结果进行了综合比较;2)对在实验与结果评估过程中所涉及到的常用数据集、常见特征、学习目标与评估指标等进行了整理与介绍;3)对目前单声道语音增强仍然面临的主要问题与挑战进行了总结.

关键词: 语音增强, 语音降噪, 语音去混响, 机器学习, 深度神经网络

Abstract: Speech enhancement refers to the use of audio signal processing techniques and various algorithms to improve the intelligibility and quality of the distorted speech signals. It has great research value and a wide range of applications including speech recognition, VoIP, tele-conference and hearing aids. Most early work utilized unsupervised digital signal analysis methods to decompose the speech signal to obtain the characteristics of the clean speech and the noise. With the development of machine learning, some supervised methods which aim to learn the relationship between noisy and clean speech signals were proposed. In particular, the introduction of deep learning has greatly improved the performance. In order to help beginners and related researchers to understand the current research status of this topic, this paper conducts a comprehensive survey of the development process of the monaural speech enhancement, and systematically summarizes from the aspect of model methods, datasets, features, evaluation metrics, etc. First, we divide speech enhancement into noise reduction and de-reverberation, then respectively sort out the existing work of traditional and machine-learning-based methods in these two directions. Moreover, we briefly introduce the main ideas of typical solutions, and compare the performance of different methods. Then, commonly used datasets, features, learning objectives and evaluation metrics in experiments are enumerated and illustrated. Finally, four major challenges and corresponding issues in this area are summarized.

Key words: speech enhancement, speech denoising, speech dereverberation, machine learning, deep neural network