• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

联合稀疏非负矩阵分解和神经网络的语音增强

时文华, 倪永婧, 张雄伟, 邹霞, 孙蒙, 闵刚

时文华, 倪永婧, 张雄伟, 邹霞, 孙蒙, 闵刚. 联合稀疏非负矩阵分解和神经网络的语音增强[J]. 计算机研究与发展, 2018, 55(11): 2430-2438. DOI: 10.7544/issn1000-1239.2018.20170580
引用本文: 时文华, 倪永婧, 张雄伟, 邹霞, 孙蒙, 闵刚. 联合稀疏非负矩阵分解和神经网络的语音增强[J]. 计算机研究与发展, 2018, 55(11): 2430-2438. DOI: 10.7544/issn1000-1239.2018.20170580
Shi Wenhua, Ni Yongjing, Zhang Xiongwei, Zou Xia, Sun Meng, Min Gang. Deep Neural Network Based Monaural Speech Enhancement with Sparse Non-Negative Matrix Factorization[J]. Journal of Computer Research and Development, 2018, 55(11): 2430-2438. DOI: 10.7544/issn1000-1239.2018.20170580
Citation: Shi Wenhua, Ni Yongjing, Zhang Xiongwei, Zou Xia, Sun Meng, Min Gang. Deep Neural Network Based Monaural Speech Enhancement with Sparse Non-Negative Matrix Factorization[J]. Journal of Computer Research and Development, 2018, 55(11): 2430-2438. DOI: 10.7544/issn1000-1239.2018.20170580
时文华, 倪永婧, 张雄伟, 邹霞, 孙蒙, 闵刚. 联合稀疏非负矩阵分解和神经网络的语音增强[J]. 计算机研究与发展, 2018, 55(11): 2430-2438. CSTR: 32373.14.issn1000-1239.2018.20170580
引用本文: 时文华, 倪永婧, 张雄伟, 邹霞, 孙蒙, 闵刚. 联合稀疏非负矩阵分解和神经网络的语音增强[J]. 计算机研究与发展, 2018, 55(11): 2430-2438. CSTR: 32373.14.issn1000-1239.2018.20170580
Shi Wenhua, Ni Yongjing, Zhang Xiongwei, Zou Xia, Sun Meng, Min Gang. Deep Neural Network Based Monaural Speech Enhancement with Sparse Non-Negative Matrix Factorization[J]. Journal of Computer Research and Development, 2018, 55(11): 2430-2438. CSTR: 32373.14.issn1000-1239.2018.20170580
Citation: Shi Wenhua, Ni Yongjing, Zhang Xiongwei, Zou Xia, Sun Meng, Min Gang. Deep Neural Network Based Monaural Speech Enhancement with Sparse Non-Negative Matrix Factorization[J]. Journal of Computer Research and Development, 2018, 55(11): 2430-2438. CSTR: 32373.14.issn1000-1239.2018.20170580

联合稀疏非负矩阵分解和神经网络的语音增强

基金项目: 国家自然科学基金项目(61402519,61471394);江苏省自然科学基金项目(BK20140071,BK20140074);陕西省自然科学基金项目(2017JQ6033)
详细信息
  • 中图分类号: TP391.4; TN912.3

Deep Neural Network Based Monaural Speech Enhancement with Sparse Non-Negative Matrix Factorization

  • 摘要: 针对基于非负矩阵分解(non-negative matrix factorization, NMF)的语音增强方法在低信噪比部分和无结构特征的清音部分会引入失真这一问题,利用语音信号在时频域呈现的稀疏特性和深度神经网络在语音增强应用中表现出的谱重构特性,提出了一种联合稀疏非负矩阵分解和深度神经网络的单通道语音增强方法.首先对带噪语音的幅度谱进行非负矩阵分解得到与语音字典和噪声字典相对应的稀疏编码矩阵,其中语音字典和噪声字典通过对纯净语音和噪声进行训练预先得到,以维纳滤波方法恢复出语音成分的主要结构;然后利用深度神经网络在语音增强中表现出的时频保持特性,通过深层网络学习经维纳滤波分离出的语音的对数幅度谱和理想纯净语音对数幅度谱之间的非线性映射函数,进而恢复出语音结构的缺失成分.实验结果表明:所提方法可以有效抑制噪声且较好地恢复出语音成分,在语音感知质量和对数谱失真性能评价指标上均优于基线方法.
    Abstract: In this paper, a monaural speech enhancement method combining deep neural network (DNN) with sparse non-negative matrix factorization (SNMF) is proposed. This method takes advantage of the sparse characteristic of speech signal in time-frequency (T-F) domain and the spectral preservation characteristic of DNN presented in speech enhancement, aiming to resolve the distortion problem introduced by low SNR situation and unvoiced components without structure characteristics in conventional non-negative matrix factorization (NMF) method. Firstly, the magnitude spectrogram matrix of noisy speech is decomposed by NMF with sparse constraint to obtain the corresponding coding matrix coefficients of speech and noise dictionary. The speech and noise dictionary are pre-trained independently. Then Wiener filtering method is used to get the separated speech and noise. DNN is employed to model the non-linear function which maps the log magnitude spectrum of the separated speech from Wiener filter to the target clean speech. Evaluations are conducted on the IEEE dataset, both stationary and non-stationary types of noise are selected to demonstrate the effectiveness of the proposed method. The experimental results show that the proposed method could effectively suppress the noise and preserve the speech component from the corrupted speech signal. It has better performance than the baseline methods in terms of perceptual quality and log-spectral distortion.
  • 期刊类型引用(9)

    1. 黄翔东,陈红红,甘霖. 基于频率-时间扩张密集网络的语音增强方法. 计算机研究与发展. 2023(07): 1628-1638 . 本站查看
    2. 许春冬,徐琅,周滨. 结合优化U-Net和残差神经网络的单通道语音增强算法. 现代电子技术. 2022(09): 35-40 . 百度学术
    3. 葛宛营,张天骐,范聪聪,张天. 噪声情况下采用稀疏非负矩阵分解与深度吸引子网络的人声分离算法. 声学学报. 2021(01): 55-66 . 百度学术
    4. GE Wanying,ZHANG Tianqi,FAN Congcong,ZHANG Tian. Monaural noisy speech separation combining sparse non-negative matrix factorization and deep attractor network. Chinese Journal of Acoustics. 2021(02): 266-280 . 必应学术
    5. 王静红,梁丽娜,李昊康,周易. 基于注意力网络特征的社区发现算法. 山东大学学报(理学版). 2021(09): 1-12+20 . 百度学术
    6. 张天骐,柏浩钧,叶绍鹏,刘鉴兴. 基于门控残差卷积编解码网络的单通道语音增强方法. 信号处理. 2021(10): 1986-1995 . 百度学术
    7. 曹丽静. 语音增强技术研究综述. 河北省科学院学报. 2020(02): 30-36 . 百度学术
    8. 张天骐,张晓艳,周琳,胡延平. 基于稀疏性的相位谱补偿语音增强算法. 信号处理. 2020(11): 1867-1876 . 百度学术
    9. 时文华,张雄伟,邹霞,孙蒙. 利用深度全卷积编解码网络的单通道语音增强. 信号处理. 2019(04): 631-640 . 百度学术

    其他类型引用(8)

计量
  • 文章访问数:  1154
  • HTML全文浏览量:  1
  • PDF下载量:  523
  • 被引次数: 17
出版历程
  • 发布日期:  2018-10-31

目录

    /

    返回文章
    返回