高级检索
    黄翔东, 陈红红, 甘霖. 基于频率-时间扩张密集网络的语音增强方法[J]. 计算机研究与发展, 2023, 60(7): 1628-1638. DOI: 10.7544/issn1000-1239.202220259
    引用本文: 黄翔东, 陈红红, 甘霖. 基于频率-时间扩张密集网络的语音增强方法[J]. 计算机研究与发展, 2023, 60(7): 1628-1638. DOI: 10.7544/issn1000-1239.202220259
    Huang Xiangdong, Chen Honghong, Gan Lin. Speech Enhancement Method Based on Frequency-Time Dilated Dense Network[J]. Journal of Computer Research and Development, 2023, 60(7): 1628-1638. DOI: 10.7544/issn1000-1239.202220259
    Citation: Huang Xiangdong, Chen Honghong, Gan Lin. Speech Enhancement Method Based on Frequency-Time Dilated Dense Network[J]. Journal of Computer Research and Development, 2023, 60(7): 1628-1638. DOI: 10.7544/issn1000-1239.202220259

    基于频率-时间扩张密集网络的语音增强方法

    Speech Enhancement Method Based on Frequency-Time Dilated Dense Network

    • 摘要: 含噪条件下的语音增强技术是语音信号领域的重要研究方向之一,该技术对于提升语音视频通话的质量、提高人机交互和语音识别的性能具有重要作用. 为此,提出了基于扩张卷积和密集连接的语音增强网络结构,通过学习语音时频谱的频率、时间轴的上下文信息,有效提高了网络的特征表达能力. 具体来说,所提结构将扩张卷积融入到时间、频率处理的基础单元中,以确保在频率方向和时间方向上均可获得足够大的感受野,提取出深层语音特征;同时,密集连接被应用到这2个基础单元的级联结构中,由此可避免多处理模块级联带来的信息丢失,从而增强特征利用效率. 实验结果表明所提出的语音增强网络在语音质量客观评估(perceptual evaluation of speech quality,PESQ)和短时客观可懂度(short-time objective intelligibility,STOI)以及各类主观平均意见方面的总体评分,相比于现有的各类语音增强模型,均居于领先水平. 此外,所提网络对各种含噪条件的泛化能力也在实验中得以评估.

       

      Abstract: Speech enhancement in noisy circumstances is one of the important research directions of speech signal processing, which plays an important role in improving the quality of voice video call and enhancing the performance of human-computer interaction and speech recognition. Therefore, we propose a network based on the dilated convolution and the dense connection, which effectively improves the feature expression ability of the network by learning the context information of frequency and time directions of speech spectrogram. Specifically, the proposed structure integrates dilated convolution into the basic unit of time and frequency processing, which can ensure that a large enough receptive field can be obtained in the frequency direction and time direction to extract deep speech features; at the same time, the dense connection is applied to the cascade structure of these two basic units, which can avoid the loss of information caused by the cascade of multiple processing modules, so as to enhance the efficiency of feature utilization. Experimental results show that the proposed speech enhancement network can achieve high scores in PESQ, STOI and a series of subjective mean opinions, showing overall superiority over the existing speech enhancement networks. Besides, the generalization ability to varieties of noisy conditions is also evaluated in these experiments.

       

    /

    返回文章
    返回