高级检索
    康宏博, 冯雨佳, 台文鑫, 蓝天, 吴祖峰, 刘峤. 基于跨维度协同注意力机制的单通道语音增强方法[J]. 计算机研究与发展, 2023, 60(7): 1639-1648. DOI: 10.7544/issn1000-1239.202220129
    引用本文: 康宏博, 冯雨佳, 台文鑫, 蓝天, 吴祖峰, 刘峤. 基于跨维度协同注意力机制的单通道语音增强方法[J]. 计算机研究与发展, 2023, 60(7): 1639-1648. DOI: 10.7544/issn1000-1239.202220129
    Kang Hongbo, Feng Yujia, Tai Wenxin, Lan Tian, Wu Zufeng, Liu Qiao. Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism[J]. Journal of Computer Research and Development, 2023, 60(7): 1639-1648. DOI: 10.7544/issn1000-1239.202220129
    Citation: Kang Hongbo, Feng Yujia, Tai Wenxin, Lan Tian, Wu Zufeng, Liu Qiao. Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism[J]. Journal of Computer Research and Development, 2023, 60(7): 1639-1648. DOI: 10.7544/issn1000-1239.202220129

    基于跨维度协同注意力机制的单通道语音增强方法

    Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism

    • 摘要: 近年来,卷积神经网络在语音增强任务中得到了广泛的应用.然而,目前广泛使用的跳跃连接机制在特征信息传输时会引入噪声成分,从而不可避免地降低了去噪性能;除此之外,普遍使用的固定形状的卷积核在处理各种声纹信息时效率低下,基于上述考虑,提出了一种跨维度协同注意力机制和形变卷积模块的端到端编-解码器网络CADNet. 具体来说,在跳跃连接中引入跨维度协同注意力模块,进一步提高信息控制能力. 并且在每个标准卷积层之后引入形变卷积层,从而更好地匹配声纹的自然特征. 在TIMIT公开数据集上进行的实验验证了所提出的方法在语音质量和可懂度的评价指标方面的有效性.

       

      Abstract: Monaural speech enhancement aims to recover clean speech from complex noise scenes, thus improving the quality of the noise-corrupted voice signals. This problem has been studied for decades. In recent years, convolutional encoder-decoder neural networks have been widely used in speech enhancement tasks. The convolutional models reflect strong correlations of speech in time and can extract important voiceprint features. However, two challenges still remain. Firstly, skip connection mechanisms widely used in recent state-of-the-art methods introduce noise components in the transmission of feature information, which degrades the denoising performance inevitably; Secondly, widely used standard fix-shaped convolution kernels are inefficient of dealing with various voiceprints due to their limitation of receptive field. Taking into consideration the above concerns, we propose a novel end-to-end encoder-decoder-based network CADNet that incorporates the cross-dimensional collaborative attention mechanism and deformable convolution modules. In specific, we insert cross-dimensional collaborative attention blocks into skip connections to further facilitate the ability of voice information control. In addition, we introduce a deformable convolution layer after each standard convolution layer in order to better match the natural characteristics of voiceprints. Experiments conducted on the TIMIT open corpus verify the effectiveness of the proposed architecture in terms of objective intelligibility and quality metrics.

       

    /

    返回文章
    返回