Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism

Kang Hongbo; Feng Yujia; Tai Wenxin; Lan Tian; Wu Zufeng; Liu Qiao

doi:10.7544/issn1000-1239.202220129

Kang Hongbo, Feng Yujia, Tai Wenxin, Lan Tian, Wu Zufeng, Liu Qiao. Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism[J]. Journal of Computer Research and Development, 2023, 60(7): 1639-1648. DOI: 10.7544/issn1000-1239.202220129

Citation:

Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Monaural speech enhancement aims to recover clean speech from complex noise scenes, thus improving the quality of the noise-corrupted voice signals. This problem has been studied for decades. In recent years, convolutional encoder-decoder neural networks have been widely used in speech enhancement tasks. The convolutional models reflect strong correlations of speech in time and can extract important voiceprint features. However, two challenges still remain. Firstly, skip connection mechanisms widely used in recent state-of-the-art methods introduce noise components in the transmission of feature information, which degrades the denoising performance inevitably; Secondly, widely used standard fix-shaped convolution kernels are inefficient of dealing with various voiceprints due to their limitation of receptive field. Taking into consideration the above concerns, we propose a novel end-to-end encoder-decoder-based network CADNet that incorporates the cross-dimensional collaborative attention mechanism and deformable convolution modules. In specific, we insert cross-dimensional collaborative attention blocks into skip connections to further facilitate the ability of voice information control. In addition, we introduce a deformable convolution layer after each standard convolution layer in order to better match the natural characteristics of voiceprints. Experiments conducted on the TIMIT open corpus verify the effectiveness of the proposed architecture in terms of objective intelligibility and quality metrics.

FullText(HTML)

References (42)

Cited By

Turn off MathJax

Article Contents

Monaural Speech Enhancement Based on Cross-Dimensional Collaborative Attention Mechanism

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content