• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Huang Xiangdong, Chen Honghong, Gan Lin. Speech Enhancement Method Based on Frequency-Time Dilated Dense Network[J]. Journal of Computer Research and Development, 2023, 60(7): 1628-1638. DOI: 10.7544/issn1000-1239.202220259
Citation: Huang Xiangdong, Chen Honghong, Gan Lin. Speech Enhancement Method Based on Frequency-Time Dilated Dense Network[J]. Journal of Computer Research and Development, 2023, 60(7): 1628-1638. DOI: 10.7544/issn1000-1239.202220259

Speech Enhancement Method Based on Frequency-Time Dilated Dense Network

Funds: This work was supported by the National Natural Science Foundation of China (62107029).
More Information
  • Author Bio:

    Huang Xiangdong: born in 1979. PhD, professor, PhD supervisor. His main research interests include super resolution spectral analysis under sparse undersampling, signal processing and speech enhancement

    Chen Honghong: born in 1998. Master. Her main research interests include speech enhancement and signal processing

    Gan Lin: born in 1985. PhD candidate, assistant professor. Her main research interest includes auditory cognition

  • Received Date: March 30, 2022
  • Revised Date: July 07, 2022
  • Available Online: March 27, 2023
  • Speech enhancement in noisy circumstances is one of the important research directions of speech signal processing, which plays an important role in improving the quality of voice video call and enhancing the performance of human-computer interaction and speech recognition. Therefore, we propose a network based on the dilated convolution and the dense connection, which effectively improves the feature expression ability of the network by learning the context information of frequency and time directions of speech spectrogram. Specifically, the proposed structure integrates dilated convolution into the basic unit of time and frequency processing, which can ensure that a large enough receptive field can be obtained in the frequency direction and time direction to extract deep speech features; at the same time, the dense connection is applied to the cascade structure of these two basic units, which can avoid the loss of information caused by the cascade of multiple processing modules, so as to enhance the efficiency of feature utilization. Experimental results show that the proposed speech enhancement network can achieve high scores in PESQ, STOI and a series of subjective mean opinions, showing overall superiority over the existing speech enhancement networks. Besides, the generalization ability to varieties of noisy conditions is also evaluated in these experiments.

  • [1]
    Lim J, Oppenheim A. All-pole modeling of degraded speech[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(3): 197−210 doi: 10.1109/TASSP.1978.1163086
    [2]
    Boll S. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1979, 27(2): 113−120 doi: 10.1109/TASSP.1979.1163209
    [3]
    Ephraim Y, Van Trees H. A signal subspace approach for speech enhancement[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(4): 251−266 doi: 10.1109/89.397090
    [4]
    时文华,倪永婧,张雄伟,等. 联合稀疏非负矩阵分解和神经网络的语音增强[J]. 计算机研究与发展,2018,55(11):2430−2438

    Shi Wenhua, Ni Yongjing, Zhang Xiongwei, et al. Deep neural network based monaural speech enhancement with sparse non-negative matrix factorization[J]. Journal of Computer Research and Development, 2018, 55(11): 2430−2438 (in Chinese)
    [5]
    Jamal N, Fuad N, Shanta S, et al. Monaural speech enhancement using deep neural network with cross-speech dataset [C]// Proc of the 6th IEEE Int Conf on Signal and Image Processing Applications. Piscataway, NJ: IEEE, 2021: 44−49
    [6]
    Gao Meng, Gao Ying, Pei Feng. DNN-based speech separation with joint improved distortion constraints [C]// Proc of the 14th Int Symp on Computational Intelligence and Design. Piscataway, NJ: IEEE, 2021: 5−8
    [7]
    Wang Deliang. Speech Separation by Humans and Machines [M]. Berlin: Springer, 2005: 181−197
    [8]
    Narayanan A, Wang Deliang. Ideal ratio mask estimation using deep neural networks for robust speech recognition [C]// Proc of the 38th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2013: 7092−7096
    [9]
    Wang Yuxuan, Narayanan A, Wang Deliang. On training targets for supervised speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12): 1849−1858 doi: 10.1109/TASLP.2014.2352935
    [10]
    Park S R, Lee J. A fully convolutional neural network for speech enhancement [J]. arXiv preprint, arXiv: 1609.07132, 2016
    [11]
    Pandey A, Wang Deliang. Dense CNN with self-attention for time-domain speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1270−1279 doi: 10.1109/TASLP.2021.3064421
    [12]
    Mamun N, Majumder S, Akter K. A self-supervised convolutional neural network approach for speech enhancement [C/OL]// Proc of the 5th Int Conf on Electrical Engineering and Information Communication Technology. Piscataway, NJ: IEEE, 2021[2020-06-01].https://ieeexplore.ieee.org/abstract/document/9667875
    [13]
    Huang P S, Kim M, Hasegawa-Johnson M, et al. Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12): 2136−2147 doi: 10.1109/TASLP.2015.2468583
    [14]
    Xian Yang, Sun Yang, Wang Wenwu, et al. Multi-scale residual convolutional encoder decoder with bidirectional long short-term memory for single channel speech enhancement [C]// Proc of the 28th European Signal Processing Conf. Piscataway, NJ: IEEE, 2021: 431−435
    [15]
    Lin Ju, Van Wijngaarden A J, Wang K C, et al. Speech enhancement using multi-stage self-attentive temporal convolutional networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3440−3450 doi: 10.1109/TASLP.2021.3125143
    [16]
    Li Chao, Jiang Ting, Yu Jiacheng. Single channel speech enhancement based on temporal convolutional network [C]// Proc of the 6th IEEE Int Conf on Signal and Image Processing. Piscataway, NJ: IEEE, 2021: 831−835
    [17]
    Zhang Qiquan, Nicolson A, Wang Mingjiang, et al. Monaural speech enhancement using a multi-branch temporal convolutional network [J]. arXiv preprint, arXiv: 1912.12023, 2020
    [18]
    Zhang Zehua, Zhang Lu, Zhuang Xuyi, et al. FB-MSTCN: A full-band single-channel speech enhancement method based on multi-scale temporal convolutional network [C]// Proc of the 47th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2022: 9276−9280
    [19]
    Bai Shaojie, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[J]. arXiv preprint, arXiv: 1803.01271, 2018
    [20]
    Huang Gao, Liu Zhuang, Van Der Maaten L, et al. Densely connected convolutional networks [C]// Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 4700−4708
    [21]
    戴善荣. 信息论与编码基础[M]. 北京: 机械工业出版社, 2005

    Dai Shanrong. Fundamentals of Information Theory and Coding[M]. Beijing: China Machine Press, 2005(in Chinese)
    [22]
    Deng Feng, Jiang Tao, Wang Xiaorui, et al. NAAGN: Noise-aware attention-gated network for speech enhancement [C]// Proc of INTERSPEECH 2020. Grenoble, France: ISCA, 2020: 2457−2461
    [23]
    Valentini-Botinhao C, Wang Xin, Takaki S, et al. Investigating RNN based speech enhancement methods for noise-robust text-to-speech [C]// Proc of the 9th ISCA Speech Synthesis Workshop. Grenoble, France: ISCA, 2016: 146−152
    [24]
    Panayotov V , Chen Guoguo, Povey D, et al. Librispeech: An ASR corpus based on public domain audio books [C]// Proc of the 40th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2015: 5206–5210
    [25]
    Thiemann J, Ito N, Vincent E, et al. The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings physical characteristics of the microphone array [C/OL]// Proc of Meetings on Acoustics ICA 2013 Montreal. Melville, NY: Acoustical Society of America, 2013[2022-06-01].https://asa.scitation.org/doi/pdf/10.1121/1.4799597
    [26]
    Reddy C K A, Dubey H, Gopal V, et al. ICASSP 2021 deep noise suppression challenge [C]// Proc of the 46th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2021: 6623−6627
    [27]
    Rix A W, Beerends J. G, Hollier M P, et al. Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs [C]// Proc of the 26th IEEE Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2001: 749−752
    [28]
    Taal C H, Hendriks R C, Heusdens R, et al. An algorithm for intelligibility prediction of time–frequency weighted noisy speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7): 2125−2136 doi: 10.1109/TASL.2011.2114881
    [29]
    Hu Yi, Loizou P C. Evaluation of objective quality measures for speech enhancement[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1): 229−238 doi: 10.1109/TASL.2007.911054
    [30]
    Tan Ke, Wang Deliang. A convolutional recurrent neural network for real-time speech enhancement [C]// Proc of INTERSPEECH 2018. Grenoble, France: ISCA, 2018: 3229−3233
    [31]
    Pandey A, Wang Deliang. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain [C]// Proc of the 44th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2019: 6875−6879
    [32]
    Pascual S, Bonafonte A, Serra J. SEGAN: Speech enhancement generative adversarial network [C]// Proc of INTERSPEECH 2017. Grenoble, France: ISCA, 2017: 3642−3646
    [33]
    Macartney C, Weyde T. Improved speech enhancement with the Wave-U-Net [J]. arXiv preprint, arXiv: 1811.11307, 2018
    [34]
    Hsieh T A, Wang H M, Lu Xugang, et al. WaveCRN: An efficient convolutional recurrent neural network for end-to-end speech enhancement[J]. IEEE Signal Processing Letters, 2020, 27: 2149−2153 doi: 10.1109/LSP.2020.3040693
    [35]
    Fu S W, Liao C F, Tsao Y, et al. MetricGAN: Generative adversarial networks based black-box metric scores optimization for speech enhancement [C/OL]// Proc of the 36th Int Conf on Machine Learning (PMLR). 2019: 2031−2041. [2022-06-01]. http://proceedings.mlr.press/v97/fu19b/fu19b.pdf
  • Related Articles

    [1]Guo Husheng, Zhang Yutong, Wang Wenjian. Elastic Gradient Ensemble for Concept Drift Adaptation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440407
    [2]Guo Husheng, Zhang Yang, Wang Wenjian. Two-Stage Adaptive Ensemble Learning Method for Different Types of Concept Drift[J]. Journal of Computer Research and Development, 2024, 61(7): 1799-1811. DOI: 10.7544/issn1000-1239.202330452
    [3]Guo Husheng, Cong Lu, Gao Shuhua, Wang Wenjian. Adaptive Classification Method for Concept Drift Based on Online Ensemble[J]. Journal of Computer Research and Development, 2023, 60(7): 1592-1602. DOI: 10.7544/issn1000-1239.202220245
    [4]Cai Derun, Li Hongyan. A Metric Learning Based Unsupervised Domain Adaptation Method with Its Application on Mortality Prediction[J]. Journal of Computer Research and Development, 2022, 59(3): 674-682. DOI: 10.7544/issn1000-1239.20200693
    [5]Cai Huan, Lu Kezhong, Wu Qirong, Wu Dingming. Adaptive Classification Algorithm for Concept Drift Data Stream[J]. Journal of Computer Research and Development, 2022, 59(3): 633-646. DOI: 10.7544/issn1000-1239.20201017
    [6]Yu Xian, Li Zhenyu, Sun Sheng, Zhang Guangxing, Diao Zulong, Xie Gaogang. Adaptive Virtual Machine Consolidation Method Based on Deep Reinforcement Learning[J]. Journal of Computer Research and Development, 2021, 58(12): 2783-2797. DOI: 10.7544/issn1000-1239.2021.20200366
    [7]Bai Chenjia, Liu Peng, Zhao Wei, Tang Xianglong. Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J]. Journal of Computer Research and Development, 2019, 56(2): 262-280. DOI: 10.7544/issn1000-1239.2019.20170812
    [8]Zhang Yuanpeng, Deng Zhaohong, Chung Fu-lai, Hang Wenlong, Wang Shitong. Fast Self-Adaptive Clustering Algorithm Based on Exemplar Score Strategy[J]. Journal of Computer Research and Development, 2018, 55(1): 163-178. DOI: 10.7544/issn1000-1239.2018.20160937
    [9]Ma Anxiang, Zhang Bin, Gao Kening, Qi Peng, and Zhang Yin. Deep Web Data Extraction Based on Result Pattern[J]. Journal of Computer Research and Development, 2009, 46(2): 280-288.
    [10]Dandan, Li Zusong, Wang Jian, Zhang Longbing, Hu Weiwu, Liu Zhiyong. Adaptive Stack Cache with Fast Address Generation[J]. Journal of Computer Research and Development, 2007, 44(1): 169-176.

Catalog

    Article views (134) PDF downloads (77) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return