ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (11): 2430-2438.doi: 10.7544/issn1000-1239.2018.20170580

Previous Articles     Next Articles

Deep Neural Network Based Monaural Speech Enhancement with Sparse Non-Negative Matrix Factorization

Shi Wenhua1,2, Ni Yongjing3,4, Zhang Xiongwei1, Zou Xia1, Sun Meng1, Min Gang5   

  1. 1(陆军工程大学指挥信息系统学院 南京 210007); 2(空军航空大学飞行训练基地 辽宁阜新 123100); 3(燕山大学信息科学与工程学院 河北秦皇岛 066004); 4(河北科技大学信息科学与工程学院 石家庄 050018); 5(国防科技大学信息通信学院 西安 710106) (whshi0919@163.com)
  • Online:2018-11-01

Abstract: In this paper, a monaural speech enhancement method combining deep neural network (DNN) with sparse non-negative matrix factorization (SNMF) is proposed. This method takes advantage of the sparse characteristic of speech signal in time-frequency (T-F) domain and the spectral preservation characteristic of DNN presented in speech enhancement, aiming to resolve the distortion problem introduced by low SNR situation and unvoiced components without structure characteristics in conventional non-negative matrix factorization (NMF) method. Firstly, the magnitude spectrogram matrix of noisy speech is decomposed by NMF with sparse constraint to obtain the corresponding coding matrix coefficients of speech and noise dictionary. The speech and noise dictionary are pre-trained independently. Then Wiener filtering method is used to get the separated speech and noise. DNN is employed to model the non-linear function which maps the log magnitude spectrum of the separated speech from Wiener filter to the target clean speech. Evaluations are conducted on the IEEE dataset, both stationary and non-stationary types of noise are selected to demonstrate the effectiveness of the proposed method. The experimental results show that the proposed method could effectively suppress the noise and preserve the speech component from the corrupted speech signal. It has better performance than the baseline methods in terms of perceptual quality and log-spectral distortion.

Key words: deep neural network (DNN), dictionary learning, non-negative matrix factorization, speech enhancement, sparse constraints

CLC Number: