Abstract:
Language identification (LID) accuracy is often significantly reduced when the duration of the test data and the training data are mismatched. This paper proposes a method to compensate language features using a denoising autoencoder (DAE). Use of denoising autoencoder-based language feature compensation can map language features from variable length utterances into a fixed length representation. Therefore the problem of length mismatch and unbalanced phoneme distribution can be mitigated. The algorithm first converts the speech signal to low level acoustic features by framing and transforming, and then estimates its i-vector and phonetic vector. These two vectors are then concatenated and fed into the DAE-based language feature compensation processing unit. The compensated i-vector from the output of the DAE, and the original i-vector, are presented to the back-end classifier to obtain two score vectors. These two score vectors are finally fused at a score level to obtain a final result. Tests on NIST-LRE07 demonstrate that this feature compensation method improves identification performance over various test speech durations. Compared with traditional LID systems, the performance for 30 s test utterances improves by 3.16%, while the performance for 10 s test utterances improves by 2.90%. Compared with the end-to-end LID system, the performance on 3 s test utterances is increased by 3.21%.