Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks

Wu Zhiyong and Cai Lianhong

Wu Zhiyong and Cai Lianhong. Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks[J]. Journal of Computer Research and Development, 2006, 43(3): 470-475.

Citation:

Wu Zhiyong and Cai Lianhong. Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks[J]. Journal of Computer Research and Development, 2006, 43(3): 470-475.

Citation:

Wu Zhiyong and Cai Lianhong. Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks[J]. Journal of Computer Research and Development, 2006, 43(3): 470-475.

Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks

Wu Zhiyong and Cai Lianhong

Graphical Abstract

Abstract

Abstract

Studied in this paper is the use of dynamic Bayesian networks (DBNs) for the task of text prompt audio-visual bimodal speaker identification. The task is to determine the identity of a speaker from a temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth respectively. According to the hierarchical structure of audio-visual bimodal modeling, a new DBN is constructed to describe the natural audio and visual state asynchrony as well as their conditional dependency over time. The experimental results show that the dynamic Bayesian network is a powerful and flexible methodology for representing and modeling the audio-visual correlations and the proposed DBN can improve the accuracy of audio-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30dB.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks

Abstract

Catalog

Export File

Citation

Format

Content