Advanced Search
    Wu Zhiyong and Cai Lianhong. Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks[J]. Journal of Computer Research and Development, 2006, 43(3): 470-475.
    Citation: Wu Zhiyong and Cai Lianhong. Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks[J]. Journal of Computer Research and Development, 2006, 43(3): 470-475.

    Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks

    • Studied in this paper is the use of dynamic Bayesian networks (DBNs) for the task of text prompt audio-visual bimodal speaker identification. The task is to determine the identity of a speaker from a temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth respectively. According to the hierarchical structure of audio-visual bimodal modeling, a new DBN is constructed to describe the natural audio and visual state asynchrony as well as their conditional dependency over time. The experimental results show that the dynamic Bayesian network is a powerful and flexible methodology for representing and modeling the audio-visual correlations and the proposed DBN can improve the accuracy of audio-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30dB.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return