ISSN 1000-1239 CN 11-1777/TP

• Paper • Previous Articles     Next Articles

Automatic Estimation of Visual Speech Parameters

Wang Zhiming1, Cai Lianhong2, and Ai Haizhou2   

  1. 1(Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083) 2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
  • Online:2005-07-15

Abstract: Visual speech parameter estimation has an important role in the study of visual speech. In this paper, 24 speech correlating parameters are selected from MPEG-4 defined facial animation parameter (FAP) to describe visual speech. Combining the statistic learning method and rule based method, precise tracking results are obtained for mouth contour and facial feature points based on facial color probability distribution and priori knowledge on shape and edge. High frequency noise in reference points tracking is eliminated by low-pass filter, and main face pose is estimated from the four most evident reference points to remove the overall movements of the face. Finally, precise visual speech parameters are computed from the movement of these facial feature points, and these parameters have already been used in some related applications.

Key words: visual speech, facial animation parameter (FAP), Gaussian mixture model (GMM), deformable template