Visual speech parameter estimation has an important role in the study of visual speech. In this paper, 24 speech correlating parameters are selected from MPEG-4 defined facial animation parameter (FAP) to describe visual speech. Combining the statistic learning method and rule based method, precise tracking results are obtained for mouth contour and facial feature points based on facial color probability distribution and priori knowledge on shape and edge. High frequency noise in reference points tracking is eliminated by low-pass filter, and main face pose is estimated from the four most evident reference points to remove the overall movements of the face. Finally, precise visual speech parameters are computed from the movement of these facial feature points, and these parameters have already been used in some related applications.