Abstract:
Visual information is important to the understanding of speech. Not only hearing-impaired people, but people with normal hearing also make use of visual information that accompanies speech, especially when the acoustic speech is degradedin the noise environment. As text-to-speech (TTS) synthesis makes computer speak like human, text-to-visual speech (TTVS) synthesis by computer face animation can incorporate bimodality of speech into human-computer interaction interface in order to make it friendly. The state-of-the-art of text-to-visual speech synthesis research is reviewed. Two classes of approaches, parameter control approach and data driven approach, are developed in visual speech synthesis. For the parameter control approach, three key problems are discussed: face model construction, animation control parameters definition, and the dynamic properties of control parameters. For the data driven approach, three main methods are introduced: video slice concatenation, key frame morphing, and face components combination. Finally, the advantages and disadvantages of each approach are discussed.