Abstract:
In recent years, the rapid advancement of deep learning and multimodal information fusion has significantly propelled the development of 2D virtual human driving technologies. Existing methods can now generate highly realistic facial expressions and body movements for specified characters by leveraging diverse input data such as video, audio, facial cues, and poses. This remarkable progress offers substantial convenience for applications in virtual entertainment, online education, and intelligent interaction. This survey provides a systematic review of the current state and evolutionary trajectory of deep learning-based 2D virtual human driving methodologies. It thoroughly outlines the underlying principles and fundamental models commonly employed in this domain. The driving techniques are categorized and summarized according to both the target driving components (e.g., face, body) and the corresponding architectural frameworks, with a focused comparison and detailed elaboration of prominent methods developed in recent years. The research status and development prospects of holistic driving systems are also introduced. Furthermore, this paper delves into the critical technical challenges currently faced by the field, including generation realism, the quality of datasets, and the real-time performance of models. The aim of this survey is to offer a comprehensive technological overview and perspective for researchers in related fields, thereby contributing to the further advancement of this domain.