Research Progress of Video Question Answering Technologies

Bao Cuizhu; Ding Kai; Dong Jianfeng; Yang Xun; Xie Mande; Wang Xun

doi:10.7544/issn1000-1239.202220294

Bao Cuizhu, Ding Kai, Dong Jianfeng, Yang Xun, Xie Mande, Wang Xun. Research Progress of Video Question Answering Technologies[J]. Journal of Computer Research and Development, 2024, 61(3): 639-673. DOI: 10.7544/issn1000-1239.202220294

Citation:

Research Progress of Video Question Answering Technologies

Graphical Abstract

Abstract

Abstract

VideoQA (video question answering), which automatically answers natural language question according to the content of videos, is a relatively new research direction in the field of visual language and has attracted extensive attention in recent years. The solution of videoQA task is of great significance for human-computer interaction, intelligent education, intelligent transportation, scenario analysis, video retrieval, and other fields. VideoQA is a challenging task because it requires a model to understand semantic information of the video and the question to generate the answer. In this work, we analyze the difference between VideoQA and ImageQA (image question answering), and summarize four challenges faced by VideoQA relative to ImageQA. Then, the existing VideoQA models are carefully classified according to the research method around these challenges. Following the classifications, we introduce the generation background and focus on the implementation of models and the relationship between different models. After that, the benchmark datasets commonly used in VideoQA are summarized, the performances of current mainstream algorithms on some datasets are introduced in detail, and the comparison, analysis and summary are carried out. Finally, the future challenges and research trends in this field are discussed, which will provide some ideas for further research in the future.

FullText(HTML)

References (174)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

Research Progress of Video Question Answering Technologies

Abstract

Catalog

Export File

Citation

Format

Content