ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (9): 1946-1958.doi: 10.7544/issn1000-1239.2018.20180168

Special Issue: 2018优青专题

Previous Articles     Next Articles

Research on Visual Question Answering Techniques

Yu Jun, Wang Liang, Yu Zhou   

  1. (School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018) (Key Laboratory of Complex System Modeling and Simulation (Hangzhou Dianzi University), Ministry of Education, Hangzhou 310018)
  • Online:2018-09-01

Abstract: With the significant advances of deep learning in computer vision and natural language processing, the existing methods are able to accurately understand the semantics of visual contents and natural languages, and carry out research on cross-media data representation and interaction. In recent years, visual question answering (VQA) has become a hot spot in cross-media expression and interaction area. The target of VQA is to learn a model to understand the visual content referred by a natural language question, and answer it automatically. This paper summarizes the research progresses in recent years on VQA from the aspects of concepts, models and datasets, and discusses the shortcomings of the current works. Finally, the possible future directions of VQA are discussed on methodology, applications and platforms.

Key words: visual question answering (VQA), visual reasoning, video question answering, deep learning, knowledge network

CLC Number: