• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

视觉问答技术研究

俞俊, 汪亮, 余宙

俞俊, 汪亮, 余宙. 视觉问答技术研究[J]. 计算机研究与发展, 2018, 55(9): 1946-1958. DOI: 10.7544/issn1000-1239.2018.20180168
引用本文: 俞俊, 汪亮, 余宙. 视觉问答技术研究[J]. 计算机研究与发展, 2018, 55(9): 1946-1958. DOI: 10.7544/issn1000-1239.2018.20180168
Yu Jun, Wang Liang, Yu Zhou. Research on Visual Question Answering Techniques[J]. Journal of Computer Research and Development, 2018, 55(9): 1946-1958. DOI: 10.7544/issn1000-1239.2018.20180168
Citation: Yu Jun, Wang Liang, Yu Zhou. Research on Visual Question Answering Techniques[J]. Journal of Computer Research and Development, 2018, 55(9): 1946-1958. DOI: 10.7544/issn1000-1239.2018.20180168
俞俊, 汪亮, 余宙. 视觉问答技术研究[J]. 计算机研究与发展, 2018, 55(9): 1946-1958. CSTR: 32373.14.issn1000-1239.2018.20180168
引用本文: 俞俊, 汪亮, 余宙. 视觉问答技术研究[J]. 计算机研究与发展, 2018, 55(9): 1946-1958. CSTR: 32373.14.issn1000-1239.2018.20180168
Yu Jun, Wang Liang, Yu Zhou. Research on Visual Question Answering Techniques[J]. Journal of Computer Research and Development, 2018, 55(9): 1946-1958. CSTR: 32373.14.issn1000-1239.2018.20180168
Citation: Yu Jun, Wang Liang, Yu Zhou. Research on Visual Question Answering Techniques[J]. Journal of Computer Research and Development, 2018, 55(9): 1946-1958. CSTR: 32373.14.issn1000-1239.2018.20180168

视觉问答技术研究

基金项目: 国家自然科学基金优秀青年基金项目(61622205) This work was supported by the National Natural Science Foundation of China for Excellent Young Scientists (61622205).
详细信息
  • 中图分类号: TP391

Research on Visual Question Answering Techniques

  • 摘要: 随着深度学习在计算机视觉、自然语言处理领域取得的长足进展,现有方法已经能准确理解视觉对象和自然语言的语义,并在此基础上开展跨媒体数据表达与交互研究.近年来,视觉问答(visual question answering, VQA)是跨媒体表达与交互方向上的研究热点问题.视觉问答旨在让计算机理解图像内容后根据自然语言输入的查询进行自动回答.围绕视觉问答问题,从概念、模型、数据集等方面对近年来的研究进展进行综述,同时探讨现有工作存在的不足;最后从方法论、应用和平台等多方面对视觉问答未来的研究方向进行了展望.
    Abstract: With the significant advances of deep learning in computer vision and natural language processing, the existing methods are able to accurately understand the semantics of visual contents and natural languages, and carry out research on cross-media data representation and interaction. In recent years, visual question answering (VQA) has become a hot spot in cross-media expression and interaction area. The target of VQA is to learn a model to understand the visual content referred by a natural language question, and answer it automatically. This paper summarizes the research progresses in recent years on VQA from the aspects of concepts, models and datasets, and discusses the shortcomings of the current works. Finally, the possible future directions of VQA are discussed on methodology, applications and platforms.
  • 期刊类型引用(1)

    1. 李鹏,闵慧,罗爱静,瞿昊宇,伊娜,许家祺. 改进的动态PPI网络构建与蛋白质功能预测算法. 计算机工程. 2020(12): 52-59 . 百度学术

    其他类型引用(3)

计量
  • 文章访问数:  2858
  • HTML全文浏览量:  3
  • PDF下载量:  1088
  • 被引次数: 4
出版历程
  • 发布日期:  2018-08-31

目录

    /

    返回文章
    返回