ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (1): 113-122.doi: 10.7544/issn1000-1239.2016.20150689

所属专题: 2016优青专题

• 图形图像 • 上一篇    下一篇



  1. (中国科学院智能信息处理重点实验室(中国科学院计算技术研究所) 北京 100190) (
  • 出版日期: 2016-01-01
  • 基金资助: 

Survey and Prospect of Intelligent Interaction-Oriented Image Recognition Techniques

Jiang Shuqiang, Min Weiqing, Wang Shuhui   

  1. (Key Laboratory of Intelligent Information Processing (Institute of Computing Technology, Chinese Academy of Sciences), Chinese Academy of Sciences, Beijing 100190)
  • Online: 2016-01-01

摘要: 视觉在人与人交互以及人与自然界的交互过程中起到非常重要的作用,让终端设备具有智能的视觉识别和交互能力是人工智能和计算机技术的核心挑战和远大目标之一.可以看到,近年来视觉识别技术发展飞速,新的创新技术不断涌现,新的研究问题不断被提出,面向智能交互的应用呈现出一些新的动态,正在不断刷新人们对此领域的原有认识.从视觉识别、视觉描述和视觉问答3个角度对图像识别技术进行综述,对基于深度学习的图像识别以及场景分类技术进行了具体介绍,对视觉描述和问答技术的最新技术进行了分析和讨论,同时对面向移动终端和机器人的视觉识别和交互应用进行了介绍,最后对该领域的未来研究趋势进行了分析.

关键词: 图像识别, 智能的视觉识别, 智能交互, 视觉描述, 视觉问答, 深度学习

Abstract: Vision plays an important role in both the human interaction and human-nature interaction. Furthermore, equipping the terminals with the intelligent visual recognition and interaction is one of the core challenges in artificial intelligence and computer technology, and also one of lofty goals. With the rapid development of visual recognition techniques, in recent years the emerging new techniques and problems have been produced. Correspondingly, the applications with the intelligent interaction also present a few new characteristics, which are changing our original understanding of the visual recognition and interaction. We give a survey on image recognition techniques, covering recent advances in regarding to visual recognition, visual description, visual question and answering (VQA). Specifically, we first focus on the deep learning approaches for image recognition and scene classification. Next, the latest techniques in visual description and VQA are analyzed and discussed. Then we introduce visual recognition and interaction applications in mobile devices and robots. Finally, we discuss future research directions in this field.

Key words: image recognition, intelligent visual recognition, intelligent interaction, visual description, visual question and answering (VQA), deep learning