ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (1): 183-208.doi: 10.7544/issn1000-1239.2019.20180770

• 综述 • 上一篇    下一篇

多媒体内容理解的研究现状与展望

彭宇新,綦金玮,黄鑫   

  1. (北京大学计算机科学技术研究所 北京 100871) (pengyuxin@pku.edu.cn)
  • 出版日期: 2019-01-01
  • 基金资助: 
    国家自然科学基金项目(61771025,61532005)

Current Research Status and Prospects on Multimedia Content Understanding

Peng Yuxin, Qi Jinwei, Huang Xin   

  1. (Institute of Computer Science and Technology, Peking University, Beijing 100871)
  • Online: 2019-01-01

摘要: 随着多媒体和网络技术的迅猛发展,海量的图像、视频、文本、音频等多媒体数据快速涌现.这些不同媒体的数据在形式上多源异构,语义上相互关联.认知科学研究表明,人脑生理组织结构决定了其对外界的感知和认知过程是跨越多种感官信息的融合处理.如何对不同媒体的数据进行语义分析和关联建模以实现多媒体内容理解,成为了一个研究和应用的关键问题,受到了学术界和工业界的广泛关注.选取了多媒体内容理解的5个最新热点研究方向:图像细分类与检索、视频分类与目标检测、跨媒体检索、视觉描述与生成、视觉问答,分别阐述了它们的基本概念、代表性方法、研究现状等,并进一步阐述了多媒体内容理解面临的重要挑战,同时给出未来的发展趋势,旨在帮助读者全面了解多媒体内容理解的研究现状,吸引更多研究人员投身相关研究并为他们提供技术参考,推动该领域的进一步发展.

关键词: 多媒体内容理解, 图像细分类与检索, 视频分类与目标检测, 跨媒体检索, 视觉描述与生成, 视觉问答

Abstract: With the rapid development of multimedia and Internet technologies, a large amount of multimedia data has been rapidly emerging, such as image, video, text and audio. Data of different media types from multi-source is heterogeneous in the form but relevant in the semantic. As indicated in the research of cognitive science, the perception and cognition of the environment is through the fusion across different sensory organs of human, which is decided by the human brain’s organization structure. Therefore, it has been a key challenge to perform data semantic analysis and correlation modeling across different media types, for achieving comprehensive multimedia content understanding, which has drawn wide interests of both academic and industrial areas. In this paper, the basic concepts, representative methods and research status of 5 latest highlighting research topics of multimedia content understanding are referred, including fine-grained image classification and retrieval, video classification and object detection, cross-media retrieval, visual description and generation, and visual question answering. This paper further presents the major challenges of multimedia content understanding, as well as gives the development trend in the future. The goal of this paper is to help readers get a comprehensive understanding on the research status of multimedia content understanding, draw more attention of researchers to relevant research topics, and provide the technical insights to promote further development of this area.

Key words: multimedia content understanding, fine-grained image classification and retrieval, video classification and object detection, cross-media retrieval, visual description and generation, visual question answering

中图分类号: