Abstract:
With the rapid development of multimedia and Internet technologies, a large amount of multimedia data has been rapidly emerging, such as image, video, text and audio. Data of different media types from multi-source is heterogeneous in the form but relevant in the semantic. As indicated in the research of cognitive science, the perception and cognition of the environment is through the fusion across different sensory organs of human, which is decided by the human brain’s organization structure. Therefore, it has been a key challenge to perform data semantic analysis and correlation modeling across different media types, for achieving comprehensive multimedia content understanding, which has drawn wide interests of both academic and industrial areas. In this paper, the basic concepts, representative methods and research status of 5 latest highlighting research topics of multimedia content understanding are referred, including fine-grained image classification and retrieval, video classification and object detection, cross-media retrieval, visual description and generation, and visual question answering. This paper further presents the major challenges of multimedia content understanding, as well as gives the development trend in the future. The goal of this paper is to help readers get a comprehensive understanding on the research status of multimedia content understanding, draw more attention of researchers to relevant research topics, and provide the technical insights to promote further development of this area.