• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

图像描述生成研究进展

李志欣, 魏海洋, 张灿龙, 马慧芳, 史忠植

李志欣, 魏海洋, 张灿龙, 马慧芳, 史忠植. 图像描述生成研究进展[J]. 计算机研究与发展, 2021, 58(9): 1951-1974. DOI: 10.7544/issn1000-1239.2021.20200281
引用本文: 李志欣, 魏海洋, 张灿龙, 马慧芳, 史忠植. 图像描述生成研究进展[J]. 计算机研究与发展, 2021, 58(9): 1951-1974. DOI: 10.7544/issn1000-1239.2021.20200281
Li Zhixin, Wei Haiyang, Zhang Canlong, Ma Huifang, Shi Zhongzhi. Research Progress on Image Captioning[J]. Journal of Computer Research and Development, 2021, 58(9): 1951-1974. DOI: 10.7544/issn1000-1239.2021.20200281
Citation: Li Zhixin, Wei Haiyang, Zhang Canlong, Ma Huifang, Shi Zhongzhi. Research Progress on Image Captioning[J]. Journal of Computer Research and Development, 2021, 58(9): 1951-1974. DOI: 10.7544/issn1000-1239.2021.20200281
李志欣, 魏海洋, 张灿龙, 马慧芳, 史忠植. 图像描述生成研究进展[J]. 计算机研究与发展, 2021, 58(9): 1951-1974. CSTR: 32373.14.issn1000-1239.2021.20200281
引用本文: 李志欣, 魏海洋, 张灿龙, 马慧芳, 史忠植. 图像描述生成研究进展[J]. 计算机研究与发展, 2021, 58(9): 1951-1974. CSTR: 32373.14.issn1000-1239.2021.20200281
Li Zhixin, Wei Haiyang, Zhang Canlong, Ma Huifang, Shi Zhongzhi. Research Progress on Image Captioning[J]. Journal of Computer Research and Development, 2021, 58(9): 1951-1974. CSTR: 32373.14.issn1000-1239.2021.20200281
Citation: Li Zhixin, Wei Haiyang, Zhang Canlong, Ma Huifang, Shi Zhongzhi. Research Progress on Image Captioning[J]. Journal of Computer Research and Development, 2021, 58(9): 1951-1974. CSTR: 32373.14.issn1000-1239.2021.20200281

图像描述生成研究进展

基金项目: 国家自然科学基金项目(61966004,61663004,61866004,61762078);广西自然科学基金项目(2019GXNSFDA245018,2018GXNSFDA281009,2017GXNSFAA198365)
详细信息
  • 中图分类号: TP391

Research Progress on Image Captioning

Funds: This work was supported by the National Natural Science Foundation of China (61966004, 61663004, 61866004, 61762078) and the Guangxi Natural Science Foundation (2019GXNSFDA245018, 2018GXNSFDA281009, 2017GXNSFAA198365).
  • 摘要: 图像描述生成结合了计算机视觉和自然语言处理2个研究领域,不仅要求完备的图像语义理解,还要求复杂的自然语言表达,是进一步研究符合人类感知的视觉智能的关键任务.对图像描述生成的研究进展做了回顾.首先,归纳分析了当前基于深度学习的图像描述生成方法涉及的5个关键技术,包括整体架构、学习策略、特征映射、语言模型和注意机制.然后,按照发展进程将现有的图像描述生成方法分为四大类,即基于模板的方法、基于检索的方法、基于编码器-解码器架构的方法和基于复合架构的方法,并阐述了各类方法的基本概念、代表性方法和研究现状,重点讨论了基于编码器-解码器架构的各种方法及其创新思路,如多模态空间、视觉空间、语义空间、注意机制、模型优化等.接着,从实验的角度给出图像描述生成的常用数据集和评估措施,并在2个基准数据集上比较了一些典型方法的性能.最后,以提升图像描述的准确性、完整性、新颖性、多样性为依据,展示了图像描述生成的未来发展趋势.
    Abstract: Image captioning combines the two research fields of computer vision and natural language processing. It requires not only complete image semantic understanding, but also complex natural language expression. It is a crucial task for further research on visual intelligence in line with human perception. This paper reviews the research progress on image captioning. Firstly, five key technologies involved in current deep learning based image captioning methods are summarized and analyzed, including overall architecture, learning strategy, feature mapping, language model and attention mechanism. Then, according to the development process, the existing image captioning methods are divided into four categories, i.e. template based methods, retrieval based methods, encoder-decoder architecture based methods and compositional architecture based methods. We describe the basic concepts, representative methods and research status of each category. Furthermore, we emphatically discuss the various methods based on encoder-decoder architecture and their innovative ideas, such as multimodal space, visual space, semantic space, attention mechanism, model optimization, and so on. Subsequently, from the experimental point of view, we show the common benchmark datasets and evaluation measures in the field of image captioning. In addition, we compare the performance of some typical methods on two benchmark datasets. Finally, based on improving the accuracy, integrity, novelty and diversity of image caption, several future development trends of image captioning are presented.
计量
  • 文章访问数:  927
  • HTML全文浏览量:  8
  • PDF下载量:  433
  • 被引次数: 0
出版历程
  • 发布日期:  2021-08-31

目录

    /

    返回文章
    返回