Advanced Search
    Li Xu, Zhu Rui, Chen Xiaolei, Wu Jinxuan, Zheng Yi, Lai Chenghang, Liang Yuxuan, Li Bin, Xue Xiangyang. A Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and Mitigations[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440444
    Citation: Li Xu, Zhu Rui, Chen Xiaolei, Wu Jinxuan, Zheng Yi, Lai Chenghang, Liang Yuxuan, Li Bin, Xue Xiangyang. A Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and Mitigations[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440444

    A Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and Mitigations

    • LVLMs (Large Vision-Language Models) represent a significant advancement in the intersection of natural language processing and computer vision. By integrating pre-trained visual encoders, vision-language adapters, and large language models, LVLMs can understand both visual and textual information, and generate responses in natural language, making them suitable for a range of downstream vision-language tasks such as image captioning and visual question answering. However, these models commonly exhibit hallucinations — generating inaccurate perceptions of image contents. Such hallucinations significantly limit the application of LVLMs in high-stakes domains like medical image diagnosis and autonomous driving. This survey aims to systematically organize and analyze the causes, evaluations, and mitigation strategies of hallucinations to guide research in the field and enhance the safety and reliability of LVLMs in practical applications. It begins with an introduction to the basic concepts of LVLMs and the definition and classification of hallucinations within them. It then explores the causes of hallucinations from four perspectives: training data, training task, visual encoding, and text generation, while also discussing the interactions among these factors. Following this, it discusses mainstream benchmarks for assessing LVLM hallucinations in terms of task setting, data construction, and assessment metrics. Additionally, it examines hallucination mitigating techniques across five aspects: training data, visual perception, training strategy, model inference, and post-hoc corrections. Finally, the review provides directions for future research in the areas of cause analysis, evaluation, and mitigation of hallucinations in LVLMs.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return