Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and Mitigations

Li Xu; Zhu Rui; Chen Xiaolei; Wu Jinxuan; Zheng Yi; Lai Chenghang; Liang Yuxuan; Li Bin; Xue Xiangyang

doi:10.7544/issn1000-1239.202440444

Li Xu, Zhu Rui, Chen Xiaolei, Wu Jinxuan, Zheng Yi, Lai Chenghang, Liang Yuxuan, Li Bin, Xue Xiangyang. Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and MitigationsJ. Journal of Computer Research and Development, 2025, 62(12): 2929-2950. DOI: 10.7544/issn1000-1239.202440444

Citation:

Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and Mitigations

Graphical Abstract

Graphical Abstract

Abstract

Abstract

LVLMs (large vision-language models) represent a significant advancement in the intersection of natural language processing and computer vision. By integrating pre-trained visual encoders, vision-language adapters, and large language models, LVLMs can understand both visual and textual information, and generate responses in natural language, making them suitable for a range of downstream vision-language tasks such as image captioning and visual question answering. However, these models commonly exhibit hallucinations — generating inaccurate perceptions of image contents. Such hallucinations significantly limit the application of LVLMs in high-stakes domains like medical image diagnosis and autonomous driving. This survey aims to systematically organize and analyze the causes, evaluations, and mitigation strategies of hallucinations to guide research in the field and enhance the safety and reliability of LVLMs in practical applications. It begins with an introduction to the basic concepts of LVLMs and the definition and classification of hallucinations within them. It then explores the causes of hallucinations from four perspectives: training data, training task, visual encoding, and text generation, while also discussing the interactions among these factors. Following this, it discusses mainstream benchmarks for assessing LVLM hallucinations in terms of task setting, data construction, and assessment metrics. Additionally, it examines hallucination mitigating techniques across five aspects: training data, visual perception, training strategy, model inference, and post-hoc corrections. Finally, the review provides directions for future research in the areas of cause analysis, evaluation, and mitigation of hallucinations in LVLMs.

FullText(HTML)

References (128)

Cited By

Turn off MathJax

Article Contents

Survey of Hallucinations in Large Vision-Language Models: Causes, Evaluations and Mitigations

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content