大语言模型幻觉检测方法综述

李自拓; 孙建彬; 陈广州; 方馨悦; 崔瑞靖; 田植良; 黄震; 杨克巍

doi:10.7544/issn1000-1239.202550069

大语言模型幻觉检测方法综述

Survey of Hallucination Detection Methods for Large Language Models

摘要

摘要: 近年来，大语言模型（large language models，LLMs）在自然语言处理（natural language processing，NLP）等领域取得了显著进展，展现出强大的语言理解与生成能力。然而，在实际应用过程中，大语言模型仍然面临诸多挑战。其中，幻觉（hallucination）问题引起了学术界和工业界的广泛关注。如何有效检测大语言模型幻觉，成为确保其在文本生成等下游任务可靠、安全、可信应用的关键挑战。该研究着重对大语言模型幻觉检测方法进行综述：首先，介绍了大语言模型概念，进一步明确了幻觉的定义与分类，系统梳理了大语言模型从构建到部署应用全生命周期各环节的特点，并深入分析了幻觉的产生机制与诱因；其次，立足于实际应用需求，考虑到在不同任务场景下模型透明度的差异等因素，将幻觉检测方法划分为针对白盒模型和黑盒模型2类，并进行了重点梳理和深入对比；而后，分析总结了现阶段主流的幻觉检测基准，为后续开展幻觉检测奠定基础；最后，指出了大语言模型幻觉检测的各种潜在研究方法和新的挑战。

Abstract: In recent years, large language models (LLMs) have made significant strides in the field of natural language processing (NLP), demonstrating impressive capabilities in both language understanding and generation. However, despite these advancements, LLMs still face numerous challenges in practical applications. One such issue that has garnered extensive attention from both the academic and industrial communities is the problem of hallucinations. Effectively detecting hallucinations in large language models is a critical challenge for ensuring their reliable, secure, and trustworthy application in downstream tasks such as text generation. We provide a comprehensive review of methods for detecting hallucinations in large language models. Firstly, we introduce the concept of large language models and clarify the definition and classification of hallucinations. Then we systematically examine the characteristics of LLMs throughout their entire lifecycle, from construction to deployment, and delve into the mechanisms and causes of hallucinations. Secondly, based on practical application requirements and considering factors such as model transparency in different task scenarios, we categorize hallucination detection methods into two types: those for white-box models and those for black-box models. A focused review and in-depth comparison of these methods are provided. Then we analyze and summarize the current mainstream benchmarks for hallucination detection, laying a foundation for future research in this area. Finally, we identify various potential research directions and new challenges in the detection of hallucinations in large language models.

HTML全文

参考文献(177)

施引文献

资源附件(0)