Abstract:
In recent years, large language models (LLMs) have made significant strides in the field of natural language processing (NLP), demonstrating impressive capabilities in both language understanding and generation. However, despite these advancements, LLMs still face numerous challenges in practical applications. One such issue that has garnered extensive attention from both the academic and industrial communities is the problem of hallucinations. Effectively detecting hallucinations in large language models is a critical challenge for ensuring their reliable, secure, and trustworthy application in downstream tasks such as text generation. We provide a comprehensive review of methods for detecting hallucinations in large language models. Firstly, we introduce the concept of large language models and clarify the definition and classification of hallucinations. Then we systematically examine the characteristics of LLMs throughout their entire lifecycle, from construction to deployment, and delve into the mechanisms and causes of hallucinations. Secondly, based on practical application requirements and considering factors such as model transparency in different task scenarios, we categorize hallucination detection methods into two types: those for white-box models and those for black-box models. A focused review and in-depth comparison of these methods are provided. Then we then analyze and summarize the current mainstream benchmarks for hallucination detection, laying a foundation for future research in this area. Finally, we identify various potential research directions and new challenges in the detection of hallucinations in large language models.