大型语言模型：原理、实现与发展

舒文韬; 李睿潇; 孙天祥; 黄萱菁; 邱锡鹏

doi:10.7544/issn1000-1239.202330303

大型语言模型：原理、实现与发展

Large Language Models: Principles, Implementation, and Progress

摘要

摘要: 近年来，大型语言模型的出现和发展对自然语言处理和人工智能领域产生了变革性影响. 随着不断增大模型参数量和训练数据量，语言模型的文本建模困惑度以可预测的形式降低，在各类自然语言处理任务上的表现也持续提升. 因此，增加语言模型的参数和数据规模成为提升系统智能水平富有前景的途径. 首先回顾了大型语言模型的基本定义，从模型表现和算力需求的角度给出了“大型”语言模型的界定标准. 其次，从数据、算法、模型3个维度梳理了大型语言模型的发展历程及规律，展示了不同阶段各个维度的规模化如何推动语言模型的发展. 接着，考察了大型语言模型所表现出的涌现能力，介绍了思维链、情景学习和指令遵循等关键涌现能力的相关研究和应用现状. 最后，展望了大型语言模型的未来发展和技术挑战.

Abstract: In recent years, the emergence and development of large language models (LLMs) have revolutionized the field of natural language processing and even artificial intelligence. With the increasing number of model parameters and training data, the perplexity of language models decreases in a predictable manner, which implies the improvement of performance on various natural language processing tasks. Therefore, scaling up language models has been a promising way to improve the system intelligence. In this survey, we first review the definition and scope of LLMs and provide a scale standard to distinguish “large” language models from the perspectives of performance and computing. Then, we review the development and representative work of LLMs in three dimensions: data, algorithm, and model architecture, showing how up-scaling in these dimensions drives the development of LLMs at different stages. Next, we discuss the emergent abilities of LLMs and possible interpretations behind them. We highlight three key emergent abilities, i.e., chain-of-thought prompting, in-context learning, and instruction-following, introducing their related advances and applications. Finally, we outline some potential directions and challenges of LLMs.

HTML全文

参考文献(46)

施引文献

资源附件(0)