Abstract:
In recent years, the emergence and development of large language models (LLMs) have revolutionized the field of natural language processing and even artificial intelligence. With the increasing number of model parameters and training data, the perplexity of language models decreases in a predictable manner, which implies the improvement of performance on various natural language processing tasks. Therefore, scaling up language models has been a promising way to improve the system intelligence. In this survey, we first review the definition and scope of LLMs and provide a scale standard to distinguish “large” language models from the perspectives of performance and computing. Then, we review the development and representative work of LLMs in three dimensions: data, algorithm, and model architecture, showing how up-scaling in these dimensions drives the development of LLMs at different stages. Next, we discuss the emergent abilities of LLMs and possible interpretations behind them. We highlight three key emergent abilities, i.e., chain-of-thought prompting, in-context learning, and instruction-following, introducing their related advances and applications. Finally, we outline some potential directions and challenges of LLMs.