Abstract:
Software systems play an indispensable role across various industries, handling large-scale and high-density data. However, the numerous defects within these systems have troubled developers for a long time, constantly threatening the security of data elements. Automated Program Repair (APR) technology aims to assist developers in automatically fixing defects in code during software development process, thereby saving costs in software system development and maintenance, enhancing the confidentiality, availability, and integrity of data elements within software systems. With the development of Large Language Model (LLM) technology, many powerful code large language models have emerged. These models have demonstrated strong repair capabilities in the APR field, while also addressing shortcomings of traditional approaches in code comprehension and patch generation capabilities, further elevating the level of program repair tools. We thoroughly survey high-quality papers related to APR in recent years, summarizing the latest developments in the field. We then systematically categorize two types of LLM-based APR techniques: cloze style and neural machine translation style. We also conduct an in-depth comparison from various perspectives such as model usage, model size, types of defects repaired, programming languages involved, and the pros and cons of repair approaches. Additionally, we discuss the widely adopted APR datasets and metrics, and outline existing empirical studies. Finally, we summarize current challenges in the APR field along with future research directions.