高级检索

    基于大语言模型的自动代码修复综述

    Survey of Large-Language-Model-Based Automated Program Repair

    • 摘要: 软件系统在各行各业中发挥着不可忽视的作用,承载着大规模、高密度的数据,但软件系统中存在的种种缺陷一直以来困扰着系统的开发者,时刻威胁着系统数据要素的安全. 自动代码修复(automated program repair, APR)技术旨在帮助开发者在软件系统的开发过程中自动地修复代码中存在的缺陷,节约软件系统开发和维护成本,提高软件系统中数据要素的保密性、可用性和完整性. 随着大语言模型(large language model, LLM)技术的发展,涌现出许多能力强大的代码大语言模型,并且代码大语言模型在APR领域的应用中表现出了强大的修复能力,弥补了传统方案对于代码理解能力、补丁生成能力方面的不足,进一步提高了代码修复工具的水平. 在综述中,全面调研分析了近年APR相关的高水平论文,总结了APR领域的最新发展,系统归纳了完形填空模式和神经机器翻译模式2类基于LLM的APR技术,并从模型类型、模型规模、修复的缺陷类型、修复的编程语言和修复方案优缺点等角度进行全方位的对比与研讨. 同时,对APR数据集和评价APR修复能力的指标进行了梳理和分析,并且对现有的实证研究展开深入探讨. 最后,分析了当前APR领域存在的挑战及未来的研究方向.

       

      Abstract: Software systems play an indispensable role across various industries, handling large-scale and high-density data. However, the numerous defects within these systems have troubled developers for a long time, constantly threatening the security of data elements. Automated Program Repair (APR) technology aims to assist developers in automatically fixing defects in code during software development process, thereby saving costs in software system development and maintenance, enhancing the confidentiality, availability, and integrity of data elements within software systems. With the development of Large Language Model (LLM) technology, many powerful code large language models have emerged. These models have demonstrated strong repair capabilities in the APR field, while also addressing shortcomings of traditional approaches in code comprehension and patch generation capabilities, further elevating the level of program repair tools. We thoroughly survey high-quality papers related to APR in recent years, summarizing the latest developments in the field. We then systematically categorize two types of LLM-based APR techniques: cloze style and neural machine translation style. We also conduct an in-depth comparison from various perspectives such as model usage, model size, types of defects repaired, programming languages involved, and the pros and cons of repair approaches. Additionally, we discuss the widely adopted APR datasets and metrics, and outline existing empirical studies. Finally, we summarize current challenges in the APR field along with future research directions.

       

    /

    返回文章
    返回