高级检索

    多代理协作实现缺陷报告补全和优化

    A Multi-agent Collaboration for Completing and Optimizing Bug Reports

    • 摘要: 缺陷报告是开发者定位和修复缺陷的关键依据,其质量直接影响软件维护效率。尽管已有研究证明高质量报告可显著缩短修复时间,但开源项目中仍普遍存在信息残缺问题。现有基于机器学习和大语言模型(LLM)的自动补全方法虽能提升完整性,但仍存在明显不足:传统方法通过检索和拼接相似报告片段,易导致语义断裂和逻辑不一致;LLM生成内容虽流畅,却可能存在事实性幻觉。受人类专家“分阶段处理、多角色协同”的问题解决机制启发,提出一种多代理协作机制实现缺陷报告补全和优化。通过3个关键设计实现高质量补全:1)将补全任务分解为缺陷分析、报告补全和质量评估3个阶段,由不同代理负责以降低认知负荷;2)设计结构化提示模板,精准引导LLM扮演领域专家角色,确保各阶段输出准确;3)引入动态反馈机制,通过多轮迭代实现代理间的交叉验证和协同优化,有效控制语义漂移问题,确保逻辑连贯且补全内容与事实一致。在4个公开数据集上的实验表明,在BLEU,Sentence-BERT,ROUGE-L和METEOR指标上较基线分别提升10.41%,7.52%,13.55%和16.64%。人工评估进一步证实,所提方法补全的报告在完整性、清晰性与帮助性上均显著优于现有方法,可为开源社区缺陷治理提供可靠支持。

       

      Abstract: Bug reports serve as a critical foundation for developers to identify and resolve software bugs, with their quality directly influencing the efficiency of software maintenance. While prior research has demonstrated that high-quality bug reports can significantly reduce repair time, information incompleteness remains a prevalent issue in open-source projects. Although current approaches leveraging machine learning and large language models (LLMs) for automatic report completion improve content completeness, they exhibit notable limitations: traditional retrieval-based methods often concatenate fragments from similar reports, leading to semantic discontinuities and logical inconsistencies; while LLM-generated content tends to be fluent, it may introduce factual hallucinations. Inspired by the human expert practice of “phased processing and multi-role collaboration”, this paper proposes a novel multi-agent collaborative framework for bug report completion and optimization. Our approach ensures high-quality output through three key design principles: 1) Decomposing the completion task into three distinct phases—bug analysis, report completion, and quality assessment—each managed by a dedicated agent, thereby reducing the cognitive burden on individual models; 2) Employing structured prompt templates to precisely guide LLMs in assuming specialized roles as domain expert agents (e.g., analyst, completer, reviewer), clearly defining responsibilities at each stage and enhancing output accuracy; 3) Incorporating a dynamic feedback mechanism that enables iterative cross-validation and collaborative refinement among agents, effectively mitigating semantic drift and ensuring both logical coherence and factual consistency in the final output. Extensive experiments on four public datasets demonstrate that our method outperforms baseline approaches, achieving improvements of 10.41%, 7.52%, 13.55%, and 16.64% on BLEU, Sentence-BERT, ROUGE-L, and METEOR scores, respectively. Furthermore, manual evaluation confirms that the completed reports exhibit superior completeness, clarity, and practical utility compared to existing methods, offering robust support for bug management in open-source communities.

       

    /

    返回文章
    返回