高级检索
    陈豪伶, 虞慧群, 范贵生, 李明辰, 黄子杰. 基于分层表示和上下文增强的类摘要生成技术[J]. 计算机研究与发展, 2024, 61(2): 307-323. DOI: 10.7544/issn1000-1239.202330730
    引用本文: 陈豪伶, 虞慧群, 范贵生, 李明辰, 黄子杰. 基于分层表示和上下文增强的类摘要生成技术[J]. 计算机研究与发展, 2024, 61(2): 307-323. DOI: 10.7544/issn1000-1239.202330730
    Chen Haoling, Yu Huiqun, Fan Guisheng, Li Mingchen, Huang Zijie. Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement[J]. Journal of Computer Research and Development, 2024, 61(2): 307-323. DOI: 10.7544/issn1000-1239.202330730
    Citation: Chen Haoling, Yu Huiqun, Fan Guisheng, Li Mingchen, Huang Zijie. Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement[J]. Journal of Computer Research and Development, 2024, 61(2): 307-323. DOI: 10.7544/issn1000-1239.202330730

    基于分层表示和上下文增强的类摘要生成技术

    Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement

    • 摘要: 代码摘要是源代码的自然语言解释,高质量的代码摘要有助于提高开发人员程序理解效率. 近年来,代码自动摘要的研究集中在为方法粒度的代码片段生成摘要. 然而,对于面向对象的语言,例如Java,类才是项目的基本组成单元. 基于上述问题,提出一种基于分层表示和上下文增强的类摘要生成方法HRCE(hierarchical representation and context enhancement),并构建了一个包含358992个\langle Java类,上下文,摘要\rangle 数据对的类摘要数据集. HRCE使用代码精简策略去除类的非关键代码,从而缩短代码长度. 然后,对类的层次结构,包括类签名、属性和方法分别进行建模,获得类的语义信息和层次结构信息. 此外,从项目中抽取父类的签名及摘要来刻画类在项目中依赖的上下文. 实验表明,基于分层表示和上下文增强的类摘要生成模型能够表征代码的语义和层次结构,并可以从目标类的内部和外部获取信息. HRCE在BLEU,METEOR,ROUGE-L等评估指标上超过了所有基准模型.

       

      Abstract: Code summarization is a natural language description of source code, and high-quality code summaries help to improve developers’ program understanding efficiency. In recent years, research on code summarization has focused on generating summaries for method-grained code snippet. However, in an object-oriented language such as Java, class is the basic programming unit. Due to the above problems, we propose a class summarization generation method based on hierarchical representation and context enhancement, called HRCE, as well as constructs a class summarization dataset containing 358992 pairs of <Java class, content, summary>. HRCE uses code simplification strategy to remove non-critical code of class to shorten the code length. Then, HRCE models the class hierarchy, including class signature, attribute and method respectively, to obtain the semantic information and hierarchical structure information of the class. In addition, HRCE selects parent’s class signature and class summary to describe the context that the class depends on in the project. Experiments show that a generative model for class summarization based on hierarchical representation and context enhancement is able to characterize the semantics and hierarchical structure of the code, and obtain information from both inside and outside of the target class. As a result, HRCE outperforms all baseline models on evaluation metrics such as BLEU, METEOR, ROUGE-L, etc.

       

    /

    返回文章
    返回