Abstract:
Code summarization is a natural language description of source code, and high-quality code summaries help to improve developers’ program understanding efficiency. In recent years, research on code summarization has focused on generating summaries for method-grained code snippet. However, in an object-oriented language such as Java, class is the basic programming unit. Due to the above problems, we propose a class summarization generation method based on hierarchical representation and context enhancement, called HRCE, as well as constructs a class summarization dataset containing 358992 pairs of <Java class, content, summary>. HRCE uses code simplification strategy to remove non-critical code of class to shorten the code length. Then, HRCE models the class hierarchy, including class signature, attribute and method respectively, to obtain the semantic information and hierarchical structure information of the class. In addition, HRCE selects parent’s class signature and class summary to describe the context that the class depends on in the project. Experiments show that a generative model for class summarization based on hierarchical representation and context enhancement is able to characterize the semantics and hierarchical structure of the code, and obtain information from both inside and outside of the target class. As a result, HRCE outperforms all baseline models on evaluation metrics such as BLEU, METEOR, ROUGE-L, etc.