Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement

Chen Haoling; Yu Huiqun; Fan Guisheng; Li Mingchen; Huang Zijie

doi:10.7544/issn1000-1239.202330730

Journal of Computer Research and Development > 2024 > 61(2): 307-323. > DOI: 10.7544/issn1000-1239.202330730 CSTR: 32373.14.issn1000-1239.202330730

Chen Haoling, Yu Huiqun, Fan Guisheng, Li Mingchen, Huang Zijie. Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement[J]. Journal of Computer Research and Development, 2024, 61(2): 307-323. DOI: 10.7544/issn1000-1239.202330730

Citation:

PDF (1546 KB)

Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237

Funds: This work was supported by the National Natural Science Foundation of China (62372174, 62276097) and the Research Programme of National Engineering Laboratory for Big Data Distribution and Exchange Technologies, the Shanghai Municipal Special Fund Program for Promoting High Quality Development (2021-GYHLW-01007).

More Information

Author Bio:
Chen Haoling: born in 1999. Master candidate. Student member of CCF. Her main research interests include automatic code summarization and program comprehension

Yu Huiqun: born in 1967. PhD, professor, PhD supervisor. Senior member of CCF. His main research interests include software engineering, trusted computing, cloud computing, and formal methods

Fan Guisheng: born in 1980. PhD, associate research fellow, PhD supervisor. Member of CCF. His main research interests include software engineering, service computing, and software architecture analysis techniques

Li Mingchen: born in 1998. PhD candidate. His main research interests include automatic code summarization and program comprehension

Huang Zijie: born in 1994. PhD candidate. Student member of CCF. His main research interests include code smell, software quality assurance, program comprehension, and empirical software engineering
Received Date: September 10, 2023
Revised Date: December 04, 2023
Available Online: December 20, 2023

Graphical Abstract

Abstract

Abstract

Code summarization is a natural language description of source code, and high-quality code summaries help to improve developers’ program understanding efficiency. In recent years, research on code summarization has focused on generating summaries for method-grained code snippet. However, in an object-oriented language such as Java, class is the basic programming unit. Due to the above problems, we propose a class summarization generation method based on hierarchical representation and context enhancement, called HRCE, as well as constructs a class summarization dataset containing 358992 pairs of <Java class, content, summary>. HRCE uses code simplification strategy to remove non-critical code of class to shorten the code length. Then, HRCE models the class hierarchy, including class signature, attribute and method respectively, to obtain the semantic information and hierarchical structure information of the class. In addition, HRCE selects parent’s class signature and class summary to describe the context that the class depends on in the project. Experiments show that a generative model for class summarization based on hierarchical representation and context enhancement is able to characterize the semantics and hierarchical structure of the code, and obtain information from both inside and outside of the target class. As a result, HRCE outperforms all baseline models on evaluation metrics such as BLEU, METEOR, ROUGE-L, etc.
- automatic code summarization,
- hierarchical representation,
- context enhancement,
- deep learning,
- class summarization

FullText(HTML)

References (56)

References

[1]	Xia Xin, Bao Lingfeng, Lo D, et al. Measuring program comprehension: A large-scale field study with professionals[J]. IEEE Transactions on Software Engineering, 2017, 44(10): 951−976
[2]	Haiduc S, Aponte J, Moreno L, et al. On the use of automated text summarization techniques for summarizing source code[C] //Proc of the 17th Working Conf on Reverse Engineering. Piscataway, NJ: IEEE, 2010: 35−44
[3]	Haiduc S, Aponte J, Marcus A, et al. Supporting program comprehension with source code summarization[C] //Proc of the 32nd ACM/IEEE Int Conf on Software Engineering. New York: ACM, 2010: 223−226
[4]	Eddy B P, Robinson J A, Kraft N A, et al. Evaluating source code summarization techniques: Replication and expansion[C] //Proc of the 21st Int Conf on Program Comprehension. Piscataway, NJ: IEEE, 2013: 13−22
[5]	McBurney P W, Liu C, McMillan C, et al. Improving topic model source code summarization[C] //Pro of the 22nd Int Conf on Program Comprehension. New York: ACM, 2014: 291−294
[6]	Movshovitz-Attias D, Cohen W. Natural language models for predicting programming comments[C] //Proc of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2013: 35−40
[7]	Wong E, Yang Jinqiu, Tan Lin. Autocomment: Mining question and answer sites for automatic comment generation[C] //Proc of the 28th IEEE/ACM Int Conf on Automated Software Engineering. Piscataway, NJ: IEEE, 2013: 562−567
[8]	Sridhara G, Hill E, Muppaneni D, et al. Towards automatically generating summary comments for Java methods[C] //Proc of the 25th IEEE/ACM Int Conf on Automated Software Engineering. New York: ACM, 2010: 43−52
[9]	Sridhara G, Pollock L, Vijay-Shanker K. Generating parameter comments and integrating with method summaries[C] //Proc of the 19th Int Conf on Program Comprehension. Piscataway, NJ: IEEE, 2011: 71−80
[10]	Sridhara G, Pollock L, Vijay-Shanker K. Automatically detecting and describing high level actions within methods[C] //Proc of the 33rd Int Conf on Software Engineering. New York: ACM, 2011: 101−110
[11]	Abid N J, Dragan N, Collard M L, et al. Using stereotypes in the automatic generation of natural language summaries for C++ methods[C] //Proc of Int Conf on Software Maintenance and Evolution. Piscataway, NJ: IEEE, 2015: 561−565
[12]	Song Xiaotao, Sun Hailong, Wang Xu, et al. A survey of automatic generation of source code comments: Algorithms and techniques[J]. IEEE Access, 2019, 7: 111411−111428 doi: 10.1109/ACCESS.2019.2931579
[13]	Moreno L, Aponte J, Sridhara G, et al. Automatic generation of natural language summaries for Java classes[C] //Proc of the 21st Int Conf on Program Comprehension. Piscataway, NJ: IEEE, 2013: 23−32
[14]	Li Mingchen, Yu Huiqun, Fan Guisheng, et al. ClassSum: A deep learning model for class-level code summarization[J]. Neural Computing and Applications, 2023, 35(4): 3373−3393 doi: 10.1007/s00521-022-07877-z
[15]	Li Zheng, Wu Yonghao, Peng Bin, et al. Setransformer: A transformer-based code semantic parser for code comment generation[J]. IEEE Transactions on Reliability, 2022, 72(1): 258−273
[16]	Iyer S, Konstas I, Cheung A, et al. Summarizing source code using a neural attention model[C] //Proc of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2016: 2073−2083
[17]	Allamanis M, Peng H, Sutton C. A convolutional attention network for extreme summarization of source code[C] //Proc of the 33rd Int Conf on Machine Learning. New York: ACM, 2016: 2091−2100
[18]	Hu Xing, Li Ge, Xia Xin, et al. Deep code comment generation[C] //Proc of the 26th Conf on Program Comprehension. New York: ACM, 2018: 200−210
[19]	Leclair A, Jiang S, McMillan C. A neural model for generating natural language summaries of program subroutines[C] //Proc of the 41st Int Conf on Software Engineering. New York: ACM, 2019: 795−806
[20]	Hu Xing, Li Ge, Xia Xin, et al. Deep code comment generation with hybrid lexical and syntactical information[J]. Empirical Software Engineering, 2019, 25(3): 2179−2217
[21]	Zhou Ziyi, Yu Huiqun, Fan Guisheng. Effective approaches to combining lexical and syntactical information for code summarization[J]. Software: Practice and Experience, 2020, 50(12): 2313−2336 doi: 10.1002/spe.2893
[22]	Ahmad W U, Chakraborty S, Ray B, et al. A transformer-based approach for source code summarization[J]. arXiv preprint, arXiv: 2005. 00653, 2020
[23]	Alon U, Brody S, Levy O, et al. Code2Seq: Generating sequences from structured representations of code[C/OL] //Proc of the 7th Int Conf on Learning Representations. Amherst, MA: OpenReview. Net, 2019 [2023-11-30].https://openreview.net/forum?id=H1gKYo09tX
[24]	Zhou Yu, Shen Juanjuan, Zhang Xiaoqing, et al. Automatic source code summarization with graph attention networks[J]. Journal of Systems and Software, 2022, 188: 111257 doi: 10.1016/j.jss.2022.111257
[25]	张世琨,谢睿,叶蔚,等. 基于关键词的代码自动摘要[J]. 计算机研究与发展,2020,57(9):1987−2000 doi: 10.7544/issn1000-1239.2020.20190179 Zhang Shikun, Xie Rui, Ye Wei, et al. Keyword-based source code summarization[J]. Journal of Computer Research and Development, 2020, 57(9): 1987−2000 (in Chinese) doi: 10.7544/issn1000-1239.2020.20190179
[26]	Zhou Ziyi, Yu Huiqun, Fan Guisheng, et al. Summarizing source code with hierarchical code representation[J]. Information and Software Technology, 2022, 143: 106761 doi: 10.1016/j.infsof.2021.106761
[27]	Wang Wenhua, Zhang Yuqun, Sui Yulei, et al. Reinforcement-learning-guided source code summarization using hierarchical attention[J]. IEEE Transactions on Software Engineering, 2020, 48(1): 102−119
[28]	Lin Chen, Ouyang Zhichao, Zhuang Junqing, et al. Improving code summarization with block-wise abstract syntax tree splitting[C] //Proc of the 2021 IEEE/ACM 29th Int Conf on Program Comprehension. Los Alamitos, CA: IEEE Computer Society, 2021: 184−195
[29]	Shi Ensheng, Wang Youlin, Du Lun, et al. Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees[C] //Proc of the 2021 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 4053−4062
[30]	Mcburney P W, McMillan C. Automatic source code summarization of context for Java methods[J]. IEEE Transactions on Software Engineering, 2016, 42(2): 103−119 doi: 10.1109/TSE.2015.2465386
[31]	Hill E, Pollock L, Vijay-Shanker K. Automatically capturing source code context of NL-queries for software maintenance and reuse[C] //Proc of the 31st Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2009: 232−242
[32]	Yu Xiaohan, Huang Quzhe, Wang Zheng, et al. Towards context-aware code comment generation[C] //Proc of Findings of the Association for Computational Linguistics: EMNLP. Stroudsburg, PA: ACL, 2020: 3938−3947
[33]	Wang Yanlin, Shi Ensheng, Du Lun, et al. Cocosum: Contextual code summarization with multi-relational graph neural network[J]. arXiv preprint, arXiv: 2107. 01933, 2021
[34]	Haque S, Leclair A, Wu Lingfei, et al. Improved automatic summarization of subroutines via attention to file context[C] //Proc of the 17th Int Conf on Mining Software Repositories. New York: ACM, 2020: 300−310
[35]	Bansal A, Haque S, McMillan C. Project-level encoding for neural source code summarization of subroutines[C] //Proc of the 29th Int Conf on Program Comprehension. Piscataway, NJ: IEEE, 2021: 253−264
[36]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810. 04805, 2018
[37]	Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9
[38]	Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 1877−1901
[39]	Ouyang Long, Wu J, Jiang Xu, et al. Training language models to follow instructions with human feedback[J]. Advances in Neural Information Processing Systems, 2022, 35: 27730−27744
[40]	Feng Zhangyin, Guo Daya, Tang Duyu, et al. CodeBERT: A pre-trained model for programming and natural languages[J]. arXiv preprint, arXiv: 2002. 08155, 2020
[41]	Wang Yue, Wang Weishi, Joty S, et al. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation[J]. arXiv preprint, arXiv: 2109. 00859, 2021
[42]	Guo Daya, Ren Shuo, Lu Shuai, et al. GraphCodeBERT: Pre-training code representations with data flow[J]. arXiv preprint, arXiv: 2009. 08366, 2020
[43]	OpenAI. GPT-4 technical report[J]. arXiv preprint, arXiv: 2303. 08774, 2023
[44]	Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code[J]. arXiv preprint, arXiv: 2107. 03374, 2021
[45]	Roziere B, Gehring J, Gloeckle F, et al. Code Llama: Open foundation models for code[J]. arXiv preprint, arXiv: 2308. 12950, 2023
[46]	Luo Ziyang, Xu Can, Zhao Pu, et al. WizardCoder: Empowering code large language models with evol-Instruct[J]. arXiv preprint, arXiv: 2306. 08568, 2023
[47]	Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[J]. Advances in Neural Information Processing Systems, 2014, 27: 3104−3112
[48]	Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735−1780 doi: 10.1162/neco.1997.9.8.1735
[49]	Ge Fan, Kuang Li. Keywords guided method name generation[C] //Proc of the 29th Int Conf on Program Comprehension. Piscataway, NJ: IEEE, 2021: 196−206
[50]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C] //Proc of the Annual Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2017: 5998−6008
[51]	Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C] //Proc of the 40th Annual Meeting of the Association for Computational Linguistics. New York: ACM, 2002: 311−318
[52]	Denkowski M, Lavie A. METEOR universal: Language specific translation evaluation for any target language[C] //Proc of the 9th Workshop on Statistical Machine Translation. Stroudsburg, PA: ACL 2014: 376−380
[53]	Lin C Y. ROUGE: A package for automatic evaluation of summaries[C] // Proc of the Workshop on Text Summarization Branches Out. Stroudsburg, PA: ACL, 2004: 74−81
[54]	Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint, arXiv: 1412. 6980, 2014
[55]	Wu Yonghui, Schuster M, Chen Zhifeng, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation[J]. arXiv preprint, arXiv: 1609. 08144, 2014
[56]	Ji Ziwei, Lee N, Frieske R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1−38

[1]	Li Gengsong, Liu Yi, Zheng Qibin, Li Xiang, Liu Kun, Qin Wei, Wang Qiang, Yang Changhong. Algorithm Selection Method Based on Multi-Objective Hybrid Ant Lion Optimizer[J]. Journal of Computer Research and Development, 2023, 60(7): 1533-1550. DOI: 10.7544/issn1000-1239.202220769
[2]	Sun Penghao, Lan Julong, Shen Juan, Hu Yuxiang. Pinning Control-Based Routing Policy Generation Using Deep Reinforcement Learning[J]. Journal of Computer Research and Development, 2021, 58(7): 1563-1572. DOI: 10.7544/issn1000-1239.2021.20200018
[3]	Hu Haiyang, Liu Runhua, Hu Hua. Multi-Objective Optimization for Task Scheduling in Mobile Cloud Computing[J]. Journal of Computer Research and Development, 2017, 54(9): 1909-1919. DOI: 10.7544/issn1000-1239.2017.20160757
[4]	Li Li, Wang Wanliang, Xu Xinli, Li Weikun. Multi-Objective Particle Swarm Optimization Based on Grid Ranking[J]. Journal of Computer Research and Development, 2017, 54(5): 1012-1023. DOI: 10.7544/issn1000-1239.2017.20160074
[5]	Bi Xiaojun, Zhang Lei, Xiao Jing. Constrained Multi-Objective Optimization Algorithm Based on Dual Populations[J]. Journal of Computer Research and Development, 2015, 52(12): 2813-2823. DOI: 10.7544/issn1000-1239.2015.20148025
[6]	Zhang Shiwen, Li Zhiyong, Chen Shaomiao, and Li Renfa. Dynamic Multi-Objective Optimization Algorithm Based on Ecological Strategy[J]. Journal of Computer Research and Development, 2014, 51(6): 1313-1330.
[7]	Wen Renqiang, Zhong Shaobo, Yuan Hongyong, Huang Quanyi. Emergency Resource Multi-Objective Optimization Scheduling Model and Multi-Colony Ant Optimization Algorithm[J]. Journal of Computer Research and Development, 2013, 50(7): 1464-1472.
[8]	Liu Chun'an, Wang Yuping. Dynamic Multi-Objective Optimization Evolutionary Algorithm Based on New Model[J]. Journal of Computer Research and Development, 2008, 45(4): 603-611.
[9]	Chang Yan, Liu Xu, Cheng Wenyuan, Xie Xianghui, Cui Degang. Research and Application of Multi-Objective Aircraft Optimization System Based on Grid[J]. Journal of Computer Research and Development, 2007, 44(1): 44-60.
[10]	Ma Ming, Zhou Chunguang, Zhang Libiao, Ma Jie. Fuzzy Neural Network Optimization by a Multi-Objective Particle Swarm Optimization Algorithm[J]. Journal of Computer Research and Development, 2006, 43(12): 2104-2109.

Cited By

Cited by

Periodical cited type(16)

1.	戎珂，施新伟，吕若明. “i7算”赋能AI产业生态可持续发展. 科学学研究. 2025(01): 197-204 .
2.	张浩严，吕文涛，余润泽，邓志江. 大语言模型研究现状. 无线电工程. 2025(01): 163-174 .
3.	李东闻，钟震宇，孙羽菲，申峻宇，马子智，于川越，张玉志. 玲珑：一个小规模的高质量中文预训练语言模型. 计算机研究与发展. 2025(03): 682-693 . 本站查看
4.	陶江垚，奚雪峰，盛胜利，崔志明，左严. 结构化思维提示增强大语言模型推理能力综述. 计算机工程与应用. 2025(06): 64-83 .
5.	魏楚元，王昕，周小平，赵光哲，黄明. 大型语言模型及其在建筑行业应用研究综述. 北京建筑大学学报. 2024(02): 1-14+121 .
6.	庞进喜. 大模型在汽车国际化多语言处理中的应用. 中国汽车. 2024(05): 14-20 .
7.	王晓璐，杨云轩，谢阳斌. 创造人机对话式学习新形态——大语言模型的教育应用现状与展望. 中小学信息技术教育. 2024(05): 15-17 .
8.	马伟民. 自然语言大模型技术在政务服务智能客服系统建设中的应用. 信息与电脑(理论版). 2024(08): 86-88 .
9.	曾白凌. “被中介的真理”：Sora对媒介相合性的追问. 现代传播(中国传媒大学学报). 2024(05): 1-10 .
10.	童俊杰，申佳，赫罡，张奎. 运营商智算中心建设思路及方案. 邮电设计技术. 2024(09): 68-73 .
11.	刘同军. 生成式人工智能革新数学教学:场景与案例. 中学数学杂志. 2024(10): 1-4 .
12.	尹为民. 一种基于预训练模型的类增量学习近似重放方法分析. 电子技术. 2024(10): 144-145 .
13.	崔金满，李冬梅，田萱，孟湘皓，杨宇，崔晓晖. 提示学习研究综述. 计算机工程与应用. 2024(23): 1-27 .
14.	王珍珍，向巴卓玛，赵岩松，马星光. 以ChatGPT为代表的大型语言模型在医学教学中的应用. 医学教育管理. 2024(06): 692-697 .
15.	王琳. 大语言模型技术背景下重塑研究生论文评价与指导. 学位与研究生教育. 2024(12): 30-37 .
16.	朱俊仪，朱尚明. 利用检索增强生成技术开发本地知识库应用. 通信学报. 2024(S2): 242-247 .