基于深度学习的程序理解研究进展

刘芳; 李戈; 胡星; 金芝

doi:10.7544/issn1000-1239.2019.20190185

基于深度学习的程序理解研究进展

(北京大学信息科学技术学院北京 100871) (高可信软件技术教育部重点实验室(北京大学) 北京 100871) (liufang816@pku.edu.cn)

基金项目: 国家“九七三”重点基础研究发展计划基金项目(2015CB352201)；国家自然科学基金项目(61620106007，61751210)

详细信息

中图分类号: TP311
计量
- 文章访问数: 2431
- HTML全文浏览量: 7
- PDF下载量: 1664
出版历程
- 发布日期: 2019-07-31

Program Comprehension Based on Deep Learning

(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871) (Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871)

摘要

摘要: 程序理解通过对程序进行分析、抽象、推理从而获取程序中相关信息，在软件开发、维护、迁移等过程中起重要作用，因而得到学术界和工业界的广泛关注.传统程序理解很大程度上依赖开发人员的经验，但随着软件规模及其复杂度不断增大，完全依赖开发人员的先验知识提取程序特征既耗时耗力，又很难充分挖掘出程序中隐含特征.深度学习是一种数据驱动的端到端的方法，它根据已有数据构建深度神经网络对数据中隐含的特征进行挖掘，已经在众多领域中获得成功应用.将深度学习技术运用于程序理解中，根据具体任务以及大量数据自动地学习程序数据中蕴含的特征，可以充分地挖掘出程序中隐含的知识，提高程序理解的效率.对基于深度学习的程序理解研究工作进行综述，首先对程序所包含的性质进行分析，然后介绍主流的程序理解模型，包括基于序列、结构以及执行过程的程序理解模型.随后展示基于深度学习的程序理解在程序分析中的应用，主要针对代码补全、代码注释生成、代码检索等任务.最后，分析并总结程序理解研究所面临的挑战.
- 程序理解 /
- 程序分析 /
- 软件工程 /
- 深度学习 /
- 数据挖掘
Abstract: Program comprehension is the process of obtaining relevant information in programs by analyzing, abstracting, and reasoning the programs. It plays an important role in software development, maintenance, migration, and other processes. It has received extensive attention in academia and industry. Traditional program comprehension relies heavily on the experience of developers. However, as the scale and complexity of software continue to grow, it is time-consuming and laborious to rely solely on the developer’s prior knowledge to extract program features, and it is difficult to fully exploit the hidden features in the program. Deep learning is a data-driven end-to-end method. It builds deep neural networks based on existing data to mine the hidden features in data, and has been successfully applied in many fields. By applying deep learning technology to program comprehension, we can automatically learn the features implied in programs, which can fully exploit the knowledge implied in the program and improve the efficiency of program comprehension. This paper surveys the research work of program comprehension based on deep learning in recent years. Firstly, we analyze the properties of the program, and then introduce mainstream program comprehension models, including sequential models, structural models, and execution traces based models. Furthermore, the applications of deep learning-based program comprehension in program analysis are introduced, which mainly focus on code completion, code summarization and code search, etc. Finally, we summarize the challenges in program comprehension research.
- program comprehension /
- program analysis /
- software engineering /
- deep learning /
- data mining

HTML全文

参考文献(0)

施引文献(47)

期刊类型引用(18)

1.	苏小红，郑伟宁，蒋远，魏宏巍，万佳元，魏子越. 基于学习的源代码漏洞检测研究与进展. 计算机学报. 2024(02): 337-374 . 百度学术
2.	刘忠鑫，唐郅杰，夏鑫，李善平. 代码变更表示学习及其应用研究进展. 软件学报. 2023(12): 5501-5526 . 百度学术
3.	奚建飞，王志英，邹文景，甘莹. 基于深度学习的非结构化表格文档数据抽取方法. 微型电脑应用. 2022(02): 102-105 . 百度学术
4.	钱忠胜，宋佳，俞情媛，成轶伟，孙志旺. 利用函数影响力的相似程序间测试用例重用与生成. 电子学报. 2022(07): 1696-1707 . 百度学术
5.	张祥平，刘建勋. 基于深度学习的代码表征及其应用综述. 计算机科学与探索. 2022(09): 2011-2029 . 百度学术
6.	魏敏，张丽萍，闫盛. 基于程序向量树和聚类的学生程序算法识别方法. 计算机工程与设计. 2022(10): 2790-2798 . 百度学术
7.	汶东震，张帆，刘海峰，杨亮，徐博，林原，林鸿飞. 深度程序理解视角下代码搜索研究综述. 计算机工程与应用. 2022(20): 63-72 . 百度学术
8.	王一凡，赵逢禹，艾均. 面向基本路径学习的代码自动命名. 小型微型计算机系统. 2022(11): 2302-2307 . 百度学术
9.	杨静宜，崔建弘，庞雅静. 基于特征深度学习的机器人协调操作感知控制. 计算机仿真. 2021(01): 307-311 . 百度学术
10.	赵乐乐，张丽萍. 代码注释自动生成研究进展. 计算机应用研究. 2021(04): 982-989 . 百度学术
11.	陈翔，杨光，崔展齐，孟国柱，王赞. 代码注释自动生成方法综述. 软件学报. 2021(07): 2118-2141 . 百度学术
12.	谢春丽，梁瑶，王霞. 深度学习在代码表征中的应用综述. 计算机工程与应用. 2021(20): 53-63 . 百度学术
13.	魏敏，张丽萍. 代码搜索方法研究进展. 计算机应用研究. 2021(11): 3215-3221+3230 . 百度学术
14.	李眩，吴晓兵，童百利. 基于动态模糊聚类的数据挖掘研究——以安徽城市综合实力分析为例. 贵阳学院学报(自然科学版). 2020(01): 52-57 . 百度学术
15.	池昊宇，陈长波. 基于神经网络的循环分块大小预测. 计算机科学. 2020(08): 62-70 . 百度学术
16.	景艳娥. 基于深度学习技术的语法纠错算法模型构建分析. 信息技术. 2020(09): 143-147+152 . 百度学术
17.	霍丽春，张丽萍. 代码注释演化及分类研究综述. 内蒙古师范大学学报(自然科学汉文版). 2020(05): 423-432 . 百度学术
18.	何后裕，王炳鑫. 基于深度学习的综合性共享数据匹配算法研究. 电子设计工程. 2020(20): 111-115 . 百度学术