一种面向软件特征定位问题的语义相似度集成方法

何云; 李彤; 王炜; 李响; 兰微

doi:10.7544/issn1000-1239.2019.20180103

一种面向软件特征定位问题的语义相似度集成方法

A Semantic Similarity Integration Method for Software Feature Location Problem

摘要

摘要: 特征是软件系统中被需求所定义的可执行功能实体.识别软件特征与源代码间映射关系的过程被称作特征定位.基于信息检索的特征定位方法由于高易用性和低开销等优点，被广泛应于软件维护、代码搜索等领域.所有基于信息检索的特征定位方法均建立在语义相似度计算基础之上，当前语义相似度计算存在2个主要问题：第一，源代码数据中大量噪声信息对相似度计算的干扰；第二，不同索引方法局限性导致的相似度计算结果失准.针对这2个问题，提出了一种面向软件特征定位问题的语义相似度集成方法.该方法在预处理过程引入词性过滤，有效过滤源代码中噪声数据，提升相似性计算的准确度.然后，以源代码数据自身结构特性为依据，集成不同索引方法进行相似度计算.在公开数据集上进行了实验，与现有方法相比，词性过滤和相似度集成在平均排序倒数性能上分别带来了30.88%和10.28%的提升，验证了所提方法的有效性.

Abstract: Feature is an executable function entity that’s defined in software system. The process of identifying the mapping relationship between the software features and source code is called feature location. Information retrieval feature location method is widely used in software maintenance, code search and other fields because of its high usability and low overhead. All the information retrieval feature location methods are based on semantic similarity calculation. However, there are two main problems: 1) There is a lot of noise data in the source code corpus. The noise data will interfere with the result of similarity calculation. 2) Different index methods’ limitation will lead to the similarity calculation results being inaccurate. To solve these problems, a semantic similarity integration method for software feature location problem is proposed. This method introduces the Part-of-Speech filtering in the preprocessing process, effectively filtering the source code noise data, and improving the accuracy of similarity calculation. Then, different index methods are integrated to calculate similarities based on the source code’s structured characteristics. Experiments are performed on the open data benchmarks. Compared with the existing methods, the POS filtering improves by an average of 30.88% on the mean reciprocal rank performance, while similarity integration improves an average of 10.28%. The experimental result verifies the effectiveness of the proposed methods.

HTML全文

参考文献(0)

施引文献

资源附件(0)