ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (2): 394-409.doi: 10.7544/issn1000-1239.2019.20180103

Previous Articles     Next Articles

A Semantic Similarity Integration Method for Software Feature Location Problem

He Yun1, Li Tong1,2, Wang Wei1,2, Li Xiang1, Lan Wei1   

  1. 1(College of Software, Yunnan University, Kunming 650091); 2(Key Laboratory for Software Engineering of Yunnan Province (Yunnan University), Kunming 650091)
  • Online:2019-02-01

Abstract: Feature is an executable function entity that’s defined in software system. The process of identifying the mapping relationship between the software features and source code is called feature location. Information retrieval feature location method is widely used in software maintenance, code search and other fields because of its high usability and low overhead. All the information retrieval feature location methods are based on semantic similarity calculation. However, there are two main problems: 1) There is a lot of noise data in the source code corpus. The noise data will interfere with the result of similarity calculation. 2) Different index methods’ limitation will lead to the similarity calculation results being inaccurate. To solve these problems, a semantic similarity integration method for software feature location problem is proposed. This method introduces the Part-of-Speech filtering in the preprocessing process, effectively filtering the source code noise data, and improving the accuracy of similarity calculation. Then, different index methods are integrated to calculate similarities based on the source code’s structured characteristics. Experiments are performed on the open data benchmarks. Compared with the existing methods, the POS filtering improves by an average of 30.88% on the mean reciprocal rank performance, while similarity integration improves an average of 10.28%. The experimental result verifies the effectiveness of the proposed methods.

Key words: feature location, information retrieval, semantic similarity, POS filtering, index method, integration

CLC Number: