Abstract:
Feature is an executable function entity that’s defined in software system. The process of identifying the mapping relationship between the software features and source code is called feature location. Information retrieval feature location method is widely used in software maintenance, code search and other fields because of its high usability and low overhead. All the information retrieval feature location methods are based on semantic similarity calculation. However, there are two main problems: 1) There is a lot of noise data in the source code corpus. The noise data will interfere with the result of similarity calculation. 2) Different index methods’ limitation will lead to the similarity calculation results being inaccurate. To solve these problems, a semantic similarity integration method for software feature location problem is proposed. This method introduces the Part-of-Speech filtering in the preprocessing process, effectively filtering the source code noise data, and improving the accuracy of similarity calculation. Then, different index methods are integrated to calculate similarities based on the source code’s structured characteristics. Experiments are performed on the open data benchmarks. Compared with the existing methods, the POS filtering improves by an average of 30.88% on the mean reciprocal rank performance, while similarity integration improves an average of 10.28%. The experimental result verifies the effectiveness of the proposed methods.