Abstract:
Software updates and evolves continuously over time, software repositories accumulate massive data. How to effectively collect, organize, and make use of these data has become a key problem in software engineering. Mining Software Repositories (MSR) aim to mine useful knowledge contained in complex and diversified data to improve the quality and productivity of software. Although some studies have elaborately summarized the background, history, and prospects about MSR, existing studies do not present systematically the most influential author, institution, and country as well as the major research topics and their transitions over time. Therefore, this study combines the existing classical publication analysis frameworks and algorithms to analyze the relationships among publications related to MSR, and presents some important domain knowledge for researchers in detail. To effectively tackle this task, we construct a framework named MSR Publication Analysis Framework (MSR-PAF). MSR-PAF consists of three components which can be used to create a dataset for the study, conduct a bibliography analysis, and implement a collaboration pattern analysis, respectively. The results of the bibliography analysis show that the most productive author, institution, and country are Ahmed E. Hassan, University of Victoria, and USA, respectively. The most frequent keyword is software maintenance and the most influential author is Abram Hindle. In addition, the results of the collaboration pattern analysis show that Abram Hindle is the most active author, and open source project and software maintenance are the most popular research topics.