ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (1): 178-188.doi: 10.7544/issn1000-1239.2021.20190836

• 软件技术 • 上一篇    下一篇

基于排序损失的ECC多标签代码异味检测方法

王继娜, 陈军华, 高建华   

  1. (上海师范大学计算机科学与技术系 上海 200234) (wjn_wy1108@163.com)
  • 出版日期: 2021-01-01
  • 基金资助: 
    国家自然科学基金项目(61672355)

ECC Multi-Label Code Smell Detection Method Based on Ranking Loss

Wang Jina, Chen Junhua, Gao Jianhua   

  1. (Department of Computer Science and Technology, Shanghai Normal University, Shanghai 200234)
  • Online: 2021-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61672355).

摘要: 代码异味是由糟糕的代码或设计问题引起的一种软件特征,严重影响了软件系统的可靠性和可维护性.在软件系统中,一段代码元素可能同时受到多种代码异味的影响,使得软件质量明显下降.多标签分类适用该情况,将高共现的多个代码异味置于同一标签组,可以更好地考虑代码异味的相关性,但现有的多标签代码异味检测方法未考虑同一段代码元素中多种代码异味检测顺序的影响.对此,提出了一种基于排序损失的集成分类器链(ensemble of classifier chains, ECC)多标签代码异味检测方法,该方法选择随机森林作为基础分类器并采取多次迭代ECC的方式,以排序损失最小化为目标,选择一个较优的标签序列集,优化代码异味检测顺序问题,模拟其生成机理,检测一段代码元素是否同时存在长方法-长参数列表、复杂类-消息链或消息链-过大类这3组代码异味.实验采用9个评价指标,结果表明所提出的检测方法优于现有的多标签代码异味检测方法,F1平均值达97.16%.

关键词: 代码异味, 随机森林, 排序损失, 集成分类器链, 多标签分类

Abstract: Code smell is a software feature of bad code or design problem, which seriously affects the reliability and maintainability of software systems. In a software system, a piece of code element may be affected by multiple code smells at the same time, which makes the quality of the software significantly reduced. Multi-label classification is suitable for this case, by placing multiple code smells with high co-occurrence in one label group, the correlation of code smells can be better considered, but the existing multi-label code smell detection methods do not consider the influence of the code smell detection order in the same code element. As a result, an ECC multi-label code smell detection method based on ranking loss is proposed. This method aims at minimizing ranking loss and chooses an optimal set of label sequences to optimize code smell detection order problem and simulate the mechanism of code smell generation by selecting random forest as the basic classifier and adopting multiple iterations of ECC to detect whether a piece of code element has long method-long parameter list, complex class-message chain or message chain-blob simultaneously. Finally, nine evaluation metrics are used and experimental results show that the proposed method is superior to the existing multi-label code smell detection method with an average F1 of 97.16%.

Key words: code smell, random forest, ranking loss, ensemble of classifier chains (ECC), multi-label classification

中图分类号: