Abstract:
Code smell is a software feature of bad code or design problem, which seriously affects the reliability and maintainability of software systems. In a software system, a piece of code element may be affected by multiple code smells at the same time, which makes the quality of the software significantly reduced. Multi-label classification is suitable for this case, by placing multiple code smells with high co-occurrence in one label group, the correlation of code smells can be better considered, but the existing multi-label code smell detection methods do not consider the influence of the code smell detection order in the same code element. As a result, an ECC multi-label code smell detection method based on ranking loss is proposed. This method aims at minimizing ranking loss and chooses an optimal set of label sequences to optimize code smell detection order problem and simulate the mechanism of code smell generation by selecting random forest as the basic classifier and adopting multiple iterations of ECC to detect whether a piece of code element has long method-long parameter list, complex class-message chain or message chain-blob simultaneously. Finally, nine evaluation metrics are used and experimental results show that the proposed method is superior to the existing multi-label code smell detection method with an average F1 of 97.16%.