Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language Models

Ji Youqing; Zhang Yingzhou; Su Yupeng; Wang Gang; Zhang Wenzhi; Xie Jinyan

doi:10.7544/issn1000-1239.202550688

Ji Youqing, Zhang Yingzhou, Su Yupeng, Wang Gang, Zhang Wenzhi, Xie Jinyan. Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language ModelsJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550688

Citation:

Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language Models

Graphical Abstract

Graphical Abstract

Abstract

Abstract

The widespread adoption of Large Language Models (LLMs) in software engineering has made automated code refactoring, leveraging their powerful code comprehension and generation capabilities, a crucial direction for enhancing software quality and development efficiency. However, when refactoring non-contiguous code clones—those arising from statement interleaving, reordering, and similar transformations—LLMs face core challenges: dispersed semantic context, difficulty in capturing critical dependencies, and susceptibility to “hallucination” errors. To address these challenges, we propose a novel method for non-contiguous code clone refactoring that integrates static analysis with LLMs. Our method first efficiently and accurately identifies non-contiguous clones by combining program slicing with an algebraic classifier. Next, a context-aware refactoring opportunity identification algorithm determines the optimal refactoring targets for the LLM. Finally, a Chain-of-Thought few-shot prompting strategy guides the LLM to generate high-quality “extract method” refactoring suggestions, and a verification mechanism, inspired by metamorphic relations, validates the semantic and structural consistency of the generated results. Experiments on the open-source datasets Google Code Jam and BigCloneBench demonstrate that our proposed refactoring method reduced clone code by 66% to 71% in real-world projects like Junit. Furthermore, our detection method achieved an F1-score 2% to 18% higher than existing mainstream tools. On the Community Corpus-A refactoring opportunity identification benchmark, it reached an F1-score of 0.415, surpassing the state-of-the-art tool GEMS by 7.5%, enhancing software quality.

FullText(HTML)

References (41)

Cited By

Turn off MathJax

Article Contents

Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language Models

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content