Advanced Search
    Ji Youqing, Zhang Yingzhou, Su Yupeng, Wang Gang, Zhang Wenzhi, Xie Jinyan. Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language ModelsJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550688
    Citation: Ji Youqing, Zhang Yingzhou, Su Yupeng, Wang Gang, Zhang Wenzhi, Xie Jinyan. Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language ModelsJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550688

    Non-Contiguous Code Refactoring: A Hybrid Approach Integrating Static Analysis and Large Language Models

    • The widespread adoption of Large Language Models (LLMs) in software engineering has made automated code refactoring, leveraging their powerful code comprehension and generation capabilities, a crucial direction for enhancing software quality and development efficiency. However, when refactoring non-contiguous code clones—those arising from statement interleaving, reordering, and similar transformations—LLMs face core challenges: dispersed semantic context, difficulty in capturing critical dependencies, and susceptibility to “hallucination” errors. To address these challenges, we propose a novel method for non-contiguous code clone refactoring that integrates static analysis with LLMs. Our method first efficiently and accurately identifies non-contiguous clones by combining program slicing with an algebraic classifier. Next, a context-aware refactoring opportunity identification algorithm determines the optimal refactoring targets for the LLM. Finally, a Chain-of-Thought few-shot prompting strategy guides the LLM to generate high-quality “extract method” refactoring suggestions, and a verification mechanism, inspired by metamorphic relations, validates the semantic and structural consistency of the generated results. Experiments on the open-source datasets Google Code Jam and BigCloneBench demonstrate that our proposed refactoring method reduced clone code by 66% to 71% in real-world projects like Junit. Furthermore, our detection method achieved an F1-score 2% to 18% higher than existing mainstream tools. On the Community Corpus-A refactoring opportunity identification benchmark, it reached an F1-score of 0.415, surpassing the state-of-the-art tool GEMS by 7.5%, enhancing software quality.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return