Advanced Search
    Cui Xing, Wu Jingzheng, Luo Tianyue, Ling Xiang, Wang Xu. An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage OptimizationJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550698
    Citation: Cui Xing, Wu Jingzheng, Luo Tianyue, Ling Xiang, Wang Xu. An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage OptimizationJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550698

    An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage Optimization

    • With the rapid growth of the open source software (OSS) ecosystem, OSS adoption has become a mainstream development practice. README files serve as a critical resource for understanding and reusing OSS. Although recent research explores automatic README generation and completion, existing approaches face limitations in cross-language applicability, neglect of code structure, and susceptibility to hallucination and subjectivity. To address these challenges, this paper proposes RMancer, a dual-stage README generation framework that integrates large language models (LLMs) with code structure modeling. In the first stage, RMancer introduces a prompt-guided structured information extraction method, enhanced with static analysis to construct high-quality training data, enabling the model to accurately capture file-level functional descriptions, dependency relations, and program entry points. In the second stage, RMancer applies a topology-based sorting strategy derived from the call graph to reconstruct execution logic and build structured input contexts. It further adopts a multi-task supervision mechanism to jointly learn document structure and content generation, enhancing logical consistency and objectivity. A post-generation standardization strategy is also incorporated to ensure the formatting and factuality of the generated README files. Evaluations on 16692 OSS projects show that RMancer consistently outperforms state-of-the-art methods in both information extraction and README generation. It achieves an average F1-score improvement of 2.34% across key fields and gains 1.37% on average in BLEU, METEOR, and ROUGE-L. RMancer also leads on AlignScore and G-Eval metrics, with superior objectivity and redundancy control, confirming the effectiveness of its structure-aware and multi-task optimization strategies.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return