An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage Optimization

Cui Xing; Wu Jingzheng; Luo Tianyue; Ling Xiang; Wang Xu

doi:10.7544/issn1000-1239.202550698

Cui Xing, Wu Jingzheng, Luo Tianyue, Ling Xiang, Wang Xu. An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage OptimizationJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550698

Citation:

An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage Optimization

Graphical Abstract

Graphical Abstract

Abstract

Abstract

With the rapid growth of the open source software (OSS) ecosystem, OSS adoption has become a mainstream development practice. README files serve as a critical resource for understanding and reusing OSS. Although recent research explores automatic README generation and completion, existing approaches face limitations in cross-language applicability, neglect of code structure, and susceptibility to hallucination and subjectivity. To address these challenges, we propose RMancer, a dual-stage README generation framework that integrates large language models (LLMs) with code structure modeling. In the first stage, RMancer introduces a prompt-guided structured information extraction method, enhanced with static analysis to construct high-quality training data, enabling the model to accurately capture file-level functional descriptions, dependency relations, and program entry points. In the second stage, RMancer applies a topology-based sorting strategy derived from the call graph to reconstruct execution logic and build structured input contexts. It further adopts a multi-task supervision mechanism to jointly learn document structure and content generation, enhancing logical consistency and objectivity. A post-generation standardization strategy is also incorporated to ensure the formatting and factuality of the generated README files. Evaluations on 16692 OSS projects show that RMancer consistently outperforms state-of-the-art methods in both information extraction and README generation. It achieves an average F1-score improvement of 2.34% across key fields and gains 1.37% on average in BLEU, METEOR, and ROUGE-L. RMancer also leads on AlignScore and G-Eval metrics, with superior objectivity and redundancy control, confirming the effectiveness of its structure-aware and multi-task optimization strategies.

FullText(HTML)

References (56)

Cited By

Turn off MathJax

Article Contents

An LLM-Based Framework for README Generation via Code-Aware Representation and Dual-Stage Optimization

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content