Advanced Search
    He Manman, Liu Longzheng, Ding Kaifeng, Ge Mingliang, Zhang Yuqi, Ren Aoyu, Guo Weilong, Wang Yaojun. Construction and Application of a Fine-Tuned Evo2-Based Model for Genomic Sequence Generation in Major CropsJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202660003
    Citation: He Manman, Liu Longzheng, Ding Kaifeng, Ge Mingliang, Zhang Yuqi, Ren Aoyu, Guo Weilong, Wang Yaojun. Construction and Application of a Fine-Tuned Evo2-Based Model for Genomic Sequence Generation in Major CropsJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202660003

    Construction and Application of a Fine-Tuned Evo2-Based Model for Genomic Sequence Generation in Major Crops

    • To construct genomic functional-element sequence generation models for major staple crops and evaluate their cross-species generalization ability, this study investigates fine-tuning and performance assessment of the Evo2 genomic language model for conditional sequence generation. Six representative crop genomes, including potato, peanut, indica rice cultivars MH63 and ZS97, japonica rice, and wheat, were selected. A unified conditional generation and evaluation framework was established for six core genomic functional elements, including coding sequences (CDS), exons, introns, mRNA, 5’ untranslated regions (5’ UTR), and 3’ untranslated regions (3’ UTR). Based on this framework, Evo2 models were fine-tuned on crop-specific genomic data, while the pretrained Evo2 model without fine-tuning was retained as a zero-shot baseline. Experimental results show that fine-tuning consistently improves generation performance across all functional elements and crop species, with an average sequence-similarity gain of approximately 0.49% over the zero-shot baseline. Cross-species evaluation further demonstrates that the fine-tuned models maintain good generalization ability when applied to unseen crop species, and the performance degradation remains below 1% for all six functional elements. Among them, coding-related elements such as CDS and mRNA show the most stable generation performance across species, whereas 5’ UTR is more sensitive to species differences. Additional bioinformatics analyses indicate that the generated sequences exhibit some degree of biological plausibility in ORF continuity, but still show limitations in codon preference, regulatory motifs in UTRs, and splice-site rules in introns. Overall, the results verify the effectiveness and cross-species applicability of Evo2-based fine-tuned genomic language models for functional-element sequence generation in major crops, providing a potential technical basis for genome-assisted breeding and crop improvement.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return