高级检索

    LEMON:面向高效硬件实现的大语言模型驱动集成约简与Verilog生成

    LEMON: LLM-Driven Ensemble Reduction and Verilog Generation for Efficient Hardware Implementation

    • 摘要: 集成学习已被广泛用于提高人工智能模型的泛化性能。然而,集成模型引起的额外内存开销和计算成本限制了它们在资源受限场景下的应用。为应对这一挑战,本文提出了LEMON,一种面向高效硬件实现的大语言模型(large language models,LLMs)驱动的集成约简与Verilog HDL生成框架。LEMON利用LLM在文本理解、生成以及解决复杂问题方面的优势,显著降低了集成模型的内存占用和计算需求。在20个公共数据集上的广泛评估表明,LEMON在保持紧凑模型尺寸的同时,在多数测试集上达到甚至超越原始集成方法。值得注意的是,与最先进的技术相比,LEMON在集成约简方面实现了超过9×和338×的加速,展示了其在各类场景下的卓越可扩展性。此外,LEMON 集成了一种基于 LLM 的模型自动化硬件部署流程,显著简化了从软件模型到高效FPGA实现的过渡过程。与传统的随机森林(random forest)实现相比,LEMON将功耗降低了64.7%以上,硬件资源利用率提高92.6%以上,使其更加适用于边缘和嵌入式场景下的应用。

       

      Abstract: Ensemble learning has been widely adopted to enhance the generalization performance of artificial intelligence models. However, the additional memory overhead and computational costs induced by ensemble models limit their deployment in resource-constrained scenarios. To address this challenge, this paper proposes LEMON, a large language model (LLM)-driven framework for ensemble reduction and Verilog HDL generation, designed to enable efficient hardware implementation. By leveraging the advanced capabilities of LLMs in semantic understanding, code generation, and complex problem solving, LEMON significantly reduces the memory footprint and computational demands of ensemble models. Extensive evaluations on 20 public datasets demonstrate that while maintaining a compact model size, LEMON achieves comparable or superior prediction accuracy to the original ensemble methods across the majority of test sets. Notably, compared to state-of-the-art techniques, LEMON achieves speedups exceeding 9× and 338× in ensemble reduction, showcasing exceptional scalability across diverse scenarios. Furthermore, LEMON integrates an LLM-powered automated hardware deployment pipeline that significantly simplifies the transition from high-level software models to optimized FPGA implementations. Compared to conventional random forest hardware implementations, LEMON reduces power consumption by over 64.7% and hardware resource utilization increased by more than 92.6%, making it highly suitable for edge computing and embedded system applications.

       

    /

    返回文章
    返回