Advanced Search
    Lai Yuanming, Li Yalong, Hu Hanzhi, Xie Mengyao, Wang Zhe, Wu Chenggang. SIMD-RVV Dynamic Binary Translation Optimization: Redundant Configuration Elimination and Hybrid Translation-Driven Cross-Architecture Programming Model Adaptation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550135
    Citation: Lai Yuanming, Li Yalong, Hu Hanzhi, Xie Mengyao, Wang Zhe, Wu Chenggang. SIMD-RVV Dynamic Binary Translation Optimization: Redundant Configuration Elimination and Hybrid Translation-Driven Cross-Architecture Programming Model Adaptation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550135

    SIMD-RVV Dynamic Binary Translation Optimization: Redundant Configuration Elimination and Hybrid Translation-Driven Cross-Architecture Programming Model Adaptation

    • RISC-V, renowned for its open-source nature and modular design, has achieved remarkable success in embedded systems and is progressively expanding into the high-performance computing (HPC) domain. While RISC-V hardware tailored for HPC, such as the Sophon SG2042 multi-core processors, has demonstrated performance level comparable to X86/ARM counterparts, its underdeveloped software ecosystem remains a critical barrier to broader adoption. To address this challenge, we developed RVBT, a process-level dynamic binary translator for RISC-V, designed to bridge the software gap by efficiently porting the mature X86 ecosystem to RISC-V platforms, thereby accelerating RISC-V’s integration into HPC applications. Focusing on the pervasive use of SIMD instructions in HPC programs, this study tackles the inefficiencies arising from fundamental differences in programming models between X86 SIMD and RISC-V Vector (RVV) extensions. Specifically, X86 SIMD hardcodes data types within opcodes, whereas RVV dynamically configures vtype and mask registers, leading to redundant operations during direct translation. To overcome this, we propose three innovative optimizations to achieve: 1) Redundancy elimination via data type locality. By leveraging the locality of data types in adjacent SIMD operations, we statically analyze and remove redundant configurations of vtype (achieving 100% dynamic elimination rates for csrr and vsetvl, and 56.31% for vsetvli) and mask settings (74.66% elimination rate in floating-point benchmarks). 2) Hybrid translation with on-demand synchronization. We decouple scalar and vectorized floating-point operations, translating X86 SIMD scalar double-precision instructions to RISC-V’s floating-point extensions and reserving RVV for vectorized operations. Data synchronization between scalar and vector registers is optimized through defuse analysis, achieving a 67.35% dynamic synchronization reduction in floating-point benchmarks. Experimental results on SPEC CPU 2006 demonstrate significant improvements on the optimized RVBT achieves 47.39% and 40.06% of native execution efficiency for integer and floating-point benchmarks, respectively, representing speedups of 1.21× and 8.31× over the unoptimized version. RVBT vastly outperforms QEMU (18.84% and 4.81% for integer and floating-point), with floating-point efficiency surpassing QEMU by 8.33 times, highlighting its potential for deployment in certain HPC scenarios. Crucially, these optimizations are architecture-agnostic: The methodology of exploiting data type locality, hybrid instruction translation, and adaptive synchronization apply equally to ARM SIMD (e.g., NEON) to RVV translation, offering a universal framework for cross-ISA binary compatibility. This work provides a pivotal technical foundation for breaking the software ecosystem deadlock and advancing RISC-V’s role in HPC.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return