高级检索
    王军, 庞建民, 傅立国, 岳峰, 单征, 张家豪. 二进制翻译中动静结合的寄存器分配优化方法[J]. 计算机研究与发展, 2019, 56(4): 708-718. DOI: 10.7544/issn1000-1239.2019.20170905
    引用本文: 王军, 庞建民, 傅立国, 岳峰, 单征, 张家豪. 二进制翻译中动静结合的寄存器分配优化方法[J]. 计算机研究与发展, 2019, 56(4): 708-718. DOI: 10.7544/issn1000-1239.2019.20170905
    Wang Jun, Pang Jianmin, Fu Liguo, Yue Feng, Shan Zheng, Zhang Jiahao. A Dynamic and Static Combined Register Mapping Method in Binary Translation[J]. Journal of Computer Research and Development, 2019, 56(4): 708-718. DOI: 10.7544/issn1000-1239.2019.20170905
    Citation: Wang Jun, Pang Jianmin, Fu Liguo, Yue Feng, Shan Zheng, Zhang Jiahao. A Dynamic and Static Combined Register Mapping Method in Binary Translation[J]. Journal of Computer Research and Development, 2019, 56(4): 708-718. DOI: 10.7544/issn1000-1239.2019.20170905

    二进制翻译中动静结合的寄存器分配优化方法

    A Dynamic and Static Combined Register Mapping Method in Binary Translation

    • 摘要: 针对二进制翻译器QEMU(quick emulator)在寄存器映射时未考虑基本块之间以及循环体之间对寄存器需求的差异,造成不必要的寄存器溢出而导致的冗余访存开销问题,引入全局寄存器静态映射和局部寄存器动态分配思想,提出高效的基于优先级的动静结合寄存器映射优化算法.该算法首先基于源平台不同寄存器使用的统计特征和各变量的生命周期,静态进行全局寄存器映射;然后依据中间表示与源平台寄存器之间的映射关系,获取基本块中间指令需求寄存器次数并排序确定寄存器分配的优先级;之后依据优先级顺序动态进行寄存器分配,从而减少寄存器溢出次数,降低生成的本地代码的膨胀率以及访存次数,提高目标程序性能.对NBENCH、典型的递归程序和SPEC2006的测试表明:该算法有效地减少了本地代码的访存次数,提高了程序性能,平均比优化前性能分别提升了8.67%, 825%, 8.10%.

       

      Abstract: To reduce the redundant memory access caused by unnecessary registers overflow in binary translation, as the registers mapping in binary translation ignores the difference of register requirements among basic blocks and loop blocks, an efficient dynamic and static combined registers mapping optimization algorithm based on priority is proposed, introduces the idea of allocating global register statically and allocating local register dynamically. Firstly, global register is mapped statically to reduce the global register overflow cost and maintenance overhead, according to statistical features of different registers used on the source platform and the life cycle of variable. Then, the number of registers requested by intermediate instruction can be obtained, based on the intermediate representation. Therefore, the priority of registers allocation is determined. Lastly, dynamically allocate the registers in order to reduce the number of registers overflow, to reduce the expansion rate of the generated local code and memory access times. Thus, the performance of the target program is improved. The test results of NBENCH, representative recursive programs and SPEC2006 show that, the algorithm effectively reduces the memory access of local code, and improves the program performance with an average increase of 8.56%, 8.14%, and 8.01%, respectively.

       

    /

    返回文章
    返回