ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (4): 708-718.doi: 10.7544/issn1000-1239.2019.20170905

• 系统结构 • 上一篇    下一篇

二进制翻译中动静结合的寄存器分配优化方法

王军,庞建民,傅立国,岳峰,单征,张家豪   

  1. (数学工程与先进计算国家重点实验室(战略支援部队信息工程大学) 郑州 450002) (wj_xd@foxmail.com)
  • 出版日期: 2019-04-01
  • 基金资助: 
    国家自然科学基金项目(61520106005,61761136014);国家重点研发计划项目(2017YFB1010000)

A Dynamic and Static Combined Register Mapping Method in Binary Translation

Wang Jun, Pang Jianmin, Fu Liguo, Yue Feng, Shan Zheng, Zhang Jiahao   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing (Strategic Support Force Information Engineering University), Zhengzhou 450002)
  • Online: 2019-04-01

摘要: 针对二进制翻译器QEMU(quick emulator)在寄存器映射时未考虑基本块之间以及循环体之间对寄存器需求的差异,造成不必要的寄存器溢出而导致的冗余访存开销问题,引入全局寄存器静态映射和局部寄存器动态分配思想,提出高效的基于优先级的动静结合寄存器映射优化算法.该算法首先基于源平台不同寄存器使用的统计特征和各变量的生命周期,静态进行全局寄存器映射;然后依据中间表示与源平台寄存器之间的映射关系,获取基本块中间指令需求寄存器次数并排序确定寄存器分配的优先级;之后依据优先级顺序动态进行寄存器分配,从而减少寄存器溢出次数,降低生成的本地代码的膨胀率以及访存次数,提高目标程序性能.对NBENCH、典型的递归程序和SPEC2006的测试表明:该算法有效地减少了本地代码的访存次数,提高了程序性能,平均比优化前性能分别提升了8.67%, 825%, 8.10%.

关键词: 二进制翻译, 寄存器分配, 翻译器QEMU, 反馈式静态二进制翻译器FD-SQEMU, TCG中间表示

Abstract: To reduce the redundant memory access caused by unnecessary registers overflow in binary translation, as the registers mapping in binary translation ignores the difference of register requirements among basic blocks and loop blocks, an efficient dynamic and static combined registers mapping optimization algorithm based on priority is proposed, introduces the idea of allocating global register statically and allocating local register dynamically. Firstly, global register is mapped statically to reduce the global register overflow cost and maintenance overhead, according to statistical features of different registers used on the source platform and the life cycle of variable. Then, the number of registers requested by intermediate instruction can be obtained, based on the intermediate representation. Therefore, the priority of registers allocation is determined. Lastly, dynamically allocate the registers in order to reduce the number of registers overflow, to reduce the expansion rate of the generated local code and memory access times. Thus, the performance of the target program is improved. The test results of NBENCH, representative recursive programs and SPEC2006 show that, the algorithm effectively reduces the memory access of local code, and improves the program performance with an average increase of 8.56%, 8.14%, and 8.01%, respectively.

Key words: binary translation, register allocation, quick emulator (QEMU), feedback static QEMU (FD-SQEMU), TCG intermediate code

中图分类号: