基于细粒度状态标签的代码缓存优化方案

牛根; 张福新

doi:10.7544/issn1000-1239.202330856

基于细粒度状态标签的代码缓存优化方案

牛根,
张福新

Code Cache Optimization Schemes Based on Fine-Grained State Label

摘要

摘要: 动态二进制翻译器中广泛使用软件代码缓存来管理翻译生成的代码块. 代码块的翻译、刷新和内存占用是软件代码缓存的重要指标. 目前仅有少量的针对系统级动态二进制翻译器中代码缓存的研究. 已有的系统级动态二进制翻译器为实现正确且高效的指令语义模拟，均使用了状态标签方案，但该方案会对软件代码缓存管理带来额外的问题. 通过深入分析状态标签方案，总结了其给代码缓存管理带来的2类问题：冲突和冗余. 针对这2类问题，提出了基于细粒度状态标签的代码缓存优化方法，包括多状态代码缓存和弱状态标签. 这2种方案在LATX-SYS中实现并在龙芯LoongArch平台上使用Ubuntu/x86 16.04和Windows XP/x86客户机操作系统进行了测试. 结果表明，代码块刷新次数和翻译次数分别降低了43%和18%，代码块相似率从59.63%降至5.06%，翻译开销和内存占用均得到降低. 总的来说，系统启动时间降低了20%. 最后，针对弱状态标签方案进一步测试了SPEC CPU2000，结果表明代码块数量平均减少了13%，且仅带来2%~3%的性能开销.

Abstract: Software code cache is widely used in dynamic binary translators to manage the dynamically generated code blocks. The translation, refresh, and memory occupancy of code blocks are key metrics for software code cache. There has been little research on software code cache for system-level dynamic binary translators. Existing system-level dynamic binary translators use state label scheme to achieve correct and efficient instruction semantic simulation, but this scheme introduces additional problems for software code cache management. Through in-depth analysis of the state label scheme, two types of problems are summarized: conflicts and redundancies. To address these two problems, two code cache optimization schemes based on fine-grained state label are proposed, including multi-state code cache scheme and weak state label scheme. These two schemes are implemented in LATX-SYS and evaluated with Ubuntu/x86 16.04 and Windows XP/x86 system booting on LoongArch platform. The evaluation results show that the code block refresh and translation are reduced by 43% and 18% respectively. The code block similarity ratio is decreased from 59.63% to 5.06%. The translation overhead and memory occupancy are both reduced. Overall, the system boot time is reduced by 20%. Finally, testing of the weak state label scheme on SPEC CPU2000 shows that the number of code blocks is reduced by an average of 13%, with only 2%−3% performance overhead introduced.

HTML全文

参考文献(35)

施引文献

资源附件(0)