ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展

• •    下一篇

一种基于持久化栈的返回地址预测器

谭弘泽,王剑   

  1. (处理器芯片全国重点实验室(中国科学院计算技术研究所) 北京 100190)(中国科学院计算技术研究所 北京 100190)(中国科学院大学 北京 100049) (tanhongze20b@ict.ac.cn)
  • 出版日期: 2022-08-24

A Return Address Predictor Based on Persistent Stack

Tan Hongze, Wang Jian   

  1. (State Key Lab of Processors(Institute of Computer Technology, Chinese Academy of Sciences), Beijing 100190)(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2022-08-24

摘要: 分支预测允许处理器并行执行分支之后的指令,由于其高准确率具有性能和功耗方面的双重好处,是一项重要的处理器优化技术.根据分而治之的策略,返回地址栈将过程返回类分支单独分出并予以预测.其中,返回地址栈利用过程调用和返回的后入先出规则,可通过猜测执行中调用栈的模拟准确预测返回地址.不幸的是,由于实际处理器猜测执行带来的错误路径污染,该结构需要通过恢复机制来保障所存储数据的准确性.尤其在对面积资源敏感的嵌入式领域,设计者需要在准确率和恢复机制的开销间进行细致的权衡.针对返回地址栈存储中的冗余,通过溢出检测结合传统栈、持久化栈和后备预测3种预测方式,提出一种基于持久化栈的返回地址预测器——混合返回地址栈,避免错误路径污染和对返回地址的冗余存储,从而有效降低返回误预测率.与此同时,设计解耦传统栈和持久化栈,进一步降低其面积需求.根据SPEC CPU 2000基准测试以及设计编译器的评估结果,混合返回地址栈可利用仅1.1×104 μm2的设计面积将过程返回误预测降至2.4×10-3 MPKI,其误预测相比现有返回地址栈可降低96%.

关键词: 返回地址预测, 猜测执行, 污染恢复, 持久化, 后备预测

Abstract: Branch prediction is an essential optimization for both the performance and power of modern processors, enabling instructions ahead of branches to be executed speculatively in parallel. Different from the general branch prediction, procedure return can be conquered with a RAS (return-address stack). By using a speculative emulation of the call stack according to the last-in-first-out rule for procedure calls and returns, the RAS predicts return addresses accurately. Unfortunately, due to wrong-path corruptions under speculative execution of real processors, the return-address stack needs a repair mechanism to maintain the accuracy of the storage. Especially for embedded processors which are sensitive to the area, a careful trade-off between the accuracy and the overhead of repair mechanisms could be necessary. To address the redundancy of return-address stack storage, we introduce Hybrid RAS, a return-address predictor based on a persistent stack. By integrating the classical stack, the persistent stack, and the backup prediction with the detection of overflows, our proposal could eliminate wrong-path corruptions and redundancies at the same time. As a result, the return misprediction rate is reduced effectively and efficiently. In addition, the classical stack is decoupled from and the persistent stack to further optimize the area. With benchmarks from the SPEC CPU 2000 suite, the experiments show that our proposed return-address stack can achieve 2.4×10-3 MPKI with a design area of only 1.1×104 μm2 under Design Compiler, which misses are reduced by over 96% compared to the state-of-the-art return-address stack.

Key words: return address prediction, speculative execution, corruption recovery, persistence, backup prediction

中图分类号: