一种基于持久化栈的返回地址预测器

谭弘泽; 王剑

doi:10.7544/issn1000-1239.202111274

一种基于持久化栈的返回地址预测器

谭弘泽,
王剑

A Return Address Predictor Based on Persistent Stack

摘要

摘要: 分支预测允许处理器并行执行分支之后的指令，由于其高准确率具有性能和功耗方面的双重好处，是一项重要的处理器优化技术. 根据分而治之的策略，返回地址栈（return-address stack，RAS ）将过程返回类分支单独分出并予以预测. 其中，RAS利用过程调用和返回的后入先出规则，可通过猜测执行中调用栈的模拟准确预测返回地址. 但是，由于实际处理器猜测执行带来的错误路径污染，该结构需要通过恢复机制来保障所存储数据的准确性. 尤其在对面积资源敏感的嵌入式领域，设计者需要在准确率和恢复机制的开销间进行细致的权衡. 针对RAS存储中的冗余，通过溢出检测结合传统栈、持久化栈和后备预测3种预测方式，提出一种基于持久化栈的返回地址预测器——混合返回地址栈（hybrid return-address stack，HRAS），避免错误路径污染和对返回地址的冗余存储，从而有效降低返回误预测率. 与此同时，设计解耦传统栈和持久化栈，进一步降低其面积需求. 根据SPEC CPU 2000基准测试以及设计编译器的评估结果，HRAS可利用仅1.1×10⁴ μm²的设计面积将过程返回的每千条指令误预测（MPKI）降至2.4×10⁻³ ，其误预测相比现有RAS可降低96%.

Abstract: Branch prediction is an essential optimization for both the performance and power of modern processors, enabling instructions ahead of branches to be executed speculatively in parallel. Different from the general branch prediction, procedure return can be conquered with a return-address stack (RAS). By using a speculative emulation of the call stack according to the last-in-first-out rule for procedure calls and returns, the RAS predicts return addresses accurately. However, due to wrong-path corruptions under speculative execution of real processors, the RAS needs a repair mechanism to maintain the accuracy of the storage. Especially for embedded processors which are sensitive to the area, a careful trade-off between the accuracy and the overhead of repair mechanisms could be necessary. To address the redundancy of RAS storage, we introduce hybrid RAS, a return-address predictor based on a persistent stack. By integrating the classical stack, the persistent stack, and the backup prediction with the detection of overflows, our proposal could eliminate wrong-path corruptions and redundancies at the same time. As a result, the return misprediction rate is reduced effectively and efficiently. In addition, the classical stack is decoupled from the persistent stack to further optimize the area. With benchmarks from the SPEC CPU 2000 suite, the experiments show that our proposed RAS can reduce MPKI（mis-predictions per kilo instructions）to 2.4×10⁻³with a design area of only 1.1×10⁴ μm² under design compiler, whose misses are reduced by over 96% compared with the state-of-the-art RAS.

HTML全文

参考文献(23)

施引文献

资源附件(0)