面向服务器应用的远距离函数调用指令预取优化

陈立; 高军; 赵天磊; 刘峤

doi:10.7544/issn1000-1239.202440783

面向服务器应用的远距离函数调用指令预取优化

Long-distance Function Call Instruction Prefetching Optimization for Server Applications

摘要

摘要: 一级指令缓存缺失导致的大取指延迟是制约现代处理器性能进一步提升的重要瓶颈之一,尤其在大指令踪迹的服务器应用上更是如此. 指令预取技术是解决这一问题的关键性技术，它通过提前将要用到的指令块放入上级缓存中，从而达到掩盖高昂访问延迟时间的目的. 近年来，研究者们提出了许多指令预取架构来缓解该问题,但由于指令局部性较差,长距离函数调用仍然带来了大量的指令缺失. 设计了一种新的指令预取机制,能以较低的硬件开销实现对函数调用目标指令的高覆盖率和高准确率预取. 实验表明，优化后函数调用目标指令缺失率较目前最先进的指令预取器降低约45%，IPC（instruction per cycle）性能比基准线高约11.9%，比相似开销的目前最先进的指令预取器高出约2.9%.

Abstract: The large instruction fetch delay which is caused by the miss of L1 instruction cache is one of the most important bottlenecks that restricts the performance development of modern processors, especially in server applications which have a very large number of instruction traces. Instruction prefetching technology is a key technology to solve this problem, which achieves the purpose of masking the high access latency by putting the instruction cache blocks to be used in L1 instruction cache in advance. In recent years, researchers have proposed a lot of instruction prefetching architectures to alleviate this problem, but due to poor temporal and spatial locality, long-distance function calls still bring a large number of missing instructions. In this paper, a new instruction prefetching mechanism is designed, which can achieve high coverage and high accuracy prefetching of the target instruction for function calls with low hardware overhead. Experiments show that after applying the optimization in this paper, the missing rate of the function call target instruction is about 45% lower than that of the current state-of-the-art instruction prefetcher, and the IPC(instruction per cycle) performance is about 11.9% higher than the baseline, and about 2.9% higher than the current state-of-the-art instruction prefetcher with similar overhead.

HTML全文

参考文献(39)

施引文献

资源附件(0)