Abstract:
The large instruction fetch delay which is caused by the miss of L1 instruction cache is one of the most important bottlenecks that restricts the performance development of modern processors, especially in server applications which have a very large number of instruction traces. Instruction prefetching technology is a key technology to solve this problem, which achieves the purpose of masking the high access latency by putting the instruction cache blocks to be used in L1 instruction cache in advance. In recent years, researchers have proposed a lot of instruction prefetching architectures to alleviate this problem, but due to poor temporal and spatial locality, long-distance function calls still bring a large number of missing instructions. In this paper, a new instruction prefetching mechanism is designed, which can achieve high coverage and high accuracy prefetching of the target instruction for function calls with low hardware overhead. Experiments show that after applying the optimization in this paper, the missing rate of the function call target instruction is about 45% lower than that of the current state-of-the-art instruction prefetcher, and the IPC(instruction per cycle) performance is about 11.9% higher than the baseline, and about 2.9% higher than the current state-of-the-art instruction prefetcher with similar overhead.