Abstract:
Software pipelining is an important instruction scheduling technique. It tries to improve the performance of a loop by overlapping the execution of several successive iterations. As the gap between the speed of processor and memory becomes larger and larger, memory access instructions, especially the instructions which cause cache miss, become the bottleneck that restricts high performance. As these instructions’s latency is not fixed, it is very important to predict and hide the latency of these memory access instructions. Unlike the method used by others, cache profiling technique is introduced, collecting runtime information to predict memory access latency, and to schedule accordingly. When increasing the memory access latency in the software pipelined loop, the initial interval may also increase, thus the performance may not increase. The CSMS and FLMS algorithms are trying to change the memory access latency without increasing the initial interval. The CSMS and FLMS algorithms are improved, changing the memory access latency according to cache profiling information, so it is more accurate than the method used before. Experiment result shows that the new method can improve the performance effectively, increasing performance of SPEC2000 1% on average, some case being as high as 11%.