高级检索

    一种多线程程序内存系统模拟器Trace驱动仿真方法

    A Trace-Driven Simulation of Memory System in Multithread Applications

    • 摘要: 伴随大数据计算时代的到来,片上多核处理器为提高多线程程序服务器吞吐率发挥巨大作用,同时其内存系统的访问延迟越来越影响系统性能.目前,路径驱动(trace-driven)仿真方法比执行驱动(execution-driven)运行速度快,被内存系统研究者广泛采用.但是路径驱动在仿真并发线程时,会同时导致宏观和微观的访存错位.而实际多线程程序运行过程中,不会发生这种访存错位行为.通过理论分析和计算,访存错位引起路径驱动的仿真结果存在明显偏差.针对上述问题,提出了一种方法来避免路径驱动仿真发生宏观和微观访存错位,精确回放采集阶段的多线程程序行为.实验数据显示,在避免宏观访存trace错位后,多线程程序的多个仿真指标出现最高10.22%的变化;对于部分访存密集型的多线程程序,避免微观访存trace错位可以使算数平均IPC出现大于50%的变化.为研究交互线程的内存系统行为提供一种更加准确的路径驱动方法.

       

      Abstract: Nowadays, chip-multiprocessors (CMPs) become significantly important for multithread applications due to their high-throughput performance in big data computing. But growing latency to memory is increasingly impacting system performance because of memory wall. Two independent simulation methods: trace-driven and execution-driven, are available for system researchers to study and evaluate the memory system. On one hand, in order to leverage simulation speed, researchers employ trace-driven simulation because it removes data processing and is faster than execution-driven counterpart. On the other hand, lack of data processing induces both global and local trace misplacements, which never exist in multithread applications on real machine. Through analytical modeling, remarkable performance metrics variations are observed due to trace misplacements. Basically speaking, the reasons are in trace-driven simulation: 1)locks do not prevent threads from non-exclusively entering critical range; 2)barriers do not synchronize threads as need; 3)the dependence among memory operations is violated. In order to improve memory system simulation accuracy in multithread applications, a methodology is designed to eliminate both global and local trace misplacement in trace-driven simulation. As shown in experiments, eliminating global trace misplacement of memory operation induces up to 10.22% reduction in various IPC metrics, while eliminating local trace misplacement of memory operation induces at least 50% reduction in arithmetic mean of IPC metrics. The proposed methodology ensures multithread application’s invariability in trace-driven simulation.

       

    /

    返回文章
    返回