ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (6): 1266-1277.doi: 10.7544/issn1000-1239.2015.20150160

所属专题: 2015面向应用领域需求的体系结构

• 系统结构 • 上一篇    下一篇

一种多线程程序内存系统模拟器Trace驱动仿真方法

朱鹏飞1,3,卢天越2,3,陈明宇2   

  1. 1(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190);2(中国科学院计算技术研究所先进计算机系统研究中心 北京 100190);3(中国科学院大学 北京 100049) (zhupengfei@ict.ac.cn)
  • 出版日期: 2015-06-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61272132,61221062)

A Trace-Driven Simulation of Memory System in Multithread Applications

Zhu Pengfei1,3, Lu Tianyue2,3, Chen Mingyu2   

  1. 1(State Key Laboratory of Computer Architecture(Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190);2(Center for Advanced Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);3(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2015-06-01

摘要: 伴随大数据计算时代的到来,片上多核处理器为提高多线程程序服务器吞吐率发挥巨大作用,同时其内存系统的访问延迟越来越影响系统性能.目前,路径驱动(trace-driven)仿真方法比执行驱动(execution-driven)运行速度快,被内存系统研究者广泛采用.但是路径驱动在仿真并发线程时,会同时导致宏观和微观的访存错位.而实际多线程程序运行过程中,不会发生这种访存错位行为.通过理论分析和计算,访存错位引起路径驱动的仿真结果存在明显偏差.针对上述问题,提出了一种方法来避免路径驱动仿真发生宏观和微观访存错位,精确回放采集阶段的多线程程序行为.实验数据显示,在避免宏观访存trace错位后,多线程程序的多个仿真指标出现最高10.22%的变化;对于部分访存密集型的多线程程序,避免微观访存trace错位可以使算数平均IPC出现大于50%的变化.为研究交互线程的内存系统行为提供一种更加准确的路径驱动方法.

关键词: 路径驱动仿真, 精确度, 内存系统, 多线程程序, trace采集回放

Abstract: Nowadays, chip-multiprocessors (CMPs) become significantly important for multithread applications due to their high-throughput performance in big data computing. But growing latency to memory is increasingly impacting system performance because of memory wall. Two independent simulation methods: trace-driven and execution-driven, are available for system researchers to study and evaluate the memory system. On one hand, in order to leverage simulation speed, researchers employ trace-driven simulation because it removes data processing and is faster than execution-driven counterpart. On the other hand, lack of data processing induces both global and local trace misplacements, which never exist in multithread applications on real machine. Through analytical modeling, remarkable performance metrics variations are observed due to trace misplacements. Basically speaking, the reasons are in trace-driven simulation: 1)locks do not prevent threads from non-exclusively entering critical range; 2)barriers do not synchronize threads as need; 3)the dependence among memory operations is violated. In order to improve memory system simulation accuracy in multithread applications, a methodology is designed to eliminate both global and local trace misplacement in trace-driven simulation. As shown in experiments, eliminating global trace misplacement of memory operation induces up to 10.22% reduction in various IPC metrics, while eliminating local trace misplacement of memory operation induces at least 50% reduction in arithmetic mean of IPC metrics. The proposed methodology ensures multithread application’s invariability in trace-driven simulation.

Key words: trace-driven simulation, accuracy, memory system, multithread applications, trace collection and replay

中图分类号: