高级检索
    朱素霞, 陈德运, 季振洲, 孙广路, 张浩. 面向监听一致性协议的并发内存竞争记录算法[J]. 计算机研究与发展, 2016, 53(6): 1238-1248. DOI: 10.7544/issn1000-1239.2016.20150100
    引用本文: 朱素霞, 陈德运, 季振洲, 孙广路, 张浩. 面向监听一致性协议的并发内存竞争记录算法[J]. 计算机研究与发展, 2016, 53(6): 1238-1248. DOI: 10.7544/issn1000-1239.2016.20150100
    Zhu Suxia, Chen Deyun, Ji Zhenzhou, Sun Guanglu, Zhang Hao. A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence[J]. Journal of Computer Research and Development, 2016, 53(6): 1238-1248. DOI: 10.7544/issn1000-1239.2016.20150100
    Citation: Zhu Suxia, Chen Deyun, Ji Zhenzhou, Sun Guanglu, Zhang Hao. A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence[J]. Journal of Computer Research and Development, 2016, 53(6): 1238-1248. DOI: 10.7544/issn1000-1239.2016.20150100

    面向监听一致性协议的并发内存竞争记录算法

    A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence

    • 摘要: 内存竞争记录是解决多核程序执行不确定性的关键技术,然而现有点到点的内存竞争记录机制带来的硬件开销大,难以应用到实际的片上多核处理器系统中.以降低点到点内存竞争记录方式的硬件开销为出发点,为采用监听一致性协议的片上多核处理器(chip multiprocessor, CMP)系统设计了基于并发记录策略的点到点内存竞争记录算法.该记录算法将两两线程间点到点的内存竞争关系扩展到所有线程,采用分布式记录方法为每个线程记录一个由内存竞争关系的一方构成的内存竞争日志;重演时采用简化的生产者消费者模型,确保了确定性重演的实现,有效降低了硬件消耗和带宽开销.在8核处理器系统中的仿真结果表明,该并发式点到点内存竞争记录算法为每个处理器核添加硬件资源约171B,每千条内存操作指令记录日志大小约2.3B,记录和重演阶段均添加不到1.5%的带宽开销.

       

      Abstract: Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs. Because of high hardware overhead, the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors. In order to reduce the hardware overhead of point-to-point logging approach, a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed. This algorithm records the current execution points of all threads concurrently when detecting a memory conflict. It extends the point-to-point memory race relationship between two threads to all threads in recording phase, reducing hardware overhead significantly. It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log. When replaying, this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay, improving replay speed and reducing coherence bandwidth overhead. The simulation results on 8-core chip multiprocessor (CMP) system show that this concurrent recording algorithm based on point-to-point logging approach adds about 171B hardware for each processor, and records about 2.3B log per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.

       

    /

    返回文章
    返回