高级检索
    朱素霞, 季振洲, 李 东, 张 浩. 基于硬件签名的循环式内存竞争记录算法[J]. 计算机研究与发展, 2014, 51(5): 1149-1157.
    引用本文: 朱素霞, 季振洲, 李 东, 张 浩. 基于硬件签名的循环式内存竞争记录算法[J]. 计算机研究与发展, 2014, 51(5): 1149-1157.
    Zhu Suxia, Ji Zhenzhou, Li Dong, Zhang Hao. A Cyclic Memory Race Recording Algorithm Implemented with Hardware Signatures[J]. Journal of Computer Research and Development, 2014, 51(5): 1149-1157.
    Citation: Zhu Suxia, Ji Zhenzhou, Li Dong, Zhang Hao. A Cyclic Memory Race Recording Algorithm Implemented with Hardware Signatures[J]. Journal of Computer Research and Development, 2014, 51(5): 1149-1157.

    基于硬件签名的循环式内存竞争记录算法

    A Cyclic Memory Race Recording Algorithm Implemented with Hardware Signatures

    • 摘要: 多核程序的执行存在不确定性,内存竞争记录是实现多核程序确定性重演的关键技术.针对现有内存竞争记录机制记录日志较大、重演速度受限等问题,提出了一种新型的循环式点到点内存竞争记录算法.该算法用当前发生序表示内存冲突,用硬件签名实现冲突检测,无需修改原有的cache结构;引入冲突方向检测机制,约减连续同向的当前发生序,记录循环发生序到内存竞争日志.该算法中,内存竞争日志中所记录的任意两线程间的内存竞争呈循环状,大大减少了冗余,并用增量计数器优化循环发生序,更大程度上减小了内存竞争日志.仿真结果表明该算法能够在引入较少硬件资源的前提下有效地减小内存竞争日志.同时,内存竞争日志也具有较好的可扩展性.

       

      Abstract: Shared-memory multithreaded programs running on chip multiprocessors tend to be nondeterministic. Two-phase deterministic record-replay is an effective approach to resolve this problem. Memory race recording is the key technology to replay multithreaded programs deterministically. It is significant to develop an efficient memory race recording scheme with both low log growth rate and rapid replay speed. A cyclic memory race recording algorithm based on point-to-point logging approach, named CyclicMR, is proposed. CyclicMR presents each memory race by using a new current dependency, uses hardware signatures with small size to detect memory races instead of cache memory, reduces the continuous memory races with same direction by a conflict direction detecting mechanism, and records an innovative cyclic dependency which can achieve much more transitivity. In this algorithm, all memory races recorded between two threads are loop-shaped, significantly reducing the redundancy of memory races. At the same time, cyclic dependency is further optimized by an incremental instruction counter, and the size of memory race is reduced a lot. Using an 8-core chip multiprocessor system, an exact comparison with earlier mainstream approaches is performed. The analysis results show that CyclicMR achieves small log growth rate, low hardware overhead and low bandwidth overhead. And it also has good scalability in memory race log.

       

    /

    返回文章
    返回