面向监听一致性协议的并发内存竞争记录算法

朱素霞; 陈德运; 季振洲; 孙广路; 张浩

doi:10.7544/issn1000-1239.2016.20150100

面向监听一致性协议的并发内存竞争记录算法

¹(哈尔滨理工大学计算机科学与技术学院博士后流动站哈尔滨 150080)
²(哈尔滨理工大学计算机科学与技术学院哈尔滨 150080)
³(哈尔滨工业大学计算机科学与技术学院哈尔滨 150001)
⁴(中国科学院计算技术研究所北京 100190) (zhusuxia@hrbust.edu.cn)

基金项目: 国家自然科学青年基金项目(61502123)；国家自然科学基金项目(61173024)；国家“九七三”重点基础研究发展计划基金项目(2011CB302501)；黑龙江省青年科学基金项目(QC2015084)；中国博士后科学基金项目(2015M571429)

详细信息

中图分类号: TP303
计量
- 文章访问数: 01231
- HTML全文浏览量: 0
- PDF下载量: 0573
出版历程
- 发布日期: 2016-05-31

A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence

¹(Postdoctoral Research Station, School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080)
²(School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080)
³(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)
⁴(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)

摘要

摘要: 内存竞争记录是解决多核程序执行不确定性的关键技术，然而现有点到点的内存竞争记录机制带来的硬件开销大，难以应用到实际的片上多核处理器系统中.以降低点到点内存竞争记录方式的硬件开销为出发点，为采用监听一致性协议的片上多核处理器(chip multiprocessor, CMP)系统设计了基于并发记录策略的点到点内存竞争记录算法.该记录算法将两两线程间点到点的内存竞争关系扩展到所有线程，采用分布式记录方法为每个线程记录一个由内存竞争关系的一方构成的内存竞争日志；重演时采用简化的生产者消费者模型，确保了确定性重演的实现，有效降低了硬件消耗和带宽开销.在8核处理器系统中的仿真结果表明，该并发式点到点内存竞争记录算法为每个处理器核添加硬件资源约171B，每千条内存操作指令记录日志大小约2.3B，记录和重演阶段均添加不到1.5%的带宽开销.
- 片上多核处理器 /
- 多核程序 /
- 确定性重演 /
- 内存竞争记录 /
- 内存冲突检测 /
- 监听一致性协议
Abstract: Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs. Because of high hardware overhead, the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors. In order to reduce the hardware overhead of point-to-point logging approach, a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed. This algorithm records the current execution points of all threads concurrently when detecting a memory conflict. It extends the point-to-point memory race relationship between two threads to all threads in recording phase, reducing hardware overhead significantly. It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log. When replaying, this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay, improving replay speed and reducing coherence bandwidth overhead. The simulation results on 8-core chip multiprocessor (CMP) system show that this concurrent recording algorithm based on point-to-point logging approach adds about 171B hardware for each processor, and records about 2.3B log per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.
- chip multiprocessor (CMP) /
- multi-core program /
- deterministic replay /
- memory race recording /
- memory conflict detection /
- snoop-based coherence protocol