ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (6): 1238-1248.doi: 10.7544/issn1000-1239.2016.20150100

Previous Articles     Next Articles

A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence

Zhu Suxia1,2, Chen Deyun2, Ji Zhenzhou3, Sun Guanglu2, Zhang Hao4   

  1. 1(Postdoctoral Research Station, School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080);2(School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080);3(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001);4(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)
  • Online:2016-06-01

Abstract: Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs. Because of high hardware overhead, the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors. In order to reduce the hardware overhead of point-to-point logging approach, a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed. This algorithm records the current execution points of all threads concurrently when detecting a memory conflict. It extends the point-to-point memory race relationship between two threads to all threads in recording phase, reducing hardware overhead significantly. It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log. When replaying, this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay, improving replay speed and reducing coherence bandwidth overhead. The simulation results on 8-core chip multiprocessor (CMP) system show that this concurrent recording algorithm based on point-to-point logging approach adds about 171B hardware for each processor, and records about 2.3B log per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.

Key words: chip multiprocessor (CMP), multi-core program, deterministic replay, memory race recording, memory conflict detection, snoop-based coherence protocol

CLC Number: