ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (4): 821-831.doi: 10.7544/issn1000-1239.2017.20151060

• 系统结构 • 上一篇    下一篇

基于逐步细化快照序列的多核并行程序调试

王博弘,刘轶,张国振,钱德沛   

  1. (北京航空航天大学中德软件技术联合研究所 北京 100191) (turtlemax2009@gmail.com)
  • 出版日期: 2017-04-01
  • 基金资助: 
    基金项目〗:国家“八六三”高技术研究发展计划基金项目(2012AA01A302)

Debugging Multi-Core Parallel Programs by Gradually Refined Snapshot Sequences

Wang Bohong, Liu Yi, Zhang Guozhen, Qian Depei   

  1. (Sino-German Joint Software Institute, Beihang University, Beijing 100191)
  • Online: 2017-04-01

摘要: 多核并行程序的调试是一个公认的困难问题,困难主要来自于程序执行的不确定性.可重现调试(replay debug)提供了消除程序中不确定性的能力,但是现有的可重现调试解决方案都无法应用于商用的软硬件平台中,且进行调试所带来的性能损失会随着并发度的增加而超线性地增长.提出了一种基于运行快照的新型并行程序调试方法SDT(snapshot debug tool).该方法以离线的断点设置、运行快照捕捉和运行快照细化为基础,提出了一套可以指导用户由粗到细发现错误的调试过程,并在通用的软硬件平台上进行了实现.实验结果显示,在8线程的并发条件下,使用SDT调试所带来的时间性能损耗平均为5188%;同时当线程数增长4倍时,使用SDT所带来的额外时间消耗最多增长1倍,具有很好的可扩展性.记录快照的数据量是影响SDT性能的重要挑战,实验证明通过使用增量式的快照记录方式可以有效地降低需要记录的数据量,减少记录快照花费的时间,提高SDT的整体性能.

关键词: 可重现调试, 运行快照, 确定性, 多核并行程序调试, 多线程

Abstract: Debugging multi-core parallel program is a well-known difficult problem. The key problem is that parallel problem may introduce many non-deterministic factors. Replay debugging is a promising method to eliminate non-deterministic. However, the state-of-art replay debugging solutions are not suitable for commercial software and hardware architecture. With the growth of concurrent degree, current replay debug method may also have unaccepted overhead. We propose a practical and novel replay debugging scheme name SDT (snapshot debug tool). The key innovation of SDT is using offline breakpoint and abstracting replay execution, instead of performing typical and physical replay execution. SDT can apply on commercial operate system and hardware, while also providing a gradually refined debugging method. According to the experimental results, using SDT will introduce 5188% extra execution time in average when using 8 threads. When the thread count increases from 1x to 4x, the overhead of SDT debugging will only increase from 1x to 2x, which shows that SDT has strong scalability. It’s a great challenge for SDT to record a large amount of data. The incremental snapshot capture used in our experiments has been proved that it can be effective to reduce the time and data which need to be record so that to improve the SDT performance.

Key words: replay debug, program snapshot, deterministic, multi-core parallel program debug, multithread

中图分类号: