高级检索

    基于逐步细化快照序列的多核并行程序调试

    Debugging Multi-Core Parallel Programs by Gradually Refined Snapshot Sequences

    • 摘要: 多核并行程序的调试是一个公认的困难问题,困难主要来自于程序执行的不确定性.可重现调试(replay debug)提供了消除程序中不确定性的能力,但是现有的可重现调试解决方案都无法应用于商用的软硬件平台中,且进行调试所带来的性能损失会随着并发度的增加而超线性地增长.提出了一种基于运行快照的新型并行程序调试方法SDT(snapshot debug tool).该方法以离线的断点设置、运行快照捕捉和运行快照细化为基础,提出了一套可以指导用户由粗到细发现错误的调试过程,并在通用的软硬件平台上进行了实现.实验结果显示,在8线程的并发条件下,使用SDT调试所带来的时间性能损耗平均为5188%;同时当线程数增长4倍时,使用SDT所带来的额外时间消耗最多增长1倍,具有很好的可扩展性.记录快照的数据量是影响SDT性能的重要挑战,实验证明通过使用增量式的快照记录方式可以有效地降低需要记录的数据量,减少记录快照花费的时间,提高SDT的整体性能.

       

      Abstract: Debugging multi-core parallel program is a well-known difficult problem. The key problem is that parallel problem may introduce many non-deterministic factors. Replay debugging is a promising method to eliminate non-deterministic. However, the state-of-art replay debugging solutions are not suitable for commercial software and hardware architecture. With the growth of concurrent degree, current replay debug method may also have unaccepted overhead. We propose a practical and novel replay debugging scheme name SDT (snapshot debug tool). The key innovation of SDT is using offline breakpoint and abstracting replay execution, instead of performing typical and physical replay execution. SDT can apply on commercial operate system and hardware, while also providing a gradually refined debugging method. According to the experimental results, using SDT will introduce 5188% extra execution time in average when using 8 threads. When the thread count increases from 1x to 4x, the overhead of SDT debugging will only increase from 1x to 2x, which shows that SDT has strong scalability. It’s a great challenge for SDT to record a large amount of data. The incremental snapshot capture used in our experiments has been proved that it can be effective to reduce the time and data which need to be record so that to improve the SDT performance.

       

    /

    返回文章
    返回