高级检索

    基于错误传播分析的SDC脆弱指令识别方法

    An Approach for Identifying SDC-Causing Instructions by Fault Propagation Analysis

    • 摘要: 单粒子软错误是高辐照空间环境下影响计算可靠性的主要因素.随着芯片晶体管数的快速增长,单粒子软错误的威胁日益严重.结果错误(silent data corruption, SDC)是单粒子软错误造成的一种故障类型.由于SDC是隐蔽传播的,SDC的检测是单粒子软错误防护的难点.寻找SDC脆弱指令是目前检测SDC的重要途径.现有方法需要进行巨量的错误注入,时间代价巨大.首先根据数据关联图建立了指令的数据依赖关系,研究了函数间和函数内部错误传播过程;进而推导出判定SDC脆弱指令的充分条件,提出了SDC脆弱指令识别方法,该方法在错误注入中依据充分条件推测潜在的SDC脆弱指令.实验表明,在保证较高准确率和覆盖率的前提下,时间代价显著减少.

       

      Abstract: Single event upset (SEU) is caused by external radiation in outer space and it has a great influence on computing reliability of space devices. As process technology scales, space devices become more susceptible to SEU. SEU could result in silent data corruption (SDC), which means wrong outcomes of a program without any crash detected. SDC may lead to serious failure and hence cannot be ignored. As SDC-causing fault always propagates silently, it is very difficult to detect SDC. To develop SDC detectors, SDC-causing instructions of a program should be identified as the first step. However, this step usually needs a huge number of fault injections, which is extremely time-consuming and not achievable for most applications. In this paper, we build data dependence graph (DDG) to capture the dependencies among the values of instructions. Then the inter-function and intra-function propagation that leads to SDC is analyzed and the sufficient condition of SDC-causing instructions is demonstrated. Further, we propose a novel method of identifying SDC-causing instructions. Taking advantage of the trace files of injection, our method can detect underlying SDC-causing instructions without any expensive computations. Validation efforts show that our method yields high accuracy and coverage rate with a great reduction of injection cost.

       

    /

    返回文章
    返回