ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (9): 1943-1952.doi: 10.7544/issn1000/1239.2016.20148367

• 系统结构 • 上一篇    下一篇

基于错误传播分析的SDC脆弱指令识别方法

马骏驰1,2,汪芸1,2,蔡震波3,张庆祥3,王颖3,胡诚1,2   

  1. 1(东南大学计算机科学与工程学院 南京 211189); 2(计算机网络和信息集成教育部重点实验室(东南大学) 南京 211189); 3(中国空间技术研究院总体部 北京 100094) (jcma@seu.edu.cn)
  • 出版日期: 2016-09-01

An Approach for Identifying SDC-Causing Instructions by Fault Propagation Analysis

Ma Junchi1,2, Wang Yun1,2, Cai Zhenbo3, Zhang Qingxiang3, Wang Ying3, Hu Cheng1,2   

  1. 1(School of Computer Science & Engineering, Southeast University, Nanjing 211189);2(Key Laboratory of Computer Network and Information Integration(Southeast University), Ministry of Education, Nanjing 211189);3(Institute of Spacecraft System Engineering, China Academy of Space Technology, Beijing 100094)
  • Online: 2016-09-01

摘要: 单粒子软错误是高辐照空间环境下影响计算可靠性的主要因素.随着芯片晶体管数的快速增长,单粒子软错误的威胁日益严重.结果错误(silent data corruption, SDC)是单粒子软错误造成的一种故障类型.由于SDC是隐蔽传播的,SDC的检测是单粒子软错误防护的难点.寻找SDC脆弱指令是目前检测SDC的重要途径.现有方法需要进行巨量的错误注入,时间代价巨大.首先根据数据关联图建立了指令的数据依赖关系,研究了函数间和函数内部错误传播过程;进而推导出判定SDC脆弱指令的充分条件,提出了SDC脆弱指令识别方法,该方法在错误注入中依据充分条件推测潜在的SDC脆弱指令.实验表明,在保证较高准确率和覆盖率的前提下,时间代价显著减少.

关键词: 单粒子翻转, 单粒子软错误, SDC脆弱指令, 错误注入, 错误传播

Abstract: Single event upset (SEU) is caused by external radiation in outer space and it has a great influence on computing reliability of space devices. As process technology scales, space devices become more susceptible to SEU. SEU could result in silent data corruption (SDC), which means wrong outcomes of a program without any crash detected. SDC may lead to serious failure and hence cannot be ignored. As SDC-causing fault always propagates silently, it is very difficult to detect SDC. To develop SDC detectors, SDC-causing instructions of a program should be identified as the first step. However, this step usually needs a huge number of fault injections, which is extremely time-consuming and not achievable for most applications. In this paper, we build data dependence graph (DDG) to capture the dependencies among the values of instructions. Then the inter-function and intra-function propagation that leads to SDC is analyzed and the sufficient condition of SDC-causing instructions is demonstrated. Further, we propose a novel method of identifying SDC-causing instructions. Taking advantage of the trace files of injection, our method can detect underlying SDC-causing instructions without any expensive computations. Validation efforts show that our method yields high accuracy and coverage rate with a great reduction of injection cost.

Key words: single event upset (SEU), soft error, SDC-causing instruction, fault injection, fault propagation

中图分类号: