ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (5): 1109-1120.doi: 10.7544/issn1000-1239.2017.20151017

• 系统结构 • 上一篇    下一篇



  1. 1(合肥工业大学计算机与信息学院 合肥 230009); 2(合肥工业大学电子科学与应用物理学院 合肥 230009) (
  • 出版日期: 2017-05-01
  • 基金资助: 

Addressing Transient and Intermittent Link Faults in NoC with Fault-Tolerant Method

Ouyang Yiming1, Sun Chenglong1, Li Jianhua1, Liang Huaguo2, Huang Zhengfeng2, Du Gaoming2   

  1. 1(School of Computer and Information, Hefei University of Technology, Hefei 230009); 2(School of Electronic Science and Applied Physics, Hefei University of Technology, Hefei 230009)
  • Online: 2017-05-01

摘要: 片上网络中链路是路由器之间连接的关键通路,其发生故障将严重影响网络性能.针对这一问题,提出了一种针对瞬时和间歇性故障的高可靠链路容错方法,该方法可以在网络中实时检测数据是否发生错误,并以此定义瞬时故障和间歇性故障,从而进行容错.在减轻网络拥塞和延时的同时,保证了数据的正确传输,有效保障了系统的高可靠性.当链路中发生瞬时故障导致数据出错且不能正确纠正时,通过设置的重传缓冲区内备份的数据重新进行传输.当链路中发生间歇性故障导致数据出错且不能正确纠正时,数据包传输被截断,对被截断的数据重新添加头微片或尾微片,从而进行重新路由或资源释放.实验结果表明:该容错方法在不同故障情况下较对比对象,均较大地降低了延时,提高了吞吐率,该方法能有效地提高网络的可靠性,保证了系统性能.

关键词: 片上网络, 瞬时故障, 间歇性故障, 容错, 重传, 可靠性

Abstract: As the link is the critical path between routers in NoC,it will seriously affect the network performance when faults occur in the link. For this reason, we propose a high reliable fault-tolerant method addressing transient and intermittent link faults. The method can detect real-time data error occurring in the network, and then define that whether the fault is transient fault or intermittent fault, thereby realizing fault-tolerance. As a result, it not only alleviates the network congestion and decreases the data delay, but also ensures the correct transmission of data, effectively guaranteeing the high reliability of the system. It is well known that when a transient fault occurs in the link, the fault link will result in a data error, which cannot be corrected properly. Therefore, the proposed method set up the retransmission buffer and then the backup data will be retransmitted. If an intermittent fault occurs, the packet transmission is truncated. To solve this problem, the proposed method adds a pseudo head flit and a pseudo tail flit to the truncated data, then re-routing begins and the occupied resource is released. Experimental results show that, in different fault conditions, this method outperforms the comparison objects with significant reduction in average packet latency and obvious improvement in throughput. In a word, this scheme can effectively improve network reliability in addition to ensuring network performance.

Key words: network-on-chip (NoC), transient fault, intermittent fault, fault-tolerant, retransmission, reliable