Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (5): 1109-1120.doi: 10.7544/issn1000-1239.2017.20151017

Addressing Transient and Intermittent Link Faults in NoC with Fault-Tolerant Method

Ouyang Yiming1, Sun Chenglong1, Li Jianhua1, Liang Huaguo2, Huang Zhengfeng2, Du Gaoming2   

  1. 1(School of Computer and Information, Hefei University of Technology, Hefei 230009); 2(School of Electronic Science and Applied Physics, Hefei University of Technology, Hefei 230009)
  • Online:2017-05-01

Abstract: As the link is the critical path between routers in NoC,it will seriously affect the network performance when faults occur in the link. For this reason, we propose a high reliable fault-tolerant method addressing transient and intermittent link faults. The method can detect real-time data error occurring in the network, and then define that whether the fault is transient fault or intermittent fault, thereby realizing fault-tolerance. As a result, it not only alleviates the network congestion and decreases the data delay, but also ensures the correct transmission of data, effectively guaranteeing the high reliability of the system. It is well known that when a transient fault occurs in the link, the fault link will result in a data error, which cannot be corrected properly. Therefore, the proposed method set up the retransmission buffer and then the backup data will be retransmitted. If an intermittent fault occurs, the packet transmission is truncated. To solve this problem, the proposed method adds a pseudo head flit and a pseudo tail flit to the truncated data, then re-routing begins and the occupied resource is released. Experimental results show that, in different fault conditions, this method outperforms the comparison objects with significant reduction in average packet latency and obvious improvement in throughput. In a word, this scheme can effectively improve network reliability in addition to ensuring network performance.

Key words: network-on-chip (NoC), transient fault, intermittent fault, fault-tolerant, retransmission, reliable

