高级检索

    针对瞬时故障和间歇性故障的NoC链路容错方法

    Addressing Transient and Intermittent Link Faults in NoC with Fault-Tolerant Method

    • 摘要: 片上网络中链路是路由器之间连接的关键通路,其发生故障将严重影响网络性能.针对这一问题,提出了一种针对瞬时和间歇性故障的高可靠链路容错方法,该方法可以在网络中实时检测数据是否发生错误,并以此定义瞬时故障和间歇性故障,从而进行容错.在减轻网络拥塞和延时的同时,保证了数据的正确传输,有效保障了系统的高可靠性.当链路中发生瞬时故障导致数据出错且不能正确纠正时,通过设置的重传缓冲区内备份的数据重新进行传输.当链路中发生间歇性故障导致数据出错且不能正确纠正时,数据包传输被截断,对被截断的数据重新添加头微片或尾微片,从而进行重新路由或资源释放.实验结果表明:该容错方法在不同故障情况下较对比对象,均较大地降低了延时,提高了吞吐率,该方法能有效地提高网络的可靠性,保证了系统性能.

       

      Abstract: As the link is the critical path between routers in NoC,it will seriously affect the network performance when faults occur in the link. For this reason, we propose a high reliable fault-tolerant method addressing transient and intermittent link faults. The method can detect real-time data error occurring in the network, and then define that whether the fault is transient fault or intermittent fault, thereby realizing fault-tolerance. As a result, it not only alleviates the network congestion and decreases the data delay, but also ensures the correct transmission of data, effectively guaranteeing the high reliability of the system. It is well known that when a transient fault occurs in the link, the fault link will result in a data error, which cannot be corrected properly. Therefore, the proposed method set up the retransmission buffer and then the backup data will be retransmitted. If an intermittent fault occurs, the packet transmission is truncated. To solve this problem, the proposed method adds a pseudo head flit and a pseudo tail flit to the truncated data, then re-routing begins and the occupied resource is released. Experimental results show that, in different fault conditions, this method outperforms the comparison objects with significant reduction in average packet latency and obvious improvement in throughput. In a word, this scheme can effectively improve network reliability in addition to ensuring network performance.

       

    /

    返回文章
    返回