高性能处理器的差错校正技术

王  真  江建慧  员春欣

高性能处理器的差错校正技术

王真江建慧员春欣

Error-Correcting Techniques for High-Performance Processors

Wang Zhen, Jiang Jianhui, and Yuan Chunxin

摘要

摘要: 随着芯片密度的不断增加和对可靠性要求的不断提高，高性能处理器的容错设计越来越受到关注.对近年来高性能处理器的差错校正技术进行了分析和比较，它们被分为时钟级差错恢复、指令级差错恢复、线程级差错恢复以及重构等4类，研究对象包括研究方案、原型和产品.研究结果表明，以片上多处理器和/或同时多线程为特征的高性能处理器除了沿用传统的容错技术之外，多以固有的、旨在为改善性能而重复设置的硬件资源为基础来设计容错机制和调度方案.

Abstract: The downscaling of feature size of CMOS technology results in faster transistors and lower supply voltages. This trend contributes to the overall performance improvement of integrated circuits, but it also brings more challenges to the reliability of complex circuits like microprocessors. Accordingly, the fault-tolerance design of high-performance processors becomes more and more important. Till now much work has been done for error detection and correction in processors. Some novel fault tolerant microprocessor architectures are proposed recently, such as the simultaneously and redundantly threaded processors with recovery architecture. In this paper, a comprehensive survey on conventional and up-to-date error correction techniques for high-performance processors is given. A novel taxonomy is presented, by which the fault tolerant techniques for processors are categorized into clock-level error recovery, instruction-level error recovery, thread-level error recovery and reconfiguration. Many microarchitecture schemes, prototype systems and industrial products are analyzed and detailed fault tolerant strategies and schedule algorithms are compared. It is shown that for modern processors characterized by chip multiprocessor and/or simultaneous multithreading, the reliability is mostly improved by the fault-tolerance techniques based on inherent replicated hardware resources that are designed for improving performance.

HTML全文

参考文献(0)

施引文献

资源附件(0)