高级检索
    王 真 江建慧 员春欣. 高性能处理器的差错校正技术[J]. 计算机研究与发展, 2008, 45(2): 358-366.
    引用本文: 王 真 江建慧 员春欣. 高性能处理器的差错校正技术[J]. 计算机研究与发展, 2008, 45(2): 358-366.
    Wang Zhen, Jiang Jianhui, and Yuan Chunxin. Error-Correcting Techniques for High-Performance Processors[J]. Journal of Computer Research and Development, 2008, 45(2): 358-366.
    Citation: Wang Zhen, Jiang Jianhui, and Yuan Chunxin. Error-Correcting Techniques for High-Performance Processors[J]. Journal of Computer Research and Development, 2008, 45(2): 358-366.

    高性能处理器的差错校正技术

    Error-Correcting Techniques for High-Performance Processors

    • 摘要: 随着芯片密度的不断增加和对可靠性要求的不断提高,高性能处理器的容错设计越来越受到关注.对近年来高性能处理器的差错校正技术进行了分析和比较,它们被分为时钟级差错恢复、指令级差错恢复、线程级差错恢复以及重构等4类,研究对象包括研究方案、原型和产品.研究结果表明,以片上多处理器和/或同时多线程为特征的高性能处理器除了沿用传统的容错技术之外,多以固有的、旨在为改善性能而重复设置的硬件资源为基础来设计容错机制和调度方案.

       

      Abstract: The downscaling of feature size of CMOS technology results in faster transistors and lower supply voltages. This trend contributes to the overall performance improvement of integrated circuits, but it also brings more challenges to the reliability of complex circuits like microprocessors. Accordingly, the fault-tolerance design of high-performance processors becomes more and more important. Till now much work has been done for error detection and correction in processors. Some novel fault tolerant microprocessor architectures are proposed recently, such as the simultaneously and redundantly threaded processors with recovery architecture. In this paper, a comprehensive survey on conventional and up-to-date error correction techniques for high-performance processors is given. A novel taxonomy is presented, by which the fault tolerant techniques for processors are categorized into clock-level error recovery, instruction-level error recovery, thread-level error recovery and reconfiguration. Many microarchitecture schemes, prototype systems and industrial products are analyzed and detailed fault tolerant strategies and schedule algorithms are compared. It is shown that for modern processors characterized by chip multiprocessor and/or simultaneous multithreading, the reliability is mostly improved by the fault-tolerance techniques based on inherent replicated hardware resources that are designed for improving performance.

       

    /

    返回文章
    返回