• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向高性能计算的众核处理器轻量级错误恢复技术研究

郑方, 沈莉, 李宏亮, 谢向辉

郑方, 沈莉, 李宏亮, 谢向辉. 面向高性能计算的众核处理器轻量级错误恢复技术研究[J]. 计算机研究与发展, 2015, 52(6): 1316-1328. DOI: 10.7544/issn1000-1239.2015.20150119
引用本文: 郑方, 沈莉, 李宏亮, 谢向辉. 面向高性能计算的众核处理器轻量级错误恢复技术研究[J]. 计算机研究与发展, 2015, 52(6): 1316-1328. DOI: 10.7544/issn1000-1239.2015.20150119
Zheng Fang, Shen Li, Li Hongliang, Xie Xianghui. Lightweight Error Recovery Techniques of Many-Core Processor in High Performance Computing[J]. Journal of Computer Research and Development, 2015, 52(6): 1316-1328. DOI: 10.7544/issn1000-1239.2015.20150119
Citation: Zheng Fang, Shen Li, Li Hongliang, Xie Xianghui. Lightweight Error Recovery Techniques of Many-Core Processor in High Performance Computing[J]. Journal of Computer Research and Development, 2015, 52(6): 1316-1328. DOI: 10.7544/issn1000-1239.2015.20150119
郑方, 沈莉, 李宏亮, 谢向辉. 面向高性能计算的众核处理器轻量级错误恢复技术研究[J]. 计算机研究与发展, 2015, 52(6): 1316-1328. CSTR: 32373.14.issn1000-1239.2015.20150119
引用本文: 郑方, 沈莉, 李宏亮, 谢向辉. 面向高性能计算的众核处理器轻量级错误恢复技术研究[J]. 计算机研究与发展, 2015, 52(6): 1316-1328. CSTR: 32373.14.issn1000-1239.2015.20150119
Zheng Fang, Shen Li, Li Hongliang, Xie Xianghui. Lightweight Error Recovery Techniques of Many-Core Processor in High Performance Computing[J]. Journal of Computer Research and Development, 2015, 52(6): 1316-1328. CSTR: 32373.14.issn1000-1239.2015.20150119
Citation: Zheng Fang, Shen Li, Li Hongliang, Xie Xianghui. Lightweight Error Recovery Techniques of Many-Core Processor in High Performance Computing[J]. Journal of Computer Research and Development, 2015, 52(6): 1316-1328. CSTR: 32373.14.issn1000-1239.2015.20150119

面向高性能计算的众核处理器轻量级错误恢复技术研究

基金项目: 国家“八六三”高技术研究发展计划基金项目(2014AA01A301);“核高基”国家科技重大专项基金项目(2013ZX0102-8001-001-001)
详细信息
  • 中图分类号: TP302

Lightweight Error Recovery Techniques of Many-Core Processor in High Performance Computing

  • 摘要: 随着半导体技术进步,单个芯片上集成大量核心的众核处理器已经广泛应用于高性能计算领域.相比多核处理器,众核处理器能提供更好的计算密度和能效比,但同时也面临越来越严重的可靠性挑战.需要设计高效的处理器容错机制,有效保证课题运行效率的同时不带来较大的芯片功耗和面积开销.在一款自主众核处理器DFMC(deeply fused and heterogeneous many-core)原型基础上,根据核心上运行的应用程序是否具有关联性特征,提出并实现了面向众核处理器的独立和协同2种轻量级错误恢复技术.其中,协同恢复技术由集中部件进行管理,通过协同恢复总线互连,出错时将与错误相关联的多个核心快速回卷到正确状态.2种错误恢复技术中,保留和恢复过程均通过定制的指令实现,恢复所需要的信息保留在运算核心内部,以保证对课题性能的影响最小化.实验表明,通过上述技术只增加了1.257%的芯片面积,可解决自主众核处理器约80%的瞬时错误,且对课题性能、芯片时序和功耗影响很小,可有效地提高众核处理器的容错能力.
    Abstract: Due to the advances in semiconductor techniques, many-core processors with a large number of cores have been widely used in high-performance computing. Compared with multi-core processors, many-core processors can provide higher computing density and ratio of computation to power consumption. However, many-core processors must design more efficient fault tolerance mechanism to solve the serious reliability problem and alleviate performance degradation, while the cost of chip area and power must be low. In this paper, we present a prototype of home-grown many-core processor DFMC(deeply fused and heterogeneous many-core). Referring to the processor’s architecture and the applications related to the characters among cores, independent and coordinated lightweight error recovery techniques are proposed. When errors are detected, the related cores can roll back to consistent recovery line quickly by coordinated error recovery technique which is controlled by centralized unit and connected by coordinated recovery bus. To guarantee the applications’ performance, error recovery techniques are performed by instructions and recovery states are saved in cores. Our experimental results show that the effect of the techniques is significant, and the transient errors can be corrected by 80% with the chip area increased by 1.257%. The influences of lightweight error recovery techniques on applications performance, chip frequency and chip power consumption are very little. The techniques can improve the fault tolerant ability of the many-core processor.
  • 期刊类型引用(3)

    1. 苏兆品,张羚,张国富. 低比特率语音流大容量分层隐写方法. 中国图象图形学报. 2022(12): 3461-3475 . 百度学术
    2. 李丽惠. 云计算环境下的数据安全传输方式研究. 漳州职业技术学院学报. 2020(04): 80-86 . 百度学术
    3. 廖克顺. 基于抗转码视频处理技术的图像隐写算法. 广西师范学院学报(自然科学版). 2019(02): 50-54 . 百度学术

    其他类型引用(2)

计量
  • 文章访问数: 
  • HTML全文浏览量:  0
  • PDF下载量: 
  • 被引次数: 5
出版历程
  • 发布日期:  2015-05-31

目录

    /

    返回文章
    返回