• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

高性能众核处理器申威26010

胡向东, 柯希明, 尹飞, 张新, 马永飞, 颜世云, 马超

胡向东, 柯希明, 尹飞, 张新, 马永飞, 颜世云, 马超. 高性能众核处理器申威26010[J]. 计算机研究与发展, 2021, 58(6): 1155-1165. DOI: 10.7544/issn1000-1239.2021.20201041
引用本文: 胡向东, 柯希明, 尹飞, 张新, 马永飞, 颜世云, 马超. 高性能众核处理器申威26010[J]. 计算机研究与发展, 2021, 58(6): 1155-1165. DOI: 10.7544/issn1000-1239.2021.20201041
Hu Xiangdong, Ke Ximing, Yin Fei, Zhao Xin, Ma Yongfei, Yan Shiyun, Ma Chao. Shenwei-26010: A High-Performance Many-Core Processor[J]. Journal of Computer Research and Development, 2021, 58(6): 1155-1165. DOI: 10.7544/issn1000-1239.2021.20201041
Citation: Hu Xiangdong, Ke Ximing, Yin Fei, Zhao Xin, Ma Yongfei, Yan Shiyun, Ma Chao. Shenwei-26010: A High-Performance Many-Core Processor[J]. Journal of Computer Research and Development, 2021, 58(6): 1155-1165. DOI: 10.7544/issn1000-1239.2021.20201041

高性能众核处理器申威26010

基金项目: “核高基”国家科技重大专项基金项目(2013ZX01028-001-001)
详细信息
  • 中图分类号: TP338

Shenwei-26010: A High-Performance Many-Core Processor

Funds: This work was supported by the National Science and Technology Major Projects of Hegaoji (2013ZX01028-001-001).
  • 摘要: 申威26010高性能众核处理器在多核处理器申威1600基础上,采用片上系统(system on chip, SoC)技术,在单芯片内集成4个运算控制核心和256个运算核心,采用自主设计的64位申威RISC(reduced instruction set computer)指令系统,支持256位SIMD(single instruction multiple data)整数和浮点向量加速运算,单芯片双精度浮点峰值性能达3.168TFLOPS.申威26010处理器基于28nm工艺流片,芯片die面积超过500mm\+2,芯片260个核心稳定运行频率达1.5GHz.申威26010处理器从结构级、微结构级到电路级,综合采用多种低功耗设计技术,峰值能效比达10.559GFLOPS/W.芯片运行频率和能效比均超过同时期国际同类型处理器.申威26010通过在高频率设计、稳定可靠性设计和成品率设计等方面的技术创新,有效解决了芯片在实现高性能目标中所遇到的高频率目标、功耗墙、稳定可靠性和成品率等难题,成功大规模应用于国产10万万亿次超级计算机系统“神威·太湖之光”,有效满足了科学与工程应用的计算需求.
    Abstract: Based on the multi-core processor Shenwei 1600, the high-performance many-core processor Shenwei 26010 adopts SoC (system on chip) technology, and integrates 4 computing-control cores and 256 computing cores in a single chip. It adopts a 64-bit RISC (reduced instruction set computer) instruction set designed with an original design, and supports 256-bit SIMD (single instruction multiple data) integer and floating-point vector-acceleration operations. Its peak performance for double precision floating-point operations reaches 3.168TFLOPS. Shenwei 26010 processor is manufactured using 28nm process technology. The die area of the chip is more than 500mm\+2, and the 260 cores of the chip can run stably with a frequency of 1.5GHz. Shenwei 26010 processor adopts a variety of low power-consumption designs on the architecture level, the microarchitecture level, and the circuit level, and thus, leading to a peak energy-efficiency-ratio of 10.559GFLOPS/W. Notably, both the operating frequency and the energy-efficiency-ratio of the chip are higher than those of the worldwide contemporary processor products. Through the technical innovations of high frequency design, stable reliability design and yield design, Shenwei 26010 has effectively solved the issues of high frequency target, power consumption wall, stability and reliability, and yield, all of which are encountered when pursuing the goal of high-performance computing. It has been applied successfully to a 100PFLOPS supercomputer system named “Sunway TaihuLight” on a large scale, and therefore, can adequately meet the computing requirements for both scientific and engineering applications.
  • 期刊类型引用(12)

    1. 张振东,王彤,刘鹏. 面向申威众核处理器的规则处理优化技术. 计算机研究与发展. 2024(01): 66-85 . 本站查看
    2. 钱宏,王飞,刘沙,郑天宇,宋佳伟,安虹. 面向SW26010Pro处理器的全局符号重定位优化. 计算机系统应用. 2024(02): 62-71 . 百度学术
    3. 郜晨,何升,杭骁骞. 基于申威NMII的锁死故障监测与诊断. 计算机应用研究. 2024(04): 1015-1021 . 百度学术
    4. 匡晓云,黄开天,杨祎巍. 基于高密度计算的多核处理器电力芯片低功耗设计系统. 电子设计工程. 2024(07): 6-9+15 . 百度学术
    5. 陶小涵,庞建民,朱雨,王博漾,徐金龙. 面向申威异构众核处理器的矩阵乘分块参数模型. 信息工程大学学报. 2023(01): 65-71 . 百度学术
    6. 吕昊,郭江宇,郝志超,庄成,刘健. 基于国产软硬件的深度学习平台设计与验证. 火力与指挥控制. 2023(07): 134-139 . 百度学术
    7. 肖谦,赵美佳,李名凡,沈莉,陈俊仕,周文浩,王飞,安虹. 面向新一代国产异构众核处理器的数据流计算系统. 计算机研究与发展. 2023(10): 2405-2417 . 本站查看
    8. 方燕飞,刘齐,董恩铭,李雁冰,过锋,王谛,何王全,漆锋滨. 面向E级超算系统的众核片上存储层次研究. 计算机工程. 2023(12): 10-24 . 百度学术
    9. 郑臣明,姚宣霞,周芳,郑雪峰,杨晓君,戴荣. 龙芯处理器服务器芯片组的适配与实现. 工程科学学报. 2022(07): 1244-1254 . 百度学术
    10. 马永飞,高成振,黄金明,李研. 高性能众核处理器芯片时钟网络设计. 计算机工程. 2022(08): 25-29+36 . 百度学术
    11. 郑臣明,姚宣霞,周芳,郑雪峰,杨晓君,戴荣. 基于硬件虚拟化的云服务器设计与实现. 工程科学学报. 2022(11): 1935-1945 . 百度学术
    12. 许乐,安虹,陈俊仕,张鹏飞,武铮. 基于神威·太湖之光的非结构网格计算加速算法. 计算机工程. 2022(12): 45-53 . 百度学术

    其他类型引用(9)

计量
  • 文章访问数:  3455
  • HTML全文浏览量:  16
  • PDF下载量:  712
  • 被引次数: 21
出版历程
  • 发布日期:  2021-05-31

目录

    /

    返回文章
    返回