映天湖：晶圆级通用异构多芯粒千万亿次计算机

董文阔; 殷春锁; 张志锰; 王鹏超; 沙江; 王梦雅; 朱旻琦; 刘宏伟; 刘宇航; 郝沁汾

doi:10.7544/issn1000-1239.202550163

映天湖：晶圆级通用异构多芯粒千万亿次计算机

Yingtian-Lake: A Wafer-Scale General-Purpose Heterogeneous Multi-chiplet Petascale Computer

摘要

摘要: 晶圆级计算机通过先进封装技术集成多芯粒，突破传统芯片面积限制实现算力扩展，但现存方案因领域专用化设计难以满足通用计算需求. 面向高性能计算与智能计算场景的负载特征，提出一种新型通用化晶圆级系统架构——映天湖. 首先通过解耦式计算模组-互连基板架构设计，结合标准化I/O接口支持多种计算模组；其次构建可重构晶上网络，采用动态拓扑重构技术适配不同业务流量模式；继而开发拓扑无关的容错控制，保障计算单元失效时的服务持续性. 实验结果表明，所设计的可重构晶上网络可实现秒级拓扑切换时延. 基于TSMC 28 nm工艺成功流片验证的16个计算模组的原型系统，在高性能线性代数计算任务中展现了约1.45倍的吞吐量提升，在深度学习推理任务中则展现约1.78倍的时延性能提升，单晶圆可实现千万亿次性能，证实该架构在实现晶圆级系统通用化方面的技术突破，为下一代异构计算平台提供了可扩展的硬件基础架构.

Abstract: Wafer-scale computer integrates multiple chiplets through advanced packaging technologies, overcoming traditional chip area limitations to achieve computational power scaling. However, existing domain-specific designs struggle to meet generalized computing requirements. In this study, we propose Yingtian-Lake, which is a wafer-scale general-purpose computer targeting workload characteristics of high-performance computing and intelligent computing scenarios. First, a decoupled computing module-interposer architecture design with standardized I/O interfaces enables multi-modal computing module compatibility. Second, a reconfigurable wafer-scale network employing dynamic topology adaptation technology accommodates diverse traffic patterns. Third, a fault-aware tolerant routing algorithm ensures service continuity during computing unit failures. Experimental results demonstrate that the proposed reconfigurable network achieves second-level topology switching latency. The prototyped 16-module system fabricated with TSMC 28 nm process shows 1.45 times and 1.78 times energy efficiency improvements in high-performance linear algebra computations and deep learning inference tasks respectively, while delivering petaflops-level performance on a single wafer. This breakthrough architecture validates the technical feasibility of universal wafer-scale systems, establishing a scalable hardware foundation for next-generation heterogeneous computing platforms.

HTML全文

参考文献(33)

施引文献

资源附件(1)