大规模异构一致性融合计算系统的性能建模与优化

李仁刚; 唐轶男; 郭振华; 王丽; 宗瓒; 杨广文

doi:10.7544/issn1000-1239.202550120

大规模异构一致性融合计算系统的性能建模与优化

Performance Modeling and Optimization for Large-Scale Heterogeneous Consistency Integrated Computing System

摘要

摘要: 随着大规模人工智能应用的普及与发展，工业界和学术界对于人工智能算力的需求逐渐提升，结合了异构计算技术与缓存一致性技术的异构一致性融合计算系统逐渐成为未来构建智算中心的重要解决方案. 然而，由于异构计算和一致性互连技术尚不成熟，现有工作难以实现对该系统进行性能建模，导致研究者无法以低成本完成异构一致性融合计算系统的建设方案评估、计算性能预测以及系统优化方法评测等工作. 提出了一种面向异构一致性融合计算系统的性能建模工具HCSim，解决了现有建模仿真研究中对该系统拓扑架构建模困难、对一致性系统中工作负载建模不准确等问题，为研究者提供了一个可灵活建模、评估互连拓扑与AI计算任务的低成本、高效建模仿真工具. 利用HCSim，建模了千卡互连的异构一致性融合计算系统，并在该系统上模拟了LLAMA2-13B大语言模型（large language model，LLM）的数据并行分布式训练任务，探究了异构算力分布、带宽、时延和任务规模等变量对系统性能与任务执行效率的影响. 进一步地，针对异构一致性融合计算系统的通信问题，设计了相应的优化方案，并利用HCSim进行了效果验证. 仿真结果说明HCSim不仅能够满足异构一致性融合计算系统的性能建模需求，同时也可以被应用于评估、验证异构一致性融合计算系统的优化方案.

Abstract: With the widespread adoption and development of large-scale artificial intelligence applications, the demand for computing power in artificial intelligence from both industry and academia is increasing. Heterogeneous consistency integrated computing systems, which combine heterogeneous computing technology with cache consistency technology, are gradually becoming an important solution for building intelligent computing centers in the future. However, due to the immaturity of heterogeneous computing and consistency interconnect technologies, it is hard for existing research to model the performance of such systems, making it difficult for researchers to evaluate construction schemes, predict computing performance, and assess system optimization methods at a low cost. We propose HCSim, a performance modeling tool for heterogeneous consistency integrated computing systems, addressing challenges in modeling system topology and inaccuracies in workload modeling within consistency systems. HCSim provides researchers with a flexible, low-cost, and efficient modeling and simulation tool for evaluating interconnect topologies and AI computing tasks. Using HCSim, we model a heterogeneous consistency integrated computing system with thousands of accelerators, and simulate the data-parallel distributed training task of the LLAMA2-13B large language model (LLM) on this system, exploring the impact of variables such as heterogeneous computing power distribution, bandwidth, latency, and task scale on system performance and task execution efficiency. Furthermore, we also design optimization strategies for the communication issues in heterogeneous consistency integrated computing systems and validate the effectiveness of these strategies using HCSim. The simulation results show that HCSim not only meets the performance modeling needs of heterogeneous consistency integrated computing systems, but can also be applied to evaluate and verify optimization strategies for such systems.

HTML全文

参考文献(52)

施引文献

资源附件(1)