ISSN 1000-1239 CN 11-1777/TP

• 系统结构 •

### MPD：结点具有多个并行缓存一致性域的CC-NUMA系统

1. (高效能服务器和存储技术国家重点实验室(浪潮集团有限公司) 北京 100085) (chenjch@inspur.com)
• 出版日期: 2017-04-01
• 基金资助:
国家“八六三”高技术研究发展计划基金项目(2013AA011701)

### MPD: A CC-NUMA System with Clump Having Multiple Parallel Cache Coherency Domains

Chen Jicheng, Zhao Yaqian, Li Yihan, Wang Endong, Shi Hongzhi, Tang Shibin

1. (State Key Laboratory of High-End Server & Storage Technology (Inspur Group Company Limited), Beijing 100085)
• Online: 2017-04-01

Abstract: Large-scale CC-NUMA system usually employs two-tier architecture to reduce the overhead of cache coherence and enhance the performance of system. In a two-tier system, various processors and a coherence chip are located in an intra-clump cache coherency domain, and various coherence chips are interconnected by a system interconnection network so as to form an inter-clump cache coherency domain. Since every processor occupies at least one processor ID number in the cache coherency domain, and the number of processor ID numbers that can be distinguished by every processor is limited, CC-NUMA system expands the scale only by increasing the number of clumps, not by increasing the scale of clump. This leads to the over-large number of clumps and complicated topology structure in a multi-processor system, thereby increasing the bandwidth and latency of cross-clump memory access. To solve this problem, we propose a new method to construct multi-processor system, called MPD, in which a clump has multiple parallel cache coherency domains. This method solves the problem of limited clump scale brought about by limited number of processor supportable by a processor in a domain. Compared with traditional CC-NUMA system, MPD system not only significantly reduces the system topological complexity, but also effectively improves the system performance. Theoretical analysis and simulation results show: compared with 32-way CC-NUMA system, MPD system constructed by same processors can achieve 75% reduction in the number of nodes, more than 40% savings in consistency directory storage, 27.9% average reduction in access latency and about 14.4% improvement in system performance.