Abstract:
Large-scale CC-NUMA system usually employs two-tier architecture to reduce the overhead of cache coherence and enhance the performance of system. In a two-tier system, various processors and a coherence chip are located in an intra-clump cache coherency domain, and various coherence chips are interconnected by a system interconnection network so as to form an inter-clump cache coherency domain. Since every processor occupies at least one processor ID number in the cache coherency domain, and the number of processor ID numbers that can be distinguished by every processor is limited, CC-NUMA system expands the scale only by increasing the number of clumps, not by increasing the scale of clump. This leads to the over-large number of clumps and complicated topology structure in a multi-processor system, thereby increasing the bandwidth and latency of cross-clump memory access. To solve this problem, we propose a new method to construct multi-processor system, called MPD, in which a clump has multiple parallel cache coherency domains. This method solves the problem of limited clump scale brought about by limited number of processor supportable by a processor in a domain. Compared with traditional CC-NUMA system, MPD system not only significantly reduces the system topological complexity, but also effectively improves the system performance. Theoretical analysis and simulation results show: compared with 32-way CC-NUMA system, MPD system constructed by same processors can achieve 75% reduction in the number of nodes, more than 40% savings in consistency directory storage, 27.9% average reduction in access latency and about 14.4% improvement in system performance.