高级检索
    陈继承, 赵雅倩, 李一韩, 王恩东, 史宏志, 唐士斌. MPD:结点具有多个并行缓存一致性域的CC-NUMA系统[J]. 计算机研究与发展, 2017, 54(4): 775-786. DOI: 10.7544/issn1000-1239.2017.20160142
    引用本文: 陈继承, 赵雅倩, 李一韩, 王恩东, 史宏志, 唐士斌. MPD:结点具有多个并行缓存一致性域的CC-NUMA系统[J]. 计算机研究与发展, 2017, 54(4): 775-786. DOI: 10.7544/issn1000-1239.2017.20160142
    Chen Jicheng, Zhao Yaqian, Li Yihan, Wang Endong, Shi Hongzhi, Tang Shibin. MPD: A CC-NUMA System with Clump Having Multiple Parallel Cache Coherency Domains[J]. Journal of Computer Research and Development, 2017, 54(4): 775-786. DOI: 10.7544/issn1000-1239.2017.20160142
    Citation: Chen Jicheng, Zhao Yaqian, Li Yihan, Wang Endong, Shi Hongzhi, Tang Shibin. MPD: A CC-NUMA System with Clump Having Multiple Parallel Cache Coherency Domains[J]. Journal of Computer Research and Development, 2017, 54(4): 775-786. DOI: 10.7544/issn1000-1239.2017.20160142

    MPD:结点具有多个并行缓存一致性域的CC-NUMA系统

    MPD: A CC-NUMA System with Clump Having Multiple Parallel Cache Coherency Domains

    • 摘要: 大规模高速缓存一致性非均匀存储访问(cache coherence non-uniform memory access, CC-NUMA)系统通常采用两级一致性域方法来降低缓存一致性协议维护开销,提升系统性能.两级一致性域系统中,多个处理器互连,形成结点内一致性域;多个结点互连,形成结点间一致性域.然而,受限于处理器直连能力与处理器可识别ID数,系统的单结点规模有限,系统规模的扩展不得不依靠增加结点数来实现,使得大规模CC-NUMA系统的结点间互连复杂度上升,跨结点访问带宽和延迟急剧增长,影响了系统性能的有效扩展.MPD系统通过在结点内构建多个并行缓存一致性域,突破了处理器直连能力与可识别ID数对单结点规模的限制,能够大幅减少结点数量,并将部分结点间访问转化为结点内访问,实现系统性能的有效扩展.理论分析和实验结果表明:采用同规格处理器的32路系统中,结点内4个并行缓存一致性域的MPD系统可实现结点数目减少75%、一致性目录存储开销节省40%以上、平均访问延迟降低约27.9%、系统整体性能提升约14.4%.

       

      Abstract: Large-scale CC-NUMA system usually employs two-tier architecture to reduce the overhead of cache coherence and enhance the performance of system. In a two-tier system, various processors and a coherence chip are located in an intra-clump cache coherency domain, and various coherence chips are interconnected by a system interconnection network so as to form an inter-clump cache coherency domain. Since every processor occupies at least one processor ID number in the cache coherency domain, and the number of processor ID numbers that can be distinguished by every processor is limited, CC-NUMA system expands the scale only by increasing the number of clumps, not by increasing the scale of clump. This leads to the over-large number of clumps and complicated topology structure in a multi-processor system, thereby increasing the bandwidth and latency of cross-clump memory access. To solve this problem, we propose a new method to construct multi-processor system, called MPD, in which a clump has multiple parallel cache coherency domains. This method solves the problem of limited clump scale brought about by limited number of processor supportable by a processor in a domain. Compared with traditional CC-NUMA system, MPD system not only significantly reduces the system topological complexity, but also effectively improves the system performance. Theoretical analysis and simulation results show: compared with 32-way CC-NUMA system, MPD system constructed by same processors can achieve 75% reduction in the number of nodes, more than 40% savings in consistency directory storage, 27.9% average reduction in access latency and about 14.4% improvement in system performance.

       

    /

    返回文章
    返回