高级检索
    陈炳彰, 刘伟, 于萧钰. 基于缓存访问模式的C-AMAT测量方法及其在图计算中的应用[J]. 计算机研究与发展, 2024, 61(4): 824-839. DOI: 10.7544/issn1000-1239.202220818
    引用本文: 陈炳彰, 刘伟, 于萧钰. 基于缓存访问模式的C-AMAT测量方法及其在图计算中的应用[J]. 计算机研究与发展, 2024, 61(4): 824-839. DOI: 10.7544/issn1000-1239.202220818
    Chen Bingzhang, Liu Wei, Yu Xiaoyu. C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing[J]. Journal of Computer Research and Development, 2024, 61(4): 824-839. DOI: 10.7544/issn1000-1239.202220818
    Citation: Chen Bingzhang, Liu Wei, Yu Xiaoyu. C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing[J]. Journal of Computer Research and Development, 2024, 61(4): 824-839. DOI: 10.7544/issn1000-1239.202220818

    基于缓存访问模式的C-AMAT测量方法及其在图计算中的应用

    C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing

    • 摘要: 图应用是大数据领域的一个重要分支,尽管图分析在显示表示实体之间关系的能力相比传统的关系数据库具有更显著的性能优势,但图处理中大量的随机访问所导致的不规则访存模式破坏了访存的时间和空间局部性,从而对片外内存系统造成了很大的性能压力. 因此如何正确度量图应用在内存系统中的性能,对于高效的图应用体系结构优化设计至关重要. 并发式平均存储访问时间(concurrent average memory access time,C-AMAT)模型作为平均存储访问时间(average memory access time,AMAT)的扩展,同时考虑了存储器访问的局部性和并发性,能够更准确地对现代处理器下图应用在存储系统中的性能进行评估分析. 但C-AMAT 模型忽略了处理器下级cache层串行访问的事实,这会导致计算的不准确性,同时由于计算所需参数纯粹缺失周期等难以获取的原因,也使得C-AMAT难以进行实际应用. 为了使C-AMAT的计算模型与现代计算机中的存储器访问模式相匹配,基于C-AMAT提出了PC-AMAT(parallel C-AMAT),SC-AMAT(serial C-AMAT),其中PC-AMAT,SC-AMAT分别从cache的并行和串行访问模式对C-AMAT的计算模型进行了细粒度的扩展和表征,并在此基础上设计并实现了纯粹缺失周期的提取算法,避免直接测量带来的巨大硬件开销. 实验结果表明,在单核和多核模式下,PC-AMAT和SC-AMAT与IPC之间的相关性比C-AMAT更强,最终利用PC-AMAT和SC-AMAT度量和分析了图应用的存储器性能并据此提出图应用访存优化策略.

       

      Abstract: Graph application is an important branch in the field of big data. Although graph analysis has more significant performance advantages than traditional relational databases in displaying the relationship between entities, the irregular memory access pattern caused by a large number of random accesses in graph processing destroys the time and space locality of memory access, thus causing great performance pressure on the off-chip memory system. Therefore, how to correctly measure the performance of graph application in memory system is crucial for efficient graph application architecture optimization. As an extension of average memory access time (AMAT), concurrent average memory access time (C-AMAT) takes into account the locality and concurrency of memory access, and can more accurately evaluate and analyze the performance of modern processors in the storage system. However, the C-AMAT model ignores the fact that the lower-level cache layer of the processor accesses serially, which will lead to the inaccuracy of the calculation. At the same time, it is difficult to obtain the parameters required for the calculation due to the “pure miss cycle” and other reasons, which also makes it difficult for C-AMAT to be applied in practice. In order to match the computing model of C-AMAT with the memory access mode in modern computers, we propose parallel C-AMAT (PC-AMAT) and serial C-AMAT (SC-AMAT) based on C-AMAT. PC-AMAT and SC-AMAT respectively extend and characterize the computing model of C-AMAT from the parallel and serial access modes of cache. On this basis, we design and implement a “pure miss cycle” extraction algorithm to avoid the huge hardware overhead caused by direct measurement. The experimental results show that the correlation between PC-AMAT and SC-AMAT, and IPC is stronger than that of C-AMAT in single-core and multi-core mode. Finally, PC-AMAT and SC-AMAT are used to measure and analyze the memory performance of graph application, based on which the optimization strategy of graph application access is proposed.

       

    /

    返回文章
    返回