C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing

Chen Bingzhang; Liu Wei; Yu Xiaoyu

doi:10.7544/issn1000-1239.202220818

Journal of Computer Research and Development > 2024 > 61(4): 824-839. > DOI: 10.7544/issn1000-1239.202220818 CSTR: 32373.14.issn1000-1239.202220818

Chen Bingzhang, Liu Wei, Yu Xiaoyu. C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing[J]. Journal of Computer Research and Development, 2024, 61(4): 824-839. DOI: 10.7544/issn1000-1239.202220818

Citation:

PDF (3612 KB)

C-AMAT Measurement Method Based on Cache Access Mode and Its Application in Graph Computing

Chen Bingzhang^1,,
Liu Wei^{1, 2, ,},
Yu Xiaoyu¹

1.
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430073
2.
Hubei Provincial Key Laboratory of Transportation Internet of Things Technology (Wuhan University of Technology), Wuhan 430073

Funds: This work was supported by the National Natural Science Foundation of China（62272356）and the Open Project of the State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences)（CARCHB202015）.

More Information

Author Bio:
Chen Bingzhang: born in 1998. Master candidate. His main research interests include graph computing, memory and I/O system, and performance evaluation and optimization of storage system

Liu Wei: born in 1978. PhD, associate professor. His main research interests include graph computing, green computing and memory computing, cloud computing and edge computing, and intelligent manufacturing and industrial Internet

Yu Xiaoyu: born in 1999. Master candidate. His main research interests include graph computing and memory system performance optimization
Received Date: September 22, 2022
Revised Date: May 18, 2023
Available Online: December 14, 2023

Graphical Abstract

Abstract

Abstract

Graph application is an important branch in the field of big data. Although graph analysis has more significant performance advantages than traditional relational databases in displaying the relationship between entities, the irregular memory access pattern caused by a large number of random accesses in graph processing destroys the time and space locality of memory access, thus causing great performance pressure on the off-chip memory system. Therefore, how to correctly measure the performance of graph application in memory system is crucial for efficient graph application architecture optimization. As an extension of average memory access time (AMAT), concurrent average memory access time (C-AMAT) takes into account the locality and concurrency of memory access, and can more accurately evaluate and analyze the performance of modern processors in the storage system. However, the C-AMAT model ignores the fact that the lower-level cache layer of the processor accesses serially, which will lead to the inaccuracy of the calculation. At the same time, it is difficult to obtain the parameters required for the calculation due to the “pure miss cycle” and other reasons, which also makes it difficult for C-AMAT to be applied in practice. In order to match the computing model of C-AMAT with the memory access mode in modern computers, we propose parallel C-AMAT (PC-AMAT) and serial C-AMAT (SC-AMAT) based on C-AMAT. PC-AMAT and SC-AMAT respectively extend and characterize the computing model of C-AMAT from the parallel and serial access modes of cache. On this basis, we design and implement a “pure miss cycle” extraction algorithm to avoid the huge hardware overhead caused by direct measurement. The experimental results show that the correlation between PC-AMAT and SC-AMAT, and IPC is stronger than that of C-AMAT in single-core and multi-core mode. Finally, PC-AMAT and SC-AMAT are used to measure and analyze the memory performance of graph application, based on which the optimization strategy of graph application access is proposed.
- graph application,
- graph analysis,
- AMAT,
- C-AMAT,
- pure miss cycle,
- cache

FullText(HTML)

References (27)

References

[1]	Basak A, Li Shuangchen, Hu Xing, et al. Analysis and optimization of the memory hierarchy for graph processing workloads[C]// Proc of the 25th IEEE Int Symp on High Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2019: 373−386
[2]	Sun Xianhe, Wang Dawei. Concurrent average memory access time[J]. Computer, 2013, 47(5): 74−80
[3]	Binkert N, Beckmann B, Black G, et al. The gem5 simulator[J]. SIGARCH Computer Architecture News, 2011, 39(2): 1−7 doi: 10.1145/2024716.2024718
[4]	Beutel A, Akoglu L, Faloutsos C. Graph-based user behavior modeling: From prediction to fraud detection[C]// Proc of the 21st ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2015: 2309−2310
[5]	Barabási A L, Albert R. Emergence of scaling in random networks[J]. Science, 1999, 286(5439): 509−512 doi: 10.1126/science.286.5439.509
[6]	孙相征,张云泉,王婷,等. 对角线稀疏矩阵的 SpMV 自适应性能优化[J]. 计算机研究与发展,2013,50(3):648−656 Sun Xiangzheng, Zhang Yunquan, Wang Ting, et al. Auto- tuning of SpMV for diagonal sparse matrices[J]. Journal of Computer Research and Development, 2013, 50(3): 648−656 (in Chinese)
[7]	Balaji V, Crago N, Jaleel A, et al. P-opt: Practical optimal cache replacement for graph analytics[C]// Proc of the 27th IEEE Int Symp on High Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2021: 668−681
[8]	汤嘉武,郑龙,廖小飞,等. 面向高性能图计算的高效高层次综合方法[J]. 计算机研究与发展,2021,58(3):467−478 doi: 10.7544/issn1000-1239.2021.20190679 Tang Jiawu, Zheng Long, Liao Xiaofei, et al. Effective high-level synthesis for high-performance graph processing[J]. Journal of Computer Research and Development, 2021, 58(3): 467−478 (in Chinese) doi: 10.7544/issn1000-1239.2021.20190679
[9]	Faldu P, Diamond J, Grot B. Domain-specialized cache management for graph analytics[C]// Proc of the 26th IEEE Int Symp on High Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2020: 234−248
[10]	Cooksey R, Jourdan S, Grunwald D. A stateless, content-directed data prefetching mechanism[J]. ACM SIGPLAN Notices, 2002, 37(10): 279−290 doi: 10.1145/605432.605427
[11]	Agarwal A, Roy K, Vijaykumar T N. Exploring high bandwidth pipelined cache architecture for scaled technology[C]// Proc of the 6th Design, Automation and Test in Europe Conf and Exhibition. Piscataway, NJ: IEEE, 2003: 778−783
[12]	Kroft D. Lockup-free instruction fetch/prefetch cache organization[C]// Proc of the 25th Int Symp on Computer Architecture. New York: ACM, 1998: 195−201
[13]	Beamer S, Asanović K, Patterson D. The GAP benchmark suite[J]. arXiv preprint, arXiv: 1508. 03619, 2015
[14]	Wang Dawei, Sun Xianhe. APC: A novel memory metric and measurement methodology for modern memory systems[J]. IEEE Transactions on Computers, 2013, 63(7): 1626−1639
[15]	Soltis D, Gibson M. Itanium 2 processor microarchitecture overview[C]// Proc of the 14th Hot Chips. Los Alamitos: IEEE Computer Society, 2002: 44−54
[16]	姚永斌. 超标量处理器设计[M]. 北京:清华大学出版社,2014 Yao Yongbin. Superscalar Processor Design[M]. Beijing: Tsinghua University Press, 2014 (in Chinese)
[17]	Gold X. 6130-Intel-WikiChip[EB/OL]. [2022-07-16].https://en.wikichip.org/ wiki/WikiChip
[18]	孙贤和. C-AMAT:大数据时代的数据存取模型[J]. 中国计算机学会通讯,2014,10(6):19−22 Sun Xianhe. C-AMAT: Data access model in the age of big data[J]. Communications of the CCF, 2014, 10(6): 19−22 (in Chinese)
[19]	Fog A. The Microarchitecture of Intel, AMD and VIA CPUs : An optimization guide for assembly programmers and compiler makers[R]. Denmark: Copenhagen University College of Engineering, 2022
[20]	Beamer S, Asanovic K, Patterson D. Direction-optimizing breadth-first search[C/OL]// Proc of the 12th Int Conf on High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2012 [2022-07-16]. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=646845 8
[21]	Brin S, Page L. The anatomy of a large-scale hypertextual web search engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107−117 doi: 10.1016/S0169-7552(98)00110-X
[22]	Madduri K, Ediger D, Jiang K, et al. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets[C]// Proc of the 23rd IEEE Int Symp on Parallel & Distributed Processing. Piscataway, NJ: IEEE, 2009: 1−8
[23]	Sutton M, Ben-Nun T, Barak A. Optimizing parallel graph connectivity computation via subgraph sampling[C]// Proc of the 32nd IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2018: 12−21
[24]	da Rosa Righi R, Lehmann M, Gomes M M, et al. A survey on global management view: Toward combining system monitoring, resource management, and load prediction[J]. Journal of Grid Computing, 2019, 17(9): 473−502
[25]	VanVoorhis C R W, Morgan B L. Understanding power and rules of thumb for determining sample sizes[J]. Tutorials in Quantitative Methods for Psychology, 2007, 3(2): 43−50 doi: 10.20982/tqmp.03.2.p043
[26]	叶楠,郝子宇,郑方,等. BFS算法与众核处理器的适应性研究[J]. 计算机研究与发展,2015,52(5):1187−1197 doi: 10.7544/issn1000-1239.2015.20140004 Ye Nan, Hao Ziyu, Zheng Fang, et al. Adaptability of BFS algorithm and many-core processor[J]. Journal of Computer Research and Development, 2015, 52(5): 1187−1197 (in Chinese) doi: 10.7544/issn1000-1239.2015.20140004
[27]	Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach[M]. Amsterdam: Elsevier, 2011

Cited By

Cited by

Periodical cited type(12)

1.	冯杨洋，汪庆，舒继武. 大模型时代下的存储系统挑战与技术发展. 大数据. 2025(01): 79-91 .
2.	尚碧筠，韩银俊，肖蓉，陈正华，屠要峰，董振江. ScaleFS：面向大语言模型的高性能可扩展元数据设计. 计算机研究与发展. 2025(03): 589-604 . 本站查看
3.	吴文隆，尹海莲，王宁，徐梦飞，赵鑫喆，殷崭祚，刘元睿，王昊奋，丁岩，李博涵. 大语言模型和知识图谱协同的跨域异质数据查询框架. 计算机研究与发展. 2025(03): 605-619 . 本站查看
4.	葛旭冉，欧洋，王博，赵宇，吴利舟，王子聪，陈志广，肖侬. 大语言模型推理中的存储优化技术综述. 计算机研究与发展. 2025(03): 545-562 . 本站查看
5.	张蒋良，蒲秋梅，罗训，李达. 基于ChatGPT的中国南海贝类知识智能服务. 物联网学报. 2025(01): 138-149 .
6.	马超，李武峰，陈羽飞，何永君，田晓鹏. Chat GPT类大语言模型赋能电力标准数字化转型的核心技术、技术特征及应用展望. 高电压技术. 2025(04): 1727-1746 .
7.	栾昊立，王晓东，杨锐，郝建宇，赵铭浩，尹祖新，王丽琼. AI智算发展对高速光模块的应用需求研究. 邮电设计技术. 2024(06): 7-11 .
8.	刘少堃，何仲廉，李彬，李超峰. 基于大模型的电子病历自动生成系统的设计与应用探讨. 中国数字医学. 2024(08): 8-13 .
9.	孙一尧，刘馨，刘晓丹，李琳. 人工智能应用设计创新与非物质文化遗产结合——以新疆毛皮画推广APP为例. 鞋类工艺与设计. 2024(15): 151-153 .
10.	童俊杰，申佳，赫罡，张奎. 运营商智算中心建设思路及方案. 邮电设计技术. 2024(09): 68-73 .
11.	丛凯，陈宏，苏征，任心钰，黄若铖，李国. 人工智能大模型在电子政务中的应用研究. 中国信息界. 2024(06): 92-94 .
12.	赵明江，刘艳梅，杨婧一，张星奎，贾占宇. 基于非Transformer架构大模型的技术研究及应用探索. 电力大数据. 2024(06): 11-21 .