使用自适应缓存结构优化图计算

贾朝阳; 陈湜; 刘京雨; 沈立

doi:10.7544/issn1000-1239.202550219

使用自适应缓存结构优化图计算

Optimizing Graph Computing with an Adaptive Cache Structure

摘要

摘要: 随着计算机硬件与软件的不断进步，图计算的应用领域日益广泛。图计算的访存行为多变。图计算在GPU执行时不同kernel访问属性数组展现出不同的空间局部性，同一个kernel访问不同的属性数组也展现出不同的空间局部性。当前GPU架构中的缓存优化策略未能有效挖掘图计算的性能潜力。最先进的缓存优化策略无法针对不同重用性的数据实施不同的管理策略，这是导致图计算性能低下的一个重要原因。为了解决上述问题，提出了专为GPU平台设计的AB-Cache架构。AB-Cache采用自适应的思想，巧妙地运用2种缓存结构分别优化具有不同空间局部性特征的内存访问请求。与AB-Cache架构相契合的在线自动分类机制能够以极低的开销迅速分类具有不同空间局部性的内存访问请求。通过对多个图计算应用及广泛图数据集的综合评估，AB-Cache相较于基线方案实现了1.14倍的加速效果，证明了该方案在图计算性能优化方面的有效性与实用性。

Abstract: With the continuous advancement of computer hardware and software, graph computing has been widely applied in many fields, including social network analysis, recommendation systems, bioinformatics, and fraud detection. The memory access behavior in graph computing is highly variable: different kernels accessing property arrays exhibit different spatial locality when graph computing is executed on a GPU, and the same kernel shows varying spatial locality across different property arrays. Existing Cache optimization strategies in GPU architectures fail to enhance the performance of graph computing effectively. State-of-the-art Cache optimization strategies cannot implement distinct management strategies for data with different reusability, which is an essential reason for the poor performance of graph computing. To address this, the AB-Cache architecture is proposed, specifically designed for GPU platforms. AB-Cache adopts an adaptive approach that utilizes two Cache structures to optimize memory access requests with different spatial locality characteristics. An online automatic classification mechanism compatible with the AB-Cache architecture is also proposed, enabling fast classification of memory access requests with different spatial locality with minimal overhead. Comprehensive evaluations across multiple graph computing applications and a wide range of graph datasets show that the AB-Cache scheme achieves a 1.14x speedup over the baseline, demonstrating its effectiveness and practicality in optimizing graph computing performance.

HTML全文

参考文献(57)

施引文献

资源附件(0)