基于访存图案变形的CGRA存储划分优化

潘德财; 牟迪; 尚家兴; 刘大江

doi:10.7544/issn1000-1239.202440079

基于访存图案变形的CGRA存储划分优化

Memory Partitioning Optimization of CGRA Based on Access Pattern Morphing

摘要

摘要: 由于兼具高灵活性和高能效的特征，粗粒度可重构阵列（coarse-grained reconfigurable array，CGRA）是一种具有潜力的领域定制加速器架构. 为了利用多bank存储器的访问并行性，通常会在CGRA中引入存储器划分. 然而，在CGRA上进行存储划分工作要么以昂贵的寻址开销为代价实现最佳分区解决方案，要么以更多的存储bank消耗为代价来减少面积和功耗开销. 为此，提出了一种通过访存图案变形来实现面向CGRA的存储划分方法. 通过对包含多维数组的应用进行存储划分和算子调度协同优化，形成了存储划分友好的访存图案，从而可以用全“1”超平面对其进行存储划分，进而优化了划分结果并减少了访存地址计算开销. 基于全“1”超平面的划分策略，还提出了一种可精简地址生成单元的高能效CGRA架构. 实验结果表明，与最先进的方法相比，该方法可以实现1.25倍的能效提升.

Abstract: With run-time configurable hardware, coarse-grained reconfigurable array (CGRA) is a potential platform to provide both program flexibility and energy efficiency for data-intensive applications. To exploit the access parallelism of the multi-bank memory, memory partitioning is usually introduced to CGRAs. However, existing work for memory partitioning on CGRAs either achieves the optimal partitioning solution with expensive addressing overheads or achieves area-and-energy efficient hardware at the sacrifice of more bank consumption. To this end, we propose an efficient memory partitioning approach for loop pipelining on CGRA via access pattern morphing. By performing a memory partitioning and scheduling co-optimization on multi-dimensional arrays, a memory partition-friendly access pattern is formed in the data domain such that it can be partitioned with a minimized number of all-one partitioning hyperplanes, resulting in both optimized partition factor and reduced addressing overhead. To solve the partitioning problem, firstly, we propose a backtracking-based scheduling algorithm to find the partition-friendly pattern with minimized initiation interval. Then, based on the partitioning result, we also propose an energy-area-efficient CGRA architecture by simplifying the address generators in load-store units. The experimental results show that our approach can achieve 1.25 times energy efficiency while keeping a moderate compilation time, as compared with the state-of-the-art method.

HTML全文

参考文献(28)

施引文献

资源附件(0)