通用图形处理器缓存子系统性能优化方法综述

张军; 谢竟成; 沈凡凡; 谭海; 汪吕蒙; 何炎祥

doi:10.7544/issn1000-1239.2020.20200113

通用图形处理器缓存子系统性能优化方法综述

Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey

摘要

摘要: 随着工艺和制程技术的不断发展以及体系架构的日趋完善，通用图形处理器(general purpose graphics processing units, GPGPU)的并行计算能力得到了很大的提升，其在高性能、高吞吐量等通用计算应用场景的使用越来越广泛.GPGPU通过支持大量线程的并发执行，可以较好地隐藏长延时访存操作，从而获得高并行计算能力.然而，GPGPU在处理计算和访存不规则的应用时，其存储子系统的效率受到很大影响，尤其是片上缓存的争用情况尤为突出，难以及时提供计算操作所需的数据，使得GPGPU的高并行计算能力不能得到充分发挥.解决片上缓存的争用问题、优化缓存子系统的性能，是优化GPGPU性能的主要解决方案之一，也是目前研究GPGPU性能优化的主要热点之一.目前，针对GPGPU缓存子系统的性能优化研究主要集中在线程级并行度(thread level parallelism, TLP)调节、访存顺序调节、数据通量增强、最后一级缓存(last level cache, LLC)优化和基于非易失性存储(non-volatile memory, NVM)的GPGPU缓存新架构设计等5个方面.也从这5个方面重点分析讨论了目前主要的GPGPU缓存子系统性能优化方法，并在最后指出了未来GPGPU缓存子系统优化需要进一步探讨的问题，对GPGPU缓存子系统性能优化的研究有重要意义.

Abstract: With the development of process technology and the improvement of architecture, the parallel computing performance of GPGPU(general purpose graphics processing units) is updated a lot, which makes GPGPU applied more and more widely in the fields of high performance and high throughput. GPGPU can obtain high parallel computing performance, as it can hide the long latency incurred by the memory accesses via supporting thousands of concurrent threads. Due to the existance of irregular computation and memory access in some applications, the performance of the memory subsystem is affected a lot, especially the contention of the on-chip cache can become serious, and the performance of GPGPU can not be up to the maximum. Alleviating the contention and optimizing the performance of the on-chip cache have become one of the main solutions to the optimization of GPGPU. At present, the studies of the performance optimization of the on-chip cache focus on five aspects, including TLP(thread level parallelism) throttling, memory access reordering, data flux enhancement, LLC(last level cache) optimization, and new architecture design based on NVM(non-volatile memory). This paper mainly discusses the performance optimization research methods of the on-chip cache from these aspects. In the end, some interesting research fields of the on-chip cache optimization in future are discussed. The contents of this paper have important significance on the research of the cache subsystem in GPGPU.

HTML全文

参考文献(0)

施引文献

资源附件(0)