ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (6): 1191-1207.doi: 10.7544/issn1000-1239.2020.20200113

所属专题: 2020计算机体系结构前沿技术专题

• 系统结构 • 上一篇    下一篇

通用图形处理器缓存子系统性能优化方法综述

张军1,2,谢竟成2,沈凡凡5,谭海3,汪吕蒙4,何炎祥4   

  1. 1(东华理工大学江西省放射性地学大数据技术工程实验室 南昌 330013);2(东华理工大学信息工程学院 南昌 330013);3(东华理工大学创新创业学院 南昌 330013);4(武汉大学计算机学院 武汉 430072);5(南京审计大学 南京 211815) (zhangjun_whu@whu.edu.cn)
  • 出版日期: 2020-06-01
  • 基金资助: 
    国家自然科学基金项目(61662002,61972293,61902189);江西省放射性地学大数据技术工程实验室项目(JELRGBDT201905);江苏省基础研究计划(自然科学基金)项目(BK20180821)

Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey

Zhang Jun1,2, Xie Jingcheng2, Shen Fanfan5, Tan Hai3, Wang Lümeng4, He Yanxiang4   

  1. 1(Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology, Eastern China University of Technology, Nanchang 330013);2(College of Information Engineering, Eastern China University of Technology, Nanchang 330013);3(School of Innovation and Entrepreneurship, Eastern China University of Technology, Nanchang 330013);4(School of Computer Science, Wuhan University, Wuhan 430072);5(Nanjing Audit University, Nanjing 211815)
  • Online: 2020-06-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61662002, 61972293, 61902189), the Project of Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (JELRGBDT201905), the Natural Science Foundation of Jiangsu Province(BK20180821).

摘要: 随着工艺和制程技术的不断发展以及体系架构的日趋完善,通用图形处理器(general purpose graphics processing units, GPGPU)的并行计算能力得到了很大的提升,其在高性能、高吞吐量等通用计算应用场景的使用越来越广泛.GPGPU通过支持大量线程的并发执行,可以较好地隐藏长延时访存操作,从而获得高并行计算能力.然而,GPGPU在处理计算和访存不规则的应用时,其存储子系统的效率受到很大影响,尤其是片上缓存的争用情况尤为突出,难以及时提供计算操作所需的数据,使得GPGPU的高并行计算能力不能得到充分发挥.解决片上缓存的争用问题、优化缓存子系统的性能,是优化GPGPU性能的主要解决方案之一,也是目前研究GPGPU性能优化的主要热点之一.目前,针对GPGPU缓存子系统的性能优化研究主要集中在线程级并行度(thread level parallelism, TLP)调节、访存顺序调节、数据通量增强、最后一级缓存(last level cache, LLC)优化和基于非易失性存储(non-volatile memory, NVM)的GPGPU缓存新架构设计等5个方面.也从这5个方面重点分析讨论了目前主要的GPGPU缓存子系统性能优化方法,并在最后指出了未来GPGPU缓存子系统优化需要进一步探讨的问题,对GPGPU缓存子系统性能优化的研究有重要意义.

关键词: 通用图形处理器, 缓存子系统, 性能优化, 延迟隐藏, 缓存争用

Abstract: With the development of process technology and the improvement of architecture, the parallel computing performance of GPGPU(general purpose graphics processing units) is updated a lot, which makes GPGPU applied more and more widely in the fields of high performance and high throughput. GPGPU can obtain high parallel computing performance, as it can hide the long latency incurred by the memory accesses via supporting thousands of concurrent threads. Due to the existance of irregular computation and memory access in some applications, the performance of the memory subsystem is affected a lot, especially the contention of the on-chip cache can become serious, and the performance of GPGPU can not be up to the maximum. Alleviating the contention and optimizing the performance of the on-chip cache have become one of the main solutions to the optimization of GPGPU. At present, the studies of the performance optimization of the on-chip cache focus on five aspects, including TLP(thread level parallelism) throttling, memory access reordering, data flux enhancement, LLC(last level cache) optimization, and new architecture design based on NVM(non-volatile memory). This paper mainly discusses the performance optimization research methods of the on-chip cache from these aspects. In the end, some interesting research fields of the on-chip cache optimization in future are discussed. The contents of this paper have important significance on the research of the cache subsystem in GPGPU.

Key words: general purpose graphics processing units (GPGPU), cache subsystem, performance optimization, latency hiding, cache contention

中图分类号: