ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (6): 1191-1207.doi: 10.7544/issn1000-1239.2020.20200113

Special Issue: 2020计算机体系结构前沿技术专题

Previous Articles     Next Articles

Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey

Zhang Jun1,2, Xie Jingcheng2, Shen Fanfan5, Tan Hai3, Wang Lümeng4, He Yanxiang4   

  1. 1(Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology, Eastern China University of Technology, Nanchang 330013);2(College of Information Engineering, Eastern China University of Technology, Nanchang 330013);3(School of Innovation and Entrepreneurship, Eastern China University of Technology, Nanchang 330013);4(School of Computer Science, Wuhan University, Wuhan 430072);5(Nanjing Audit University, Nanjing 211815)
  • Online:2020-06-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61662002, 61972293, 61902189), the Project of Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (JELRGBDT201905), the Natural Science Foundation of Jiangsu Province(BK20180821).

Abstract: With the development of process technology and the improvement of architecture, the parallel computing performance of GPGPU(general purpose graphics processing units) is updated a lot, which makes GPGPU applied more and more widely in the fields of high performance and high throughput. GPGPU can obtain high parallel computing performance, as it can hide the long latency incurred by the memory accesses via supporting thousands of concurrent threads. Due to the existance of irregular computation and memory access in some applications, the performance of the memory subsystem is affected a lot, especially the contention of the on-chip cache can become serious, and the performance of GPGPU can not be up to the maximum. Alleviating the contention and optimizing the performance of the on-chip cache have become one of the main solutions to the optimization of GPGPU. At present, the studies of the performance optimization of the on-chip cache focus on five aspects, including TLP(thread level parallelism) throttling, memory access reordering, data flux enhancement, LLC(last level cache) optimization, and new architecture design based on NVM(non-volatile memory). This paper mainly discusses the performance optimization research methods of the on-chip cache from these aspects. In the end, some interesting research fields of the on-chip cache optimization in future are discussed. The contents of this paper have important significance on the research of the cache subsystem in GPGPU.

Key words: general purpose graphics processing units (GPGPU), cache subsystem, performance optimization, latency hiding, cache contention

CLC Number: