• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

通用图形处理器缓存子系统性能优化方法综述

张军, 谢竟成, 沈凡凡, 谭海, 汪吕蒙, 何炎祥

张军, 谢竟成, 沈凡凡, 谭海, 汪吕蒙, 何炎祥. 通用图形处理器缓存子系统性能优化方法综述[J]. 计算机研究与发展, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113
引用本文: 张军, 谢竟成, 沈凡凡, 谭海, 汪吕蒙, 何炎祥. 通用图形处理器缓存子系统性能优化方法综述[J]. 计算机研究与发展, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113
Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113
Citation: Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113
张军, 谢竟成, 沈凡凡, 谭海, 汪吕蒙, 何炎祥. 通用图形处理器缓存子系统性能优化方法综述[J]. 计算机研究与发展, 2020, 57(6): 1191-1207. CSTR: 32373.14.issn1000-1239.2020.20200113
引用本文: 张军, 谢竟成, 沈凡凡, 谭海, 汪吕蒙, 何炎祥. 通用图形处理器缓存子系统性能优化方法综述[J]. 计算机研究与发展, 2020, 57(6): 1191-1207. CSTR: 32373.14.issn1000-1239.2020.20200113
Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. CSTR: 32373.14.issn1000-1239.2020.20200113
Citation: Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. CSTR: 32373.14.issn1000-1239.2020.20200113

通用图形处理器缓存子系统性能优化方法综述

基金项目: 国家自然科学基金项目(61662002,61972293,61902189);江西省放射性地学大数据技术工程实验室项目(JELRGBDT201905);江苏省基础研究计划(自然科学基金)项目(BK20180821)
详细信息
  • 中图分类号: TP303.1

Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey

Funds: This work was supported by the National Natural Science Foundation of China (61662002, 61972293, 61902189), the Project of Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (JELRGBDT201905), the Natural Science Foundation of Jiangsu Province(BK20180821).
  • 摘要: 随着工艺和制程技术的不断发展以及体系架构的日趋完善,通用图形处理器(general purpose graphics processing units, GPGPU)的并行计算能力得到了很大的提升,其在高性能、高吞吐量等通用计算应用场景的使用越来越广泛.GPGPU通过支持大量线程的并发执行,可以较好地隐藏长延时访存操作,从而获得高并行计算能力.然而,GPGPU在处理计算和访存不规则的应用时,其存储子系统的效率受到很大影响,尤其是片上缓存的争用情况尤为突出,难以及时提供计算操作所需的数据,使得GPGPU的高并行计算能力不能得到充分发挥.解决片上缓存的争用问题、优化缓存子系统的性能,是优化GPGPU性能的主要解决方案之一,也是目前研究GPGPU性能优化的主要热点之一.目前,针对GPGPU缓存子系统的性能优化研究主要集中在线程级并行度(thread level parallelism, TLP)调节、访存顺序调节、数据通量增强、最后一级缓存(last level cache, LLC)优化和基于非易失性存储(non-volatile memory, NVM)的GPGPU缓存新架构设计等5个方面.也从这5个方面重点分析讨论了目前主要的GPGPU缓存子系统性能优化方法,并在最后指出了未来GPGPU缓存子系统优化需要进一步探讨的问题,对GPGPU缓存子系统性能优化的研究有重要意义.
    Abstract: With the development of process technology and the improvement of architecture, the parallel computing performance of GPGPU(general purpose graphics processing units) is updated a lot, which makes GPGPU applied more and more widely in the fields of high performance and high throughput. GPGPU can obtain high parallel computing performance, as it can hide the long latency incurred by the memory accesses via supporting thousands of concurrent threads. Due to the existance of irregular computation and memory access in some applications, the performance of the memory subsystem is affected a lot, especially the contention of the on-chip cache can become serious, and the performance of GPGPU can not be up to the maximum. Alleviating the contention and optimizing the performance of the on-chip cache have become one of the main solutions to the optimization of GPGPU. At present, the studies of the performance optimization of the on-chip cache focus on five aspects, including TLP(thread level parallelism) throttling, memory access reordering, data flux enhancement, LLC(last level cache) optimization, and new architecture design based on NVM(non-volatile memory). This paper mainly discusses the performance optimization research methods of the on-chip cache from these aspects. In the end, some interesting research fields of the on-chip cache optimization in future are discussed. The contents of this paper have important significance on the research of the cache subsystem in GPGPU.
  • 期刊类型引用(4)

    1. 冯小江. 基于用户QoS速率需求的5G网络主动缓存方法. 通信电源技术. 2021(02): 149-151 . 百度学术
    2. 张健. 层次化通信网络备份数据库缓存子系统设计. 现代电子技术. 2021(11): 33-36 . 百度学术
    3. 张军,胡廷贤,沈凡凡,谭海,何炎祥. 基于Gem5+NVMain的混合存储体系结构模拟实验方法. 实验技术与管理. 2021(10): 65-70 . 百度学术
    4. 刘建友,蒋春霞. 一种基于高通量计算机的图算法优化技术. 信息与电脑(理论版). 2020(22): 69-71 . 百度学术

    其他类型引用(3)

计量
  • 文章访问数:  1020
  • HTML全文浏览量:  2
  • PDF下载量:  498
  • 被引次数: 7
出版历程
  • 发布日期:  2020-05-31

目录

    /

    返回文章
    返回