基于CUPTI接口的典型GPU程序负载特征分析

郑祯; 翟季冬; 李焱; 陈文光

doi:10.7544/issn1000-1239.2016.20148354

基于CUPTI接口的典型GPU程序负载特征分析

(清华大学计算机科学与技术系北京 100084) (z-zheng14@mails.tsinghua.edu.cn)

基金项目: 国家自然科学基金项目(61103021)；国家“八六三”高技术研究发展计划基金项目(2012AA010901)

详细信息

中图分类号: TP338.4
计量
- 文章访问数: 1892
- HTML全文浏览量: 9
- PDF下载量: 825
出版历程
- 发布日期: 2016-05-31

Workload Analysis for Typical GPU Programs Using CUPTI Interface

(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)

摘要

摘要: 基于图形处理器(graphics processing unit, GPU)加速设备的高性能计算机已经成为目前高性能计算领域的一个重要发展趋势.然而，在当前的GPU设备上开发高效的并行程序仍然是一件非常复杂的事情.针对这一问题，1)总结了影响GPU程序性能的5类关键性能指标；2)采用NVIDIA公司提供的CUPTI底层接口，设计并实现了一套GPU程序性能分析工具集，该工具集可以有效地分析GPU程序的性能行为；3)采用该工具集对著名的GPU评测程序集Rodinia中的17个程序和一个真实应用程序进行了负载特征分析.总结出常见性能瓶颈的典型原因，并给出一些开发高效GPU程序的建议.
- 图形处理器 /
- 负载特征分析 /
- Rodinia /
- 硬件计数器 /
- 性能指标
Abstract: GPU-based high performance computers have become an important trend in the area of high performance computing. However, developing efficient parallel programs on current GPU devices is very complex because of the complex memory hierarchy and thread hierarchy. To address this problem, we summarize five kinds of key metrics that reflect the performance of programs according to the hardware and software architecture. Then we design and implement a performance analysis tool based on underlying CUPTI interfaces provided by NVIDIA, which can collect key metrics automatically without modifying the source code. The tool can analyze the performance behaviors of GPU programs effectively with very little impact on the execution of programs. Finally, we analyze 17 programs in Rodinia benchmark, which is a famous benchmark for GPU programs, and a real application using our tool. By analyzing the value of key metrics, we find the performance bottlenecks of each program and map the bottlenecks back to source code. These analysis results can be used to guide the optimization of CUDA programs and GPU architecture. Result shows that most bottlenecks come from inefficient memory access, and include unreasonable global memory and shared memory access pattern, and low concurrency for these programs. We summarize the common reasons for typical performance bottlenecks and give some high-level suggestions for developing efficient GPU programs.
- graphics processing unit (GPU) /
- workload analysis /
- Rodinia /
- performance counter /
- performance metric

HTML全文

参考文献(0)

施引文献(49)

期刊类型引用(20)

1.	冉玲琴，彭长根，许德权，吴宁博. 基于区块链技术架构的隐私泄露风险评估方法. 计算机工程. 2023(01): 146-153 . 百度学术
2.	张学旺，林金朝，黎志鸿，姚亚宁. 基于新型公平盲签名和属性基加密的食用农产品溯源方案. 电子与信息学报. 2023(03): 836-846 . 百度学术
3.	俞惠芳，吕芝蕊. 基于联盟链的身份环签密方案. 电子与信息学报. 2023(03): 865-873 . 百度学术
4.	陈亮，黄华威，吴嘉婧，郑子彬. 区块链原理与技术课程建设探索与实践. 计算机教育. 2023(04): 74-78 . 百度学术
5.	宋露燕，刘雪凤. 区块链赋能海洋数据共享：技术路径与风险议题. 科技与经济. 2023(02): 66-70 . 百度学术
6.	陈志刚. 论非同质化通证的数据财产属性. 政法论丛. 2023(05): 149-160 . 百度学术
7.	邓崧，吕雨婷，杨迪. 数据垄断的演化与分类——基于国内公共数据与商业数据. 信息资源管理学报. 2022(01): 80-90 . 百度学术
8.	陈丽莎，李雪莲，高军涛. 支持数据完整性验证的可问责数据交易方案. 系统工程与电子技术. 2022(04): 1364-1371 . 百度学术
9.	乔鹏程，张岩松. 农业公司使用区块链技术能否提升营运效率？. 会计之友. 2022(12): 144-150 . 百度学术
10.	袁冰. 智媒环境下个人信息保护的场域迁移与框架建构. 科技传播. 2022(10): 126-128 . 百度学术
11.	钟锭，刘金红，夏新斌，周良荣. 基于区块链技术的湖南省道地药材全过程追溯体系构建. 中国医药导报. 2022(24): 194-197 . 百度学术
12.	许思源，李畅，李贺鑫，谢沂伯，肖飞. 基于区块链的生物样本信息共享系统模型研究. 中国卫生信息管理杂志. 2022(04): 471-475+534 . 百度学术
13.	宋晓玲，刘勇，董景楠，黄勇飞. 元宇宙中区块链的应用与展望. 网络与信息安全学报. 2022(04): 45-65 . 百度学术
14.	周磊，陈珍珠，付安民，苏铓，俞研. 支持密钥更新与审计者更换的云安全审计方案. 计算机研究与发展. 2022(10): 2247-2260 . 本站查看
15.	程晗蕾，鲁静. 区块链技术驱动融资租赁平台优化策略探究. 财会月刊. 2022(20): 154-160 . 百度学术
16.	黄思云，齐金平. 云计算环境下基于XML的异构数据共享研究. 信息与电脑(理论版). 2022(19): 16-19 . 百度学术
17.	岳增龙，陈海燕，张晓晶. NKN区块链技术在物联网中的应用. 中国新通信. 2021(10): 91-92 . 百度学术
18.	张茹，曹佳远. 基于区块链技术的造纸绿色材料供应量控制技术. 造纸科学与技术. 2021(04): 53-57+76 . 百度学术
19.	杨信廷，王杰伟，邢斌，罗娜，于华竟，孙传恒. 基于区块链的畜牧养殖资产监管身份认证研究. 农业机械学报. 2021(11): 170-180 . 百度学术
20.	刘芬，朱壮友，许勇. 基于区块链的电子病历数据共享模型研究. 安徽师范大学学报(自然科学版). 2021(06): 536-544 . 百度学术