ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (1): 105-115.doi: 10.7544/issn1000-1239.2015.20131195

• 系统结构 • 上一篇    下一篇

多指标自趋优的GPU集群能耗控制模型

王海峰1,陈庆奎2   

  1. 1(临沂大学信息学院 山东临沂 276002); 2(上海理工大学光电信息与计算机工程学院 上海 200093) (wanghaifeng@lyu.edu.cn)
  • 出版日期: 2015-01-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(60970012)|山东省自然科学基金联合专项项目(ZR2013FL005)|山东省自主创新及成果转化专项项目(2014ZZCX02702)

Multi-Indices Self-Approximate Optimal Power Consumption Control Model of GPU Clusters

Wang Haifeng1, Chen Qingkui2   

  1. 1(Information School, Linyi University, Linyi, Shandong 276002); 2(School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093)
  • Online: 2015-01-01

摘要: 在大规模流数据实时处理领域中图形处理器(graphics processing unit, GPU)集群是一种重要的并行计算系统,对计算速度、能耗和可靠性3项指标都有较高要求.然而各指标互相约束,在实时计算中需要动态寻找最优均衡点,因此GPU集群中多项性能指标实时优化成为一个具有挑战性的问题.为综合考虑计算速度、能耗和可靠性3项指标,利用极大熵函数法把多项指标转化为一个综合性能评价指标,再以模型预测控制理论为基础构造一个自适应强的控制模型,该模型能够依据计算负载的变化动态调整集群内节点的能耗状态,在保证计算速度和可靠性的前提下消减冗余计算能耗.与未考虑可靠性的基准控制模型进行对比实验,结果表明所提出的模型具有较好的控制稳定性和鲁棒性,适合应用到GPU集群节能管理中.

关键词: 能耗优化, 可靠性, GPU集群, 模型预测, 极大熵函数

Abstract: GPU clusters have become important high-performance parallel computing systems in the large-scale stream data field. In practice, the computing requires high computing speed, less power consumption and better reliability.So GPU clusters have three significantly performance indices restrainting each others that are computing speed, power consumption and reliability. In real-time computing phase, it needs to dynamically search the optimal point that is the tradeoff among computing speed, power consumption optimization and reliability. So the multi-indices optimization in GPU clusters power consumption control process is a challenging issue. To consider the three indices simultaneously, a comprehensive index is generated by maxinum entropy function that can combine them. Then an adaptable control model is built based on model prediction theory that can dynamically scale power consumption status with the workloads variation. This control model can cap the redundant energy consumption and control the power consumption of the GPU clusters under a specific ideal set point while guaranteeing computing speed and reliability. Compared with the control scheme without considering reliability, the results demonstrate that the proposed control scheme has better control stability and robustness and is very suitable to apply into GPU cluster power management projects to handle the real-time large-scale stream data.

Key words: power consumption optimization, reliability, GPU clusters, model prediction, maxinum entropy function

中图分类号: