ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (2): 377-390.doi: 10.7544/issn1000-1239.2015.20140126

所属专题: 2015大数据管理

• 软件技术 • 上一篇    下一篇

一种异构集群中能量高效的大数据处理算法

丁有伟,秦小麟,刘亮,王涛春   

  1. (南京航空航天大学计算机科学与技术学院 南京 210016) (dingyouwei@nuaa.edu.cn)
  • 出版日期: 2015-02-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61373015,61300052,41301407,61402014,61402225);教育部高等学校博士学科点博导基金资助项目(20103218110017);江苏高校优势学科建设工程资助项目(PAPD);中央高校基本科研业务费专项基金项目(NP2013307,NZ2013306)

An Energy Efficient Algorithm for Big Data Processing in Heterogeneous Cluster

Ding Youwei, Qin Xiaolin, Liu Liang, Wang Taochun   

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016)
  • Online: 2015-02-01

摘要: 集群的能量消耗已经超过了其本身的硬件购置费用,而大数据处理需要大规模的集群耗费大量时间,因此如何进行能量高效的大数据处理是数据拥有者和使用者亟待解决的问题,也是对能源和环境的一个巨大挑战.现有的研究一般通过关闭部分节点以减少能量消耗,或者设计新的数据存储策略以便实施能量高效的数据处理.通过分析发现即便使用最少的节点也存在很大的能源浪费,而新的数据存储策略对于已经部署好的集群会造成大规模的数据迁移,消耗额外的能量.针对异构集群下I/O密集型的大数据处理任务,提出一种新的能量高效算法MinBalance,将问题分为节点选择和负载均衡两个步骤.在节点选择阶段采用4种不同的贪心策略,充分考虑到节点的异构性,尽量选择最合适的节点进行任务处理;在负载均衡阶段对选择的节点进行负载均衡,以减少各个节点因为等待而造成的能量浪费.该方法具有通用性,不受数据存储策略的影响.实验表明MinBalance方法在数据集较大的情况下相对于传统关闭部分节点的方法可以减少超过60%的能量消耗.

关键词: 大数据, 能量高效, 异构性, 云计算, 负载均衡

Abstract: It is reported that the electricity cost to operate a cluster may well exceed its acquisition cost, and the processing of big data requires large scale cluster and long period. Therefore, energy efficient processing of big data is essential for the data owners and users, and it is also a great challenge for the energy use and environment protection. Existing methods powered down some nodes to reduce energy consumption or developed new strategies of data storage in the cluster. However, we can find that much energy is still wasted even minimal nodes are used to process the task, and new storage strategies do not suit for the deployed clusters for the extra cost of data transformation. In this paper, we propose a novel algorithm MinBalance to processing I/O intensive big data tasks energy efficiently in heterogeneous cluster. The algorithm can be divided into two steps, node selection and workload balance. In the former step, four greedy policies are used to select the proper nodes considering heterogeneity of the cluster. While in the latter step, the workloads of the selected nodes will be well balanced to avoid the energy wastes caused by waiting. MinBalance is a universal algorithm and cannot be affected by the data storage strategies. Experimental results indicate that MinBalance can achieve over 60% energy reduction for large data sets over the traditional methods of powering down partial nodes.

Key words: big data, energy efficiency, heterogeneity, cloud computing, workload balance

中图分类号: