ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (11): 2416-2426.doi: 10.7544/issn1000-1239.2014.20130749

• 网络技术 • 上一篇    下一篇


王强, 李雄飞, 王婧   

  1. (符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012) (
  • 出版日期: 2014-11-01
  • 基金资助: 

A Data Placement and Task Scheduling Algorithm in Cloud Computing

Wang Qiang, Li Xiongfei, Wang Jing   

  1. (Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012)
  • Online: 2014-11-01

摘要: 在海量数据的云计算中,通常面临着数据传输时间长的问题.针对目前大多数数据放置与任务调度算法存在的副本静态性和传输标准精确度的不足,提出了一种动态调整副本个数、以时间作为衡量数据传输标准的数据放置与任务调度算法.该算法根据数据访问频率和存储大小,动态地调整副本个数,一方面减少了低访问率副本对存储空间的浪费;另一方面也减少了高访问率副本所需跨节点传输次数.考虑到节点间网络带宽的差异性,确定以数据传输时间作为传输衡量标准,提高了传输标准的精确度.实验结果表明,除了任务集和网络节点均较少的情况外,该算法均能有效地减少数据传输时间,甚至在任务集合和网络节点较多的情况下,能减少近50%的传输时间.

关键词: 云计算, 数据放置, 任务调度, 数据传输, 数据副本

Abstract: It is well known that cloud computing can be used to deal with mass data, however such tasks always suffer from expensive time cost of data transmission. Data placement and task scheduling algorithms are used to place data and schedule task to nodes for one special purpose, such as decreasing data transmission time, balancing node load and increasing throughput of cloud computing system. At present, however, the shortcoming of those algorithms is that the amount of data replica is not changed and the transmission criterion is not extremely accurate. In this paper, we propose a new algorithm, called dynamic iterate for time (DIT), to decrease data transmission time. It dynamically changes the amount of data replica according to the frequency of data accessing and the remaining memory, which reduces the memory waste caused by the low efficiency of data access, as well as the number of data transmission of those data replica with high access rate. Moreover, DIT evaluates the data transmission by time cost, which increases the accuracy of transmission criteria, considering the differences among network bandwidths. The experiment results show that DIT can significantly reduce data transmission time compared with data cluster (DC) and data dependence (DD), only except one certain special situation that the scale of task set and the amount of nodes are small. It is worth to mention that a 50% speedup can be achieved when the scale of task set and amount of node are big.

Key words: cloud computing, data placement, task scheduling, data transmission, data replica