Abstract:
It is well known that cloud computing can be used to deal with mass data, however such tasks always suffer from expensive time cost of data transmission. Data placement and task scheduling algorithms are used to place data and schedule task to nodes for one special purpose, such as decreasing data transmission time, balancing node load and increasing throughput of cloud computing system. At present, however, the shortcoming of those algorithms is that the amount of data replica is not changed and the transmission criterion is not extremely accurate. In this paper, we propose a new algorithm, called dynamic iterate for time (DIT), to decrease data transmission time. It dynamically changes the amount of data replica according to the frequency of data accessing and the remaining memory, which reduces the memory waste caused by the low efficiency of data access, as well as the number of data transmission of those data replica with high access rate. Moreover, DIT evaluates the data transmission by time cost, which increases the accuracy of transmission criteria, considering the differences among network bandwidths. The experiment results show that DIT can significantly reduce data transmission time compared with data cluster (DC) and data dependence (DD), only except one certain special situation that the scale of task set and the amount of nodes are small. It is worth to mention that a 50% speedup can be achieved when the scale of task set and amount of node are big.