ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (1): 71-92.doi: 10.7544/issn1000-1239.2018.20160812

Previous Articles     Next Articles

A Task Migration Strategy in Big Data Stream Computing with Storm

Lu Liang1, Yu Jiong1, Bian Chen1, Liu Yuechao1, Liao Bin2, Li Huijuan3   

  1. 1(School of Information Science and Engineering, Xinjiang University, Urumqi 830046);2(School of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012);3(Wulumuqi Electric Power Supply Company, State Grid Corporation of China, Urumqi 830011)
  • Online:2018-01-01

Abstract: As one of the most representative platforms in stream computing, Apache Storm has become the first choice for the scenarios of real-time big data processing due to its advantages in open source, simplicity and excellent performance. A round-robin scheduling strategy is used as the Storm default scheduler, without considering the differences of performance and workload among distinct work nodes, and the different overhead of inter-node, inter-process and inter-executor communication under heterogeneous environment, which cannot fully exploit the high performance of Storm cluster in itself. In order to minimize the communication overhead on the premise of all kinds of resource constraints, a task migration strategy for heterogeneous Storm cluster (TMSH-Storm) is proposed on the basis of resource-constrained model, optimal communication overhead model and task migration model, which comprises two algorithms: source node selection algorithm and task migration algorithm. Source node selection algorithm adds work nodes which exceed the threshold to a set of source nodes according to the workload and priority of CPU, memory and network bandwidth in each work node; Task migration algorithm takes into account various factors such as the migration overhead, communication overhead, resource constraint as well as load of each node and each task, migrating the tasks that from source nodes to proper destination nodes successively and asynchronously. Experimental results show that the proposed strategy can reduce latency and overhead of inter-node communication, moreover, the implementation cost is lower compared with the existing research.

Key words: big data, stream computing, Storm, communication overhead, task migration

CLC Number: