ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (1): 71-92.doi: 10.7544/issn1000-1239.2018.20160812

• 软件技术 • 上一篇    下一篇

大数据流式计算框架Storm的任务迁移策略

鲁亮1,于炯1,卞琛1,刘月超1,廖彬2,李慧娟3   

  1. 1(新疆大学信息科学与工程学院 乌鲁木齐 830046);2(新疆财经大学统计与信息学院 乌鲁木齐 830012);3(国网乌鲁木齐供电公司 乌鲁木齐 830011) (luliang19891108@gmail.com)
  • 出版日期: 2018-01-01
  • 基金资助: 
    国家自然科学基金项目(61462079,61262088,61562086,61363083,61562078);新疆维吾尔自治区自然科学基金项目(2017D01A20);新疆维吾尔自治区高校科研计划基金项目(XJEDU2016S106)

A Task Migration Strategy in Big Data Stream Computing with Storm

Lu Liang1, Yu Jiong1, Bian Chen1, Liu Yuechao1, Liao Bin2, Li Huijuan3   

  1. 1(School of Information Science and Engineering, Xinjiang University, Urumqi 830046);2(School of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012);3(Wulumuqi Electric Power Supply Company, State Grid Corporation of China, Urumqi 830011)
  • Online: 2018-01-01

摘要: Storm作为流式计算模式下最具代表性的平台之一,其默认轮询的调度机制未考虑到异构环境下不同工作节点的自身性能和负载差异,以及工作节点之间的网络传输开销和节点内部的进程与线程通信开销,无法充分发挥集群的性能.为了在各类资源约束的前提下最小化通信开销,在建立并论证Storm资源约束模型、最优通信开销模型和任务迁移模型的基础上,提出一种异构Storm环境下的任务迁移策略(task migration strategy for heterogeneous Storm cluster, TMSH-Storm),包括源节点选择算法和任务迁移算法.其中,源节点选择算法根据集群中各工作节点CPU、内存和网络带宽的负载情况以及各类资源的优先级顺序,将超出阈值的节点加入源节点集;任务迁移算法综合迁移开销、通信开销、节点资源约束以及节点和任务负载等因素,依次将源节点中的待迁移任务异步迁移至目的节点上.实验表明:相对于现有研究而言,TMSH-Storm能有效降低延迟和节点间通信开销,且执行开销较小.

关键词: 大数据, 流式计算, Storm, 通信开销, 任务迁移

Abstract: As one of the most representative platforms in stream computing, Apache Storm has become the first choice for the scenarios of real-time big data processing due to its advantages in open source, simplicity and excellent performance. A round-robin scheduling strategy is used as the Storm default scheduler, without considering the differences of performance and workload among distinct work nodes, and the different overhead of inter-node, inter-process and inter-executor communication under heterogeneous environment, which cannot fully exploit the high performance of Storm cluster in itself. In order to minimize the communication overhead on the premise of all kinds of resource constraints, a task migration strategy for heterogeneous Storm cluster (TMSH-Storm) is proposed on the basis of resource-constrained model, optimal communication overhead model and task migration model, which comprises two algorithms: source node selection algorithm and task migration algorithm. Source node selection algorithm adds work nodes which exceed the threshold to a set of source nodes according to the workload and priority of CPU, memory and network bandwidth in each work node; Task migration algorithm takes into account various factors such as the migration overhead, communication overhead, resource constraint as well as load of each node and each task, migrating the tasks that from source nodes to proper destination nodes successively and asynchronously. Experimental results show that the proposed strategy can reduce latency and overhead of inter-node communication, moreover, the implementation cost is lower compared with the existing research.

Key words: big data, stream computing, Storm, communication overhead, task migration

中图分类号: