Abstract:
As one of the most representative platforms in stream computing, Apache Storm has become the first choice for the scenarios of real-time big data processing due to its advantages in open source, simplicity and excellent performance. A round-robin scheduling strategy is used as the Storm default scheduler, without considering the differences of performance and workload among distinct work nodes, and the different overhead of inter-node, inter-process and inter-executor communication under heterogeneous environment, which cannot fully exploit the high performance of Storm cluster in itself. In order to minimize the communication overhead on the premise of all kinds of resource constraints, a task migration strategy for heterogeneous Storm cluster (TMSH-Storm) is proposed on the basis of resource-constrained model, optimal communication overhead model and task migration model, which comprises two algorithms: source node selection algorithm and task migration algorithm. Source node selection algorithm adds work nodes which exceed the threshold to a set of source nodes according to the workload and priority of CPU, memory and network bandwidth in each work node; Task migration algorithm takes into account various factors such as the migration overhead, communication overhead, resource constraint as well as load of each node and each task, migrating the tasks that from source nodes to proper destination nodes successively and asynchronously. Experimental results show that the proposed strategy can reduce latency and overhead of inter-node communication, moreover, the implementation cost is lower compared with the existing research.