大数据背景下集群调度结构与研究进展

郝春亮; 沈捷; 张珩; 武延军; 王青; 李明树

doi:10.7544/issn1000-1239.2018.20170051

大数据背景下集群调度结构与研究进展

Structures and State-of-Art Research of Cluster Scheduling in Big Data Background

摘要

摘要: 集群调度一直以来是集群计算方向的热点研究问题.集群调度研究主要关注在固定的集群资源条件下，数据处理作业如何快速、精确地获得所需运行资源，从而达到预先设定的执行目标.随着大数据计算的发展，集群环境在过去10年内持续且快速地发展变化，集群调度场景和目标也日趋复杂.尤其是在大数据背景下，传统集中调度结构的性能瓶颈被放大，研究者开始向全新的调度结构进行探索，应运而生了众多新思路、新结构.从大数据背景下集群调度研究的主要研究问题出发，分别介绍了大数据背景下的4种集群调度结构:集中结构、双层结构、分布式结构以及混合结构，并对各结构的产生原因、适用场景、优劣、典型研究工作、研究进展进行分析，并尝试对各结构的未来发展进行展望.

Abstract: Cluster scheduling is one of the most investigated topics in big data environment. The main problem it aims to solve is to efficiently fulfill the requirements of data analytic workload using finite amount of cluster resources. Along with the rapid development in big data applications within the past decade, the context and goals of cluster scheduling also rose significantly in complexity. As the drawbacks of traditional centralized scheduling methods have becoming increasingly apparent in modern clusters, many alternative scheduling structures, including two-level scheduling, distributed scheduling, and hybrid scheduling, have been proposed in recent years. Unfortunately, as each of these methods embodies a distinct set of advantages and limitations, there is yet to appear a simple one-fits-all answer that can overcome all scheduling challenges simultaneously in big data environment. Therefore, this work aims at providing a comprehensive survey on various families of mainstream scheduling methods, focusing on their motivation, strengths and weaknesses, and suitability to different application scenarios. Seminal works of each scheduling structure are analyzed in-depth in this paper to bring insights on the current state of development. Last but not least, we try to extrapolate the current trend in cluster scheduling and highlight the challenges to be tackled in future works.

HTML全文

参考文献(0)

施引文献

资源附件(0)