ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (1): 53-70.doi: 10.7544/issn1000-1239.2018.20170051

• 综述 • 上一篇    下一篇

大数据背景下集群调度结构与研究进展

郝春亮1,2,沈捷3,张珩1,2,武延军1,王青1,李明树1   

  1. 1(中国科学院软件研究所基础软件中心 北京 100190);2(中国科学院大学 北京 100049);3(帝国理工大学计算学院 伦敦 SW72AZ) (chunliang@nfs.iscas.ac.cn)
  • 出版日期: 2018-01-01
  • 基金资助: 
    中国科学院战略性先导科技专项(XDA06010600)

Structures and State-of-Art Research of Cluster Scheduling in Big Data Background

Hao Chunliang1,2, Shen Jie3, Zhang Heng1,2, Wu Yanjun1, Wang Qing1, Li Mingshu1   

  1. 1(National Engineering Research Center for Fundamental Software, Institute of Software, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(Department of Computing, Imperial College, London SW72AZ)
  • Online: 2018-01-01

摘要: 集群调度一直以来是集群计算方向的热点研究问题.集群调度研究主要关注在固定的集群资源条件下,数据处理作业如何快速、精确地获得所需运行资源,从而达到预先设定的执行目标.随着大数据计算的发展,集群环境在过去10年内持续且快速地发展变化,集群调度场景和目标也日趋复杂.尤其是在大数据背景下,传统集中调度结构的性能瓶颈被放大,研究者开始向全新的调度结构进行探索,应运而生了众多新思路、新结构.从大数据背景下集群调度研究的主要研究问题出发,分别介绍了大数据背景下的4种集群调度结构:集中结构、双层结构、分布式结构以及混合结构,并对各结构的产生原因、适用场景、优劣、典型研究工作、研究进展进行分析,并尝试对各结构的未来发展进行展望.

关键词: 集群调度, 资源抽象, 集群计算, 大数据, 数据处理作业

Abstract: Cluster scheduling is one of the most investigated topics in big data environment. The main problem it aims to solve is to efficiently fulfill the requirements of data analytic workload using finite amount of cluster resources. Along with the rapid development in big data applications within the past decade, the context and goals of cluster scheduling also rose significantly in complexity. As the drawbacks of traditional centralized scheduling methods have becoming increasingly apparent in modern clusters, many alternative scheduling structures, including two-level scheduling, distributed scheduling, and hybrid scheduling, have been proposed in recent years. Unfortunately, as each of these methods embodies a distinct set of advantages and limitations, there is yet to appear a simple one-fits-all answer that can overcome all scheduling challenges simultaneously in big data environment. Therefore, this work aims at providing a comprehensive survey on various families of mainstream scheduling methods, focusing on their motivation, strengths and weaknesses, and suitability to different application scenarios. Seminal works of each scheduling structure are analyzed in-depth in this paper to bring insights on the current state of development. Last but not least, we try to extrapolate the current trend in cluster scheduling and highlight the challenges to be tackled in future works.

Key words: cluster scheduling, resource abstraction, cluster computing, big data, data analytic job

中图分类号: