ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (1): 53-70.doi: 10.7544/issn1000-1239.2018.20170051

Previous Articles     Next Articles

Structures and State-of-Art Research of Cluster Scheduling in Big Data Background

Hao Chunliang1,2, Shen Jie3, Zhang Heng1,2, Wu Yanjun1, Wang Qing1, Li Mingshu1   

  1. 1(National Engineering Research Center for Fundamental Software, Institute of Software, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(Department of Computing, Imperial College, London SW72AZ)
  • Online:2018-01-01

Abstract: Cluster scheduling is one of the most investigated topics in big data environment. The main problem it aims to solve is to efficiently fulfill the requirements of data analytic workload using finite amount of cluster resources. Along with the rapid development in big data applications within the past decade, the context and goals of cluster scheduling also rose significantly in complexity. As the drawbacks of traditional centralized scheduling methods have becoming increasingly apparent in modern clusters, many alternative scheduling structures, including two-level scheduling, distributed scheduling, and hybrid scheduling, have been proposed in recent years. Unfortunately, as each of these methods embodies a distinct set of advantages and limitations, there is yet to appear a simple one-fits-all answer that can overcome all scheduling challenges simultaneously in big data environment. Therefore, this work aims at providing a comprehensive survey on various families of mainstream scheduling methods, focusing on their motivation, strengths and weaknesses, and suitability to different application scenarios. Seminal works of each scheduling structure are analyzed in-depth in this paper to bring insights on the current state of development. Last but not least, we try to extrapolate the current trend in cluster scheduling and highlight the challenges to be tackled in future works.

Key words: cluster scheduling, resource abstraction, cluster computing, big data, data analytic job

CLC Number: