ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (6): 1263-1270.doi: 10.7544/issn1000-1239.2016.20148351

Previous Articles     Next Articles

Accelerator Support in YARN Cluster

Li Qin, Zhu Yanchao, Liu Yi, Qian Depei   

  1. (School of Computer Science and Engineering, Beihang University, Beijing 100191) (Beijing Key Laboratory of Network Technology (Beihang University), Beijing 100191)
  • Online:2016-06-01

Abstract: Accelerators, such as GPU and Intel MIC, are widely used in scientific computing and image processing, and have strong potentials in big data processing and HPC based on cloud platform. YARN is a new generation of Hadoop distributed computing framework. Its adoption of computing resources is only limited to CPU, lacking of support for accelerators. This paper adds the support to nodes with accelerators to YARN to solve the problem. By analyzing the problem of supporting heterogeneous node, there are three identified difficulties which should be solved to introduce hybrid/heterogeneous to YARN. The first one is how to manage and schedule the added accelerator resources in the cluster; the second one is how to collect the status of accelerators to the master node for management; the third one is how to address the contention issue among multiple accelerator tasks concurrently running on the same node. In order to solve the above problems, the following design tasks have been carried out. Resource encapsulation which bundles neighbor nodes into one resource encapsulation is designed to solve the first problem. Management functions which collect the real-time accelerators status from working nodes are designed on the master node to solve the second problem. Accelerator task pipeline which splits accelerator tasks into three parts and executes them in parallel is designed on the nodes with accelerators to solve the third problem. Our scheme is verified with a cluster consisting of 4 nodes with GPU, and the workload testing the cluster includes LU, QR and Cholesky decomposition from the third party benchmark MAGMA, and the program performes feature extraction and clustering upon 50000 images. The results prove the effectiveness of the scheme presented.

Key words: distributed system, yet another resource negotiator (YARN), accelerator, heterogeneous nodes, graphics processing unit (GPU), node binding, task scheduling

CLC Number: