ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (4): 787-803.doi: 10.7544/issn1000-1239.2017.20160049

Previous Articles     Next Articles

Partial Data Shuffled First Strategy for In-Memory Computing Framework

Bian Chen1, Yu Jiong1, Xiu Weirong1, Qian Yurong1, Ying Changtian1, Liao Bin2   

  1. 1(College of Information Science and Engineering, Xinjiang University, Urumqi 830046); 2(College of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012)
  • Online:2017-04-01

Abstract: In-memory computing framework has greatly improved the computing efficiency of cluster, but the low performance of Shuffle operation cannot be ignored. There is a compulsory synchronous operation of wide dependence node on in-memory computing framework, and most executors are obliged to delay their computing tasks to wait for the results of slowest worker, and the synchronization process not only wastes computing resources, but also extends the completion time of jobs and reduces the efficiency of implementation, and this phenomenon is even worse in heterogeneous cluster environment. In this paper, we establish the resource requirement model, job execution efficiency model, task allocation and scheduling model, give the definition of allocation efficiency entropy (AEE) and worker contribution degree (WCD). Moreover, the optimization objective of the algorithm is proposed. To solve the problem of optimizing, we design a partial data shuffled first algorithm (PDSF) which includes more innovative approaches, such as efficient executors priority scheduling, minimize executor wait time strategy and moderately inclined task allocation and so on. PDSF breaks through the restriction of parallel computing model, releases the high performance of efficient executors to decrease the duration of synchronous operation, and establish adaptive task scheduling scheme to improve the efficiency of job execution. We further analyze the correlative attributes of our algorithm, prove that PDSF conforms to Pareto optimum. Experimental results demonstrate that our algorithm optimizes the computational efficiency of in-memory computing framework, and PDSF contributes to the improvement of cluster resources utilization.

Key words: in-memory computing, task allocation, job scheduling, allocation efficiency entropy (AEE), worker contribution degree (WCD), heterogeneous environment

CLC Number: