ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (6): 1271-1280.doi: 10.7544/issn1000-1239.2016.20148445

Previous Articles     Next Articles

Dynamic Task Scheduling Model and Fault-Tolerant via Queuing Theory

He Wangquan1, Wei Di1, Quan Jianxiao1, Wu Wei1, Qi Fengbin2   

  1. 1(Jiangnan Institute of Computing Technology, Wuxi, Jiangsu 214083);2(National Research Center of Parallel Computer Engineering & Technology, Beijing 100080)
  • Online:2016-06-01

Abstract: The design of efficient dynamic task scheduling and fault-tolerant mechanism is an issue of crucial importance in high-performance computing field. Most existing methods, however, can hardly achieve good scalability on large-scale system. In this paper, we propose a scalable dynamic task scheduling model via N-level queuing theory, which dramatically reduces the programming burden by providing programmer with concise parallel programming framework. On one hand, we utilize the Poisson process theory to analyze the average wait time of tasks, and then decide the task layers according to threshold. On the other hand, we reduce the fault tolerance overhead using region-aware light-weight degradation model. Experimental results with Micro Benchmark on Bluelight system with 32768 cores show that our method achieves good scalability when the tasks take 3.4s on average and the overhead is just 7.2% of traditional model. Running on 16384 cores, pharmacological application DOCK achieves performance improvement by 34.3% with our scheduling. Moreover, the results of DOCK show our fault-tolerant model achieves 3.75%~5.13% performance improvements over traditional mechanism.

Key words: queuing theory, dynamic task scheduling, programming framework, fault-tolerant, light-weight degradation

CLC Number: