ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (12): 2798-2810.doi: 10.7544/issn1000-1239.2021.20200568

• 网络技术 • 上一篇    下一篇



  1. (国防科技大学计算机学院 长沙 410073) (国防科技大学并行与分布式处理国防科技重点实验室 长沙 410073) (
  • 出版日期: 2021-12-01
  • 基金资助: 

A DAG-Based Network Traffic Scheduler

Shi Yang, Wen Mei, Fei Jiawei, Zhang Chunyuan   

  1. (College of Computer Science and Technology, National University of Defense Technology, Changsha 410073) (National Key Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073)
  • Online: 2021-12-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB1000400) and the National Natural Science Foundation of China (61502509, 61402504).

摘要: 在如今的数据中心中,各种分布式任务往往会对各种不同的资源进行竞争,特别是网络资源.如果没有有效的网络调度,那么这种竞争就会降低整个数据中心的运行效率.以往的网络资源调度研究由于忽视了任务里计算与网络需求之间的具体关系,对于任务性能的提升十分有限.因此,旨在探索如何通过网络调度来缩短数据中任务的完成时间(job completion time, JCT),从而提升数据中心的整体效率.通过对基于有向无环图(directed acyclic graph, DAG)的分布式任务的深度分析,发现可以在降低它们的网络占用的同时,却不影响它们的JCT.根据这个发现,提出了一个利用计算图来加速任务执行的网络调度器JIT.为了实现JIT,首先将调度问题建模成为一个整数线性规划问题(integer linear programming, ILP),然后证明了这个ILP可以通过一个等价的线性规划模型(linear programming, LP)来快速求解.此外,通过一些合理的简化,将求解时间降低到了1s.与其他调度器的比较实验结果说明了JIT可以取得1.55倍的整体加速效果,从而有效提升数据中心的工作效率.

关键词: 数据中心网络, 分布式任务, 网络调度, 并行计算, 任务完成时间, 有向无环图

Abstract: Nowadays, it is common that distributed jobs within a datacenter compete for different resources, especially the network. Due to this competition, these jobs’ performance is decreased and datacenters run at low efficiency. Most previous work on network scheduling lacks the knowledge of detailed requirements of jobs, hence the scheduling benefit is limited. In this paper, we try to develop a new scheduling algorithm which aims at reducing the job completion time (JCT). To achieve this goal, we take advantage of the directed acyclic graph (DAG) to build a novel network scheduler. The proposed scheduler formulates the problem as an integer linear programming (ILP) model, and proves it can be solved through an equivalent linear programming (LP) model quickly. Finally, experimental results demonstrate that our scheduler can return the solution in a few seconds and accelerate jobs significantly.

Key words: datacenter networking, distributed jobs, network scheduling, parallel computing, job completion time (JCT), directed acyclic graph (DAG)