分布式深度学习框架下基于性能感知的DBS-SGD算法

纪泽宇; 张兴军; 付哲; 高柏松; 李靖波

doi:10.7544/issn1000-1239.2019.20180880

分布式深度学习框架下基于性能感知的DBS-SGD算法

(西安交通大学计算机科学与技术学院西安 710049) (zeyu.ji@stu.xjtu.edu.cn)

基金项目: 国家重点研发计划项目(2016YFB0200902)

详细信息

中图分类号: TP183
计量
- 文章访问数: 1439
- HTML全文浏览量: 12
- PDF下载量: 623
出版历程
- 发布日期: 2019-10-31

Performance-Awareness Based Dynamic Batch Size SGD for Distributed Deep Learning Framework

(School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049)

摘要

摘要: 通过增加模型的深度以及训练数据的样本数量，深度神经网络模型能够在多个机器学习任务中获得更好的性能，然而这些必要的操作会使得深度神经网络模型训练的开销相应增大.因此为了更好地应对大量的训练开销，在分布式计算环境中对深度神经网络模型的训练过程进行加速成为了研发人员最常用的手段.随机梯度下降(stochastic gradient descent, SGD)算法是当前深度神经网络模型中最常见的训练算法之一，然而SGD在进行并行化的时候容易产生梯度过时问题，从而影响算法的整体收敛性.现有解决方案大部分针对的是各节点性能差别较小的高性能计算(high performance computing, HPC)环境，很少有研究考虑过各节点性能差别较大的集群环境.针对上述问题进行研究并提出了一种基于性能感知技术的动态batch size随机梯度下降算法(dynamic batch size SGD, DBS-SGD).该算法通过分析各节点的计算能力，对各节点的minibatch进行动态分配，从而保证了节点间每次迭代更新的时间基本一致，进而降低了节点的平均梯度过时值.提出的算法能够有效优化异步更新策略中存在的梯度过时问题.选用常用的图像分类基准Mnist和cifar10作为训练数据集，将该算法与异步随机梯度下降(asynchronous SGD, ASGD)算法、n-soft算法进行了对比.实验结果表明：在不损失加速比的情况下，Mnist数据集的loss函数值降低了60%，cifar数据集的准确率提升了约10%，loss函数值降低了10%，其性能高于ASGD算法和n-soft算法，接近同步策略下的收敛曲线.
- 参数服务器 /
- 异步随机梯度下降算法 /
- 梯度过时 /
- 性能感知 /
- 数据并行
Abstract: By increasing the depth of neural network and the size of datasets, the deep neural networks are currently widely used for many artificial intelligence applications including computer vision, speech recognition and natural language processing. It can deliver the state of the art accuracy on these tasks. However, these operations will increase the overhead of training process in deep neural network algorithm. Stochastic gradient descent (SGD) has been used for accelerating the training of deep neural networks in a distributed computing environment. Nevertheless, parallel SGD can easily generate the problem of stale gradient, which affects the overall convergence. Most of the existing solutions are suit for high performance computing (HPC) environment where the performance of each node is similar. Few studies have researched cluster environment where the performance of each node is quite different. This paper proposes a variant of ASGD (asynchronous SGD) algorithm in which the batch size is modulated according to the runtime performance of each node. Experimental verification is performed on commonly-used image classification benchmarks: Mnist and cifar10 to demonstrate the effectiveness of the approach. Compared with ASGD and n-soft, the loss function of Mnist is reduced by 60% and the accuracy of the cifar10 is increased about 10% without reducing the speed-up.
- parameter server /
- synchronous stochastic gradient descent (SSGD) /
- stale gradient /
- performance awareness /
- data parallelism

HTML全文

参考文献(0)

施引文献(35)

期刊类型引用(11)

1.	肖宇庭，吕晓琪，谷宇，刘传强. 基于拆分残差网络的糖尿病视网膜病变分类. 广西师范大学学报(自然科学版). 2024(01): 91-101 . 百度学术
2.	吕德珍，赵玉，苗素琴. 基于分布式多节点医疗管理系统进程设计. 计算机与数字工程. 2024(02): 382-387 . 百度学术
3.	盛文娟，赖振谱，杨宁，Peng Gangding. 基于改进AdaBoost算法的可调谐F-P滤波器温漂补偿方法. 光学学报. 2023(03): 48-56 . 百度学术
4.	傅懋钟，胡海洋，李忠金. 面向GPU集群的动态资源调度方法. 计算机研究与发展. 2023(06): 1308-1321 . 本站查看
5.	杨小琴，朱玉全. 基于距离限定优化的多姿态人脸图像智能识别. 计算机仿真. 2022(01): 200-203+282 . 百度学术
6.	王昕. 梯度下降及优化算法研究综述. 电脑知识与技术. 2022(08): 71-73 . 百度学术
7.	赵永亮，于倩，邓博，韩丽君，高红梅. 基于博弈论及机器学习的最优化算法设计与仿真. 电子设计工程. 2022(13): 23-27 . 百度学术
8.	李晓锋，燕少飞，吴宸. 移动终端操作系统应用程序恶意检测系统技术研究. 电子技术与软件工程. 2022(17): 75-79 . 百度学术
9.	蒋平. 基于卷积神经网络的图像精度深度优化. 淮阴工学院学报. 2021(03): 30-34 . 百度学术
10.	杨国葳，李宏坤，张明亮，黄刚劲. 基于一维深度卷积自动编码器的刀具状态监测方法. 振动与冲击. 2021(21): 223-233+274 . 百度学术
11.	郑雯，沈琪浩，任佳. 基于Improved DR-Net算法的糖尿病视网膜病变识别与分级. 光学学报. 2021(22): 72-83 . 百度学术