ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (11): 2396-2409.doi: 10.7544/issn1000-1239.2019.20180880

Previous Articles     Next Articles

Performance-Awareness Based Dynamic Batch Size SGD for Distributed Deep Learning Framework

Ji Zeyu, Zhang Xingjun, Fu Zhe, Gao Bosong, Li Jingbo   

  1. (School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049)
  • Online:2019-11-12

Abstract: By increasing the depth of neural network and the size of datasets, the deep neural networks are currently widely used for many artificial intelligence applications including computer vision, speech recognition and natural language processing. It can deliver the state of the art accuracy on these tasks. However, these operations will increase the overhead of training process in deep neural network algorithm. Stochastic gradient descent (SGD) has been used for accelerating the training of deep neural networks in a distributed computing environment. Nevertheless, parallel SGD can easily generate the problem of stale gradient, which affects the overall convergence. Most of the existing solutions are suit for high performance computing (HPC) environment where the performance of each node is similar. Few studies have researched cluster environment where the performance of each node is quite different. This paper proposes a variant of ASGD (asynchronous SGD) algorithm in which the batch size is modulated according to the runtime performance of each node. Experimental verification is performed on commonly-used image classification benchmarks: Mnist and cifar10 to demonstrate the effectiveness of the approach. Compared with ASGD and n-soft, the loss function of Mnist is reduced by 60% and the accuracy of the cifar10 is increased about 10% without reducing the speed-up.

Key words: parameter server, synchronous stochastic gradient descent (SSGD), stale gradient, performance awareness, data parallelism

CLC Number: