Coding-Based Performance Improvement of Distributed Machine Learning in Large-Scale Clusters

Wang Yan; Li Nianshuang; Wang Xiling; Zhong Fengyan

doi:10.7544/issn1000-1239.2020.20190286

Journal of Computer Research and Development > 2020 > 57(3): 542-561. > DOI: 10.7544/issn1000-1239.2020.20190286 CSTR: 32373.14.issn1000-1239.2020.20190286

Wang Yan, Li Nianshuang, Wang Xiling, Zhong Fengyan. Coding-Based Performance Improvement of Distributed Machine Learning in Large-Scale Clusters[J]. Journal of Computer Research and Development, 2020, 57(3): 542-561. DOI: 10.7544/issn1000-1239.2020.20190286

Citation:

PDF (3119 KB)

Coding-Based Performance Improvement of Distributed Machine Learning in Large-Scale Clusters

(School of Software, East China Jiaotong University, Nanchang 330013)

Funds: This work was supported by the National Natural Science Foundation of China (61402172) and the Natural Science Foundation of Jiangxi Province of China (20192BAB217006).

More Information

Published Date: February 29, 2020

Graphical Abstract

Abstract

Abstract

With the growth of models and data sets, running large-scale machine learning algorithms in distributed clusters has become a common method. This method divides the whole machine learning algorithm and training data into several tasks and each task runs on different worker nodes. Then, the results of all tasks are combined by master node to get the results of the whole algorithm. When there are a large number of nodes in distributed cluster, some worker nodes, called straggler, will inevitably slow down than other nodes due to resource competition and other reasons, which makes the task time of running on this node significantly higher than that of other nodes. Compared with running replica task on multiple nodes, coded computing shows an impact of efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in large-scale machine learning cluster.This paper introduces the research progress of solving the straggler issues and improving the performance of large-scale machine learning cluster based on coding technology. Firstly, we introduce the background of coding technology and large-scale machine learning cluster. Secondly, we divide the related research into several categories according to application scenarios: matrix multiplication, gradient computing, data shuffling and some other applications. Finally, we summarize the difficulties of applying coding technology in large-scale machine learning cluster and discuss the future research trends about it.
- coding technology,
- machine learning,
- distributed computing,
- stragglers tolerate,
- performance improvement

FullText(HTML)

References (0)

[1]	Gao Han, Luo Juan, Cai Qianya, Zheng Yanliu. An Intelligent Traffic Signal Coordination Method Based on Asynchronous Decision-Making[J]. Journal of Computer Research and Development, 2023, 60(12): 2797-2805. DOI: 10.7544/issn1000-1239.202220773
[2]	Wang Qi, Li Deyu, Zhai Yanhui, Zhang Shaoxia. Parameterized Fuzzy Decision Implication[J]. Journal of Computer Research and Development, 2022, 59(9): 2066-2074. DOI: 10.7544/issn1000-1239.20210539
[3]	Zhang Chao, Li Deyu. Interval-Valued Hesitant Fuzzy Graphs Decision Making with Correlations and Prioritization Relationships[J]. Journal of Computer Research and Development, 2019, 56(11): 2438-2447. DOI: 10.7544/issn1000-1239.2019.20180314
[4]	Liu Linlan, Zhang Jiang, Shu Jian, Guo Kai, Meng Lingchong. Multiple Attribute Decision Making-Based Prediction Approach of Critical Node for Opportunistic Sensor Networks[J]. Journal of Computer Research and Development, 2017, 54(9): 2021-2031. DOI: 10.7544/issn1000-1239.2017.20160645
[5]	Yu Haiquan, Si Guangya, Yang Zhimou, Luo Pi. The Scene Simulation of Crowd Behaviors Oriented to Strategic Decision-Making[J]. Journal of Computer Research and Development, 2010, 47(6): 1020-1025.
[6]	Zhai Junhai, Wang Xizhao, Zhang Sufang. Integration of Multiple Fuzzy Decision Trees Based on Fuzzy Integral[J]. Journal of Computer Research and Development, 2009, 46(3): 470-477.
[7]	Mu Chengpo, Huang Houkuan, Tian Shengfeng, Li Xiangjun. A Survey of Intrusion Response Decision-Making Techniques of Automated Intrusion Response Systems[J]. Journal of Computer Research and Development, 2008, 45(8): 1290-1298.
[8]	Jiang Yuncheng, Tang Yong, Wang Ju, Shen Yuming. A Tableaux Decision Procedure for Fuzzy Description Logic FALNUI[J]. Journal of Computer Research and Development, 2007, 44(8): 1309-1316.
[9]	Hu Xiaojian, Yang Shanlin, Hu Xiaoxuan, Fang Fang. Optimal Decomposition of Decision Table Systems Based on Bayesian Networks[J]. Journal of Computer Research and Development, 2007, 44(4): 667-673.
[10]	Lin Zhigui, Xu Lizhong, Yan Xijun, Huang Fengchen, Liu Yingping. A Decision-Making Method on D-S Evidence Fusion Information Based on Distance Measure[J]. Journal of Computer Research and Development, 2006, 43(1): 169-175.

Cited By

Cited by

Periodical cited type(5)

1.	谢汶兵，田雪，漆锋滨，武成岗，王俊，罗巧玲. 二进制翻译技术综述. 软件学报. 2024(06): 2687-2723 .
2.	刘登峰，李东亚，柴志雷，周浩杰，丁海峰. 基于QEMU的SIMD指令替换浮点指令框架. 湖南大学学报(自然科学版). 2024(08): 70-77 .
3.	余子濠，陈璐，孙凝晖，包云岗 . 以RISC-V为目标的动态二进制翻译代码质量优化方法. 计算机研究与发展. 2023(10): 2322-2334 . 本站查看
4.	李明亮，庞建民，岳峰. 基于地址重用的二进制翻译本地代码替换. 信息工程大学学报. 2022(01): 38-44 .
5.	李男，庞建民. 基于中间表示规则替换的二进制翻译中间代码优化方法. 国防科技大学学报. 2021(04): 156-162 .