Citation: | Liu Chunhong, Li Weili, Jiao Jie, Wang Jingxiong, Zhang Junna. An Interpretable Cloud Platform Task Termination State Prediction Method[J]. Journal of Computer Research and Development, 2024, 61(3): 716-727. DOI: 10.7544/issn1000-1239.202220796 |
Based on feature selection and model interpretable method, a cloud platform task termination state prediction model with strong interpretability is constructed. The model visualizes the mapping relationship between static and dynamic attributes of tasks/jobs and termination status, then finds out the mapping mechanism between the load characteristics and the task termination states. The workload monitoring log published by Google is used, and the task dynamic information in the cloud platform is added. Shapley Additive explain (SHAP) is used to find out the importance of the influence of static and dynamic attributes on the termination state, and the results of task termination states prediction model modeling are explained by using the importance of variables combined with SHAP value and XGBoost model. Visualization technology is used to show how load characteristics affect the model’s prediction of different task termination states. The average value of the absolute value of SHAP is used to measure the importance of features, and the global visualization of the importance of features in different termination states is realized. According to the results, 20 variables that have great influence on the prediction model of task termination states are selected as the basis of feature selection. How the change of characteristics affects the different termination states of tasks is visualized. From the visualization results, it can be seen that in the process of task running, different eigenvalues of each feature have influences on the termination states of the task, and different eigenvalues have different influences on the termination states. Feature selection combined with model interpretable method is applied to the construction process of task termination states prediction model, which can help to build a task termination states prediction model with high classification performance and easy understanding. By exploring the mapping mechanism between load characteristics and task termination status, the scheduling mechanism of cloud platform can be optimized.
[1] |
Tjoa E, Guan C. A survey on explainable artificial intelligence : Toward medical[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(11): 4793−4813
|
[2] |
Verma A, Pedrosa L, Korupolu M , et al. Large-scale cluster management at Google with Borg[C]//Proc of the 10th European Conf on Computer Systems. New York: ACM, 2015 [2022-12-26].https://doi.org/10.1145/2741948.2741964
|
[3] |
Jassas M S , Mahmoud Q H . Failure characterization and prediction of scheduling jobs in Google cluster traces[C/OL]// Proc of the 10th IEEE GCC Conf & Exhibition (GCC). Piscataway, NJ: IEEE, 2019 [2022-12-26].https://ieeexplore.ieee.org/document/9087621
|
[4] |
Gao Jiechao, Wang Haoyu, Shen Haiying. Task failure prediction in cloud data centers using deep learning[J]. IEEE Transactions on Services Computing, 2020, 15(3): 1411−1422
|
[5] |
Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences[C]// Proc of Int Conf on Machine Learning. New York: ACM, 2017: 3145−3153
|
[6] |
Ventura F, Cerquitelli T, Giacalone F. Black-box model explained through an assessment of its interpretable features[C]// Proc of European Conf on Advances in Databases and Information Systems. Berlin: Springer, 2018: 138−149
|
[7] |
Fong R C, Vedaldi A. Interpretable explanations of black boxes by meaningful perturbation[C]//Proc of IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 3429−3437
|
[8] |
Petsiuk V, Das A, Saenko K. Rise: Randomized input sampling for explanation of black-box models[J]. arXiv preprint, arXiv: 1806. 07421, 2018
|
[9] |
马连韬,张超贺,焦贤锋,等. Dr. Deep:基于医疗特征上下文学习的患者健康状态可解释评估[J]. 计算机研究与发展,2021,58(12):2645−2659 doi: 10.7544/issn1000-1239.2021.20211022
Ma Liantao, Zhang Chaohe, Jiao Xianfeng, et al. Dr. Deep: Interpretable evaluation of patient health status via clinical feature’s context learning[J]. Journal of Computer Research and Development, 2021, 58(12): 2645−2659(in Chinese) doi: 10.7544/issn1000-1239.2021.20211022
|
[10] |
Baumgartner C F, Koch L M, Tezcan K C, et al. Visual feature attribution using wasserstein GANs[C]//Proc of IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 8309−8319
|
[11] |
化盈盈,张岱墀,葛仕明. 深度学习模型可解释性的研究进展[J]. 信息安全学报,2020,5(3):1−12 doi: 10.19363/J.cnki.cn10-1380/tn.2020.05.01
Hua Yingying, Zhang Daichi, Ge Shiming. Research progress in the interpretability of deep learning models[J]. Journal of Cyber Security, 2020, 5(3): 1−12 (in Chinese) doi: 10.19363/J.cnki.cn10-1380/tn.2020.05.01
|
[12] |
Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C]//Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 4768−4777
|
[13] |
Lundberg S M, Erion G, Chen H, et al. Explainable AI for trees: From local explanations to global understanding[J]. arXiv preprint, arXiv: 1905. 04610, 2019
|
[14] |
Lundberg S M, Erion G G, Lee S I. Consistent individualized feature attribution for tree ensembles[J]. arXiv preprint, arXiv: 1802. 03888, 2018
|
[15] |
代丽萍. 大规模云平台任务终止状态预测方法研究[D]. 新乡:河南师范大学,2020
Dai Liping. Research on prediction method of task termination status of large-scale cloud platform[D]. Xinxiang: Henan Normal University 2020(in Chinese)
|
[16] |
刘春红,韩晶晶,商彦磊. 基于 SVM 分类的云集群失败作业主动预测方法[J]. 北京邮电大学学报,2016,39(5):104−109
Liu Chunhong, Han Jingjing, Shang Yanlei. Predicting job failure in cloud cluster: Based on SVM classification[J]. Journal of Beijing University of Posts and Telecommunications, 2016, 39(5): 104−109(in Chinese)
|
[17] |
Chen Xin, Lu C D, Pattabiraman K. Failure analysis of jobs in compute clouds: A Google cluster case study[C]// Proc of the 25th IEEE Int Symp on Software Reliability Engineering. Piscataway, NJ: IEEE, 2014: 167−177
|
[1] | Wu Jingya, Lu Wenyan, Yan Guihai, Li Xiaowei. HyperTree: High Concurrent B+tree Index Accelerator[J]. Journal of Computer Research and Development, 2023, 60(7): 1661-1677. DOI: 10.7544/issn1000-1239.202111055 |
[2] | Yang Yongpeng, Jiang Dejun. A Method for Solving the wandering B+ tree Problem[J]. Journal of Computer Research and Development, 2023, 60(3): 539-554. DOI: 10.7544/issn1000-1239.202220555 |
[3] | Yan Wei, Zhang Xingjun, Ji Zeyu, Dong Xiaoshe, Ji Chenzhao. One-Direction Shift B+-Tree Based on Persistent Memory[J]. Journal of Computer Research and Development, 2021, 58(2): 371-383. DOI: 10.7544/issn1000-1239.2021.20200403 |
[4] | Te Rigen, Li Wei, and Li Xiongfei. Storage Model and Implementation of the Dynamic Ordered Tree[J]. Journal of Computer Research and Development, 2013, 50(5): 969-985. |
[5] | Shen Yan, Song Shunlin, Zhu Yuquan. Mining Algorithm of Association Rules Based on Disk Table Resident FP-TREE[J]. Journal of Computer Research and Development, 2012, 49(6): 1313-1322. |
[6] | Wang Hongqiang, Li Jianzhong, and Wang Hongzhi. Processing XPath over F&B-Index[J]. Journal of Computer Research and Development, 2010, 47(5): 866-877. |
[7] | Zhou Da, Liang Zhichao, Meng Xiaofeng. HF-Tree: An Update-Efficient Index for Flash Memory[J]. Journal of Computer Research and Development, 2010, 47(5): 832-840. |
[8] | Sun Xiaojuan, Sun Ninghui, Chen Mingyu. Optimization of B-NIDS for Multicore[J]. Journal of Computer Research and Development, 2007, 44(10): 1733-1740. |
[9] | Ju Dapeng, Li Ming, Hu Jinfeng, Wang Dongsheng, Zheng Weimin, and Ma Yongquan. An Algorithm of B\++ Tree Management in P2P Environment[J]. Journal of Computer Research and Development, 2005, 42(8): 1438-1444. |
[10] | Dong Daoguo, Liang Liuhong, and Xue Xiangyang. VAR-Tree—A New High-Dimensional Data Index Structure[J]. Journal of Computer Research and Development, 2005, 42(1): 10-17. |
1. |
LUO Haoran,HU Shuisong,WANG Wenyong,TANG Yuke,ZHOU Junwei. Research on Multi-Core Processor Analysis for WCET Estimation. ZTE Communications. 2024(01): 87-94 .
![]() |