高级检索

    一种可解释的云平台任务终止状态预测方法

    An Interpretable Cloud Platform Task Termination State Prediction Method

    • 摘要: 基于特征选择和模型可解释方法构建可解释性强的云平台任务终止状态预测模型,该模型可视化任务/作业的静态和动态属性与终止状态之间的映射关系,进而找出负载特征与任务终止状态之间的映射机理. 利用 Google公开的工作负载监控日志,并加入云平台中任务的动态信息,采用沙普利加和解释(Shapley additive explain,SHAP)找出静态和动态属性对终止状态影响的重要性,利用变量重要性结合SHAP值和XGBoost模型,对任务终止状态预测模型建模后的结果进行解释,使用可视化技术呈现负载特征如何影响模型对不同任务终止状态的预测. 用SHAP值绝对值的平均值衡量特征的重要性,实现任务不同终止状态特征重要性的全局可视化,根据结果筛选出对任务终止状态预测模型影响大的20个变量,作为特征筛选的依据;由可视化的结果可知,任务运行过程中,各特征的不同特征值对任务的终止状态有影响,不同特征值对终止状态的产生有不同的影响. 特征选择结合模型可解释性方法运用于任务终止状态预测模型的构建流程中,可辅助构建高分类性能及易于理解的任务终止状态预测模型,通过对负载特征与任务终止状态之间映射机理的探索,可以优化云平台的调度机制.

       

      Abstract: Based on feature selection and model interpretable method, a cloud platform task termination state prediction model with strong interpretability is constructed. The model visualizes the mapping relationship between static and dynamic attributes of tasks/jobs and termination status, then finds out the mapping mechanism between the load characteristics and the task termination states. The workload monitoring log published by Google is used, and the task dynamic information in the cloud platform is added. Shapley Additive explain (SHAP) is used to find out the importance of the influence of static and dynamic attributes on the termination state, and the results of task termination states prediction model modeling are explained by using the importance of variables combined with SHAP value and XGBoost model. Visualization technology is used to show how load characteristics affect the model’s prediction of different task termination states. The average value of the absolute value of SHAP is used to measure the importance of features, and the global visualization of the importance of features in different termination states is realized. According to the results, 20 variables that have great influence on the prediction model of task termination states are selected as the basis of feature selection. How the change of characteristics affects the different termination states of tasks is visualized. From the visualization results, it can be seen that in the process of task running, different eigenvalues of each feature have influences on the termination states of the task, and different eigenvalues have different influences on the termination states. Feature selection combined with model interpretable method is applied to the construction process of task termination states prediction model, which can help to build a task termination states prediction model with high classification performance and easy understanding. By exploring the mapping mechanism between load characteristics and task termination status, the scheduling mechanism of cloud platform can be optimized.

       

    /

    返回文章
    返回