ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (11): 2456-2474.doi: 10.7544/issn1000-1239.2021.20210560

所属专题: 2021密码学与网络空间安全治理专题

• 信息安全 • 上一篇    下一篇

InterDroid:面向概念漂移的可解释性Android恶意软件检测方法

张炳1,2,文峥1,2,魏筱瑜3,任家东1,2   

  1. 1(燕山大学信息科学与工程学院 河北秦皇岛 066004);2(河北省软件工程重点实验室(燕山大学) 河北秦皇岛 066004);3(中国五洲工程设计研究院 北京 100053) (jdren@ysu.edu.cn)
  • 出版日期: 2021-11-01
  • 基金资助: 
    国家自然科学基金项目(61802332,61807028,61772449);燕山大学博士基金项目(BL18012)

InterDroid: An Interpretable Android Malware Detection Method for Conceptual Drift

Zhang Bing1,2, Wen Zheng1,2, Wei Xiaoyu3, Ren Jiadong1,2   

  1. 1(School of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004);2(Key Laboratory of Software Engineering of Hebei Province(Yanshan University), Qinhuangdao, Hebei 066004);3(China Wuzhou Engineering Group, Beijing 100053)
  • Online: 2021-11-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61802332, 61807028, 61772449) and the Doctoral Foundation Program of Yanshan University (BL18012).

摘要: 针对Android恶意软件检测存在特征引入过程主观性高、特征选择过程可解释性差、训练模型检测效果不具备时间稳定性的问题,提出了一种面向概念漂移的可解释性Android恶意软件检测方法InterDroid,该方法首先通过高质量的人工Android恶意软件分析报告引入权限、API包名、意图、Dalvik字节码4种特征.并通过自动化机器学习算法TPOT(tree-based tipeline optimization tool)获得InterDroid训练及对比算法,从而摒弃传统方法中繁复的模型选择与参数调整过程.其后,融入模型解释算法SHAP(shapley additive explanations)改进传统的特征包装方法,从而获得对分类结果具有高贡献度的特征组合用于检测模型训练.最后,通过曼-惠特尼U(Mann-Whitney U, MWU)与机器学习模型的双重检验证明概念漂移现象在Android恶意软件检测中的存在性.并基于联合分布适配(joint distribution adaptation, JDA)算法提高检测模型对新时期Android恶意软件的检测准确率.实验表明:InterDroid筛选出的特征组合具备稳定性与可解释性.同时,InterDroid中的特征迁移模块可将自身对2019年、2020年新兴Android恶意软件的检测准确率分别提高46%,44%.

关键词: Android恶意软件检测, 可解释性, 概念漂移, 特征迁移, 自动化机器学习

Abstract: Aiming at the problems in Android malware detection, which are high subjectivity of feature definition, poor interpretability of feature selection process, and lack of temporal instability of training model detection accuracy, an interpretable Android malware detection method for concept drift called InterDroid is proposed. Firstly, four characteristics of the detection model: permission, API package name, intention and Dalvik bytecode are inferred through the high-quality artificial Android malware analysis report. And InterDroid training and comparison algorithm are obtained through automatic machine learning algorithm TPOT (tree-based tipeline optimization tool), thus abandoning the complicated process of model selection and parameter adjustment in traditional methods. After that, the traditional feature wrapper method is improved by integrating the model interpretation algorithm SHAP (shapley additive explanations), and the feature set with high contribution to the classification results is obtained for detection model training. Finally, the existence of concept drift in Android malware detection is proved by the double tests of MWU(Mann-Whitney U) and machine learning model. Based on the JDA(joint distribution adaptation), the accuracy of the detection model for Android malware in the new era is improved. The experimental results show that the feature screened by InterDroid is stable and interpretable. At the same time, the feature-representation transfer module in InterDroid can improve the detection accuracy of Android malware in 2019 and 2020 by 46% and 44%.

Key words: Android malware detection, interpretability, concept drift, feature-representation transfer, automated machine learning

中图分类号: