Abstract:
Aiming at the problems in Android malware detection, which are high subjectivity of feature definition, poor interpretability of feature selection process, and lack of temporal instability of training model detection accuracy, an interpretable Android malware detection method for concept drift called InterDroid is proposed. Firstly, four characteristics of the detection model: permission, API package name, intention and Dalvik bytecode are inferred through the high-quality artificial Android malware analysis report. And InterDroid training and comparison algorithm are obtained through automatic machine learning algorithm TPOT (tree-based tipeline optimization tool), thus abandoning the complicated process of model selection and parameter adjustment in traditional methods. After that, the traditional feature wrapper method is improved by integrating the model interpretation algorithm SHAP (shapley additive explanations), and the feature set with high contribution to the classification results is obtained for detection model training. Finally, the existence of concept drift in Android malware detection is proved by the double tests of MWU(Mann-Whitney U) and machine learning model. Based on the JDA(joint distribution adaptation), the accuracy of the detection model for Android malware in the new era is improved. The experimental results show that the feature screened by InterDroid is stable and interpretable. At the same time, the feature-representation transfer module in InterDroid can improve the detection accuracy of Android malware in 2019 and 2020 by 46% and 44%.