Abstract:
A novel static Android malware detection system Maldetect is proposed in this paper. At first, the Dalvik instructions decompiled from Android DEX files are simplified and abstracted into simpler symbolic sequences. N-Gram is then employed to extract the features from the simplified Dalvik instruction sequences, and the detection and classification model is finally built using machine learning algorithms. By comparing different classification algorithms and N-Gram sequences, 3-Gram sequences with the random forest algorithm is identified as an optimal solution for the malware detection and classification. The performance of our method is compared against the professional anti-virus tools using 4000 malware samples, and the results show that Maldetect is more effective for Android malware detection with high detection accuracy.