高级检索

    一种抗混淆的大规模Android应用相似性检测方法

    An Anti-Obfuscation Method for Detecting Similarity Among Android Applications in Large Scale

    • 摘要: 随着代码混淆、加壳技术的应用,基于行为特征的Android应用相似性检测受到的影响愈加明显.提出了一种抗混淆的大规模Android应用相似性检测方法,通过提取应用内特定文件的内容特征计算应用相似性,该方法不受代码混淆的影响,且能有效抵抗文件混淆带来的干扰.对59万个应用内的文件类型进行统计,选取具有普遍性、代表性和可度量性的图片文件、音频文件和布局文件作为特征文件.针对3种特征文件的特点,提出了不同内容特征提取方法和相似度计算方法,并通过学习对其相似度赋予权重,进一步提高应用相似性检测的准确性.使用正版应用和已知恶意应用作为标准,对59万个应用进行相似性检测实验,结果显示基于文件内容的相似性检测可以准确识别重打包应用和含有已知恶意代码的应用,并且在效率和准确性上均优于现有方案.

       

      Abstract: Code obfuscation exerts a huge impact on similarity detection among Android applications based on behavior characteristics. In order to deal with the situation, we propose a novel way of similarity detection among Android applications based on file content characteristics, which computes the similarity of file content features and can be applied to large-scale scenario in real world. Our method is not subject to code obfuscation or file obfuscation. We choose to utilize the characteristics of image, audio and layout files which are shown in our statistics as the most representative features in Android applications. Meanwhile, different weights are given to these features through machine learning, which further enhances the accuracy of our method. In addition, we implement a prototype system and particularly optimize each step to speed up the calculation, making our system suitable for large-scale scenario and give a good calculation performance. The experiments dataset contains 59 000 applications. And for both legitimate application and malware applications, our system successfully detects those repackaged pirate applications and those with the similar malicious component, which prove the effectiveness of our method. The experiment results demonstrate that similarity detection based on file content characteristics could resist the file obfuscation and give better performance in both accuracy and efficiency.

       

    /

    返回文章
    返回