ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展

• 信息安全 • 上一篇    下一篇

基于有效窗口和朴素贝叶斯的恶意代码分类

朱克楠1 尹宝林2 冒亚明3 胡英男3   

  1. 1(中国石油信息技术服务中心 北京 100007) 2(北京航空航天大学计算机学院 北京 100191) 3(中国石油安全环保技术研究院HSE信息中心 北京 102206) (zhukenan@gmail.com)
  • 出版日期: 2014-02-15

Malware Classification Approach Based on Valid Window and Naive Bayes

Zhu Kenan1, Yin Baolin2, Mao Yaming3, and Hu Yingnan3   

  1. 1(CNPC Information Technology Service Center, Beijing 100007) 2(School of Computer Science, Beihang University, Beijing 100191) 3(HSE Information Center, CNPC Research Institute of Safety and Environment Technology, Beijing 102206)
  • Online: 2014-02-15

摘要: 恶意代码分类是恶意代码分析和入侵检测领域中的核心问题.现有分类方法分析效率低,准确性差,主要原因在于行为分析原始资料规模大,噪声高,随机因素干扰.针对上述问题,以恶意代码行为序列报告作为基础,在分析随机因素及行为噪声对恶意代码行为特征和操作相似性的干扰之后,给出一个系统调用参数有效窗口模型,通过该模型加强行为序列的相似度描述能力,降低随机因素的干扰.在此基础上提出一种基于朴素贝叶斯机器学习模型和操作相似度窗口的恶意代码自动分类方法.设计并实现了一个自动恶意代码行为分类器原型MalwareFilter.使用真实恶意代码生成的行为序列报告对原型系统进行评估,通过实验证明了该方法的有效性,结果表明,该方法通过操作相似度窗口提高了训练和分类过程的性能和准确度.

关键词: 恶意代码, 行为分类, 朴素贝叶斯, 机器学习, 入侵检测, 行为特征, 操作相似度

Abstract: Malware classification is the key problem in the field of malicious code analysis and intrusion detection. Existing malware classification approaches have low efficiency and poor accuracy because the raw behavior analysis data is large-scale with high noise data and interfered by random factors. To solve the above issues, taking the malware behavior reports as raw data, this paper analyzes the malware behavior characteristics, the operation similarity, the interference situation of random factors and noisy behavior data. Then it proposes a parameter valid window model for system call which improves the ability of operation sequence to describe behavior similarity. On this basis, the paper presents a malware classification approach based on naive Bayes machine learning model and parameter valid window. Moreover, an automatic malware behavior classifier prototype called MalwareFilter is designed and implemented in this paper. In case study, we evaluate the prototype using system call sequence reports generated through true malware. The experiment results show that our approach is effective, and the performance and accuracy of training and classification are improved through parameter valid window.

Key words: malware, behavior classification, naive Bayes, machine learning, intrusion detection, behavior characteristic, operation similarity