ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (3): 631-638.doi: 10.7544/issn1000-1239.2020.20180846

• 信息安全 • 上一篇    下一篇

基于最大频繁子图挖掘的动态污点分析方法

郭方方1,王欣悦1,王慧强1,吕宏武1,胡义兵1,吴芳1,冯光升1,赵倩2   

  1. 1(哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001);2(哈尔滨商业大学计算机与信息工程学院 哈尔滨 150028) (guofangfang@hrbeu.edu.cn)
  • 出版日期: 2020-03-01
  • 基金资助: 
    国家自然科学基金项目(61502118);黑龙江省自然科学基金项目(F2016028);中央高校基本科研业务费专项资金(HEUCF180602,HEUCFM180604);国家科技重大专项基金项目(2016ZX03001023-005)

A Dynamic Stain Analysis Method on Maximal Frequent Sub Graph Mining

Guo Fangfang1, Wang Xinyue1, Wang Huiqiang1, Lü Hongwu1, Hu Yibing1, Wu Fang1, Feng Guangsheng1, Zhao Qian2   

  1. 1(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001);2(College of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028)
  • Online: 2020-03-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61502118), the Natural Science Foundation of Heilongjiang Province of China (F2016028), the Fundamental Research Funds for the Central Universities (HEUCF180602, HEUCFM180604), and the Major National Science and Technology Program (2016ZX03001023-005).

摘要: 目前,传统面向恶意代码识别的动态污点分析方法广泛存在行为依赖图数量巨大、匹配时间消耗长的问题.提出一种动态污点分析方法——基于最大频繁子图挖掘的动态污点分析方法.该方法从恶意代码家族行为依赖图挖掘出代表家族显著共性特征的最大频繁子图,被挖掘出的最大频繁子图即为某类恶意代码家族以及该家族所有变种之间最为突出的共有特征,使用挖掘出的最大频繁子图与被测行为依赖图进行比较匹配即可.既能够保证原有恶意代码特征无丢失又削减了行为依赖图数量,并在此基础上进一步提升了识别效率.经实验分析,提出的这种新的动态污点分析方法相比于传统方法,当最小支持度为0.045时,行为依赖图数量减少了82%,识别效率提高了81.7%,准确率达到了92.15%.

关键词: 恶意代码识别, 恶意代码家族, 动态污点分析, 行为依赖图, 最大频繁子图挖掘

Abstract: The malicious code recognition method on traditional dynamic stain analysis technology has many problems such as huge number of malicious code behavior dependency graphs (MBDG) and long time of matching process.According to the common characteristics of each malicious code family, the behavior dependency graph is represented by some common sub graph parts. Therefore, this paper proposes a malicious code behavior dependency graph mining method based on maximum frequent sub graphs. The method mines the largest frequent sub graphs which can represent the significant common features of the family from the malicious code family behavior dependency graph. The maximum frequent sub graph that is mined can represent the most significant common feature among the variants of this type of malicious code. The target behavior dependency graph just needs to be matched with the largest frequent sub graph after mining.Besides, the method reduces the number of behavior dependency graphs and improves the recognition efficiency without losing the characteristics of malicious code behavior. Compared with the traditional dynamic stain analysis method for malicious code recognition, when the minimum support is 0.045, the number of behavior dependency graphs decreases by 82%, the recognition efficiency increases by 81.7%, and the accuracy rate is 92.15%.

Key words: malicious code recognition, malicious code family, dynamic taint analysis, behavior dependency graph, maximal frequent sub graph mining

中图分类号: