ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (11): 2458-2469.doi: 10.7544/issn1000-1239.2014.20130798

• 网络技术 • 上一篇    下一篇

高速网络流频繁项挖掘算法

赵小欢1,2,夏靖波2,付凯2,李明辉3   

  1. 1(中国人民解放军95034部队 广西百色 533616);2(空军工程大学信息与导航学院 西安 710077);3(空军后勤部 北京 100720) (zxhzxh_2012@163.com)
  • 出版日期: 2014-11-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61201209);陕西省自然科学基金重点项目(2012JZ8005);全军军事学研究生课题(2010JYXXXX-488)

Frequent Items Mining Algorithm over Network Flows at High-Speed Network

Zhao Xiaohuan1,2, Xia Jingbo2, Fu Kai2, Li Minghui3   

  1. 1(No. 95034 Unit of PLA, Baise, Guangxi 533616); 2(School of Information and Navigation, Air Force Engineering University, Xi’an 710077); 3(Air Force Logistics Department, Beijing 100720)
  • Online: 2014-11-01

摘要: 在当前骨干网络链路速率呈几何倍数增长的情况下,实时准确地挖掘出网络流中的频繁项对于网络管理和网络安全具有重要的意义.在SS(space saving)计数算法的启发之下,针对网络流的实际特性,提出了一种剪枝操作受时间和流长双重约束的网络流频繁项挖掘算法(integrated weighted frequent items mining, IWFIM).IWFIM计数算法采用时间和流长组合赋权的方式为每个流项赋权,且算法每次剪枝操作时总是删除权值最小的流项.在IWFIM算法的基础上,依据网络流的重尾分布特性,又提出了一种能够结合散列方法和计数方法优点的网络流频繁项挖掘算法(counting Blooming filter and integrated weighted frequent items mining, CBF_IWFIM).CBF_IWFIM算法首先采用改进的计数型布鲁姆过滤器(counting Blooming filter, CBF)在不保存网络流信息的情况下过滤掉绝大部分的短流,然后采用IWFIM算法实现网络流频繁项挖掘.通过实际网络流量测试表明,CBF_IWFIM和IWFIM算法具有非常高的空间利用率和准确率,2种算法对于网络流频繁项的挖掘效果明显优于SS等3种算法,即使在使用其他算法1/3缓存的极端情况下,CBF_IWFIM和IWFIM 2种算法的频繁项识别效果仍然要优于SS等算法.

关键词: 网络流, 频繁项, 数据挖掘, 剪枝策略, 计数算法, 散列算法, 重尾分布, 计数型布鲁姆过滤器

Abstract: With the bandwidth of backbone network link increasing geometrically, mining the frequent items over network flows promptly and accurately is important for network management and network security. Inspired by SS counting method, an integrated weighted frequent items mining algorithm IWFIM over network flows, whose pruning strategy is subject to the constraints of time and flow length, is proposed according to the property of flows. The weight of each flow item is endowed by time and flow length and the item with the minimum weight is deleted during the operation of pruning for IWFIM. Then, based on IWFIM, another frequent items mining algorithm CBF_IWFIM with the capability of combining the advantages of hashing method and counting method is proposed according to the property of heavy-tailed distribution of flows. The improved counting Blooming filter is used to filter the majority of small flows without saving flows’ information and IWFIM is introduced to identify the frequent items afterwards for CBF_IWFIM. The experiments over real network traffic show that CBF_IWFIM and IWFIM are very space-saving and precise, and they can achieve much more reasonable measurement accuracy than other three frequent items mining algorithms like SS. Even in the situation of consuming one-third of the space cost in other three algorithms, the two algorithms CBF_IWFIM and IWFIM still perform better than other three algorithms like SS.

Key words: network flows, frequent items, data mining, pruning strategy, counting method, Hash method, heavy-tailed distribution, counting Blooming filter

中图分类号: