ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2014, Vol. 51 ›› Issue (11): 2458-2469.doi: 10.7544/issn1000-1239.2014.20130798

Previous Articles     Next Articles

Frequent Items Mining Algorithm over Network Flows at High-Speed Network

Zhao Xiaohuan1,2, Xia Jingbo2, Fu Kai2, Li Minghui3   

  1. 1(No. 95034 Unit of PLA, Baise, Guangxi 533616); 2(School of Information and Navigation, Air Force Engineering University, Xi’an 710077); 3(Air Force Logistics Department, Beijing 100720)
  • Online:2014-11-01

Abstract: With the bandwidth of backbone network link increasing geometrically, mining the frequent items over network flows promptly and accurately is important for network management and network security. Inspired by SS counting method, an integrated weighted frequent items mining algorithm IWFIM over network flows, whose pruning strategy is subject to the constraints of time and flow length, is proposed according to the property of flows. The weight of each flow item is endowed by time and flow length and the item with the minimum weight is deleted during the operation of pruning for IWFIM. Then, based on IWFIM, another frequent items mining algorithm CBF_IWFIM with the capability of combining the advantages of hashing method and counting method is proposed according to the property of heavy-tailed distribution of flows. The improved counting Blooming filter is used to filter the majority of small flows without saving flows’ information and IWFIM is introduced to identify the frequent items afterwards for CBF_IWFIM. The experiments over real network traffic show that CBF_IWFIM and IWFIM are very space-saving and precise, and they can achieve much more reasonable measurement accuracy than other three frequent items mining algorithms like SS. Even in the situation of consuming one-third of the space cost in other three algorithms, the two algorithms CBF_IWFIM and IWFIM still perform better than other three algorithms like SS.

Key words: network flows, frequent items, data mining, pruning strategy, counting method, Hash method, heavy-tailed distribution, counting Blooming filter

CLC Number: