高级检索
    赵小欢, 夏靖波, 付凯, 李明辉. 高速网络流频繁项挖掘算法[J]. 计算机研究与发展, 2014, 51(11): 2458-2469. DOI: 10.7544/issn1000-1239.2014.20130798
    引用本文: 赵小欢, 夏靖波, 付凯, 李明辉. 高速网络流频繁项挖掘算法[J]. 计算机研究与发展, 2014, 51(11): 2458-2469. DOI: 10.7544/issn1000-1239.2014.20130798
    Zhao Xiaohuan, Xia Jingbo, Fu Kai, Li Minghui. Frequent Items Mining Algorithm over Network Flows at High-Speed Network[J]. Journal of Computer Research and Development, 2014, 51(11): 2458-2469. DOI: 10.7544/issn1000-1239.2014.20130798
    Citation: Zhao Xiaohuan, Xia Jingbo, Fu Kai, Li Minghui. Frequent Items Mining Algorithm over Network Flows at High-Speed Network[J]. Journal of Computer Research and Development, 2014, 51(11): 2458-2469. DOI: 10.7544/issn1000-1239.2014.20130798

    高速网络流频繁项挖掘算法

    Frequent Items Mining Algorithm over Network Flows at High-Speed Network

    • 摘要: 在当前骨干网络链路速率呈几何倍数增长的情况下,实时准确地挖掘出网络流中的频繁项对于网络管理和网络安全具有重要的意义.在SS(space saving)计数算法的启发之下,针对网络流的实际特性,提出了一种剪枝操作受时间和流长双重约束的网络流频繁项挖掘算法(integrated weighted frequent items mining, IWFIM).IWFIM计数算法采用时间和流长组合赋权的方式为每个流项赋权,且算法每次剪枝操作时总是删除权值最小的流项.在IWFIM算法的基础上,依据网络流的重尾分布特性,又提出了一种能够结合散列方法和计数方法优点的网络流频繁项挖掘算法(counting Blooming filter and integrated weighted frequent items mining, CBF_IWFIM).CBF_IWFIM算法首先采用改进的计数型布鲁姆过滤器(counting Blooming filter, CBF)在不保存网络流信息的情况下过滤掉绝大部分的短流,然后采用IWFIM算法实现网络流频繁项挖掘.通过实际网络流量测试表明,CBF_IWFIM和IWFIM算法具有非常高的空间利用率和准确率,2种算法对于网络流频繁项的挖掘效果明显优于SS等3种算法,即使在使用其他算法1/3缓存的极端情况下,CBF_IWFIM和IWFIM 2种算法的频繁项识别效果仍然要优于SS等算法.

       

      Abstract: With the bandwidth of backbone network link increasing geometrically, mining the frequent items over network flows promptly and accurately is important for network management and network security. Inspired by SS counting method, an integrated weighted frequent items mining algorithm IWFIM over network flows, whose pruning strategy is subject to the constraints of time and flow length, is proposed according to the property of flows. The weight of each flow item is endowed by time and flow length and the item with the minimum weight is deleted during the operation of pruning for IWFIM. Then, based on IWFIM, another frequent items mining algorithm CBF_IWFIM with the capability of combining the advantages of hashing method and counting method is proposed according to the property of heavy-tailed distribution of flows. The improved counting Blooming filter is used to filter the majority of small flows without saving flows’ information and IWFIM is introduced to identify the frequent items afterwards for CBF_IWFIM. The experiments over real network traffic show that CBF_IWFIM and IWFIM are very space-saving and precise, and they can achieve much more reasonable measurement accuracy than other three frequent items mining algorithms like SS. Even in the situation of consuming one-third of the space cost in other three algorithms, the two algorithms CBF_IWFIM and IWFIM still perform better than other three algorithms like SS.

       

    /

    返回文章
    返回