挖掘数据流中的频繁模式

刘学军; 徐宏炳; 董逸生; 王永利; 钱江波

挖掘数据流中的频繁模式

Mining Frequent Patterns in Data Streams

摘要

摘要: 发现数据流中的频繁项是数据流挖掘中最基本的问题之一.数据流的无限性和流动性使得传统的频繁模式挖掘算法难以适用.针对数据流的特点，在借鉴FP-growth算法的基础上，提出了一种数据流频繁模式挖掘的新方法：FP-DS算法.算法采用数据分段的思想，逐段挖掘频繁项集，用户可以连续在线获得当前的频繁项集，可以有效地挖掘所有的频繁项集，算法尤其适合长频繁项集的挖掘.通过引入误差ε，裁减了大量的非频繁项集，减少了数据的存储量，也能保证整个数据集中项目集支持度误差不超过ε. 分析和实验表明算法有较好的性能.

Abstract: Finding frequent items is one of the most basic problems in the data streams. The limitless and mobility of data streams make the traditional frequent-pattern algorithm difficult to extend to data streams. According to data streams characteristic, inspired by the fact that the FP-growth provides an effective algorithm for frequent pattern mining, a new FP-DS algorithm for mining frequent patterns from data streams is proposed. In addition, the method, in which data streams are partitioned and frequent items are mined step by step, is adopted in the algorithm. So users may continuously get present frequent items online and any length frequent patterns for data streams can effectively be mined. Through introducing error ε, a large number of non- frequent items will be cut down and the storage space of the data streams can be reduced. Based on this algorithm, the error of support is guaranteed not to exceed ε. The analysis and experiments show that this algorithm has good performance.

HTML全文

参考文献(0)

施引文献

资源附件(0)