ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

挖掘数据流中的频繁模式

刘学军1,2 徐宏炳1 董逸生1 王永利1 钱江波1   

  1. 1(东南大学计算机科学与技术系 南京 210096) 2(南京工业大学信息科学与工程学院 南京 210009) (lxj-gd@vip.sina.com)
  • 出版日期: 2005-12-15

Mining Frequent Patterns in Data Streams

Liu Xuejun1,2, Xu Hongbing1, Dong Yisheng1, Wang Yongli1, and Qian Jiangbo1   

  1. 1(Department of Computer Science and Technology, Southeast University, Nanjing 210096) 2(College of Information Science and Engineering,Nanjing University of Technology, Nanjing 210009)
  • Online: 2005-12-15

摘要: 发现数据流中的频繁项是数据流挖掘中最基本的问题之一.数据流的无限性和流动性使得传统的频繁模式挖掘算法难以适用.针对数据流的特点,在借鉴FP-growth算法的基础上,提出了一种数据流频繁模式挖掘的新方法:FP-DS算法.算法采用数据分段的思想,逐段挖掘频繁项集,用户可以连续在线获得当前的频繁项集,可以有效地挖掘所有的频繁项集,算法尤其适合长频繁项集的挖掘.通过引入误差ε,裁减了大量的非频繁项集,减少了数据的存储量,也能保证整个数据集中项目集支持度误差不超过ε. 分析和实验表明算法有较好的性能.

关键词: 数据流, 频繁模式, FP-DS算法, 流数据挖掘

Abstract: Finding frequent items is one of the most basic problems in the data streams. The limitless and mobility of data streams make the traditional frequent-pattern algorithm difficult to extend to data streams. According to data streams characteristic, inspired by the fact that the FP-growth provides an effective algorithm for frequent pattern mining, a new FP-DS algorithm for mining frequent patterns from data streams is proposed. In addition, the method, in which data streams are partitioned and frequent items are mined step by step, is adopted in the algorithm. So users may continuously get present frequent items online and any length frequent patterns for data streams can effectively be mined. Through introducing error ε, a large number of non- frequent items will be cut down and the storage space of the data streams can be reduced. Based on this algorithm, the error of support is guaranteed not to exceed ε. The analysis and experiments show that this algorithm has good performance.

Key words: data streams, frequent patterns, FP-DS algorithm, stream data mining