基于磁盘存储1项集计数的增量FP_GROWTH算法

申彦; 朱玉全; 刘春华

doi:10.7544/issn1000-1239.2015.20131436

基于磁盘存储1项集计数的增量FP_GROWTH算法

Incremental FP_GROWTH Algorithm Based on Disk-resident 1-itemsets Counting

摘要

摘要: 随着数据集规模的不断增大,提高频繁项集的挖掘效率成为数据挖掘领域的研究重点.频繁项集的增量更新挖掘算法因其可以利用已挖掘发现的信息提高对新数据集的挖掘效率,成为重要的研究方向.但现有频繁项集增量更新算法大多基于APRIORI算法框架,性能提高有限.最近出现的建立在FP-TREE等树形结构上的增量更新算法又往往存在树形结构调整困难、已发现频繁项集及树形结构保存效率较低等问题,算法性能有待进一步地提高.对此,通过分析增量挖掘过程中的关键信息,提出了一种基于磁盘存储1项集计数的增量FP_GROWTH算法(IU_FPGROWTH_1COUNTING).该算法无需保存临时树形结构及临时挖掘结果，可以在原数据集及支持度均发生变化时，减少FP_GROWTH算法对数据集的扫描，提高频繁项集的挖掘效率.在生成以及真实数据集上进行了验证实验以及性能分析,结果表明IU_FPGROWTH_1COUNTING是一种有效的频繁项集增量更新挖掘算法.

Abstract: As the size of data set increases constantly, improving the mining efficiency of frequent itemsets has become a key research in data mining fields. The incremental update algorithm of frequent itemsets is a main research target, because these algorithms can make use of the discovered information to improve mining efficiency for new data set. However, current incremental update algorithms are almost based on the framework of APRIORI. These algorithms only have limited improvements of efficiency. There are some new incremental update algorithms recently based on FP-TREE and the like data structure. But these algorithms have some other problems such as low efficiency in tree structure adjustment, low efficiency in storage of the tree, discovered frequent itemsets and so on. The performance of algorithms should be improved further. Therefore, a novel incremental FP_GROWTH algorithm based on disk-resident 1-itemsets counting (IU_FPGROWTH_1COUNTING) is presented in this paper after analyzing the key information of incremental update mining process. This algorithm doesnt need to store the temporal tree structure and temporal mining results. When the original data set and the support change at the same time, IU_FPGROWTH_1COUNTING can improve mining efficiency of frequent itemsets by decreasing the data set scan of FP_GROWTH, which is very time-consuming. The verification experiments and performance analysis on synthetic data set and real data set are presented. The experimental results show that IU_FPGROWTH_1COUNTING is an effective incremental update algorithm of frequent itemsets.

HTML全文

参考文献(0)

施引文献

资源附件(0)