ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (3): 569-578.doi: 10.7544/issn1000-1239.2015.20131436

Previous Articles     Next Articles

Incremental FP_GROWTH Algorithm Based on Disk-resident 1-itemsets Counting

Shen Yan1,2, Zhu Yuquan2, Liu Chunhua2   

  1. 1(Department of Information Management and Information System, Jiangsu University, Zhenjiang, Jiangsu 212013); 2(School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013)
  • Online:2015-03-01

Abstract: As the size of data set increases constantly, improving the mining efficiency of frequent itemsets has become a key research in data mining fields. The incremental update algorithm of frequent itemsets is a main research target, because these algorithms can make use of the discovered information to improve mining efficiency for new data set. However, current incremental update algorithms are almost based on the framework of APRIORI. These algorithms only have limited improvements of efficiency. There are some new incremental update algorithms recently based on FP-TREE and the like data structure. But these algorithms have some other problems such as low efficiency in tree structure adjustment, low efficiency in storage of the tree, discovered frequent itemsets and so on. The performance of algorithms should be improved further. Therefore, a novel incremental FP_GROWTH algorithm based on disk-resident 1-itemsets counting (IU_FPGROWTH_1COUNTING) is presented in this paper after analyzing the key information of incremental update mining process. This algorithm doesnt need to store the temporal tree structure and temporal mining results. When the original data set and the support change at the same time, IU_FPGROWTH_1COUNTING can improve mining efficiency of frequent itemsets by decreasing the data set scan of FP_GROWTH, which is very time-consuming. The verification experiments and performance analysis on synthetic data set and real data set are presented. The experimental results show that IU_FPGROWTH_1COUNTING is an effective incremental update algorithm of frequent itemsets.

Key words: association rules, frequent itemsets, massive data, FP_GROWTH algorithm, incremental update

CLC Number: