Abstract:
As the size of data set increases constantly, improving the mining efficiency of frequent itemsets has become a key research in data mining fields. The incremental update algorithm of frequent itemsets is a main research target, because these algorithms can make use of the discovered information to improve mining efficiency for new data set. However, current incremental update algorithms are almost based on the framework of APRIORI. These algorithms only have limited improvements of efficiency. There are some new incremental update algorithms recently based on FP-TREE and the like data structure. But these algorithms have some other problems such as low efficiency in tree structure adjustment, low efficiency in storage of the tree, discovered frequent itemsets and so on. The performance of algorithms should be improved further. Therefore, a novel incremental FP_GROWTH algorithm based on disk-resident 1-itemsets counting (IU_FPGROWTH_1COUNTING) is presented in this paper after analyzing the key information of incremental update mining process. This algorithm doesnt need to store the temporal tree structure and temporal mining results. When the original data set and the support change at the same time, IU_FPGROWTH_1COUNTING can improve mining efficiency of frequent itemsets by decreasing the data set scan of FP_GROWTH, which is very time-consuming. The verification experiments and performance analysis on synthetic data set and real data set are presented. The experimental results show that IU_FPGROWTH_1COUNTING is an effective incremental update algorithm of frequent itemsets.