Abstract:
As the size of the database to be mined is increasing constantly, the size of physical memory available has become a bottleneck when using FP-GROWTH algorithm for association rules mining. So, it is necessary to tackle space scalability by some new algorithms in order to mine association rules in huge database. Nowadays, disk-resident algorithm has become the main target. Therefore, the original data structure of FP-TREE is improved and a novel algorithm called DTRFP-GROWTH (disk table resident FP-TREE growth) is presented. This algorithm uses disk table for storing FP-TREE to decrease memory usage. When the mining works failed for FP-GROWTH using too much memory, DTRFP-GROWTH can continue to mine association rules from huge database by its special skill called disk table resident FP-TREE, which is suitable to occasions of space performance priority. In addition, this algorithm also integrates association rules mining with RDBMS system. It overcomes the problems of some relative solutions based on file system, such as low performance, high difficulty in development, etc. The correctness verification and performance analysis of the above mentioned algorithm are presented. The experiment results on real world data sets also support effectiveness of this algorithm. The study shows that in memory limited occasion, DTRFP-GROWTH is an effective association rules mining algorithm based on disk.