Abstract:
The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. Yet mining frequent closed itemsets remains to be a time consuming task. Proposed in this paper is an improved algorithm DCI-closed-index for mining frequent closed itemset. Firstly, the “index array” is proposed. Using the subsume index, those itemsets that always appear together can be discovered. Then, by using bitmap, an algorithm for computing index array is presented. Thirdly, the items are sorted in frequency descending order according to their frequencies in subsume index. Fourthly, frequent items, which appear together and share the same supports, are merged to initial generators according to heuristic information provided by index array. Thus, the search space is reduced greatly. Finally, based on index array, reduced pre-set and reduced post-set are proposed. It is proved that the reduced pre-set and post-set are equivalent to original pre-set and post-set. Thus, the redundant set-inclusion operations are avoided greatly. The experimental results show that the proposed algorithm outperforms other state-of-the-art algorithms.