Abstract:
Mining TOP-K high utility pattern from a dataset is an extension of frequent pattern mining, and it aims to mine the patterns whose utilities are higher than a user-specified minimum utility threshold. At present, it has been a topic in data mining. Existing algorithms of mining TOP-K high utility pattern generate candidate itemsets in the mining process and they need multiple scans of a dataset; this hinders their performance of runtime and memory usage, especially when a dataset is large or there are many long transaction itemsets in a dataset. To address this issue, we propose a tree structure called HUP-Tree (high utility pattern tree) to maintain transaction itemsets and their utility values, and we also give an algorithm named TOPKHUP (TOP-K high utility pattern) that mines TOP-K high utility patterns without generating candidates. HUP-Tree ensures efficient retrieval of utility value of each pattern without additional scan of the dataset, so the performance of the algorithm is effectively improved. Seven classical real and synthetic datasets are used in the testing experiments and the results show that the proposed algorithm outperforms state-of-the-art algorithms significantly for both runtime performance and memory usage, and it is more stable along the change of the value K.