ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (2): 445-455.doi: 10.7544/issn1000-1239.2015.20131184

Previous Articles     Next Articles

An Algorithm of Mining TOP-K High Utility Patterns Without Generating Candidates

Wang Le1,2, Feng Lin2, Wang Shui1   

  1. 1(School of Information Engineering, Ningbo Dahongying University, Ningbo, Zhejiang 315175); 2(School of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024)
  • Online:2015-02-01

Abstract: Mining TOP-K high utility pattern from a dataset is an extension of frequent pattern mining, and it aims to mine the patterns whose utilities are higher than a user-specified minimum utility threshold. At present, it has been a topic in data mining. Existing algorithms of mining TOP-K high utility pattern generate candidate itemsets in the mining process and they need multiple scans of a dataset; this hinders their performance of runtime and memory usage, especially when a dataset is large or there are many long transaction itemsets in a dataset. To address this issue, we propose a tree structure called HUP-Tree (high utility pattern tree) to maintain transaction itemsets and their utility values, and we also give an algorithm named TOPKHUP (TOP-K high utility pattern) that mines TOP-K high utility patterns without generating candidates. HUP-Tree ensures efficient retrieval of utility value of each pattern without additional scan of the dataset, so the performance of the algorithm is effectively improved. Seven classical real and synthetic datasets are used in the testing experiments and the results show that the proposed algorithm outperforms state-of-the-art algorithms significantly for both runtime performance and memory usage, and it is more stable along the change of the value K.

Key words: high utility pattern, frequent pattern, frequent itemsets, data mining, TOP-K

CLC Number: