Abstract:
It is a challenging task to mine high utility itemsets from the data stream, because the incoming data stream must be processed in real time within the constraints of time and storage memory. Data stream mining usually generates a large number of redundant itemsets. In order to reduce the number of these useless itemsets and ensure lossless compression of complete high utility itemsets, it is necessary to mine closed itemsets, which can be several orders of magnitude smaller than the collection of complete high utility itemsets. In order to solve the above problem, a high utility itemsets mining algorithm (sliding-window-model-based closed high utility itemsets mining on data stream, CHUI_DS) is proposed to achieve mining closed high utility itemsets on data stream. A new utility-list structure is designed in CHUI_DS, which is very effective in increasing the speed of batch insertion and deletion. In addition, effective pruning strategies are applied to improve the closed itemset mining process and eliminate potential low-utility candidates. Extensive experimental evaluation of the proposed algorithm on real datasets and synthetic datasets shows the efficiency and feasibility of the algorithm. In terms of speed, it is superior to the previously proposed algorithms that mainly run in batch mode. Moreover, it is suitable for sliding windows of different sizes, and has strong scalability in terms of the number of transactions.