ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (11): 2500-2514.doi: 10.7544/issn1000-1239.2021.20200554

Previous Articles     Next Articles

Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model

Cheng Haodong, Han Meng, Zhang Ni, Li Xiaojuan, Wang Le   

  1. (College of Computer Science and Engineering, North Minzu University, Yinchuan 750021)
  • Online:2021-11-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (62062004), the Natural Science Foundation of Ningxia Hui Autonomous Region of China (2020AAC03216), and the Graduate Innovation Project of North Minzu University (YCX20077).

Abstract: It is a challenging task to mine high utility itemsets from the data stream, because the incoming data stream must be processed in real time within the constraints of time and storage memory. Data stream mining usually generates a large number of redundant itemsets. In order to reduce the number of these useless itemsets and ensure lossless compression of complete high utility itemsets, it is necessary to mine closed itemsets, which can be several orders of magnitude smaller than the collection of complete high utility itemsets. In order to solve the above problem, a high utility itemsets mining algorithm (sliding-window-model-based closed high utility itemsets mining on data stream, CHUI_DS) is proposed to achieve mining closed high utility itemsets on data stream. A new utility-list structure is designed in CHUI_DS, which is very effective in increasing the speed of batch insertion and deletion. In addition, effective pruning strategies are applied to improve the closed itemset mining process and eliminate potential low-utility candidates. Extensive experimental evaluation of the proposed algorithm on real datasets and synthetic datasets shows the efficiency and feasibility of the algorithm. In terms of speed, it is superior to the previously proposed algorithms that mainly run in batch mode. Moreover, it is suitable for sliding windows of different sizes, and has strong scalability in terms of the number of transactions.

Key words: pattern mining, data stream mining, closed high utility itemsets, sliding window, utility list

CLC Number: