• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于滑动窗口模型的数据流闭合高效用项集挖掘

程浩东, 韩萌, 张妮, 李小娟, 王乐

程浩东, 韩萌, 张妮, 李小娟, 王乐. 基于滑动窗口模型的数据流闭合高效用项集挖掘[J]. 计算机研究与发展, 2021, 58(11): 2500-2514. DOI: 10.7544/issn1000-1239.2021.20200554
引用本文: 程浩东, 韩萌, 张妮, 李小娟, 王乐. 基于滑动窗口模型的数据流闭合高效用项集挖掘[J]. 计算机研究与发展, 2021, 58(11): 2500-2514. DOI: 10.7544/issn1000-1239.2021.20200554
Cheng Haodong, Han Meng, Zhang Ni, Li Xiaojuan, Wang Le. Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514. DOI: 10.7544/issn1000-1239.2021.20200554
Citation: Cheng Haodong, Han Meng, Zhang Ni, Li Xiaojuan, Wang Le. Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514. DOI: 10.7544/issn1000-1239.2021.20200554
程浩东, 韩萌, 张妮, 李小娟, 王乐. 基于滑动窗口模型的数据流闭合高效用项集挖掘[J]. 计算机研究与发展, 2021, 58(11): 2500-2514. CSTR: 32373.14.issn1000-1239.2021.20200554
引用本文: 程浩东, 韩萌, 张妮, 李小娟, 王乐. 基于滑动窗口模型的数据流闭合高效用项集挖掘[J]. 计算机研究与发展, 2021, 58(11): 2500-2514. CSTR: 32373.14.issn1000-1239.2021.20200554
Cheng Haodong, Han Meng, Zhang Ni, Li Xiaojuan, Wang Le. Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514. CSTR: 32373.14.issn1000-1239.2021.20200554
Citation: Cheng Haodong, Han Meng, Zhang Ni, Li Xiaojuan, Wang Le. Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514. CSTR: 32373.14.issn1000-1239.2021.20200554

基于滑动窗口模型的数据流闭合高效用项集挖掘

基金项目: 国家自然科学基金项目(62062004);宁夏自然科学基金项目(2020AAC03216);北方民族大学研究生创新项目(YCX20077)
详细信息
  • 中图分类号: TP311

Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model

Funds: This work was supported by the National Natural Science Foundation of China (62062004), the Natural Science Foundation of Ningxia Hui Autonomous Region of China (2020AAC03216), and the Graduate Innovation Project of North Minzu University (YCX20077).
  • 摘要: 从数据流中挖掘高效用项集是一项具有挑战性的任务,因为传入的数据必须在时间和存储内存约束下进行实时处理.数据流挖掘通常会产生大量冗余的项集,为了减少这些无用的项集数量且保证无损压缩,需要挖掘闭合项集,它可以比全集高效用项集的集合小几个数量级.为了解决以上问题,提出一种基于滑动窗口模型的数据流闭合高效用项集挖掘(closed high utility itemsets mining over data stream based on sliding window model, CHUI_DS)算法. 在CHUI_DS中设计了一种新的效用列表结构,该结构在提升批次插入和删除的速度方面非常有效.此外,应用修剪策略来改进闭合项集挖掘过程,消除潜在的低效用候选对象.对真实数据集和合成数据集进行的广泛实验评估显示了该算法的效率以及可行性.就速度而言,它优于先前提出的主要以批处理模式运行的算法. 且它适用于不同大小的滑动窗口,在事务数量等方面具有较强的扩展性.
    Abstract: It is a challenging task to mine high utility itemsets from the data stream, because the incoming data stream must be processed in real time within the constraints of time and storage memory. Data stream mining usually generates a large number of redundant itemsets. In order to reduce the number of these useless itemsets and ensure lossless compression of complete high utility itemsets, it is necessary to mine closed itemsets, which can be several orders of magnitude smaller than the collection of complete high utility itemsets. In order to solve the above problem, a high utility itemsets mining algorithm (sliding-window-model-based closed high utility itemsets mining on data stream, CHUI_DS) is proposed to achieve mining closed high utility itemsets on data stream. A new utility-list structure is designed in CHUI_DS, which is very effective in increasing the speed of batch insertion and deletion. In addition, effective pruning strategies are applied to improve the closed itemset mining process and eliminate potential low-utility candidates. Extensive experimental evaluation of the proposed algorithm on real datasets and synthetic datasets shows the efficiency and feasibility of the algorithm. In terms of speed, it is superior to the previously proposed algorithms that mainly run in batch mode. Moreover, it is suitable for sliding windows of different sizes, and has strong scalability in terms of the number of transactions.
  • 期刊类型引用(15)

    1. 金兰,陈荆亮. 一种用于异常数据流挖掘的改进Apriori算法研究. 计算机仿真. 2025(01): 480-484 . 百度学术
    2. 鲁江. 基于模糊聚类的网络敏感数据流动态挖掘. 电子设计工程. 2024(09): 152-155+160 . 百度学术
    3. 陈鲜展,沈易成,洪飞扬,石绅. 煤矿掘进工作面瓦斯浓度预测. 工矿自动化. 2024(04): 128-132 . 百度学术
    4. 刘淑娟,韩萌,高智慧,穆栋梁,李昂. 数据流上的约束跨层级高效用项集挖掘. 计算机工程与应用. 2024(13): 287-300 . 百度学术
    5. 郑浩,王鹰. 嵌入式异构物联网敏感数据流动态挖掘研究. 电子设计工程. 2024(15): 12-15+20 . 百度学术
    6. 韩萌,何菲菲,张瑞华,李春鹏,孟凡兴. 生物启发式的模式挖掘方法综述. 计算机工程与应用. 2024(16): 19-33 . 百度学术
    7. 欧阳原野. 基于关联规则挖掘算法的集团型企业业务数据管理系统. 电子设计工程. 2024(22): 47-50+57 . 百度学术
    8. 肖金桐,温晓楠,李亚娟. 基于最大增益的广域网冗余数据迭代消除仿真. 计算机仿真. 2024(10): 371-375 . 百度学术
    9. 单芝慧 ,韩萌 ,韩强 . 基于滑动窗口的数据流高效用模糊项集挖掘. 南京师大学报(自然科学版). 2023(01): 120-129 . 百度学术
    10. 戴美玲. 基于改进模糊聚类的网络敏感数据流动态挖掘研究. 保山学院学报. 2023(02): 44-51 . 百度学术
    11. 单芝慧,韩萌,韩强. 增量数据上的闭合定量高效用项集挖掘算法. 计算机应用. 2023(07): 2049-2056 . 百度学术
    12. 蒋华,李星,王慧娇,韦静海. 基于数据索引结构的跨级高效用项集挖掘算法. 计算机应用. 2023(07): 2200-2208 . 百度学术
    13. 单芝慧,韩萌,韩强. 动态数据上的高效用模式挖掘综述. 计算机应用. 2022(01): 94-108 . 百度学术
    14. 李慕航,韩萌,陈志强,武红鑫,张喜龙. 基于窗口内投影的闭合高效用模式挖掘. 太原理工大学学报. 2022(02): 257-265 . 百度学术
    15. 张妮,韩萌,王乐,李小娟,程浩东. 基于滑动窗口的含负项高效用模式挖掘方法. 郑州大学学报(理学版). 2022(04): 55-63 . 百度学术

    其他类型引用(3)

计量
  • 文章访问数:  336
  • HTML全文浏览量:  4
  • PDF下载量:  201
  • 被引次数: 18
出版历程
  • 发布日期:  2021-10-31

目录

    /

    返回文章
    返回