ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (12): 2785-2796.doi: 10.7544/issn1000-1239.2017.20160612

• 信息安全 • 上一篇    下一篇

基于簇和阈值区间的高效关联规则隐藏算法

牛新征1,王崇屹1,叶志佳1,佘堃2   

  1. 1(电子科技大学计算机科学与工程学院 成都 611731); 2(电子科技大学信息与软件工程学院 成都 610054) (xinzhengniu@uestc.edu.cn)
  • 出版日期: 2017-12-01
  • 基金资助: 
    国家自然科学基金项目(61300192);国家科技支撑计划基金项目(2013BAH33F02);中央高校基本科研业务费专项资金项目(ZYGX2014J052);四川省科技支撑计划基金项目(2015GZ0096);成都市科学技术局软科学研究项目(2015-RK00-00046-ZF);四川省公安厅科研项目(2015SCYYCX06);四川省自贡市公安局项目

An Efficient Association Rule Hiding Algorithm Based on Cluster and Threshold Interval

Niu Xinzheng1, Wang Chongyi1, Ye Zhijia1, She Kun2   

  1. 1(School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731); 2(School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054)
  • Online: 2017-12-01

摘要: 关联规则隐藏是隐私保护数据挖掘(privacy-preserving data mining, PPDM)的一种重要方法.针对当前的关联规则隐藏算法直接操作事务数据、I/O开销较大的缺陷,提出一种基于FP-tree快速关联规则隐藏的算法FP-DSRRC.算法首先对FP-tree的结构进行改进,增设事务编号索引并建立双向遍历结构,进而利用改进的FP-tree对事务信息进行快速处理,避免了遍历原始数据集产生的大量I/O时间;然后通过建立和维护事务索引表实现对敏感项的快速查找,并基于分簇策略对关联规则处理,以簇为单位进行敏感规则消除,同时采用规则支持度和置信度阈值区间的思想,减少了关联规则隐藏处理对原始数据集的影响;最后通过实验测试证明:相较于传统关联规则隐藏算法,FP-DSRRC算法在保证生成的数据集质量的同时,减少了50%~70%的算法执行时间,并在大规模真实数据集上有较好的可用性.

关键词: 隐私保护, 关联规则隐藏, 频繁模式树, 敏感规则, 数据清洗

Abstract: Association rules hiding is a very important method of privacy-preserving data mining (PPDM). Because the current association rules hiding algorithm operates the transaction database directly, it leads to a lot of I/O overhead. To solve this problem, we put forward a quick association rules hiding algorithm based on FT-tree, called FP-DSRRC. Firstly, the algorithm improves the structure of FP-tree by adding an index to the transaction number and establishing the bidirectional traverse structure. Then FP-DSRRC uses the improved FP-tree to quickly handle transaction data set, avoiding a large number of I/O overhead caused by traversal the raw transaction data set. Furthermore, FP-DSRRC finds the sensitive items quickly by building and maintaining a transaction index table, and then handles the association rules based on the clustering strategy. We eliminate the sensitive rules by clusters, and reduce the negative influence caused by association rules hiding progress to the original data set by adopting the idea of rule support and confidence degree interval at the same time. Finally, the experiment shows that compared with traditional association rules hiding algorithm, the executive time of FP-DSRRC has been decreased by 50%~70% while guaranteeing the quality of general data, moreover, FP-DSRRC has better availability on a large-scale real data set.

Key words: privacy preservation, association rule hiding, FP-tree, sensitive rule, data sanitization

中图分类号: