基于簇和阈值区间的高效关联规则隐藏算法

牛新征; 王崇屹; 叶志佳; 佘堃

doi:10.7544/issn1000-1239.2017.20160612

基于簇和阈值区间的高效关联规则隐藏算法

An Efficient Association Rule Hiding Algorithm Based on Cluster and Threshold Interval

摘要

摘要: 关联规则隐藏是隐私保护数据挖掘(privacy-preserving data mining, PPDM)的一种重要方法.针对当前的关联规则隐藏算法直接操作事务数据、I/O开销较大的缺陷，提出一种基于FP-tree快速关联规则隐藏的算法FP-DSRRC.算法首先对FP-tree的结构进行改进，增设事务编号索引并建立双向遍历结构，进而利用改进的FP-tree对事务信息进行快速处理，避免了遍历原始数据集产生的大量I/O时间；然后通过建立和维护事务索引表实现对敏感项的快速查找，并基于分簇策略对关联规则处理，以簇为单位进行敏感规则消除，同时采用规则支持度和置信度阈值区间的思想，减少了关联规则隐藏处理对原始数据集的影响；最后通过实验测试证明:相较于传统关联规则隐藏算法，FP-DSRRC算法在保证生成的数据集质量的同时，减少了50%~70%的算法执行时间，并在大规模真实数据集上有较好的可用性.

Abstract: Association rules hiding is a very important method of privacy-preserving data mining (PPDM). Because the current association rules hiding algorithm operates the transaction database directly, it leads to a lot of I/O overhead. To solve this problem, we put forward a quick association rules hiding algorithm based on FT-tree, called FP-DSRRC. Firstly, the algorithm improves the structure of FP-tree by adding an index to the transaction number and establishing the bidirectional traverse structure. Then FP-DSRRC uses the improved FP-tree to quickly handle transaction data set, avoiding a large number of I/O overhead caused by traversal the raw transaction data set. Furthermore, FP-DSRRC finds the sensitive items quickly by building and maintaining a transaction index table, and then handles the association rules based on the clustering strategy. We eliminate the sensitive rules by clusters, and reduce the negative influence caused by association rules hiding progress to the original data set by adopting the idea of rule support and confidence degree interval at the same time. Finally, the experiment shows that compared with traditional association rules hiding algorithm, the executive time of FP-DSRRC has been decreased by 50%~70% while guaranteeing the quality of general data, moreover, FP-DSRRC has better availability on a large-scale real data set.

HTML全文

参考文献(0)

施引文献

资源附件(0)