ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (12): 2785-2796.doi: 10.7544/issn1000-1239.2017.20160612

Previous Articles     Next Articles

An Efficient Association Rule Hiding Algorithm Based on Cluster and Threshold Interval

Niu Xinzheng1, Wang Chongyi1, Ye Zhijia1, She Kun2   

  1. 1(School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731); 2(School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054)
  • Online:2017-12-01

Abstract: Association rules hiding is a very important method of privacy-preserving data mining (PPDM). Because the current association rules hiding algorithm operates the transaction database directly, it leads to a lot of I/O overhead. To solve this problem, we put forward a quick association rules hiding algorithm based on FT-tree, called FP-DSRRC. Firstly, the algorithm improves the structure of FP-tree by adding an index to the transaction number and establishing the bidirectional traverse structure. Then FP-DSRRC uses the improved FP-tree to quickly handle transaction data set, avoiding a large number of I/O overhead caused by traversal the raw transaction data set. Furthermore, FP-DSRRC finds the sensitive items quickly by building and maintaining a transaction index table, and then handles the association rules based on the clustering strategy. We eliminate the sensitive rules by clusters, and reduce the negative influence caused by association rules hiding progress to the original data set by adopting the idea of rule support and confidence degree interval at the same time. Finally, the experiment shows that compared with traditional association rules hiding algorithm, the executive time of FP-DSRRC has been decreased by 50%~70% while guaranteeing the quality of general data, moreover, FP-DSRRC has better availability on a large-scale real data set.

Key words: privacy preservation, association rule hiding, FP-tree, sensitive rule, data sanitization

CLC Number: