ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (12): 2789-2801.doi: 10.7544/issn1000-1239.2015.20140516

Previous Articles     Next Articles

Frequent Sequential Pattern Mining under Differential Privacy

Lu Guoqing1,2, Zhang Xiaojian3, Ding Liping1, Li Yanfeng1, Liao Xin1,4   

  1. 1(Institute of Software, Chinese Academy of Sciences, Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100190); 3(School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450002); 4(College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082)
  • Online:2015-12-01

Abstract: Frequent sequential pattern mining is an exploratory problem in the field of data mining. However, directly releasing the discovered frequent patterns and the corresponding true supports may reveal the individuals privacy. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection by adding noise. Due to the inherent sequentiality and high-dimensionality in sequences, it is challenging to apply differential privacy to frequent sequential pattern mining. To address those problems, this paper presents a differentially private method called Diff-FSPM that uses an interactive way to find frequent patterns. To reduce the impact of dimensionality, Diff-FSPM first employs the exponential mechanism to obtain the optimal sequential length for truncating each sequence in the original dataset. After that, Diff-FSPM relies on a prefix-tree to compress all the frequent patterns, and then utilizes the Laplace mechanism to perturb the true support of the frequent patterns. To efficiently allocate the privacy budget, Diff-FSPM uses the closet frequent pattern and Markov assumption to guide the mining process. Finally, Diff-FSPM adopts a post-processing technique with consistency constraint to boost the accuracy of the returned noisy support counts. We theoretically prove that Diff-FSPM satisfies ε-differential privacy, and the experimental results show that it outperforms the existing methods in terms of utility.

Key words: frequent sequential pattern, data mining, differential privacy (DP), privacy protection, prefix tree

CLC Number: