Abstract:
Frequent pattern mining is a popular technique for analyzing transaction datasets. However, because these datasets contain sensitive information (e.g., user behavior records, electronic health records, etc), directly releasing discovered frequent patterns with true support counts will carry significant risk to privacy of individuals. In this paper, we propose an efficient algorithm, called DP-topkP, based on differential privacy model, to accurately find top-k frequent patterns. To avoid the individuals' privacy leakage, in this algorithm, exponential mechanism is used to sample top-k frequent patterns in a candidate set, and Laplace mechanism is utilized to generate the noisy data for perturbing the true support counts of the sampled patterns. However, the noisy support counts returned may be inconsistent with query semantic constraints (e.g., descending order, integer, etc), which will make the utility of the discovered top-k patterns poor. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct the post-processing step. We theoretically prove that the proposed method is (λ,δ)-useful and differentially private. The experimental results demonstrate that this method can maintain better accuracy, utility and scalability.