Abstract:
The problem of differentially private data publishing has attracted considerable research attention in recent years. The current existing solutions, however, cannot effectively handle the release of high-dimensional data. That is because these methods suffer from curse of dimensionality and various domain sizes, which will lead to the lower utility of publication. To address the problems, this paper presents PrivHD (differentially private high dimensional data release) with junction tree, a differentially private method for publishing high-dimensional data. PrivHD firstly generates a Markov network with exponential mechanism, which employs the high-pass filter technique to reduce the candidate space in the sampling process. After that, based on the network, PrivHD obtains a complete cluster graph in terms of full triangulation and node elimination, and then relies on the cluster graph and maximum spanning tree method to construct a differentially private junction tree. Finally, PrivHD uses the post-processing technique to boost the noisy counts of marginal tables in each cluster in junction tree, and based on the boosted result, PrivHD produces the high-dimensional synthetic dataset. PrivHD is compared with the existing approaches such as PrivBayes, JTree on the different real datasets. The experimental results show that PrivHD is better than its competitors on k-way query and SVM classification.