ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (12): 2794-2809.doi: 10.7544/issn1000-1239.2018.20170756

Previous Articles    

Private High-Dimensional Data Publication with Junction Tree

Zhang Xiaojian1, Chen Li2, Jin Kaizhong1, Meng Xiaofeng3   

  1. 1(College of Computer & Information Engineering, He’nan University of Economics and Law, Zhengzhou 450002);2(Institute of Network Information Security, He’nan University of Economics and Law, Zhengzhou 450046);3(School of Information, Renmin University of China, Beijing 100872)
  • Online:2018-12-01

Abstract: The problem of differentially private data publishing has attracted considerable research attention in recent years. The current existing solutions, however, cannot effectively handle the release of high-dimensional data. That is because these methods suffer from curse of dimensionality and various domain sizes, which will lead to the lower utility of publication. To address the problems, this paper presents PrivHD (differentially private high dimensional data release) with junction tree, a differentially private method for publishing high-dimensional data. PrivHD firstly generates a Markov network with exponential mechanism, which employs the high-pass filter technique to reduce the candidate space in the sampling process. After that, based on the network, PrivHD obtains a complete cluster graph in terms of full triangulation and node elimination, and then relies on the cluster graph and maximum spanning tree method to construct a differentially private junction tree. Finally, PrivHD uses the post-processing technique to boost the noisy counts of marginal tables in each cluster in junction tree, and based on the boosted result, PrivHD produces the high-dimensional synthetic dataset. PrivHD is compared with the existing approaches such as PrivBayes, JTree on the different real datasets. The experimental results show that PrivHD is better than its competitors on k-way query and SVM classification.

Key words: high-dimensional data, differential privacy, Markov network, junction tree, marginal distribution

CLC Number: