ISSN 1000-1239 CN 11-1777/TP

• 软件技术 •

基于联合树的隐私高维数据发布方法

1. 1(河南财经政法大学计算机与信息工程学院 郑州 450002);2(河南财经政法大学网络信息安全研究所 郑州 450046);3(中国人民大学信息学院 北京 100872) (xjzhang82@ruc.edu.cn)
• 出版日期: 2018-12-01
• 基金资助:
国家自然科学基金项目(61502146,91646203,91746115,61772131)；河南省自然科学基金项目(162300410006)；河南省科技攻关项目(172102310713)；河南省教育厅高等学校重点科研项目(16A520002)；河南财经政法大学青年拔尖人才资助计划项目

Private High-Dimensional Data Publication with Junction Tree

Zhang Xiaojian1, Chen Li2, Jin Kaizhong1, Meng Xiaofeng3

1. 1(College of Computer & Information Engineering, He’nan University of Economics and Law, Zhengzhou 450002);2(Institute of Network Information Security, He’nan University of Economics and Law, Zhengzhou 450046);3(School of Information, Renmin University of China, Beijing 100872)
• Online: 2018-12-01

Abstract: The problem of differentially private data publishing has attracted considerable research attention in recent years. The current existing solutions, however, cannot effectively handle the release of high-dimensional data. That is because these methods suffer from curse of dimensionality and various domain sizes, which will lead to the lower utility of publication. To address the problems, this paper presents PrivHD (differentially private high dimensional data release) with junction tree, a differentially private method for publishing high-dimensional data. PrivHD firstly generates a Markov network with exponential mechanism, which employs the high-pass filter technique to reduce the candidate space in the sampling process. After that, based on the network, PrivHD obtains a complete cluster graph in terms of full triangulation and node elimination, and then relies on the cluster graph and maximum spanning tree method to construct a differentially private junction tree. Finally, PrivHD uses the post-processing technique to boost the noisy counts of marginal tables in each cluster in junction tree, and based on the boosted result, PrivHD produces the high-dimensional synthetic dataset. PrivHD is compared with the existing approaches such as PrivBayes, JTree on the different real datasets. The experimental results show that PrivHD is better than its competitors on k-way query and SVM classification.