ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2022, Vol. 59 ›› Issue (1): 182-196.doi: 10.7544/issn1000-1239.20200701

Previous Articles     Next Articles

Differentially Private High-Dimensional Binary Data Publication via Attribute Segmentation

Hong Jinxin1, Wu Yingjie1, Cai Jianping2, Sun Lan1   

  1. 1(College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108);2(College of Information and Smart Electromechanical Engineering, Xiamen Huaxia University, Xiamen, Fujian 361024)
  • Online:2022-01-01
  • Supported by: 
    This work was supported by the Natural Science Foundation of Fujian Province of China (2017J01754, 2018J01797).

Abstract: Generally, as the attribute dimension of the data set increases, the time cost and noise interference generated by the differential privacy publishing method of high-dimensional data will also increase. Especially for high-dimensional binary data, it is easy to be covered by excessive noise. Therefore, an efficient and low-noise publishing method PrivSCBN(differentially private spectral clustering Bayesian network) is proposed for the issue of privacy publishing of high-dimensional binary data. Firstly, based on Jaccard distance, this method uses a spectral clustering algorithm which satisfies differential privacy to divide the attributes set, and further segments the original data set, so as to achieve dimension reduction. Secondly, based on the idea of dynamic programming and combined with the exponential mechanism, this method uses a fast building Bayesian network algorithm which satisfies differential privacy to construct Bayesian network for each subset after segmentation. Finally, this method uses the value characteristic of conditional probability on binary data to add noise to conditional distribution extracted from Bayesian network, and reduces the noise by controlling the maximum in-degrees of Bayesian network. The efficiency and availability of the PrivSCBN method are verified by experiments on three real high-dimensional binary data sets.

Key words: differential privacy, high-dimensional binary data publication, Bayesian network, attribute division, dynamic programming, conditional distribution

CLC Number: