Differentially Private High-Dimensional Binary Data Publication via Attribute Segmentation
-
Graphical Abstract
-
Abstract
Generally, as the attribute dimension of the data set increases, the time cost and noise interference generated by the differential privacy publishing method of high-dimensional data will also increase. Especially for high-dimensional binary data, it is easy to be covered by excessive noise. Therefore, an efficient and low-noise publishing method PrivSCBN(differentially private spectral clustering Bayesian network) is proposed for the issue of privacy publishing of high-dimensional binary data. Firstly, based on Jaccard distance, this method uses a spectral clustering algorithm which satisfies differential privacy to divide the attributes set, and further segments the original data set, so as to achieve dimension reduction. Secondly, based on the idea of dynamic programming and combined with the exponential mechanism, this method uses a fast building Bayesian network algorithm which satisfies differential privacy to construct Bayesian network for each subset after segmentation. Finally, this method uses the value characteristic of conditional probability on binary data to add noise to conditional distribution extracted from Bayesian network, and reduces the noise by controlling the maximum in-degrees of Bayesian network. The efficiency and availability of the PrivSCBN method are verified by experiments on three real high-dimensional binary data sets.
-
-