高级检索

    基于属性分割的高维二值数据差分隐私发布

    Differentially Private High-Dimensional Binary Data Publication via Attribute Segmentation

    • 摘要: 通常随着数据集属性维度的增加,高维数据的差分隐私发布方法所需的时间成本和产生的噪声干扰也会随之增大,尤其是对于高维二值数据很容易被过大的噪声所覆盖.因此,针对高维二值数据的隐私发布问题,提出了一种高效且低噪的发布方法PrivSCBN(differentially private spectral clustering Bayesian network).首先,该方法基于Jaccard距离,使用满足差分隐私的谱聚类算法来划分属性集,然后根据划分的结果来进一步分割原始数据集,从而实现数据的降维.其次,该方法基于动态规划思想并结合指数机制,使用满足差分隐私的贝叶斯网络快速构建算法来为每个分割后的子集构建贝叶斯网络.最后,该方法利用条件概率在二值数据上的取值特点,对从贝叶斯网络中提取的条件分布进行加噪,并通过控制贝叶斯网络的最大入度数来减少其产生的噪声大小.通过在3个真实高维二值数据集上的实验,验证了PrivSCBN方法的高效性与可用性.

       

      Abstract: Generally, as the attribute dimension of the data set increases, the time cost and noise interference generated by the differential privacy publishing method of high-dimensional data will also increase. Especially for high-dimensional binary data, it is easy to be covered by excessive noise. Therefore, an efficient and low-noise publishing method PrivSCBN(differentially private spectral clustering Bayesian network) is proposed for the issue of privacy publishing of high-dimensional binary data. Firstly, based on Jaccard distance, this method uses a spectral clustering algorithm which satisfies differential privacy to divide the attributes set, and further segments the original data set, so as to achieve dimension reduction. Secondly, based on the idea of dynamic programming and combined with the exponential mechanism, this method uses a fast building Bayesian network algorithm which satisfies differential privacy to construct Bayesian network for each subset after segmentation. Finally, this method uses the value characteristic of conditional probability on binary data to add noise to conditional distribution extracted from Bayesian network, and reduces the noise by controlling the maximum in-degrees of Bayesian network. The efficiency and availability of the PrivSCBN method are verified by experiments on three real high-dimensional binary data sets.

       

    /

    返回文章
    返回