高级检索

    基于邻域属性熵的隐私保护数据干扰方法

    A Privacy-Preserving Data Perturbation Algorithm Based on Neighborhood Entropy

    • 摘要: 隐私保护微数据发布是数据隐私保护研究的一个热点,数据干扰是隐私保护微数据发布采用的一种有效解决方法.针对隐私保护聚类问题,提出一种隐私保护数据干扰方法NETPA,NETPA干扰方法通过对数据点及邻域点集的分析,借助信息论中熵的理论,提出邻域属性熵和邻域主属性等概念,对原始数据中数据点的邻域主属性值用其k邻域点集内数据点在该属性的均值进行干扰替换,在较好地维持原始数据k邻域关系的情况下达到保护原始数据隐私不泄露的目的.理论分析表明,NETPA干扰方法具有良好地避免隐私泄露的效果,同时可以较好地维持原始数据的聚类模式.实验采用DBSCAN和k-LDCHD聚类算法对干扰前后的数据进行聚类分析比对.实验结果表明,干扰前后数据聚类结果具有较高的相似度,算法是有效可行的.

       

      Abstract: Privacy preserving micro-data publishing is a hot issue in data privacy preserving research. Data perturbation is one of those methods to solve this problem, which does some revision to primitive data values at the cost of little mining accuracy loss. The key is the balance between privacy preserving and mining accuracy, which contradict each other to some extent. Concerning the problem of privacy preserving clustering, a novel privacy preserving data perturbation algorithm NETPA is proposed. The potential relation between data object and it’s neighborhood is analyzed. Referring the idea of entropy in information theory, the definitions of neighborhood entropy of attribute and neighboring main attribute are proposed. The primitive data set can be perturbed by changing each data object’s values of neighboring main attributes with corresponding attribute average value of those data objects in its k nearest neighborhood. Theoretical analysis testifies that this perturbation strategy can maintain the stability of k nearest neighboring relations in primitive data well, meanwhile it can avoid privacy leakage effectively. Experimental analysis is designed by adopting clustering algorithm DBCSAN and k-LDCHD on primitive datasets and perturbed ones by NETPA. Experimental results on both realistic and synthetic datasets prove that NETPA can preserve the privacy of primitive data effectively and maintain the clustering model of primitive data well.

       

    /

    返回文章
    返回