高级检索
    熊 平, 朱天清. 基于杂度增益与层次聚类的数据匿名方法[J]. 计算机研究与发展, 2012, 49(7): 1545-1552.
    引用本文: 熊 平, 朱天清. 基于杂度增益与层次聚类的数据匿名方法[J]. 计算机研究与发展, 2012, 49(7): 1545-1552.
    Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering[J]. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.
    Citation: Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering[J]. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.

    基于杂度增益与层次聚类的数据匿名方法

    A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering

    • 摘要: 数据匿名是发布数据时对隐私信息进行保护的重要手段之一.对数据匿名的基本概念和应用模型进行了介绍,探讨了数据匿名结果应该满足的要求.为了抵制背景知识攻击,提出了一种基于杂度增益与层次聚类的数据匿名方法,该方法以杂度来度量敏感属性随机性,并以概化过程中信息损失最小、杂度增益最大的条件约束来控制聚类的合并过程,可以使数据匿名处理后的数据集在满足k-匿名模型和l-多样模型的同时,使数据概化的信息损失最小且敏感属性的取值均匀化.在实验部分,提出了一种对数据匿名结果进行评估的方法,该方法将匿名结果和原始数据进行对比,并从平均信息损失和平均杂度2个方面来评估数据匿名的质量.实验结果验证了以上方法的有效性.

       

      Abstract: Data anonymization is one of the important solutions to preserve privacy in data publishing. The basic concept of data anonymization and the application models are introduced, and the requirements that an anonymized dataset should meet are discussed. To resist the background knowledge attack, a new data anonymization approach based on impurity gain and hierarchical clustering is brought out. The impurity of a cluster is used to measure the randomicity of sensitive attributes, and the clusters' combination process is controlled by the restrictions that the information loss caused by generalization should be minimized and the impurity gain should be maximized. With the method, the anonymization results of a dataset can meet the requirements of k-anonymity model and l-diversity model, meanwhile, the information loss is minimized and the values of the sensitive attributes in each cluster has a uniform distribution. An evaluation method is provided in the experiment section, which compares anonymized dataset with the original one to evaluate the quality by calculating the average information loss and impurity. The experimental results validate the availability of the method.

       

    /

    返回文章
    返回