高级检索

    面向聚类的数据隐藏发布研究

    Privacy-Preserving Data Publication for Clustering

    • 摘要: 数据隐藏发布在保护数据隐私和维持数据可用性间寻求一种折中,近年来得到了研究者的持续关注.数据隐藏发布的起因和目标都源于数据的使用价值,聚类作为实现数据深层使用价值的一个重要步骤,在数据挖掘领域得到了广泛的研究.聚类对数据个体特征的依赖与隐藏操作弱化个体特征的主导思想间的矛盾,使得面向聚类的数据隐藏发布成为一个难点.对面向聚类的隐私保护数据发布领域已有研究成果进行了总结,从保存聚类特征粒度的角度,分析保存聚类特征粒度与聚类可用性、隐私保护安全性间的关系;从维持数据聚类可用性效果角度对匿名、随机化、数据交换、人工合成数据替换等主要隐藏方法的原理、特点进行了分析.在对已有技术方法深入对比分析的基础上,指出了面向聚类的数据隐藏发布领域待解决的一些难点问题和未来发展方向.

       

      Abstract: Privacy-preserving data publication has attracted sustained attention in recent years. It seeks a trade-off between preserving data privacy and maintaining data utility. Clustering is a crucial step for advanced data analysis, which has been widely studied in data mining. There exists some inconsistency between clustering and data obfuscation. Process of clustering heavily depends on characteristics of individual records to segment data into different clusters. On the contrary, the process of data obfuscation usually adopts the idea of suppressing individual characteristics for the sake of avoiding leakage of individual privacy. It becomes difficult to accommodate data privacy and clustering utility of the published data simultaneously. Various distortion and limited distribution techniques are delved into this problem. The state-of-the-art of data obfuscation methods for clustering application is surveyed. The constraint mechanism among clustering character granularities to be kept, clustering usability maintenance and security of data privacy is discussed. Further, the principles and merits of some prevalent methods, such as data anonymity, data randomization, data swapping and synthetic data substitution, are compared from a view of accommodating data privacy preservation and clustering usability maintenance. Following a comprehensive analysis of the existing techniques, some unaddressed problems and future directions are highlighted.

       

    /

    返回文章
    返回