最优聚类的<em>k</em>-匿名数据隐私保护机制

张强; 叶阿勇; 叶帼华; 邓慧娜; 陈爱民

doi:10.7544/issn1000-1239.20210117

最优聚类的k-匿名数据隐私保护机制

k-Anonymous Data Privacy Protection Mechanism Based on Optimal Clustering

摘要

摘要: 基于聚类的k-匿名机制是共享数据脱敏的主要方法，它能有效防范针对隐私信息的背景攻击和链接攻击。然而，现有方案都是通过寻找最优k-等价集来平衡隐私性与可用性.从全局看，k-等价集并不一定是满足k-匿名的最优等价集，隐私机制的可用性最优化问题仍然未得到解决.针对上述问题，提出一种基于最优聚类的k-匿名隐私保护机制.通过建立数据距离与信息损失间的函数关系，将k-匿名机制的最优化问题转化为数据集的最优聚类问题；然后利用贪婪算法和二分机制，寻找满足k-匿名约束条件的最优聚类，从而实现k-匿名模型的可用性最优化；最后给出了问题求解的理论证明和实验分析.实验结果表明该机制能最大程度减少聚类匿名的信息损失，并且在运行时间方面是可行有效的.

Abstract: The emerging technologies about big data enable many organizations to collect massive amount information about individuals. Sharing such a wealth of information presents enormous opportunities for data mining applications, data privacy has been a major barrier. k-anonymity based on clustering is the most important technique to prevent privacy disclosure in data-sharing, which can overcome the threat of background based attacks and link attacks. Existing anonymity methods achieve the balance with privacy and utility requirements by seeking the optimal k-equivalence set. However, viewing the results as a whole, k-equivalent set is not necessarily the optimal solution satisfying k-anonymity so that the utility optimality is not guaranteed. In this paper, we endeavor to solve this problem by using optimal clustering approach. We follow this idea and propose a greedy clustering-anonymity method by combining the greedy algorithm and dichotomy clustering algorithm. In addition, we formulate the optimal data release problem that minimizes information loss given a privacy constraint. We also establish the functional relationship between data distance and information loss to capture the privacy/accuracy trade-off process in an online way. Finally, we evaluate the mechanism through theoretic analysis and experiments verification. Evaluations using real datasets show that the proposed method can minimize the information loss and be effective in terms of running time.

HTML全文

参考文献(0)

施引文献

资源附件(0)