ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (7): 1625-1635.doi: 10.7544/issn1000-1239.20210117

• 信息安全 • 上一篇    

最优聚类的k-匿名数据隐私保护机制

张强,叶阿勇,叶帼华,邓慧娜,陈爱民   

  1. (福建师范大学计算机与网络空间安全学院 福州 350117) (福建省网络安全与密码技术重点实验室(福建师范大学) 福州 350117) (1102131507@qq.com)
  • 出版日期: 2022-07-01
  • 基金资助: 
    国家自然科学基金项目(61972096,61771140,61872088,61872090);福建省高校产学合作项目(2022H6025)

k-Anonymous Data Privacy Protection Mechanism Based on Optimal Clustering

Zhang Qiang, Ye Ayong, Ye Guohua, Deng Huina, Chen Aimin   

  1. (College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350117) (Fujian Provincial Key Laboratory of Network Security and Cryptology(Fujian Normal University), Fuzhou 350117)
  • Online: 2022-07-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61972096, 61771140, 61872088, 61872090) and the University-Industry Cooperation of Fujian Province (2022H6025).

摘要: 基于聚类的k-匿名机制是共享数据脱敏的主要方法,它能有效防范针对隐私信息的背景攻击和链接攻击。然而,现有方案都是通过寻找最优k-等价集来平衡隐私性与可用性.从全局看,k-等价集并不一定是满足k-匿名的最优等价集,隐私机制的可用性最优化问题仍然未得到解决.针对上述问题,提出一种基于最优聚类的k-匿名隐私保护机制.通过建立数据距离与信息损失间的函数关系,将k-匿名机制的最优化问题转化为数据集的最优聚类问题;然后利用贪婪算法和二分机制,寻找满足k-匿名约束条件的最优聚类,从而实现k-匿名模型的可用性最优化;最后给出了问题求解的理论证明和实验分析.实验结果表明该机制能最大程度减少聚类匿名的信息损失,并且在运行时间方面是可行有效的.

关键词: 隐私保护, k-匿名, 聚类优化, 信息损失, 数据发布

Abstract: The emerging technologies about big data enable many organizations to collect massive amount information about individuals. Sharing such a wealth of information presents enormous opportunities for data mining applications, data privacy has been a major barrier. k-anonymity based on clustering is the most important technique to prevent privacy disclosure in data-sharing, which can overcome the threat of background based attacks and link attacks. Existing anonymity methods achieve the balance with privacy and utility requirements by seeking the optimal k-equivalence set. However, viewing the results as a whole, k-equivalent set is not necessarily the optimal solution satisfying k-anonymity so that the utility optimality is not guaranteed. In this paper, we endeavor to solve this problem by using optimal clustering approach. We follow this idea and propose a greedy clustering-anonymity method by combining the greedy algorithm and dichotomy clustering algorithm. In addition, we formulate the optimal data release problem that minimizes information loss given a privacy constraint. We also establish the functional relationship between data distance and information loss to capture the privacy/accuracy trade-off process in an online way. Finally, we evaluate the mechanism through theoretic analysis and experiments verification. Evaluations using real datasets show that the proposed method can minimize the information loss and be effective in terms of running time.

Key words: privacy preservation, k-anonymity, clustering optimization, information loss, data publishing

中图分类号: