ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (10): 2261-2274.doi: 10.7544/issn1000-1239.20220504

所属专题: 2022数据安全与智能隐私保护研究专题

• 信息安全 • 上一篇    下一篇

效用优化的本地差分隐私集合数据频率估计机制

曹依然1,朱友文1,2,贺星宇1,张跃1   

  1. 1(南京航空航天大学计算机科学与技术学院 南京 211106);2(广西可信软件重点实验室(桂林电子科技大学) 广西桂林 541004) (caoyiran@nuaa.edu.cn)
  • 出版日期: 2022-10-01
  • 基金资助: 
    国家重点研发计划项目(2021YFB3100400);国家自然科学基金项目(62172216);江苏省自然科学基金项目(BK20211180);广西可信软件重点实验室开放课题(KX202034)

Utility-Optimized Local Differential Privacy Set-Valued Data Frequency Estimation Mechanism

Cao Yiran1, Zhu Youwen1,2, He Xingyu1, Zhang Yue1   

  1. 1(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106);2(Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004)
  • Online: 2022-10-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2021YFB3100400), the National Natural Science Foundation of China (62172216), the Natural Science Foundation of Jiangsu Province of China (BK20211180), and the Research Fund of Guangxi Key Laboratory of Trusted Software (KX202034).

摘要: 本地差分隐私具有不需要可信第三方、交互少、运行效率高等优点,近年来受到了广泛关注.然而,现有本地差分隐私集合数据频率估计机制未能考虑数据的隐私敏感度差异,将所有数据同等对待,这会对非敏感数据保护过强,导致估计结果准确度低.针对这一问题,定义了集合数据效用优化本地差分隐私(set-valued data utility-optimized local differential privacy, SULDP)模型,考虑了原始数据域同时包含敏感值和非敏感值的情况,在不减弱对敏感值保护的前提下,允许降低对非敏感值的保护.进一步,提出了符合SULDP模型的5种频率估计机制suGRR,suGRR-Sample,suRAP,suRAP-Sample和suWheel,理论分析证实,相对于现有的本地差分隐私机制,所提方案能够对敏感数据实现完全相同的保护效果,并通过降低非敏感数据的保护效果,实现了频率估计结果的准确度提升.最后,在真实和模拟数据集上评估了新的方案,实验结果证明了所提的5种机制能够有效降低估计误差,提升数据效用,其中suWheel机制表现最优.

关键词: 本地差分隐私, 频率估计, 集合数据, 隐私保护, 效用优化

Abstract: In recent years, local differential privacy has received much attention because of its advantages of not requiring trusted third parties, less interaction, and high efficiency. However, the existing frequency estimation mechanism under local differential privacy for set-valued data fails to take into account the privacy sensitivity differences of inputs, and treats all data equally, which will over-protect the non-sensitive data and lead to low accuracy of estimation results. To address this problem, the set-valued data utility-optimized local differential privacy (SULDP) model is defined. SULDP considers the case that the original data domain contains both sensitive and non-sensitive values, and allows for a reduction in the protection of non-sensitive values without weakening the protection of sensitive values. Further, five frequency estimation mechanisms suGRR, suGRR-Sample, suRAP, suRAP-Sample and suWheel are proposed under the SULDP model. Theoretical analysis confirms that the proposed schemes can achieve exactly the same protection on sensitive data compared with local differential privacy mechanisms, and improve the accuracy by loosening the protection of non-sensitive data. Finally, the new schemes are evaluated on real and simulated datasets, and the experimental results demonstrate that the proposed five mechanisms can effectively reduce the estimation error and improve the data utility, among which suWheel mechanism achieves best performance.

Key words: local differential privacy, frequency estimation, set-valued data, privacy protection, utility optimization

中图分类号: