ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (5): 1106-1117.doi: 10.7544/issn1000-1239.2016.20150304

• 软件技术 • 上一篇    下一篇

差分隐私下一种精确直方图发布方法

张啸剑1,邵超1,孟小峰2   

  1. 1(河南财经政法大学计算机与信息工程学院 郑州 450002); 2(中国人民大学信息学院 北京 100872) (xjzhang82@ruc.edu.cn)
  • 出版日期: 2016-05-01
  • 基金资助: 
    国家自然科学基金项目(61502146,61379050,U1404605,61202285);国家“八六三”高技术研究发展计划基金项目(2013AA013204);河南省科技厅基础与前沿技术研究项目(152300410091);河南省教育厅高等学校重点科研项目(16A520002);河南财经政法大学校重大研究课题(201426)

Accurate Histogram Release under Differential Privacy

Zhang Xiaojian1, Shao Chao1, Meng Xiaofeng2   

  1. 1(College of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou 450002); 2(School of Information, Renmin University of China, Beijing 100872)
  • Online: 2016-05-01

摘要: 基于分组的差分隐私直方图发布得到了研究者的广泛关注,组均值造成的近似误差与噪音造成的拉普拉斯误差之间的均衡直接制约着直方图发布精度.针对现有基于分组的直方图发布方法难以有效兼顾近似误差与拉普拉斯误差的不足,提出了一种满足差分隐私的精确直方图发布方法DiffHR(differentially private histogram release);通过分析直方图桶计数序列的排序有助于提升发布精度,利用Markov链蒙特卡洛(Markov chain Monte Carlo, MCMC)方法中的Metropolis-Hastings技术与指数机制,提出了一种有效排序方法,通过不断置换2个随机选取的桶以逐渐逼近正确排序;基于抽样排序后的直方图,提出了一种基于懒散分组下界的自适应贪心聚类方法,该方法的时间复杂度为O(n),并且可有效均衡近似误差与拉普拉斯误差.DiffHR,GS,AHP方法在真实数据上的实验结果表明,其发布精度上优于同类算法.

关键词: 差分隐私, 直方图发布, 分组, 拉普拉斯误差, 近似误差

Abstract: Grouping-based differentially private histogram release has attracted considerable research attention in recent years. The trade-off between approximation error caused by the group’s mean and Laplace error due to Laplace noise constrains the accuracy of histogram release. Most existing methods based on grouping strategy cannot efficiently accommodate the both errors. This paper proposes an efficient differentially private method, called DiffHR (differentially private histogram release) to publish histograms. In order to boost the accuracy of the released histogram, DiffHR employs Metropolis-Hastings method in MCMC (Markov chain Monte Carlo) and the exponential mechanism to propose an efficient sorting method. This method generates a differentially private histogram by sampling and exchanging two buckets to approximate the correct order. To balance Laplace error and approximation error efficiently, a utility-driven adaptive clustering method is proposed in DiffHR to partition the sorted histogram. Furthermore, the time complexity of the clustering method is O(n). DiffHR is compared with existing methods such as GS, AHP on the real datasets. The experimental results show that DiffHR outperforms its competitors, and achieves the accurate results.

Key words: differential privacy, histogram release, grouping, Laplace error, approximation error

中图分类号: