ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (4): 847-858.doi: 10.7544/issn1000-1239.2020.20190360

• 信息安全 • 上一篇    下一篇

基于本地差分隐私的空间范围查询方法

张啸剑1,付楠1,孟小峰2   

  1. 1(河南财经政法大学计算机与信息工程学院 郑州 450002);2(中国人民大学信息学院 北京 100872) (xjzhang82@ruc.edu.cn)
  • 出版日期: 2020-04-01
  • 基金资助: 
    国家自然科学基金项目(61502146,61572420,91646203,91746115);河南省自然科学基金项目(162300410006);河南省科技攻关项目(162102310411);河南省教育厅高等学校重点科研项目(16A520002);河南财经政法大学青年拔尖人才资助计划项目

Towards Spatial Range Queries Under Local Differential Privacy

Zhang Xiaojian1, Fu Nan1, Meng Xiaofeng2   

  1. 1(School of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou 450002);2(School of Information, Renmin University of China, Beijing 100872)
  • Online: 2020-04-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61502146, 61572420, 91646203, 91746115), the Natural Science Foundation of Henan Province (162300410006), the Key Technologies Research and Development Program of Henan Province (162102310411), the Research Program of the Higher Education of Henan Educational Committee (16A520002), and the Young Talents Fund of Henan University of Economics and Law.

摘要: 基于本地差分隐私的用户数据收集与分析得到了研究者的广泛关注.用户数据的值域大小、编码机制以及扰动机制直接制约着空间范围查询的精度.针对现有编码机制与扰动机制难以有效响应空间范围查询的不足,提出了一种基于网格分割与四分树索引的空间范围查询响应方法GT-R(grid-based quadtree range query),该方法利用网格对用户数据的值域进行均匀分割,产生大小均等的单元格区域.同时利用四分树结构对所有单元格区域进行索引.每个用户结合服务器共享的四分树副本,对所拥有的数据进行编码.借助于编码后的四分树进行层次随机采样,并利用优化随机应答机制对所采层次中的结点进行本地扰动处理.服务器利用每个用户的报告值重构四分树索引结构,并响应空间范围查询.GT-R与现有的编码机制与扰动机制在真实的大规模空间数据集上实验结果表明,其分割精度以及响应范围查询效果优于同类算法.

关键词: 本地差分隐私, 空间范围查询, 网格划分, 随机应答, 约束推理

Abstract: User data collection and analysis with local differential privacy has attracted considerable attention in recent years. The trade-off among the domain size of user data, encoding method, and perturbation method directly constrains the accuracy of spatial range query. To remedy the deficiency caused by the current encoding and perturbating method, this paper employs grid and quadtree to propose an efficient solution, called GT-R, to answer spatial range query. GT-R uses a uniform grid to decompose the data domain, and generate unit sized regions. Based on these regions, an indexing quadtree is built. And then each user encodes his/her data with the quadtree shared from server, and runs the optimal randomized response on each node of the sampled level in the quadtree and reports the sampled level along with the perturbed value. The server accumulates reports from users to reconstruct a quadtree comprising sum of reports from all users. Besides, to boost the accuracy of range query, the server relies on post-processing skill for consistency on the frequency of each node. GT-R method is compared with existing methods on the large-scale real datasets. The experimental results show that GT-R outperforms its competitors, achieves the accurate results of spatial range query.

Key words: local differential privacy, spatial range query, grid decomposition, randomized response, constrained inference

中图分类号: