ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (12): 2805-2817.doi: 10.7544/issn1000-1239.2017.20160555

• 信息安全 • 上一篇    下一篇

差分隐私流数据自适应发布算法

吴英杰,张立群,康健,王一蕾   

  1. (福州大学数学与计算机科学学院 福州 350116) (yjwu@fzu.edu.cn)
  • 出版日期: 2017-12-01
  • 基金资助: 
    国家自然科学基金项目(61300026);福建省自然科学基金项目(2014J01230)

An Algorithm for Differential Privacy Streaming Data Adaptive Publication

Wu Yingjie, Zhang Liqun, Kang Jian, Wang Yilei   

  1. (College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116)
  • Online: 2017-12-01

摘要: 当前,许多实际应用需要持续地对流数据进行发布,现有关于单条流数据的差分隐私发布研究大多考虑区间的累和发布,而现实应用中往往需要对发布流数据进行任意区间计数查询,同时,用户查询往往存在特定规律,可针对历史查询进行自适应统计与分析,提高发布数据可用性.为此,提出一个基于历史查询的差分隐私流数据自适应发布算法HQ_DPSAP.算法HQ_DPSAP首先结合流数据的特性,利用滑动窗口机制动态构建窗口内流数据对应的差分隐私区间树,而后进一步分析与计算树节点的覆盖概率;接着自底向上计算隐私分配参数,再自顶向下分配隐私预算,并据此对树节点进行异方差加噪;最后根据历史查询规律自适应调整树节点的隐私预算与树结构参数,以实现流数据的自适应发布.实验对算法HQ_DPSAP的可行性及有效性进行比较分析,结果表明:算法HQ_DPSAP可有效支持任意区间计数查询,且具有较低的查询均方误差和较高的算法执行效率.

关键词: 差分隐私, 流数据发布, 异方差加噪, 历史查询, 自适应算法

Abstract: Nowadays, many practical applications need to publish streaming data continuously. Most of existing research works for differential privacy single streaming data publication focus on range accumulation. However, many practical scenarios need to answer arbitrary range counting queries of streaming data. At the same time, there exist specific rules of queries from users, so adaptive analysis and calculation for historical queries should be concerned. To improve the usability of published data, an algorithm HQ_DPASP for differential privacy streaming data adaptive publication based on historical queries is proposed. Combining the characteristics of streaming data, HQ_DPASP firstly uses the sliding window mechanism to construct the differential privacy range tree of the streaming data dynamically. Secondly, by analyzing the coverage probability of tree nodes and calculating the privacy parameters from leaves to root, HQ_DPASP allocates privacy budget from root to leaves and adds non-uniform noise on tree nodes. Finally, the privacy budget of tree nodes and tree's parameters are adjusted adaptively based on the characteristic of historical queries. Experiments are designed for testing the feasibility and effectiveness of HQ_DPSAP. The results show that HQ_DPSAP is effective in answering arbitrary range counting queries on the published streaming data while assuring low mean squared error of queries and high algorithm efficiency.

Key words: differential privacy, streaming data publication, non-uniform noise, historical query, adaptive algorithm

中图分类号: