ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (10): 2195-2205.doi: 10.7544/issn1000-1239.2014.20130824

• 软件技术 • 上一篇    下一篇

一种有效的差分隐私事务数据发布策略

欧阳佳,印鉴,刘少鹏,刘玉葆   

  1. (中山大学信息科学与技术学院 广州 510006) (ouyangjia1@163.com)
  • 出版日期: 2014-10-01
  • 基金资助: 
    国家自然科学基金项目(61170019);天津市自然科学基金项目(11JCYBJC00700)

An Effective Differential Privacy Transaction Data Publication Strategy

Ouyang Jia, Yin Jian, Liu Shaopeng, Liu Yubao   

  1. (School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006)
  • Online: 2014-10-01

摘要: 近年来,隐私保护事务数据发布得到了研究者的广泛关注.事务数据的稀疏性导致个体隐私保护与数据效用性之间很难达到平衡.目前已有的方法大多是基于分组的匿名模型,但该类模型依赖于攻击者背景知识,且发布的数据无法满足事务数据分析任务的需要.针对事务数据隐私保护发布的数据安全性与效用性不足,基于差分隐私与压缩感知理论,提出一种有效的面向应用的事务数据发布策略(transaction data publish strategy, TDPS).首先构建事务数据库的完整Trie项集树,然后基于压缩感知技术对项集树添加满足差分隐私约束的噪音得到含噪Trie项集树,最后在含噪树上进行频繁项集挖掘任务.实验结果表明, TDPS不仅能很好地保护隐私,而且能有效保持数据效用性,满足事务数据分析任务对数据质量的要求.

关键词: 隐私保护, 差分隐私, 事务数据, Trie树, 压缩感知

Abstract: For the past few years, privacy preserving data publishing which can securely publish data for analysis purpose has attracted considerable research interests in database community. However, the sparsity of the transaction data burdens the trade-off between privacy protection and enough utility maintaining. Most existing data publishing methods for transaction data are based on partition-based anonymity models, for example k-anonymity. They depend on background knowledge from the attack, and the published data cannot meet the needs of the analysis tasks. In contrast, differential privacy is a strong privacy model which provides strong privacy guarantees independent of an adversary’s background knowledge and also maintains high utility for the published data. Because most existing methods and privacy models cannot accommodate both utility and privacy security of the data, in this paper, an application-oriented TDPS(transaction data publish strategy) is proposed, which is based on differential privacy and compressive sensing. Firstly, an entire Trie tree is constructed for a transaction database. Secondly, based on compressive sensing, we get a noisy Trie tree by adding the differential privacy noisy to the Trie tree. Finally, the frequent itemset mining task is performed on the noisy Trie tree. Theoretical analysis and experimental results demonstrate that the TDPS can preserve privacy of the sensitive data well, meanwhile maintain better data utility.

Key words: privacy preserving, differential privacy, transaction data, Trie tree, compressive sensing

中图分类号: