高级检索
    张啸剑, 徐雅鑫, 付楠, 孟小峰. 基于直方图的隐私键-值数据收集算法[J]. 计算机研究与发展, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
    引用本文: 张啸剑, 徐雅鑫, 付楠, 孟小峰. 基于直方图的隐私键-值数据收集算法[J]. 计算机研究与发展, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
    Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng. Towards Private Key-Value Data Collection with Histogram[J]. Journal of Computer Research and Development, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
    Citation: Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng. Towards Private Key-Value Data Collection with Histogram[J]. Journal of Computer Research and Development, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319

    基于直方图的隐私键-值数据收集算法

    Towards Private Key-Value Data Collection with Histogram

    • 摘要: 基于本地差分隐私的用户数据收集与分析算法已延伸到了键-值数据类型.然而, 该类数据值域大小与稀疏性以及本地扰动机制直接制约着收集与分析精度.针对现有机制难以有效应对该类数据收集的不足, 提出了一种基于直方图技术的有效收集与分析算法HISKV(histogram-based key-value data collection), 该算法首先结合用户分组策略寻找最优截断长度, 利用最优截断-抽样技术处理值域过大与稀疏性问题, 然后结合截断结果随机抽取单个键-值对进行离散化处理.针对离散化结果, 设计一种高效的本地扰动机制LRR_KV(local random response for key-value data), 该机制结合具体的键分配不同的本地扰动概率.每个用户利用LRR_KV机制扰动离散化的键-值对之后发送给收集者, 收集者结合用户的报告值对每个键的频率及其值所对应的均值进行估计.理论分析了HISKV算法的无偏性、所产生的方差以及最大偏差, 并与现有的键-值收集算法在真实与合成的数据集上进行比较, 实验结果表明HISKV算法优于同类算法.

       

      Abstract: Recently, user data collection and analysis with local differential privacy has extended into key-value data. The trade-off between the size and sparsity of domain and perturbation method directly constrains the accuracy of the collection and analysis of such data. To remedy the deficiency caused by the domain size and perturbating method, this paper employs histogram technology to propose an efficient solution, called HISKV, to collect key-value data. HISKV firstly uses a user-grouping strategy and partial privacy budget to find the optimal length of truncation and enables each user to truncate his/her key-value data set. And then, based on the truncated set, each user samples one key-value pair and uses the discretization and perturbation method to process this pair. To perturb key-value data efficiently, a novel mechanism in HISKV, named LRR_KV is proposed, which allocates different perturbing probability for different keys. In LRR_KV, each user adopts this mechanism to add noise to his/her sampled pair, and sents the report to a collector. Based on the reports from all of the users, the collector estimates the frequency of each key and the mean of the values. To evaluate the utility of HISKV, we firstly conduct theoretical analysis on unbias, variance, and error bound of LRR_KV, and then perform experiments on real and synthetic datasets to compare different methods. The experimental results show that HISKV outperforms its competitors.

       

    /

    返回文章
    返回