ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (3): 624-637.doi: 10.7544/issn1000-1239.2021.20200319

Previous Articles     Next Articles

Towards Private Key-Value Data Collection with Histogram

Zhang Xiaojian1, Xu Yaxin1, Fu Nan1, Meng Xiaofeng2   

  1. 1(School of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou 450002);2(School of Information, Renmin University of China, Beijing 100872)
  • Online:2021-03-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61502146, 91646203, 91746115, 62072156), the Natural Science Foundation of Henan (162300410006), the Key Technologies Research and Development Program of Henan Province (202102310563), and the Young Talents Fund of Henan University of Economics and Law.

Abstract: Recently, user data collection and analysis with local differential privacy has extended into key-value data. The trade-off between the size and sparsity of domain and perturbation method directly constrains the accuracy of the collection and analysis of such data. To remedy the deficiency caused by the domain size and perturbating method, this paper employs histogram technology to propose an efficient solution, called HISKV, to collect key-value data. HISKV firstly uses a user-grouping strategy and partial privacy budget to find the optimal length of truncation and enables each user to truncate his/her key-value data set. And then, based on the truncated set, each user samples one key-value pair and uses the discretization and perturbation method to process this pair. To perturb key-value data efficiently, a novel mechanism in HISKV, named LRR_KV is proposed, which allocates different perturbing probability for different keys. In LRR_KV, each user adopts this mechanism to add noise to his/her sampled pair, and sents the report to a collector. Based on the reports from all of the users, the collector estimates the frequency of each key and the mean of the values. To evaluate the utility of HISKV, we firstly conduct theoretical analysis on unbias, variance, and error bound of LRR_KV, and then perform experiments on real and synthetic datasets to compare different methods. The experimental results show that HISKV outperforms its competitors.

Key words: local differential privacy, randomized response mechanism, key-value data, frequency estimation, mean estimation

CLC Number: