Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng. Towards Private Key-Value Data Collection with Histogram[J]. Journal of Computer Research and Development, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
Citation:
Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng. Towards Private Key-Value Data Collection with Histogram[J]. Journal of Computer Research and Development, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng. Towards Private Key-Value Data Collection with Histogram[J]. Journal of Computer Research and Development, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
Citation:
Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng. Towards Private Key-Value Data Collection with Histogram[J]. Journal of Computer Research and Development, 2021, 58(3): 624-637. DOI: 10.7544/issn1000-1239.2021.20200319
1(School of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou 450002)
2(School of Information, Renmin University of China, Beijing 100872)
Funds: This work was supported by the National Natural Science Foundation of China (61502146, 91646203, 91746115, 62072156), the Natural Science Foundation of Henan (162300410006), the Key Technologies Research and Development Program of Henan Province (202102310563), and the Young Talents Fund of Henan University of Economics and Law.
Recently, user data collection and analysis with local differential privacy has extended into key-value data. The trade-off between the size and sparsity of domain and perturbation method directly constrains the accuracy of the collection and analysis of such data. To remedy the deficiency caused by the domain size and perturbating method, this paper employs histogram technology to propose an efficient solution, called HISKV, to collect key-value data. HISKV firstly uses a user-grouping strategy and partial privacy budget to find the optimal length of truncation and enables each user to truncate his/her key-value data set. And then, based on the truncated set, each user samples one key-value pair and uses the discretization and perturbation method to process this pair. To perturb key-value data efficiently, a novel mechanism in HISKV, named LRR_KV is proposed, which allocates different perturbing probability for different keys. In LRR_KV, each user adopts this mechanism to add noise to his/her sampled pair, and sents the report to a collector. Based on the reports from all of the users, the collector estimates the frequency of each key and the mean of the values. To evaluate the utility of HISKV, we firstly conduct theoretical analysis on unbias, variance, and error bound of LRR_KV, and then perform experiments on real and synthetic datasets to compare different methods. The experimental results show that HISKV outperforms its competitors.