Statistics Collecting Algorithms of Distributed Graph via Local Differential Privacy
-
Graphical Abstract
-
Abstract
Mass distributed graph data, generated by social network, social IoT (Internet of things) and other scenarios, are collected by the service providers and used to provide diverse data oriented services, and they arise serious privacy conserns. In this context, how to achieve secure and effective collecting of distributed graph-structure data with strong correlation has become a bottleneck in large-scale graph-structure data application services. The Node/Edge-LDP (local differential privacy) model of distributed graph-structure data cannot effectively handle the conflict between effectiveness of privacy preserving and the utility of data. To this end, a statistics collecting algorithm named GS-LDP of distributed graph via LDP is proposed. This algorithm can collect three different statistics, i.e., degree distribution, triangle count sequence and clustering coefficient, simultaneously. And it can adapt to different effectiveness and privacy preserving requirements. First, a degree distribution collecting algorithm with Node-LDP is designed. By using the grouping mechanism and symmetric unary coding mechanism, this algorithm can achieve high strength privacy protection. Second, obtaining the threshold value with the algorithm mentioned above, then alleviating the problem of excessive noisy edges in random noise through introducing a prune algorithm, the triangle count sequence collecting algorithms with Node-LDP and Edge-LDP are proposed respectively. Third, based on the aforementioned algorithms, clustering coefficient collecting algorithms with Node-LDP and Edge-LDP are proposed respectively, by introducing the Laplace mechanism. And these algorithms achieve multi-metrics collecting of distributed graph-structure for different privacy and utility requirements. At last, experimental and comparative results show that, the proposed algorithms can strengthen both privacy and data utility, and perform significantly better than existing single or multiple statistics collecting algorithms.
-
-