高级检索

    基于本地差分隐私的分布式图统计采集算法

    Statistics Collecting Algorithms of Distributed Graph via Local Differential Privacy

    • 摘要: 社交网络、社交物联网等应用场景产生的海量分布式图结构数据,被应用服务商采集并以此提供各类以数据为驱动的服务,或将引发严重的隐私风险. 在此背景下,如何针对具备强关联性的分布式图结构数据实现安全高效的采集,成为大规模图结构数据应用服务的瓶颈. 面向分布式图结构数据隐私保护的节点或边本地差分隐私模型无法有效处理隐私保护效果和数据有效性之间的冲突关系. 针对该问题,提出基于本地差分隐私的分布式图统计采集算法,同时实现度分布、三角计数序列和聚类系数3个不同统计指标采集,并适应不同有效性和隐私保护的需求. 首先,采用分组机制及对称一元编码机制,设计具备高强度隐私保护的基于Node-LDP的度分布采集算法;其次,基于所提度分布采集算法获取阈值,引入剪枝算法缓解随机加噪的噪声边过多问题,并分别提出基于Node-LDP和Edge-LDP的三角计数序列采集算法;再次,在前述三角计数序列采集算法基础上引入拉普拉斯机制,从而分别提出基于Node-LDP和Edge-LDP的聚类系数采集算法,进而实现不同保护强度及数据效用需求下的分布式图结构多指标采集;最后,实验和对比结果表明,所提算法能同时提高隐私保护强度和数据效用,比现有单一或多统计指标采集算法更具优势.

       

      Abstract: Mass distributed graph data, generated by social network, social IoT (Internet of things) and other scenarios, are collected by the service providers and used to provide diverse data oriented services, and they arise serious privacy conserns. In this context, how to achieve secure and effective collecting of distributed graph-structure data with strong correlation has become a bottleneck in large-scale graph-structure data application services. The Node/Edge-LDP (local differential privacy) model of distributed graph-structure data cannot effectively handle the conflict between effectiveness of privacy preserving and the utility of data. To this end, a statistics collecting algorithm named GS-LDP of distributed graph via LDP is proposed. This algorithm can collect three different statistics, i.e., degree distribution, triangle count sequence and clustering coefficient, simultaneously. And it can adapt to different effectiveness and privacy preserving requirements. First, a degree distribution collecting algorithm with Node-LDP is designed. By using the grouping mechanism and symmetric unary coding mechanism, this algorithm can achieve high strength privacy protection. Second, obtaining the threshold value with the algorithm mentioned above, then alleviating the problem of excessive noisy edges in random noise through introducing a prune algorithm, the triangle count sequence collecting algorithms with Node-LDP and Edge-LDP are proposed respectively. Third, based on the aforementioned algorithms, clustering coefficient collecting algorithms with Node-LDP and Edge-LDP are proposed respectively, by introducing the Laplace mechanism. And these algorithms achieve multi-metrics collecting of distributed graph-structure for different privacy and utility requirements. At last, experimental and comparative results show that, the proposed algorithms can strengthen both privacy and data utility, and perform significantly better than existing single or multiple statistics collecting algorithms.

       

    /

    返回文章
    返回