带权值的大规模社交网络数据隐私保护方法

黄海平; 张东军; 王凯; 朱毅凯; 王汝传

doi:10.7544/issn1000-1239.2020.20190018

摘要: 各类移动社交网络应用的发展促使了海量网络用户的出现,从而形成了大规模的社交图结构数据.这些图结构数据中包含着大量的用户隐私信息,因此发布之前需要进行隐私保护处理以防数据遭到泄露.同时,用户间错综复杂的社交关系并非均等,个体间关系的强弱可能直接影响到隐私的分布和保护的效率.目前存在相当多的针对无权值的社交网络图数据的隐私保护方法,但这些方法不能直接应用于带权值(社交关系敏感程度不均等)的社交网络图数据中.为解决这一问题,提出一种基于非交互的差分隐私保护模型的带权值的社交网络图扰动方法dp-noisy,可实现对边权值以及图结构的强保护.该方法基于单源最短路径约束模型来添加扰动噪音,根据不同的权值划分出关键边和非关键边,有效减少了需要扰动的边关系.实验结果表明:在大规模数据集中(节点数为30 000),dp-noisy在运行效率上比K-MPNP(K-shortest path privacy)提高了47.3%,比LWSPA(protection algorithm based on Laplace noise for weighted social networks)提高了41.8%,比DER(density-based exploration and reconstruc-tion)提高了52.6%.在相似的数据隐私保护程度下,dp-noisy的数据可用性比lp-noisy提高了10%,显著优于DER的数据可用性,略好于LWSPA.此外,dp-noisy的平均扰动质量比lp-noisy提高了14%,比DER提高了11.3%,比K-MPNP提高了27%; 在达到最优数据效用时(ε=10),dp-noisy的平均扰动质量比LWSPA提高了6%.综上,dp-noisy具有较高的运行效率和数据效用,同时满足抵御图结构攻击的特性,可适用于大规模的社交网络数据分析.

Abstract: The development of various social network applications inspires the emergence of vast amounts of users, which forms a large-scale social graph structure. Amounts of private data are involved in such a graph structure, which should be preserved before being released in order to prevent privacy leakage. And meanwhile, the intricate social relationships between users are not equivalent, and the sensitivity of individual relationships may directly affect the distribution of privacy and the effect of protection. Currently, there are many privacy protection methods for social network graphs without weights, however these methods cannot be directly applied to the scenarios with weighted values (i.e. uneven sensitivity of social relations). To solve this problem, we propose a weighted social network graph perturbation method based on non-interactive differential privacy protection model, named as dp-noisy, which can achieve strong protection of edge weights and graph structure. This method adds disturbance noise based on the single-source shortest path constraint model, classifies the critical edges and non-critical edges according to the weights, and effectively reduces the edge links that need to be disturbed. The experimental results show that in a large-scale dataset (the number of nodes is 30 000), the execution efficiency of the proposed method is improved by 47.3% than that of K-MPNP(K-shortest path privacy), 41.8% than that of LWSPA(protection algorithm based on Laplace noise for weighted social networks) and 52.6% than that of DER(density-based exploration and reconstruction). With similar degree of data privacy protection, the data availability of dp-noisy is 10% higher than that of lp-noisy, superior to that of DER and slightly better than that of LWSPA. The average perturbation quality of dp-noisy is 14% higher than that of lp-noisy, 11.3% higher than that of DER and 27% higher than that of K-MPNP. In the case of the best data availability (ε=10), the average perturbation quality of dp-noisy is 6% higher than that of LWSPA. Compared with the classical methods, our proposal achieves the satisfactory execution efficiency and data availability, which can resist graph structure attack, so it is suitable for large-scale social network applications.

带权值的大规模社交网络数据隐私保护方法

Weighted Large-Scale Social Network Data Privacy Protection Method