ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (11): 2630-2644.doi: 10.7544/issn1000-1239.2016.20150219

• 信息处理 • 上一篇    下一篇

一种社会网络用户身份特征识别方法

胡开先1,2,梁英1,许洪波1,毕晓迪1,2,左遥1,2   

  1. 1(中国科学院网络数据科学与技术重点实验室(中国科学院计算技术研究所) 北京 100190); 2(中国科学院大学 北京 100049) (kaixian.hu@gmail.com)
  • 出版日期: 2016-11-01
  • 基金资助: 
    国家重点研发计划项目(2016YFB0800403);国家“九七三”重点基础研究发展计划基金项目(2014CB340406,2013CB329602);国家“八六三”高技术研究发展计划基金项目(2015AA015803);国家自然科学基金重点项目(61232010);国家自然科学基金面上项目(61173064);国家科技支撑计划基金项目(2015BAK20B03);山东省自主创新及成果转化专项(2014CGZH1103) This work was supported by the National Key Research and Development Program of China (2016YFB0800403), the National Basic Research Program of China (973 Program) (2014CB340406,2013CB329602), the National High Technology Research and Development Program of China (863 Program) (2015AA015803), the Key Program of the National Natural Science Foundation of China (61232010), the General Program of the National Natural Science Foundation of China (61173064), the National Key Technology R&D Program of China (2015BAK20B03), and the Independent Innovation and Achievement Transformation Project of Shandong Province (2014CGZH1103).

A Method for Social Network User Identity Feature Recognition

Hu Kaixian1,2, Liang Ying1, Xu Hongbo1, Bi Xiaodi1,2, Zuo Yao1,2   

  1. 1(Key Laboratory of Network Data Science and Technology (Institute of Computing Technology, Chinese Academy of Sciences), Chinese Academy of Sciences, Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2016-11-01

摘要: 社会网络是现代信息社会重要的组成部分.社会网络用户身份不透明、不可见的特性带来一系列社会安全问题.提出了一种社会网络身份特征识别方法,分别利用基于位置的社会网络和社交关系进行社会网络用户的身份特征识别,融合2种识别结果推测社会网络用户真实身份.提出了一种基于位置的社会网络用户身份识别方法,通过计算中文分词和二元组分词的基本匹配权重和完全匹配权重得到近似度权重,并用它衡量实体为用户所属实体的可能性;通过实体名称聚合算法,对近似度权重计算结果进行优化.根据好友之间倾向于拥有相似的身份特征和相同的兴趣爱好的观察,提出了一种基于社交关系的多数投票的身份识别方法,对社交关系中的用户身份特征进行统计,推测当前用户的地址信息、实体信息和用户兴趣.基于微博数据,进行了样本数为1 000名用户和10 000名用户的2组实验,涵盖了超过250万条社交关系.实验结果表明,提出的虚实映射方法有很高的准确率和覆盖率,与现有方法相比,该方法着眼于推测个人用户细粒度的身份特征,具有较高的实际应用价值.

关键词: 身份识别, 用户身份特征, 基于位置的社会网络, 社交关系, 去匿名化

Abstract: Social network is an important part of modern information society. The anonymity of social network users brings a series of problems concerning social security. This paper presents a method to recognize social network user identity feature by location-based social network (LBSN) and social relationships, and combine the results of those two to infer social network user true identity. The method of geo-location uses approximation weight which is calculated by computing full match weight and basic match weight based on Chinese segmentation and bi-word segmentation to evaluate the possibility that the entity is where the user studies or works, and the method uses entity name aggregation algorithm to optimize the result of approximation weight calculation. According to the observation that friend relationship between users on social network tends to indicate a certain same identity features or a share of common interests, the method of social relationships uses majority voting scheme to count users friends identity features to infer user address, entity information and interests. Based on microblog data, we conduct experiments on two samples which cover 1 000 users and 10 000 users respectively and involve a total of more than 2.5 million users relationships. Results shows that our method has a high rate of precision and recall. Compared with the existing methods, our method focuses on individual user identity feature and is valuable in practice.

Key words: identity recognition, user identity features, location-based social network (LBSN), social relationships, de-anonymizing

中图分类号: