ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (12): 2797-2807.doi: 10.7544/issn1000-1239.2014.20131209

• 网络技术 • 上一篇    

基于相似度的微博社交网络的社区发现方法

孙怡帆,李赛   

  1. (中国人民大学应用统计科学研究中心 北京 100872) (中国人民大学统计学院 北京 100872) (yfsun1984@gmail.com)
  • 出版日期: 2014-12-01
  • 基金资助: 
    基金项目:中国人民大学科学研究基金项目(中央高校基本科研业务费专项资金资助项目)(14XNLF13)

Similarity-Based Community Detection in Social Network of Microblog

Sun Yifan,Li Sai   

  1. (Center for Applied Statistics, Renmin University of China, Beijing 100872) (School of Statistics, Renmin University of China, Beijing 100872)
  • Online: 2014-12-01

摘要: 作为一种新兴的社交媒体,微博由于其信息的简短性、实时性和公开性,在短短4年内已积累数以亿计的用户并且数量还在迅速增长,由此带来的社会影响日益广泛.对微博用户关系网络进行社区发现具有重要的理论和实际意义.根据微博网络的有向性及建立关注关系的随意性等特点,提出一种基于共同关注和共同粉丝的微博用户相似度,定义此相似度的模块化函数,依据贪心算法思想设计出基于此模块化函数最大化的社区发现方法,并在此基础上将该方法推广到具有标签信息的微博网络中.应用该方法处理了3个真实的微博用户关系网络数据,结果表明该方法可以有效地发掘微博用户关系网络中的社区结构.

关键词: 微博, 社区发现, 标签, 相似度, 模块化

Abstract: As a kind of new-arising social media, Microblog has accumulated hundreds of millions of users in four years and the amount is still increasing quickly, because of its brevity, instantaneity and openness. The social influence of Microblog becomes more and more widely nowadays. It is significant to research for the community detection in the network of Microblog’s users both in theory and application. On one hand, most Microblog’s users are real persons and thus finding communities’ structure will help in revealing the behavior pattern of human being; on the other hand, Microblog’s users can be classified different groups based on the results from community detection, which will facilitate the accomplishment of targeted advertising. Given the features of Microblog, i.e., a directed network and the arbitrariness in establishing the following relation, this paper proposes a kind of similarity measure for users based on their behavior that following others and being followed by others, and defines its modularity function, then designs the community detection approach based on the modularity maximization inspired by the idea of fast greedy algorithm. Furthermore, this method has been generalized to Microblog network with tag information of users. Three real networks are processed in this approach. The results show that the approach proposed in this paper is more efficient on detecting the community structure of network of Microblog’s users, compared with Newman’s modularity maximization method, Infomap method and Walktrap method.

Key words: microblog, community detection, labels, similarity, modularity

中图分类号: