ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (2): 522-532.doi: 10.7544/issn1000-1239.2015.20131273

• 信息处理 • 上一篇    下一篇

微博用户特征量增长规律研究

苑卫国1,2,刘云1   

  1. 1(北京市通信与信息系统重点实验室(北京交通大学) 北京 100044); 2(中国科学院计算机网络信息中心 北京 100190) (10111029@bjtu.edu.cn)
  • 出版日期: 2015-02-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61172072,61271308);北京市自然科学基金项目(4112045);教育部高等学校博士学科点专项科研基金项目(W11C100030);北京市科技计划资助项目(Z121100000312024)

Growth Law of User Characteristics in Microblog

Yuan Weiguo1,2, Liu Yun1   

  1. 1(Beijing Municipal Key Laboratory of Communication and Information Systems (Beijing Jiaotong University), Beijing 100044); 2(Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2015-02-01

摘要: 根据抓取到的新浪微博实际用户数据,分析了粉丝数、关注数和微博数3个特征量的增长模式,发现这3个特征量整体上都随时间线性增长,取整后的增长率服从幂律分布.用户特征量增长模式主要呈持续增长和爆发式增长,其中爆发式增长用户按增长的不同阶段又可以划分为前期、中期、后期和阶跃式4种增长模式.使用基于向量余弦距离相似性的K-means聚类算法,对不同排序和不同初始规模实际用户特征量的时间序列进行聚类分析,统计得到不同增长模式的用户数量.发现用户特征量中增速高的用户增长主要以爆发式增长为主,而规模高的用户增长以持续式增长为主.通过对用户粉丝数爆发式增长的过程分析,对比用户微博被转发和被评论二者的增长关系,提出了导致用户粉丝数爆发式增长的原因.

关键词: 微博, 增长模式, 余弦相似性, K-means聚类算法, 时间序列

Abstract: Based on the actual data crawled from Sina Microblog, this paper mainly analyzes the growth law of three user characteristics: the number of followers, friends and statuses. They all increase linearly with time and the growth rate in round figures obeys the power-law distribution. It is found that these characteristics are mainly in sustainable and explosive growth patterns. Moreover, the user with the explosive growth pattern can be divided into four main categories, such as early-stage growth pattern, middle-stage growth pattern, later-stage growth pattern, and step-stage growth pattern. Furthermore, the users’ number of different growth patterns can be counted using the K-means clustering algorithm, which is based on the vector cosine similarity. The growth patterns of user characteristics are observed by cluster analysis of the actual time series, which are grouped by different sorting methods and initial scales. It is observed that the users with higher growth rate are mainly in explosive growth pattern, and the users with higher initial number tend to be in sustainable growth pattern. Finally, based on the analysis of the explosive growth process of the number of followers, the relationships between the growth of the numbers of retweet and comment are compared, and the reasons for the explosive growth of the users are proposed.

Key words: Microblog, growth patterns, cosine similarity, K-means clustering, time series

中图分类号: