ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (11): 2527-2534.doi: 10.7544/issn1000-1239.2015.20140804

• 人工智能 • 上一篇    下一篇

微博自媒体账号识别研究

刘金宝,盛达魁,张铭   

  1. (北京大学信息科学技术学院网络与信息系统研究所 北京 100871) (shengdakui@pku.edu.cn)
  • 出版日期: 2015-11-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61272343);教育部高等学校博士学科点专项科研基金项目(20130001110032)

Study on We Media Account Detection in Microblog

Liu Jinbao, Sheng Dakui, Zhang Ming   

  1. (Institute of Network Computing and Information Systems, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871)
  • Online: 2015-11-01

摘要: 随着Web 2.0时代的发展,微博作为新兴的社交网络媒体在人们的日常生活中扮演着愈发重要的角色.它不仅是用户交流与分享信息的桥梁,也是获取信息的重要方式.微博同时具有社交网络与信息媒体双重性,其生态环境中仅具有媒体属性,用于发布信息给公众的自媒体账号(we media account)发展迅速.首次提出微博自媒体账号识别这一研究问题,阐述了自媒体账号识别对分析微博生态环境、用户兴趣建模、优质内容挖掘的重要意义,提出了结合个人信息、账号行为及微博内容3类特征的有监督识别方法.研究结果表明:1)自媒体账号与普通的微博账号有着较明显的不同,主要体现在微博发布行为的规律性以及话题分布特性之上.2)提出的3类特征能够有效识别自媒体账号,不同类别的特征也能够相互补充,预测准确率高达96.71%.

关键词: 微博, 自媒体账号, 分类, 支持向量机, 有监督学习

Abstract: As an outcome of Web 2.0 era and a rising social media, microblog service has been playing a more and more important role in people’s daily life. It serves as not only a bridge of communication and information sharing, but also a crucial way to acquire information.As a mixture of social network and information media, microblog has a diverse ecological environment.We media accounts as a component of microblog, have been taking rapid development.In this paper, we creatively introduce the we media account detection problem and illustrate its meaning, then we propose a comprehensive feature set from account profile, posting behavior and posting content.Based on these features, we perform a supervised learning method to detect we media account. Experimental results show that: 1) we media accounts distinct from general accounts in the environment of Sina Weibo, and the difference is mainly on the behavior of publishing microblogs and the topic of microblogs. 2) The proposed three feature sets are effective for we media account detection, and they complement with each other as well, achieving an impressively high accuracy of 96.71%.

Key words: microblog, we media account, classification, support vector machine(SVM), supervised learning

中图分类号: