ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (11): 2517-2526.doi: 10.7544/issn1000-1239.2015.20148133

• 人工智能 • 上一篇    下一篇

EMTM:微博中与主题相关的专家挖掘方法

张腊梅1,2,3,4,黄威靖3,4,陈薇3,4,王腾蛟3,4,雷凯1,2   

  1. 1(深圳市云计算关键技术与应用重点实验室(北京大学) 广东深圳 518055); 2(北京大学信息工程学院 广东深圳 518055); 3(高可信软件技术教育部重点实验室(北京大学) 北京 100871); 4(北京大学信息科学技术学院 北京 100871) (citlmzhang@163.com)
  • 出版日期: 2015-11-01
  • 基金资助: 
    基金项目:国家“八六三”高技术研究发展计划基金项目(2012AA011002);国家自然科学基金项目(61300003);教育部高等学校博士学科点专项科研基金项目(20130001120001)

EMTM: A Method for Experts Mining in Micro-Blog with Topic-Level

Zhang Lamei1,2,3,4, Huang Weijing3,4, Chen Wei3,4, Wang Tengjiao3,4, Lei Kai1,2   

  1. 1(Shenzhen Key Laboratory for Cloud Computing Technology & Applications (Peking University), Shenzhen, Guangdong 518055);2(School of Electronics and Computer Engineering, Peking University, Shenzhen, Guangdong 518055);3(Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871);4(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871)
  • Online: 2015-11-01

摘要: 目前,微博已成为人们获取信息、分享信息的最流行平台之一.经过长期的发展积累,微博中聚集了很多具有权威专业知识背景的专家,挖掘微博中与主题相关的专家有利于进一步地用户推荐、微博舆情分析等工作.在微博中,与某个主题相关的专家是指因具有可靠的与此主题相关的专业知识或技能而在此主题下具有高影响力的用户.挖掘高影响力的用户可以通过分析微博的转发数据来进行,然而由于微博中用户的转发行为分为“主题相关转发”和“跟随转发”2种,因此,因被转发概率高而具有高影响力的用户不一定是专家.EMTM(experts mining topic model)是一种基于主题模型的概率生成模型,通过区分微博用户的不同转发行为来挖掘微博中与主题相关的专家.模型采用Gibbs采样进行推理求解.在真实的新浪微博数据集上的对比实验表明EMTM能够有效地挖掘微博中与主题相关的专家.

关键词: 专家, 主题, 微博, 转发行为, 概率模型

Abstract: So far, micro-blog has been one of the most popular platforms for people to access and share information. After long-term development, there are many experts with authoritative professional background knowledge. Mining experts in topic-level will contribute to the user recommendation and public opinion analysis in micro-blog. In micro-blog, experts in a topic are the users who have high influence on the topic, since they have authoritative professional knowledge and skills about the topic. High influence is a necessary condition for experts. Influence analysis belongs to subjective problems and need to be quantified objectively. In micro-blog, the probability of being retweeted is one of the most important indexes to measure the influence of users. So we can find out the high influencers by analyzing the retweet data. But, there are two kinds of retweet behaviors for the users in micro-blog: “topic-sensitive retweet” and “following retweet”. Therefore, the users who have high influence because of being retweeted with high probability are not always experts. In this paper, we propose a probability generation model EMTM (experts mining topic model) which can find out the experts in topic-level by distinguishing two kinds of the retweet behaviors. We use Gibbs sampling for model inference. Our experiments on real Sina Weibo data show that our model EMTM is effective in mining experts in topic-level.

Key words: experts, topic, micro-blog, retweet behavior, probability model

中图分类号: