ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (2): 306-318.doi: 10.7544/issn1000-1239.2019.20170746

• 人工智能 • 上一篇    下一篇

基于信息融合的概率矩阵分解链路预测方法

王智强1,梁吉业1,2,李茹1,2   

  1. 1(山西大学计算机与信息技术学院 太原 030006); 2(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006) (zhiq.wang@163.com)
  • 出版日期: 2019-02-01
  • 基金资助: 
    国家自然科学基金项目(U1435212,61432011,61876103);山西省重点研发计划项目(201603D111014);山西省1331工程项目;山西省青年科技基金项目(201701D221098)

Probability Matrix Factorization for Link Prediction Based on Information Fusion

Wang Zhiqiang1, Liang Jiye1,2, Li Ru1,2   

  1. 1(School of Computer & Information Technology, Shanxi University, Taiyuan 030006); 2(Key Laboratory of Computation Intelligence & Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online: 2019-02-01

摘要: 作为一种典型的网络大数据,社交信息网络如微博、Tweeter等,不仅包含用户间复杂的网络结构,而且包含大量用户所发表的微博/Tweet信息.现有链路预测算法大多只利用单方面的网络拓扑信息或非拓扑信息,仍然缺乏有效融合社交信息网络中拓扑与非拓扑信息的链路预测方法.为此,从社交信息网络中用户的主题角度出发,提出一种融合主题相似信息的链路预测方法.首先基于用户文本内容抽取用户的主题表示,并定义用户间的主题相似度;然后基于用户主题相似度,构建了一种用户主题相似稀疏网络;进一步将用户主题相似网络与用户间关注/被关注网络融合在统一的概率矩阵分解框架下,通过学习获得用户的潜在特征表示和网络链路参数;最终在此概率矩阵分解框架下,基于用户的潜在特征表示和链路参数计算得到用户间的链路可能性.所提出的模型提供了一种融合多种网络信息的通用策略和学习方法.实验在包含网络结构与文本信息的4组微博与推特数据集中显示,所提出的融合概率矩阵分解链路方法相比其他链路预测方法更有效.

关键词: 社交信息网络, 链路预测, 概率矩阵分解, 融合模型, 网络数据分析

Abstract: As one kind of typical network big data, social-information networks such as Weibo and Twitter include both the complex network structure among users and rich microblog/Tweet information published by users. It is notable that most of the existing methods only make use of the network topological information or the non-topological information for link prediction, but there is still a lack of effective methods by fusing the topological information or non-topological information in social-information networks. A link prediction method is proposed from the perspective of users’ topic by fusing users’ topic similarity in social-information networks. The method goes in accordance with the following sequence: firstly, a topic similarity between users based on users’ topic representation is defined, followed by which a topic similarity-based sparse network is constructed; secondly, the information of the following/followed network and the topic similarity-based network are fused into a unified framework of probabilistic matrix factorization, based on which the latent-feature representation of the network nodes and the linking relation parameters are obtained; finally, the linking probability between network nodes is calculated based on the obtained latent-feature representation and linking relation parameters. The proposed approach provides a general modeling strategy fusing multi-network information and a learning-based solution. Link prediction experiments are conducted on four real network datasets, i.e. Twitter and Weibo. The experimental results demonstrate that the proposed method is more effective than others.

Key words: social-information network, link prediction, probability matrix factorization, fusion model, network data analysis

中图分类号: