高级检索

    基于信息融合的概率矩阵分解链路预测方法

    Probability Matrix Factorization for Link Prediction Based on Information Fusion

    • 摘要: 作为一种典型的网络大数据,社交信息网络如微博、Tweeter等,不仅包含用户间复杂的网络结构,而且包含大量用户所发表的微博/Tweet信息.现有链路预测算法大多只利用单方面的网络拓扑信息或非拓扑信息,仍然缺乏有效融合社交信息网络中拓扑与非拓扑信息的链路预测方法.为此,从社交信息网络中用户的主题角度出发,提出一种融合主题相似信息的链路预测方法.首先基于用户文本内容抽取用户的主题表示,并定义用户间的主题相似度;然后基于用户主题相似度,构建了一种用户主题相似稀疏网络;进一步将用户主题相似网络与用户间关注/被关注网络融合在统一的概率矩阵分解框架下,通过学习获得用户的潜在特征表示和网络链路参数;最终在此概率矩阵分解框架下,基于用户的潜在特征表示和链路参数计算得到用户间的链路可能性.所提出的模型提供了一种融合多种网络信息的通用策略和学习方法.实验在包含网络结构与文本信息的4组微博与推特数据集中显示,所提出的融合概率矩阵分解链路方法相比其他链路预测方法更有效.

       

      Abstract: As one kind of typical network big data, social-information networks such as Weibo and Twitter include both the complex network structure among users and rich microblog/Tweet information published by users. It is notable that most of the existing methods only make use of the network topological information or the non-topological information for link prediction, but there is still a lack of effective methods by fusing the topological information or non-topological information in social-information networks. A link prediction method is proposed from the perspective of users’ topic by fusing users’ topic similarity in social-information networks. The method goes in accordance with the following sequence: firstly, a topic similarity between users based on users’ topic representation is defined, followed by which a topic similarity-based sparse network is constructed; secondly, the information of the following/followed network and the topic similarity-based network are fused into a unified framework of probabilistic matrix factorization, based on which the latent-feature representation of the network nodes and the linking relation parameters are obtained; finally, the linking probability between network nodes is calculated based on the obtained latent-feature representation and linking relation parameters. The proposed approach provides a general modeling strategy fusing multi-network information and a learning-based solution. Link prediction experiments are conducted on four real network datasets, i.e. Twitter and Weibo. The experimental results demonstrate that the proposed method is more effective than others.

       

    /

    返回文章
    返回