ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (2): 423-436.doi: 10.7544/issn1000-1239.2015.20140221

所属专题: 2015大数据管理

• 软件技术 • 上一篇    下一篇

基于低秩和稀疏矩阵分解的多源融合链接预测算法

刘冶1,朱蔚恒2,潘炎3,印鉴1   

  1. 1(中山大学信息科学与技术学院 广州 510006); 2(暨南大学信息科学技术学院 广州 510632); 3(中山大学软件学院 广州 510006) (jourkliu@163.com)
  • 出版日期: 2015-02-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61033010,61272065,61370021,61472453,U1401256);广东省自然科学基金项目(S2011020001182,S2012010009311);广东省科技计划项目(2011B040200007,2012A010701013)

Multiple Sources Fusion for Link Prediction via Low-Rank and Sparse Matrix Decomposition

Liu Ye1,Zhu Weiheng2,Pan Yan3, Yin Jian1   

  1. 1(School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006); 2(College of Information Science Technology, Jinan University, Guangzhou 510632); 3(School of Software, Sun Yat-sen University, Guangzhou 510006)
  • Online: 2015-02-01

摘要: 近年来,链接预测成为社会网络和其他复杂网络链接挖掘中的热门研究领域.在链接预测问题中,经常会存在用来提高预测效果的附加数据信息源,这些数据可以用于预测网络中的链接是否存在.在所有的数据源中,最主要的数据源在链接预测中起到最重要的作用.因此,设计具备健壮性的算法用于充分利用所有数据源的信息来进行链接预测十分重要,算法还需要平衡主数据源和附加数据源的关系,使得链接预测能够获得更好的效果.同时,传统基于拓扑结构计算的无监督算法大多数通过计算网络中节点间的评分值来解决预测链接存在可能性的问题,这些方法能够获得有效的结果.在链接预测方法中,最关键的一步是构建准确的输入矩阵数据.由于许多真实世界数据集存在噪声,这导致降低了大多数链接预测模型的效果.提出了一种新的链接预测方法,通过多个数据源的融合,兼顾地利用了主数据源的信息和其他附加数据源的信息.接着,主数据源和其他附加数据源被用于构建一个低噪声且更准确的矩阵,而新的矩阵被用于作为传统无监督拓扑链接预测算法的输入.根据在多个真实世界数据上的测试结果,在多源数据集上进行对比实验,提出的基于低秩和稀疏矩阵分解的多源融合链接预测算法相对于基准算法能够获得更好的效果.

关键词: 低秩约束, 矩阵分解, 多源融合, 链接预测, 机器学习

Abstract: In recent years, link prediction is a popular research field of link mining in social network and other complex networks. In the problem of link prediction, there usually exist multiple additional sources of information used to improve the performance of predicting the probability of the links in network. Among all the sources, the major source of all the information sources usually plays the most significant role on predicting. It is important to design a robust algorithm to make full use of all the sources and balance the major source and additional sources to get better link prediction result. Meanwhile, the traditional unsupervised algorithms based on topological calculation are mostly useful methods to calculate the scores for solving link prediction problem. In the approach of link prediction methods, the most important step is to construct a precise input seed matrix. Since many real-world network data may be noisy, which decreases the accuracy of most link prediction methods. In this paper, we propose a novel method with the multiple additional sources which take advantage of the leading information seed source matrix and others. And then, the seed source matrix is combined with other sources to construct a better matrix with lower noise and more precise structure than the seed matrix. The new matrix is used as the input matrix to traditional unsupervised topological algorithm. Experiment results show that the new proposed method can get better performance of the link prediction problem in different kinds of multiple sources real-world datasets.

Key words: low-rank constraint, matrix decomposition, multiple sources fusion, link prediction, machine learning

中图分类号: