ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (3): 623-634.doi: 10.7544/issn1000-1239.2019.20170961

• 人工智能 • 上一篇    下一篇

基于层次信息粒表示的属性图链接预测模型

罗晟1,2,苗夺谦1,2,张志飞1,3,张远健1,2,胡声丹1,2   

  1. 1(同济大学计算机科学与技术系 上海 201804); 2(嵌入式系统与服务计算教育部重点实验室(同济大学) 上海 201804); 3(计算机软件新技术国家重点实验室(南京大学) 南京 210023) (tjluosheng@gmail.com)
  • 出版日期: 2019-03-01
  • 基金资助: 
    国家自然科学基金项目(61673301,61502259);南京大学计算机软件新技术国家重点实验室开放课题(KFKT2017B22)

A Link Prediction Model Based on Hierarchical Information Granular Representation for Attributed Graphs

Luo Sheng1,2, Miao Duoqian1,2, Zhang Zhifei1,3, Zhang Yuanjian1,2, Hu Shengdan1,2   

  1. 1(Department of Computer Science and Technology, Tongji University, Shanghai 201804); 2(Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai 201804); 3(State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023)
  • Online: 2019-03-01

摘要: 随着具有结点属性信息的网络图数据的增加,结点属性及结点链接关系越来越复杂,这对复杂网络的链接预测任务带来了一系列的挑战.这些不同来源的原始数据之间存在着不一致性,即结点的属性诱导的潜在链接关系与网络拓扑结构观测到的链接边之间存在着不一致的情况,这一现象将直接影响结点对之间的链接预测准确性与精确性.为了有效处理多源数据的不一致性,融合异构数据的差异,借助粒计算思想,通过对原始数据的多粒度表示,将原始数据在不同层次的粒度进行信息表示建模.最终依据这些数据的粒度表示,寻找最优的粒层结构,并最大化地消除数据内在的不一致性.首先,定义了数据的粒度不同层次表示及粒层关系;其次,对所观测到的链接数据,构建对数似然统计模型,并综合不同粒度层数据特点对模型进行修正;最后,使用多源数据训练统计模型,将学习好的模型用于预测结点对之间的链接概率.实验表明:与现有链接预测模型相比,多源数据经过粒度表示极大地平衡了多源数据的不一致性,有效提升了链接预测任务的准确性.

关键词: 粒度表示学习, 粒计算, 属性图, 链接预测, 数据融合

Abstract: With the accumulation of the network graph data coupled with node attributes, the relations between node attributes and node linkages become more and more complex, which brings a lot of challenges to the task of the link prediction in complex network. The main reason is the inconsistency existing in the different source data, that is, the relations between the latent linkages which are implied by the node attributes and the observed linkages from network topological structure, respectively. This phenomenon directly affects the correctness and accuracy of link predictions. In order to effectively deal with multi-source data inconsistency and fuse the heterogeneous data, with the idea of granular computing and data multi-layer granular representation, we model the original data at different levels of granular representation. According to the data granular representation, we ultimately eliminate data inherent inconsistencies by finding the optimal granular structure. In this paper, we firstly define the data granular representation and the relation between different level granular; Then, we construct a log-likelihood model of the data, and place a lot of constraints decided by the granular relations to regularize the model; At last, we use the trained model to perform the link probability between nodes. Experiments show that, multi-source data can ultimately reduce the inconsistency by granular representation, and the statistic model regulated by these granular relations outperforms the state-of-the-art methods, and effectively improves the accuracy of the link prediction in the attributed graph.

Key words: granular representation learning, granular computing, attributed graph, link prediction, data fusion

中图分类号: