ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (8): 1795-1803.doi: 10.7544/issn1000-1239.2017.20170172

所属专题: 2017人工智能前沿进展专题

• 人工智能 • 上一篇    下一篇

基于聚类和决策树的链路预测方法

杨妮亚1,彭涛1,2,刘露1   

  1. 1(吉林大学计算机科学与技术学院 长春 130012);2(符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012) (yangny15@mails.jlu.edu.cn)
  • 出版日期: 2017-08-01
  • 基金资助: 
    国家自然科学基金项目(60903098);吉林省发改委产业技术研究与开发专项(2015Y055);吉林省科技厅重点科技攻关项目(20150204040GX)

Link Prediction Method Based on Clustering and Decision Tree

Yang Niya1, Peng Tao1,2, Liu Lu1   

  1. 1(College of Computer Science and Technology, Jilin University, Changchun 130012);2(Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012)
  • Online: 2017-08-01

摘要: 链路预测是数据挖掘研究的主要问题之一.由于网络的复杂性、数据的多样性,根据网络结构及已有信息对异质网络中的不同类型的数据进行链路预测的问题也变得更加复杂.针对双类型异质信息网络,提出了一种基于聚类和决策树的链路预测方法CDTLinks.通过将网络中2种类型对象互为特征的方法得到对象的特征表示,并分别进行聚类.对于双类型异质网络提出了3种启发式规则来构建决策树,根据信息增益来选择树中不同分支.最后,根据聚簇分布结果以及决策树模型来判断任意2个不同类型节点之间是否存在链接.另外,定义了潜在链接节点并引入层数的概念,在降低算法运行时间的同时提高了准确率.在DBLP和AMiner数据集上验证了提出的CDTlinks方法,结果表明:在双类型异质网络中,CDTlinks模型能够有效地进行链路预测.

关键词: 链路预测, 聚类, 决策树, 异质信息网络, 启发式规则

Abstract: Link prediction is one of the primal problems in data mining. Due to the network complexity and the data diversity, the problem of link prediction for different types of data in heterogeneous networks has become more and more complicated. Aiming at link prediction in bi-typed heterogeneous information network, this paper proposes a link prediction method based on clustering and decision tree, called CDTLinks. One kind of objects is considered as the features of the other kind of objects. Then, they are clustered separately. Three heuristic rules are proposed to construct decision trees for bi-typed heterogeneous networks. The branch of the tree with the highest information gain is selected. Finally, we can judge whether there is a link between two nodes through the clustering result and the decision tree model. In addition, we define the concept of potential link nodes and introduce the number of layers, which can reduce the running time and improve the accuracy. The proposed CDTlinks method is validated on DBLP and AMiner datasets. The experimental results show that the CDTlinks model can be used to conduct link prediction effectively in bi-typed heterogeneous networks.

Key words: link prediction, clustering, decision tree, heterogeneous information network, heuristic rules

中图分类号: