Abstract:
Link prediction is one of the primal problems in data mining. Due to the network complexity and the data diversity, the problem of link prediction for different types of data in heterogeneous networks has become more and more complicated. Aiming at link prediction in bi-typed heterogeneous information network, this paper proposes a link prediction method based on clustering and decision tree, called CDTLinks. One kind of objects is considered as the features of the other kind of objects. Then, they are clustered separately. Three heuristic rules are proposed to construct decision trees for bi-typed heterogeneous networks. The branch of the tree with the highest information gain is selected. Finally, we can judge whether there is a link between two nodes through the clustering result and the decision tree model. In addition, we define the concept of potential link nodes and introduce the number of layers, which can reduce the running time and improve the accuracy. The proposed CDTlinks method is validated on DBLP and AMiner datasets. The experimental results show that the CDTlinks model can be used to conduct link prediction effectively in bi-typed heterogeneous networks.