Abstract:
Heterogeneous information networks, composed of multiple types of objects and links, are ubiquitous in real life. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. Learning from labeled and unlabeled data via semi-supervised classification can lead to good knowledge extraction of the hidden network structure. However, although semi-supervised learning on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the semi-supervised classification problem on heterogeneous information networks with an arbitrary schema consisting of a number of object and link types. By applying graph regularization to preserve consistency over each relation graph corresponding to each type of links separately, we develop a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We propose an iterative framework on heterogeneous information network in which the information of labeled data can be spread to the adjacent nodes by iterative method until the steady state. We infer the class memberships of unlabeled data from those of labeled ones according to their proximities in the network. Experiments on the real DBLP data set clearly show that our approach outperforms the classic semi-supervised Learning methods.