异构信息网络上基于图正则化的半监督学习

刘钰峰; 李仁发

doi:10.7544/issn1000-1239.2015.20131147

异构信息网络上基于图正则化的半监督学习

Graph Regularized Semi-Supervised Learning on Heterogeneous Information Networks

摘要

摘要: 现实世界中存在着大量包含多种类型的对象和联系的异构信息网络，从中挖掘信息获取知识已成为当前的研究热点之一.基于图正则化的半监督学习在近年来得到了广泛的研究，然而，现有的半监督学习算法大都只能应用于同构网络.基于同构节点和异构节点的一致性假设，提出了任意结构的异构信息网络上的半监督学习的正则化分类函数，并得到分类函数的闭式解，以此预测未标记节点的类别.提出了异构信息网络上的半监督学习的迭代框架，标记节点的信息可以在邻近的节点上迭代传播，直至达到稳定状态，并证明了迭代算法将收敛于正则化分类函数的闭式解.DBLP数据集上的实验表明该方法优于经典的半监督学习算法.

Abstract: Heterogeneous information networks, composed of multiple types of objects and links, are ubiquitous in real life. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. Learning from labeled and unlabeled data via semi-supervised classification can lead to good knowledge extraction of the hidden network structure. However, although semi-supervised learning on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the semi-supervised classification problem on heterogeneous information networks with an arbitrary schema consisting of a number of object and link types. By applying graph regularization to preserve consistency over each relation graph corresponding to each type of links separately, we develop a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We propose an iterative framework on heterogeneous information network in which the information of labeled data can be spread to the adjacent nodes by iterative method until the steady state. We infer the class memberships of unlabeled data from those of labeled ones according to their proximities in the network. Experiments on the real DBLP data set clearly show that our approach outperforms the classic semi-supervised Learning methods.

HTML全文

参考文献(0)

施引文献

资源附件(0)