ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (3): 606-613.doi: 10.7544/issn1000-1239.2015.20131147

• 人工智能 • 上一篇    下一篇

异构信息网络上基于图正则化的半监督学习

刘钰峰1,李仁发1,2   

  1. 1(湖南大学信息科学与工程学院 长沙 410082); 2(湖南大学嵌入式系统与网络实验室 长沙 410082) (fx_yfliu@163.com)
  • 出版日期: 2015-03-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61173036);湖南大学青年教师成长计划基金项目(531107040824)

Graph Regularized Semi-Supervised Learning on Heterogeneous Information Networks

Liu Yufeng1, Li Renfa1,2   

  1. 1(College of Information Science and Engineering, Hunan University, Changsha 410082); 2(Embedded System and Networking Laboratory, Hunan University, Changsha 410082)
  • Online: 2015-03-01

摘要: 现实世界中存在着大量包含多种类型的对象和联系的异构信息网络,从中挖掘信息获取知识已成为当前的研究热点之一.基于图正则化的半监督学习在近年来得到了广泛的研究,然而,现有的半监督学习算法大都只能应用于同构网络.基于同构节点和异构节点的一致性假设,提出了任意结构的异构信息网络上的半监督学习的正则化分类函数,并得到分类函数的闭式解,以此预测未标记节点的类别.提出了异构信息网络上的半监督学习的迭代框架,标记节点的信息可以在邻近的节点上迭代传播,直至达到稳定状态,并证明了迭代算法将收敛于正则化分类函数的闭式解.DBLP数据集上的实验表明该方法优于经典的半监督学习算法.

关键词: 异构信息网络, 同构信息网络, 半监督学习, 正则化框架, 聚类

Abstract: Heterogeneous information networks, composed of multiple types of objects and links, are ubiquitous in real life. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. Learning from labeled and unlabeled data via semi-supervised classification can lead to good knowledge extraction of the hidden network structure. However, although semi-supervised learning on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the semi-supervised classification problem on heterogeneous information networks with an arbitrary schema consisting of a number of object and link types. By applying graph regularization to preserve consistency over each relation graph corresponding to each type of links separately, we develop a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We propose an iterative framework on heterogeneous information network in which the information of labeled data can be spread to the adjacent nodes by iterative method until the steady state. We infer the class memberships of unlabeled data from those of labeled ones according to their proximities in the network. Experiments on the real DBLP data set clearly show that our approach outperforms the classic semi-supervised Learning methods.

Key words: heterogeneous information network, homogeneous information network, semi-supervised learning, regularization framework, clustering

中图分类号: