一种基于正则化的半监督多标记学习方法

李宇峰  黄圣君  周志华

一种基于正则化的半监督多标记学习方法

李宇峰黄圣君周志华

Regularized Semi-Supervised Multi-Label Learning

Li Yufeng, Huang Shengjun, and Zhou Zhihua

摘要

摘要: 多标记学习主要用于解决单个样本同时属于多个类别的问题.传统的多标记学习通常假设训练数据集含有大量有标记的训练样本.然而在许多实际问题中，大量训练样本中通常只有少量有标记的训练样本.为了更好地利用丰富的未标记训练样本以提高分类性能，提出了一种基于正则化的归纳式半监督多标记学习方法——MASS.具体而言，MASS首先在最小化经验风险的基础上，引入两种正则项分别用于约束分类器的复杂度及要求相似样本拥有相似结构化多标记输出, 然后通过交替优化技术给出快速解法.在网页分类和基因功能分析问题上的实验结果验证了MASS方法的有效性.

Abstract: Multi-label learning is proposed to deal with examples which are associating with multiple class labels simultaneously. Previous multi-label studies usually assume that large amounts of labeled training examples are available to obtain good performance. However, in many real world applications, labeled examples are few and amounts of unlabeled examples are readily available. In order to exploit the abundant unlabeled examples to help improve the generalization performance, we propose a novel regularized inductive semi-supervised multi-label method named MASS. Specifically, aside from minimizing the empirical risk, MASS employs two regularizers to constrain the final decision function. One is to characterize the classifier’s complexity with consideration of label relatedness, and the other requires that similar examples share with similar structural multi-label outputs. This leads to a large scale convex optimization problem, and an efficient alternating optimization algorithm is provided to achieve its global optimal solution in super-linear convergence rate due to the strong convexity of the objective function. Comprehensive experimental results on two real-world data sets, i.e., webpage categorization and gene functional analysis with varied numbers of labeled examples, demonstrate the effectiveness of the proposal.

HTML全文

参考文献(0)

施引文献

资源附件(0)