ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

半监督鲁棒联机聚类算法

金 骏 张道强   

  1. (南京航空航天大学计算机科学与工程系 南京 210016) (junjin@nuaa.edu.cn)
  • 出版日期: 2008-03-15

Semi-Supervised Robust On-Line Clustering Algorithm

Jin Jun and Zhang Daoqiang   

  1. (Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016)
  • Online: 2008-03-15

摘要: 将监督信息引入到聚类算法中去,在先前提出的鲁棒联机聚类算法(ROC)的基础上,通过引入以样本类标号形式给出的监督信息,提出了一种半监督的鲁棒联机聚类算法(Semi-ROC).在算法的聚类精度和鲁棒性能上,算法Semi-ROC比ROC和AddC有着更好的性能,在人工数据集和UCI标准数据集上的实验结果表明,Semi-ROC能有效地利用少量的监督信息来提高算法的聚类性能,得到较优的结果.另外,在添加噪声的情况下,算法Semi-ROC比原始的联机聚类算法AddC和ROC都更加鲁棒.

关键词: 联机聚类, 半监督学习, 鲁棒, 核方法, 机器学习

Abstract: Recently, a semi-supervised learning has attracted much attention in machine learning community. One reason is that in many learning tasks, there is a large supply of unlabeled data but insufficient labeled data because the latter is much more expensive to obtain than the former. Typically, semi-supervised learning is applicable to both clustering and classification. This paper focuses its attention on semi-supervised clustering. In semi-supervised clustering, some label level or instance level supervised information is used along with the unlabeled data in order to obtain a better clustering result. A semi-supervised robust on-line clustering algorithm called Semi-ROC is developed, which introduces supervision information in the form of class labels into the previously proposed robust on-line clustering (ROC). After introducing the supervised information, the algorithm can get a more confidential result than the ROC and AddC. The experimental results on the artificial dataset and UCI benchmark data sets show that the proposed Semi-ROC can effectively use little supervision information to enhance the clustering performance, the clustering validity can be improved significantly. Besides, when dealing with noises, Semi-ROC is more robust than both ROC and AddC.

Key words: on-line clustering, semi-supervised learning, robust, kernel method, machine learning