ISSN 1000-1239 CN 11-1777/TP

• 论文 •

半监督鲁棒联机聚类算法

1. (南京航空航天大学计算机科学与工程系 南京 210016) (junjin@nuaa.edu.cn)
• 出版日期: 2008-03-15

Semi-Supervised Robust On-Line Clustering Algorithm

Jin Jun and Zhang Daoqiang

1. (Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016)
• Online: 2008-03-15

Abstract: Recently, a semi-supervised learning has attracted much attention in machine learning community. One reason is that in many learning tasks, there is a large supply of unlabeled data but insufficient labeled data because the latter is much more expensive to obtain than the former. Typically, semi-supervised learning is applicable to both clustering and classification. This paper focuses its attention on semi-supervised clustering. In semi-supervised clustering, some label level or instance level supervised information is used along with the unlabeled data in order to obtain a better clustering result. A semi-supervised robust on-line clustering algorithm called Semi-ROC is developed, which introduces supervision information in the form of class labels into the previously proposed robust on-line clustering (ROC). After introducing the supervised information, the algorithm can get a more confidential result than the ROC and AddC. The experimental results on the artificial dataset and UCI benchmark data sets show that the proposed Semi-ROC can effectively use little supervision information to enhance the clustering performance, the clustering validity can be improved significantly. Besides, when dealing with noises, Semi-ROC is more robust than both ROC and AddC.