Abstract:
Recently, a semi-supervised learning has attracted much attention in machine learning community. One reason is that in many learning tasks, there is a large supply of unlabeled data but insufficient labeled data because the latter is much more expensive to obtain than the former. Typically, semi-supervised learning is applicable to both clustering and classification. This paper focuses its attention on semi-supervised clustering. In semi-supervised clustering, some label level or instance level supervised information is used along with the unlabeled data in order to obtain a better clustering result. A semi-supervised robust on-line clustering algorithm called Semi-ROC is developed, which introduces supervision information in the form of class labels into the previously proposed robust on-line clustering (ROC). After introducing the supervised information, the algorithm can get a more confidential result than the ROC and AddC. The experimental results on the artificial dataset and UCI benchmark data sets show that the proposed Semi-ROC can effectively use little supervision information to enhance the clustering performance, the clustering validity can be improved significantly. Besides, when dealing with noises, Semi-ROC is more robust than both ROC and AddC.