高级检索

    半监督鲁棒联机聚类算法

    Semi-Supervised Robust On-Line Clustering Algorithm

    • 摘要: 将监督信息引入到聚类算法中去,在先前提出的鲁棒联机聚类算法(ROC)的基础上,通过引入以样本类标号形式给出的监督信息,提出了一种半监督的鲁棒联机聚类算法(Semi-ROC).在算法的聚类精度和鲁棒性能上,算法Semi-ROC比ROC和AddC有着更好的性能,在人工数据集和UCI标准数据集上的实验结果表明,Semi-ROC能有效地利用少量的监督信息来提高算法的聚类性能,得到较优的结果.另外,在添加噪声的情况下,算法Semi-ROC比原始的联机聚类算法AddC和ROC都更加鲁棒.

       

      Abstract: Recently, a semi-supervised learning has attracted much attention in machine learning community. One reason is that in many learning tasks, there is a large supply of unlabeled data but insufficient labeled data because the latter is much more expensive to obtain than the former. Typically, semi-supervised learning is applicable to both clustering and classification. This paper focuses its attention on semi-supervised clustering. In semi-supervised clustering, some label level or instance level supervised information is used along with the unlabeled data in order to obtain a better clustering result. A semi-supervised robust on-line clustering algorithm called Semi-ROC is developed, which introduces supervision information in the form of class labels into the previously proposed robust on-line clustering (ROC). After introducing the supervised information, the algorithm can get a more confidential result than the ROC and AddC. The experimental results on the artificial dataset and UCI benchmark data sets show that the proposed Semi-ROC can effectively use little supervision information to enhance the clustering performance, the clustering validity can be improved significantly. Besides, when dealing with noises, Semi-ROC is more robust than both ROC and AddC.

       

    /

    返回文章
    返回