高级检索
    张 维, 苗夺谦, 高 灿, 岳晓冬. 邻域粗糙协同分类模型[J]. 计算机研究与发展, 2014, 51(8): 1811-1820. DOI: 10.7544/issn1000-1239.2014.20131049
    引用本文: 张 维, 苗夺谦, 高 灿, 岳晓冬. 邻域粗糙协同分类模型[J]. 计算机研究与发展, 2014, 51(8): 1811-1820. DOI: 10.7544/issn1000-1239.2014.20131049
    Zhang Wei, Miao Duoqian, Gao Can, Yue Xiaodong. A Neighborhood Rough Sets-Based Co-Training Model for Classification[J]. Journal of Computer Research and Development, 2014, 51(8): 1811-1820. DOI: 10.7544/issn1000-1239.2014.20131049
    Citation: Zhang Wei, Miao Duoqian, Gao Can, Yue Xiaodong. A Neighborhood Rough Sets-Based Co-Training Model for Classification[J]. Journal of Computer Research and Development, 2014, 51(8): 1811-1820. DOI: 10.7544/issn1000-1239.2014.20131049

    邻域粗糙协同分类模型

    A Neighborhood Rough Sets-Based Co-Training Model for Classification

    • 摘要: Pawlak粗糙集理论是一种有监督学习模型,只适合处理离散型数据.但在一些现实问题中存在着大量的连续型数据,并且有标记数据很有限,更多的是无标记数据.结合邻域粗糙集和协同学习理论,提出了适合处理连续型数据并可有效利用无标记数据提升分类性能的邻域粗糙协同分类模型.该模型首先构建了邻域粗糙半监督约简算法,并利用该算法提取两个差异性较大的约简构造基分类器,然后迭代地在无标记数据上交互协同学习.UCI数据集实验对比分析表明,与其他同类模型相比,该模型有较好的性能.

       

      Abstract: Pawlak's rough set theory, as a supervised learning model, is only applicable for discrete data. However it is often the case that practical data sets are continuous and involve both few labeled and abundant unlabeled data, which is outside the realm of Pawlak's rough set theory. In this paper, a neighborhood rough sets based co-training model for classification is proposed, which could deal with continuous data and utilize the unlabeled and labeled data to achieve better performance than the classifier learned only from few labeled data. Firstly, a heuristic algorithm based on neighborhood mutual information is put forward to compute the reduct of partially labeled continuous data. Then two diverse reducts are generated. The model employs the two reducts to train two base classifiers on the labeled data, and makes the two base classifiers teach each other on the unlabeled data to boot the their performance iteratively. The experimental results on selected UCI datasets show that the proposed model are more effective to deal with partially labeled continuous data than some representative ones in learning accuracy.

       

    /

    返回文章
    返回