ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (8): 1811-1820.doi: 10.7544/issn1000-1239.2014.20131049

• 人工智能 • 上一篇    下一篇

邻域粗糙协同分类模型

张 维1,2,3,苗夺谦1,3,高 灿4,岳晓冬5,   

  1. 1(同济大学电子与信息工程学院 上海 201804);2(上海电力学院计算机科学与技术学院 上海 200090);3(嵌入式系统与服务计算教育部重点实验室(同济大学) 上海 201804);4(中联重科股份有限公司 长沙 410013);5(上海大学计算机工程与科学学院 上海 200444) (zhangweismile@163.com)
  • 出版日期: 2014-08-15
  • 基金资助: 
    基金项目:国家自然科学基金项目(61075056,61273304,61202170,61103067);中央高校基本科研业务费专项资金项目

A Neighborhood Rough Sets-Based Co-Training Model for Classification

Zhang Wei1,2,3, Miao Duoqian1,3, Gao Can4, Yue Xiaodong5   

  1. 1(School of Electronics and Information, Tongji University, Shanghai 201804);2(School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090);3(Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai 201804);4(Zoomlion Heavy Industry Science and Technology Development Co., Ltd., Changsha 410013) ;5(School of Computer Engineering and Science, Shanghai University, Shanghai 200444)
  • Online: 2014-08-15

摘要: Pawlak粗糙集理论是一种有监督学习模型,只适合处理离散型数据.但在一些现实问题中存在着大量的连续型数据,并且有标记数据很有限,更多的是无标记数据.结合邻域粗糙集和协同学习理论,提出了适合处理连续型数据并可有效利用无标记数据提升分类性能的邻域粗糙协同分类模型.该模型首先构建了邻域粗糙半监督约简算法,并利用该算法提取两个差异性较大的约简构造基分类器,然后迭代地在无标记数据上交互协同学习.UCI数据集实验对比分析表明,与其他同类模型相比,该模型有较好的性能.

关键词: 邻域粗糙集, 邻域互信息, 半监督约简, 协同学习, 连续型数据

Abstract: Pawlak's rough set theory, as a supervised learning model, is only applicable for discrete data. However it is often the case that practical data sets are continuous and involve both few labeled and abundant unlabeled data, which is outside the realm of Pawlak's rough set theory. In this paper, a neighborhood rough sets based co-training model for classification is proposed, which could deal with continuous data and utilize the unlabeled and labeled data to achieve better performance than the classifier learned only from few labeled data. Firstly, a heuristic algorithm based on neighborhood mutual information is put forward to compute the reduct of partially labeled continuous data. Then two diverse reducts are generated. The model employs the two reducts to train two base classifiers on the labeled data, and makes the two base classifiers teach each other on the unlabeled data to boot the their performance iteratively. The experimental results on selected UCI datasets show that the proposed model are more effective to deal with partially labeled continuous data than some representative ones in learning accuracy.

Key words: neighborhood rough sets, neighborhood mutual information, semi-supervised reduction, co-training, continuous data

中图分类号: