ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (8): 1811-1820.doi: 10.7544/issn1000-1239.2014.20131049

Previous Articles     Next Articles

A Neighborhood Rough Sets-Based Co-Training Model for Classification

Zhang Wei1,2,3, Miao Duoqian1,3, Gao Can4, Yue Xiaodong5   

  1. 1(School of Electronics and Information, Tongji University, Shanghai 201804);2(School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090);3(Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai 201804);4(Zoomlion Heavy Industry Science and Technology Development Co., Ltd., Changsha 410013) ;5(School of Computer Engineering and Science, Shanghai University, Shanghai 200444)
  • Online:2014-08-15

Abstract: Pawlak's rough set theory, as a supervised learning model, is only applicable for discrete data. However it is often the case that practical data sets are continuous and involve both few labeled and abundant unlabeled data, which is outside the realm of Pawlak's rough set theory. In this paper, a neighborhood rough sets based co-training model for classification is proposed, which could deal with continuous data and utilize the unlabeled and labeled data to achieve better performance than the classifier learned only from few labeled data. Firstly, a heuristic algorithm based on neighborhood mutual information is put forward to compute the reduct of partially labeled continuous data. Then two diverse reducts are generated. The model employs the two reducts to train two base classifiers on the labeled data, and makes the two base classifiers teach each other on the unlabeled data to boot the their performance iteratively. The experimental results on selected UCI datasets show that the proposed model are more effective to deal with partially labeled continuous data than some representative ones in learning accuracy.

Key words: neighborhood rough sets, neighborhood mutual information, semi-supervised reduction, co-training, continuous data

CLC Number: