Abstract:
Selective classifiers have been proved to be a kind of algorithms that can effectively improve the accuracy and efficiency of classification by deleting irrelevant or redundant attributes of a data set. Though some selective classifiers have been proposed, most of them deal with complete data, which is due to the complexity of dealing with incomplete data. Yet actual data sets are often incomplete and have many redundant or irrelevant attributes because of various kinds of reason. Similar to the case of complete data, irrelevant or redundant attributes of an incomplete data set can also sharply reduce the accuracy of a classifier established on this data set. So constructing selective classifiers for incomplete data is an important problem. With the analysis of main methods of processing incomplete data for classification, two selective Bayes classifiers for incomplete data, which are denoted as SRBC and CBSRBC respectively, are presented. While SRBC is constructed by using the robust Bayes classifiers, CBSRBC is based on SRBC and chi-squared statistics. Experiments on twelve benchmark incomplete data sets show that these two algorithms can not only enormously reduce the number of attributes, but also greatly improve the accuracy and stability of classification as well. On the whole, CBSRBC is more efficient than SRBC and its classification accuracy is higher than that of SRBC. But some thresholds necessary to CBSRBC can be avoided by SRBC.