Abstract:
A novel feature selection method for data classification problems, as well as a quick rule extraction scheme, are proposed in this paper. At first, the Chi-Merge discretization method is improved by reducing the initial intervals. Using the improved method, the continuous attributes can be effectively discretized. After the attributes discretization, all contingency tables on variant feature patterns can be calculated quickly, and the inconsistency rate can also be generated for each contingency table. The key sequential of features can be identified by selecting the minimum inconsistency rate, and the optimized feature subset can also be achieved efficiently based on the sequence forward search strategy. At last, based on the data contingency table under the selected feature subset, the classification rules can be extracted with one-pass. The experiments show that the proposed data classification scheme obtains good performance. Furthermore, the proposed feature selection and rule extraction method can be extended for the classification applications on distributed isomorphic datasets. The proposed distributed classification method is also simple, efficient with high performance, as well as with privacy-preserving property for contents of sample data.