Chen Tieming, Ma Jixia, Samuel H.Huang, Cai Jiamei. Novel and Efficient Method on Feature Selection and Data Classification[J]. Journal of Computer Research and Development, 2012, 49(4): 735-745.
Citation:
Chen Tieming, Ma Jixia, Samuel H.Huang, Cai Jiamei. Novel and Efficient Method on Feature Selection and Data Classification[J]. Journal of Computer Research and Development, 2012, 49(4): 735-745.
Chen Tieming, Ma Jixia, Samuel H.Huang, Cai Jiamei. Novel and Efficient Method on Feature Selection and Data Classification[J]. Journal of Computer Research and Development, 2012, 49(4): 735-745.
Citation:
Chen Tieming, Ma Jixia, Samuel H.Huang, Cai Jiamei. Novel and Efficient Method on Feature Selection and Data Classification[J]. Journal of Computer Research and Development, 2012, 49(4): 735-745.
1(College of Computer Science & Technology, Zhejiang University of Technology, Hangzhou 310023) 2(State Key Laboratory of Software Development Environment (Beihang University), Beijing 100191) 3(System Intelligent Laboratory, University of Cincinnati, Cincinnati, OH, USA 45221)
A novel feature selection method for data classification problems, as well as a quick rule extraction scheme, are proposed in this paper. At first, the Chi-Merge discretization method is improved by reducing the initial intervals. Using the improved method, the continuous attributes can be effectively discretized. After the attributes discretization, all contingency tables on variant feature patterns can be calculated quickly, and the inconsistency rate can also be generated for each contingency table. The key sequential of features can be identified by selecting the minimum inconsistency rate, and the optimized feature subset can also be achieved efficiently based on the sequence forward search strategy. At last, based on the data contingency table under the selected feature subset, the classification rules can be extracted with one-pass. The experiments show that the proposed data classification scheme obtains good performance. Furthermore, the proposed feature selection and rule extraction method can be extended for the classification applications on distributed isomorphic datasets. The proposed distributed classification method is also simple, efficient with high performance, as well as with privacy-preserving property for contents of sample data.
Ge Weiping, Wang Wei, Zhou Haofeng, and Shi Baile. Privacy Preserving Classification Mining[J]. Journal of Computer Research and Development, 2006, 43(1): 39-45.