ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (8): 1684-1695.doi: 10.7544/issn1000-1239.2016.20160172

Special Issue: 2016数据挖掘前沿技术专题

Previous Articles     Next Articles

Feature Selection Based on the Measurement of Correlation Information Entropy

Dong Hongbin, Teng Xuyang,Yang Xue   

  1. (College of Computer Science and Technology, Harbin Engineering University, Harbin 150001)
  • Online:2016-08-01

Abstract: Feature selection aims to select a smaller feature subset from the original feature set. The subset can provide the approximate or better performance in data mining and machine learning. Without transforming physical characteristics of features, fewer features give a more powerful interpretation. Traditional information-theoretic methods tend to measure features relevance and redundancy separately and ignore the combination effect of the whole feature subset. In this paper, the correlation information entropy is applied to feature selection, which is a technology in data fusion. Based on this method, we measure the degree of the independence and redundancy among features. Then the correlation matrix is constructed by utilizing the mutual information between features and their class labels and the combination of feature pairs. Besides, with the consideration of the multivariable correlation of different features in subset, the eigenvalue of the correlation matrix is calculated. Therefore, the sorting algorithm of features and an adaptive feature subset selection algorithm combining with the parameter are proposed. Experiment results show the effectiveness and efficiency on classification tasks of the proposed algorithms.

Key words: feature selection, correlation information entropy, group effect, multivariable correlation, correlation matrix

CLC Number: