ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 一种小样本数据的特征选择方法

1. 1(山西大学计算机与信息技术学院 太原 030006);2(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006) (xuh102@126.com)
• 出版日期: 2018-10-01
• 基金资助:
国家自然科学基金项目(61673249)；山西省回国留学人员科研基金项目(2016-004)；赛尔网络下一代互联网技术创新项目(NGII20170601)

### A Feature Selection Method for Small Samples

Xu Hang1, Zhang Kai1, Wang Wenjian1,2

1. 1(School of Computer and Information Technology, Shanxi University, Taiyuan 030006);2(Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education，Taiyuan 030006)
• Online: 2018-10-01

Abstract: For small samples, the common machine learning algorithms may not obtain good results as the feature dimension of small samples is often larger than the number of samples and some irrelevant or redundant features are often existed. It is an effective way to solve this problem by reducing the feature dimension through feature selection. This paper proposes a filter feature selection method based on mutual information for the small samples. First, the criterion of feature grouping based on the mutual information is defined. Both the correlations between features and the class and the redundancy among different features are considered in this criterion, according to which the features are grouped. Then those features that have maximal correlation with the class in each group will be chosen to compose a candidate feature subset. Meanwhile, it is ensured that the time complexity of this algorithm is low. After that, the feature selection method based on feature grouping is combined with Boruta algorithm to determine the optimal feature subset automatically from the candidate feature subset. In this way, the feature dimension can be reduced greatly. Compared with the five classical feature selection algorithms, experimental results on benchmark data sets demonstrate that the feature subset selected by the proposed method has better classification performance and running efficiency on three kinds of classifiers.