ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 一种针对异常点的自适应回归特征选择方法

1. 1(山西大学计算机与信息技术学院 太原 030006);2(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006) (791771653@qq.com)
• 出版日期: 2019-08-01
• 基金资助:
国家自然科学基金项目(61673249)；国家自然科学联合基金重点项目(U1805263)；山西省回国留学人员科研基金项目(2016-004)

### An Adaptive Regression Feature Selection Method for Datasets with Outliers

Guo Yaqing1, Wang Wenjian2, Su Meihong1

1. 1(School of Computer and Information Technology, Shanxi University, Taiyuan 030006);2(Key Laboratory of Computational Intelligence and Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
• Online: 2019-08-01

Abstract: Irrelevant and redundant features embedded in data will raise the difficulty for learning tasks, and feature selection can solve this problem effectively and improve learning efficiency and learner performance. Most of existing feature selection approaches are proposed for classification problems, while there are few studies on regression problems. Eespecially in presence of outliers, the present methods do not perform well. Although some methods can increase their robustness by weighting sample loss functions, the weights are set in advance and fixed throughout feature selection and learner training, which leads to bad adaptability. This paper proposes a regression feature selection method named adaptive weight LASSO (AWLASSO) for outliers. Firstly, it updates sample errors according to regression coefficients. Then the weights for loss functions of all samples are set according to the adaptive regularization term, i.e., the loss functions of samples whose errors are larger than current threshold are set smaller weights and loss functions of samples whose errors are less than threshold are set larger weights. The regression coefficient will be estimated iteratively under weighted loss function whose weights are updated. AWLASSO controls whether samples participate in regression coefficient estimation by the threshold. Only those samples with small errors participate in estimation, so a better regression coefficient estimation may be obtained in the end. In addition, the error threshold of AWLASSO algorithm is not fixed but increasing(To make initial regression coefficient estimation be accurate, initial threshold is often smaller). So some samples which are misjudged as outliers will have chance to be added again in training set. The AWLASSO regards samples whose errors are larger than the maximum threshold as outliers for their learning cost is bigger, and the weights of their loss functions are set to 0. Hence, the influence of outliers will be reduced. Experiment results on artificial data and benchmark datasets demonstrate that the proposed AWLASSO has better robustness and sparsity specially for datasets with outliers in comparison with classical methods.