ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (8): 1695-1707.doi: 10.7544/issn1000-1239.2019.20190313

Special Issue: 2019人工智能前沿进展专题

Previous Articles     Next Articles

An Adaptive Regression Feature Selection Method for Datasets with Outliers

Guo Yaqing1, Wang Wenjian2, Su Meihong1   

  1. 1(School of Computer and Information Technology, Shanxi University, Taiyuan 030006);2(Key Laboratory of Computational Intelligence and Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online:2019-08-01

Abstract: Irrelevant and redundant features embedded in data will raise the difficulty for learning tasks, and feature selection can solve this problem effectively and improve learning efficiency and learner performance. Most of existing feature selection approaches are proposed for classification problems, while there are few studies on regression problems. Eespecially in presence of outliers, the present methods do not perform well. Although some methods can increase their robustness by weighting sample loss functions, the weights are set in advance and fixed throughout feature selection and learner training, which leads to bad adaptability. This paper proposes a regression feature selection method named adaptive weight LASSO (AWLASSO) for outliers. Firstly, it updates sample errors according to regression coefficients. Then the weights for loss functions of all samples are set according to the adaptive regularization term, i.e., the loss functions of samples whose errors are larger than current threshold are set smaller weights and loss functions of samples whose errors are less than threshold are set larger weights. The regression coefficient will be estimated iteratively under weighted loss function whose weights are updated. AWLASSO controls whether samples participate in regression coefficient estimation by the threshold. Only those samples with small errors participate in estimation, so a better regression coefficient estimation may be obtained in the end. In addition, the error threshold of AWLASSO algorithm is not fixed but increasing(To make initial regression coefficient estimation be accurate, initial threshold is often smaller). So some samples which are misjudged as outliers will have chance to be added again in training set. The AWLASSO regards samples whose errors are larger than the maximum threshold as outliers for their learning cost is bigger, and the weights of their loss functions are set to 0. Hence, the influence of outliers will be reduced. Experiment results on artificial data and benchmark datasets demonstrate that the proposed AWLASSO has better robustness and sparsity specially for datasets with outliers in comparison with classical methods.

Key words: feature selection, regression, outliers, robustness, adaptive regularization term

CLC Number: