Abstract:
Feature selection is one of the most important problems in pattern recognition, machine learning and data mining areas, as a basic pre-processing step of compressing data. Most of the current algorithms were proposed separately for some special domain, which limited their extension. Especially, different applications are often under different supervised models, such as supervised, semi-supervised and unsupervised model. A concrete feature selection algorithm is always designed for a given environment. When the setting is changed, the original algorithm, which was running fluently and efficiently, turns to be inefficient, or even useless. Hence a new algorithm should be explored in this condition.This paper presents a common feature selection method based on Hilbert-Schmidt Independence Criterion, evaluating the correlation between feature subset and target concept. Intrinsic properties of feature selection are exploited in this method, under multiple supervised models, like supervised, semi-supervised and unsupervised. And a uniform format is applied. Furthermore, some existing algorithms can be explained from the viewpoint of kernel-based methods, which brings a deeper understanding. And a novel algorithm is derived from this method. It can solve a challenging problem, known as interactive feature selection. The experimental results not only demonstrate the efficiency and stability of the algorithm, but also infer that the method can give a considerable guidance for the production of novel feature selection algorithms.