Liu Yanfang, Li Wenbin, Gao Yang. Adaptive Neighborhood Embedding Based Unsupervised Feature Selection[J]. Journal of Computer Research and Development, 2020, 57(8): 1639-1649. DOI: 10.7544/issn1000-1239.2020.20200219
Citation:
Liu Yanfang, Li Wenbin, Gao Yang. Adaptive Neighborhood Embedding Based Unsupervised Feature Selection[J]. Journal of Computer Research and Development, 2020, 57(8): 1639-1649. DOI: 10.7544/issn1000-1239.2020.20200219
Liu Yanfang, Li Wenbin, Gao Yang. Adaptive Neighborhood Embedding Based Unsupervised Feature Selection[J]. Journal of Computer Research and Development, 2020, 57(8): 1639-1649. DOI: 10.7544/issn1000-1239.2020.20200219
Citation:
Liu Yanfang, Li Wenbin, Gao Yang. Adaptive Neighborhood Embedding Based Unsupervised Feature Selection[J]. Journal of Computer Research and Development, 2020, 57(8): 1639-1649. DOI: 10.7544/issn1000-1239.2020.20200219
2(College of Mathematics and Information Engineering, Longyan University, Longyan, Fujian 364012)
Funds: This work was supported by the National Key Research and Development Program of China (2017YFB0702600, 2017YFB0702601), the National Natural Science Foundation of China (61806096), the Education Scientific Research Project of Young Teachers of Fujian Province (JAT170577, JAT190743), and the Science and Technology Project of Longyan City (2019LYF13002).
Unsupervised feature selection algorithms can effectively reduce the dimensionality of high-dimensional unmarked data, which not only reduce the time and space complexity of data processing, but also avoid the over-fitting phenomenon of the feature selection model. However, most of the existing unsupervised feature selection algorithms use k-nearest neighbor method to capture the local geometric structure of data samples, ignoring the problem of uneven data distribution. To solve this problem, an unsupervised feature selection algorithm based on adaptive neighborhood embedding (ANEFS) is proposed. The algorithm determines the number of neighbors of samples according to the distribution of datasets, and then constructs similarity matrix. Meanwhile, a mid-matrix is introduced which maps from high-dimensional space to low-dimensional space, and Laplacian multiplier method is used to optimize the reconstructed function. The experimental results of six UCI datasets show that the proposed algorithm can select representative feature subsets which have higher clustering accuracy and normalize mutual information.