基于自适应邻域嵌入的无监督特征选择算法

刘艳芳; 李文斌; 高阳

doi:10.7544/issn1000-1239.2020.20200219

基于自适应邻域嵌入的无监督特征选择算法

¹(计算机软件新技术国家重点实验室(南京大学) 南京 210023)
²(龙岩学院数学与信息工程学院福建龙岩 364012) (liuyanfang003@163.com)

基金项目: 国家重点研发计划项目(2017YFB0702600, 2017YFB0702601)；国家自然科学基金项目(61806096)；福建省中青年教师教育科研项目(科技类)(JAT170577,JAT190743)；龙岩市科技计划项目(2019LYF13002)

详细信息

中图分类号: TP391
计量
- 文章访问数: 946
- HTML全文浏览量: 1
- PDF下载量: 481
出版历程
- 发布日期: 2020-07-31

Adaptive Neighborhood Embedding Based Unsupervised Feature Selection

¹(State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023)
²(College of Mathematics and Information Engineering, Longyan University, Longyan, Fujian 364012)

Funds: This work was supported by the National Key Research and Development Program of China (2017YFB0702600, 2017YFB0702601), the National Natural Science Foundation of China (61806096), the Education Scientific Research Project of Young Teachers of Fujian Province (JAT170577, JAT190743), and the Science and Technology Project of Longyan City (2019LYF13002).

摘要

摘要: 无监督特征选择算法可以对高维无标记数据进行有效的降维，从而减少数据处理的时间和空间复杂度，避免算法模型出现过拟合现象.然而，现有的无监督特征选择方法大都运用k近邻法捕捉数据样本的局部几何结构，忽略了数据分布不均的问题.为了解决这个问题，提出了一种基于自适应邻域嵌入的无监督特征选择(adaptive neighborhood embedding based unsupervised feature selection, ANEFS)算法,该算法根据数据集自身的分布特点确定每个样本的近邻数，进而构造样本相似矩阵，同时引入从高维空间映射到低维空间的中间矩阵，利用拉普拉斯乘子法优化目标函数进行求解.6个UCI数据集的实验结果表明：所提出的算法能够选出具有更高聚类精度和互信息的特征子集.
- k近邻 /
- 自适应邻域 /
- 流形学习 /
- 特征选择 /
- 无监督学习
Abstract: Unsupervised feature selection algorithms can effectively reduce the dimensionality of high-dimensional unmarked data, which not only reduce the time and space complexity of data processing, but also avoid the over-fitting phenomenon of the feature selection model. However, most of the existing unsupervised feature selection algorithms use k-nearest neighbor method to capture the local geometric structure of data samples, ignoring the problem of uneven data distribution. To solve this problem, an unsupervised feature selection algorithm based on adaptive neighborhood embedding (ANEFS) is proposed. The algorithm determines the number of neighbors of samples according to the distribution of datasets, and then constructs similarity matrix. Meanwhile, a mid-matrix is introduced which maps from high-dimensional space to low-dimensional space, and Laplacian multiplier method is used to optimize the reconstructed function. The experimental results of six UCI datasets show that the proposed algorithm can select representative feature subsets which have higher clustering accuracy and normalize mutual information.
- k-nearest neighbor /
- adaptive neighborhood /
- manifold learning /
- feature selection /
- unsupervised learning

HTML全文

参考文献(0)

施引文献(31)

期刊类型引用(10)

1.	汪廷华，胡振威，占宏祥. 一种新颖的无监督特征选择方法. 山东大学学报(理学版). 2024(12): 130-140 . 百度学术
2.	杨鹏飞，陈梅，张忠帅，陈永旭. 自适应邻居和图正则的表示学习. 小型微型计算机系统. 2023(03): 553-559 . 百度学术
3.	崔峻玮，翟亚红. 近邻成分分析下的DDoS攻击检测. 湖北汽车工业学院学报. 2023(02): 36-41 . 百度学术
4.	朱建勇，李兆祥，徐彬，杨辉，聂飞平. 基于图嵌入的正交局部保持投影无监督特征选择. 计算机科学. 2023(S2): 552-560 . 百度学术
5.	樊星男，刘晓娟. 一种适用于轴承故障诊断的改进Mixup数据增强方法. 工程机械. 2022(04): 38-45+9 . 百度学术
6.	杨秀璋，宋籍文，武帅，廖文婧，任天舒，刘建义. 一种融合Bert预训练和BiLSTM的场景迁移情感分析研究. 计算机时代. 2022(08): 69-74+79 . 百度学术
7.	江兵兵，何文达，吴兴宇，项俊浩，洪立斌，盛伟国. 基于自适应图学习的半监督特征选择. 电子学报. 2022(07): 1643-1652 . 百度学术
8.	周长顺，徐久成，瞿康林，申凯丽，章磊. 一种基于改进邻域粗糙集中属性重要度的快速属性约简方法. 西北大学学报(自然科学版). 2022(05): 745-752 . 百度学术
9.	张巍，张圳彬. 联合图嵌入与特征加权的无监督特征选择. 广东工业大学学报. 2021(05): 16-23 . 百度学术
10.	彭明，张继炎，王慧玲，黄宏昆，刘艳芳. 基于自适应邻域和自表示正则的无监督特征选择算法. 南京理工大学学报. 2021(04): 439-446 . 百度学术