基于直觉模糊深度自动编码器的特征选择算法

华启冲; 丁卫平; 陈悦鹏; 周天奕

doi:10.7544/issn1000-1239.202550502

基于直觉模糊深度自动编码器的特征选择算法

Intuitionistic Fuzzy Deep Autoencoder Algorithm for Feature Selection

摘要

摘要: 高维数据集在多个领域的应用愈发广泛，其维度高、冗余多的特征易引发模型过拟合并增加计算开销。特征选择有助于从中提取有效特征，降低维度并提升模型的可解释性。尽管深度神经网络在特征提取方面表现出潜力，但在面对噪声和离群值时，其性能易受影响。为了解决这些挑战，提出一种新的基于直觉模糊深度自动编码器的特征选择算法（IFDAE）。该算法首先通过融合隶属度和非隶属度信息，应用直觉模糊权重来处理不确定性，从而抑制噪声和离群值的影响；然后，充分学习深度自动编码器中高度非线性的潜在表示，利用权重迁移将预训练知识迁入目标网络，通过结构化稀疏范数对输入层与首个隐藏层之间的连接权重矩阵施加正则项来获得结构稀疏权值矩阵，从而获得各特征的权值；最后，为了验证IFDAE算法的有效性，在12个公共数据集和1个真实世界的精神分裂症数据集上进行实验。结果表明，所提算法在重建能力和分类性能方面具有较高的优越性。

Abstract: With the continuous development of big data technology, high-dimensional datasets have been increasingly widely applied in various fields. These datasets typically feature high dimensionality and excessive redundancy, which tend to induce model overfitting and increase computational costs. Feature selection helps extracting an informative subset from them, reducing dimensionality and enhancing model interpretability. Although deep neural networks have shown potential in feature extraction, their performance is vulnerable to noise and outliers. To address these challenges, a novel feature selection algorithm based on the intuitionistic fuzzy deep autoencoder (IFDAE) is proposed. First, the algorithm fuses membership and non-membership degree information and applies intuitionistic fuzzy weights to handle uncertainty, thereby suppressing the impact of noise and outliers. Then, it fully learns the highly non-linear latent representations in the deep autoencoder, uses weight transfer to migrate pre-trained knowledge into the target network, and imposes a regularizer on the connection weight matrix between the input layer and the first hidden layer via the structured sparse norm to obtain a structurally sparse weight matrix, thus deriving the weight of each feature. Finally, to verify the effectiveness of the proposed IFDAE algorithm, experiments are conducted on twelve public datasets and one real-world schizophrenia dataset. The results demonstrate that the proposed algorithm exhibits significant superiority in both reconstruction capability and classification performance.

HTML全文

参考文献(54)

施引文献

资源附件(0)