基于粗糙超立方的联邦进化特征选择算法

陈雪颖; 罗川; 李天瑞; 陈红梅

doi:10.7544/issn1000-1239.202440422

基于粗糙超立方的联邦进化特征选择算法

Federated Evolutionary Feature Selection Algorithm Based on Rough Hypercuboid

摘要

摘要: 特征选择是机器学习领域中数据降维的有效手段. 在大数据时代，数据安全成为了当今社会中备受关注的问题，如何在隐私保护的前提下完成特征选择任务是亟需解决的一个挑战性科学问题. 粗糙超立方是一种结合粗糙集理论和超立方学习的不确定性近似计算模型，通过引入有监督的信息粒化技术和多重特征评估准则，为数值型近似分类问题提供了一种高效的特征选择方法. 将粗糙集超立方模型和粒子群优化算法相结合，提出了一种新颖的隐私保护下多方参与的联邦特征选择算法. 首先，该算法建立了一种适用于多方参与的集中式（客户端/服务器）联邦特征选择架构. 在客户端上利用粗糙集超立方模型和粒子群优化算法搜索本地最优特征子集，同时在服务器端给出了一种适应多参与方的全局特征子集评估策略. 然后，通过设计联邦环境下的粒子初始化策略提高了算法在多参与方下协同特征选择能力. 最后，在12组UCI基准数据集上的实验结果表明，相比于其他6种传统特征选择算法，在满足各参与方数据隐私保护的前提下，算法所选择出的特征子集在各参与方上具有更好的分类性能表现.

Abstract: Feature selection is an effective technique of dimensionality reduction in the field of machine learning. In the era of big data, data security has become an issue of great concern nowadays, and how to perform the feature selection task under the premise of privacy protection is a challenging scientific problem that needs to be solved urgently. Rough hypercuboid is an uncertainty approximation computational model combining rough set theory and hypercuboid learning, which provides an efficient feature selection method for numerical approximate classification problems by introducing supervised information granulation technique and multiple feature evaluation criteria. In this paper, we propose a novel multi-party federated feature selection algorithm under privacy protection, based on the rough hypercuboid model and particle swarm optimization algorithm. Firstly, a centralized (client/server) federated feature selection architecture for multi-party participation is established. Based on the architecture, the rough hypercuboid model and the particle swarm optimization algorithm are used to search the optimal feature subset on the client, and a novel global feature subset evaluation strategy for multiple participants is proposed on the server. Then, the ability of the proposed algorithm to select features in collaboration with multiple participants is improved by designing a particle initialization strategy in a federated environment. Finally, experimental results on the twelve UCI benchmark datasets show that compared with the other six traditional feature selection algorithms, the subset of features selected by the proposed algorithm has higher classification performance on each participant under the premise of satisfying the data privacy protection.

HTML全文

参考文献(35)

施引文献

资源附件(0)