高级检索

    基于多群体公平模型的特征选择算法

    Feature Selection Algorithm Based on the Multi-Colony Fairness Model

    • 摘要: 随着当今世界逐渐从信息化转型为数据化,模式识别和数据挖掘等领域面临越来越大的挑战.爆炸式增大的数据量使得特征选择过程成为大数据模式识别等领域必不可少的环节.受动物界资源争夺行为启发,在由特征选择模型转变为资源分配问题模型中加入个体的资源争夺行为,提出多群体公平算法(multi-colony fairness algorithm, MCFA)对该行为进行评判和处理,用以取得更优的分配方案(即更优特征子集),其有机融合随机搜索和启发式搜索,且将filter方法和wrapper方法相结合,降低计算量的同时获得更高的分类准确率.对提出的多群体公平算法进行了分析,从理论上证明了算法的收敛性和有效性;UCI机器学习数据库数据集与4种经典特征选择算法:顺序前向搜索算法(sequential forward selection, SFS)、顺序后向搜索算法(sequential backward selection, SBS)、顺序前向浮动搜索算法(sequential floating forward selection, SFFS)、顺序后向浮动搜索算法(sequential floating backward selection, SBFS)和3种主流特征选择算法:相关性-冗余度特征选择算法(relevance-redundancy feature selection, RRFS)、最大相关最小冗余算法(minimal-redundancy-maximal-relevance, mRMR)、ReliefF算法的对比实验表明,提出的多群体公平算法能够有效选择规模和性能都比较好的特征子集.

       

      Abstract: As the world gradually transforms from the information world to the data-driven world, the areas of pattern recognition and date mining are facing more and more challenges. Feature subset selection process becomes a necessary part of big-data pattern recognition due to the data with explosive growth. Inspired by the behavior of grabbing resources in animals, the paper adds personal grabbing-resource behavior into the model of resource distribution transformed from the model of feature selection and proposes multi-colony fairness algorithm(MCFA) to deal with this behavior in order to obtain a better distribution scheme (i.e. to obtain a better feature subset). The algorithm effectively fuses the strategies of the random search and the heuristic search. In addition, it combines the methods of filter and wrapper so as to reduce the amount of calculation while improving the classification accuracy. The convergence and the effectiveness of the proposed algorithm are verified both from mathematical and experimental aspects. MCFA is compared with the other four classic feature selection algorithms SFS(sequential forward selection), SBS(sequential backward selection), SFFS(sequential floating forward selection), SBFS(sequential floating backward selection) and three mainstream feature selection algorithms RRFS(relevance-redundancy feature selection), mRMR(minimal-redundancy-maximal-relevance), ReliefF. The comparison results show that the proposed algorithm can obtain better feature subsets both in the aspects of feature subset length and the classification accuracy which indicates the efficiency and the effectiveness of the proposed algorithm.

       

    /

    返回文章
    返回