ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (8): 1742-1756.doi: 10.7544/issn1000-1239.2015.20150245

所属专题: 2015面向大数据的人工智能技术

• 人工智能 • 上一篇    下一篇

基于多群体公平模型的特征选择算法

杨昙,冯翔,虞慧群   

  1. (华东理工大学信息科学与工程学院 上海 200237)(xfeng@ecust.edu.cn)
  • 出版日期: 2015-08-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(60905043,61073107,61173048,61272198)

Feature Selection Algorithm Based on the Multi-Colony Fairness Model

Yang Tan,Feng Xiang,Yu Huiqun   

  1. (School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237)
  • Online: 2015-08-01

摘要: 随着当今世界逐渐从信息化转型为数据化,模式识别和数据挖掘等领域面临越来越大的挑战.爆炸式增大的数据量使得特征选择过程成为大数据模式识别等领域必不可少的环节.受动物界资源争夺行为启发,在由特征选择模型转变为资源分配问题模型中加入个体的资源争夺行为,提出多群体公平算法(multi-colony fairness algorithm, MCFA)对该行为进行评判和处理,用以取得更优的分配方案(即更优特征子集),其有机融合随机搜索和启发式搜索,且将filter方法和wrapper方法相结合,降低计算量的同时获得更高的分类准确率.对提出的多群体公平算法进行了分析,从理论上证明了算法的收敛性和有效性;UCI机器学习数据库数据集与4种经典特征选择算法:顺序前向搜索算法(sequential forward selection, SFS)、顺序后向搜索算法(sequential backward selection, SBS)、顺序前向浮动搜索算法(sequential floating forward selection, SFFS)、顺序后向浮动搜索算法(sequential floating backward selection, SBFS)和3种主流特征选择算法:相关性-冗余度特征选择算法(relevance-redundancy feature selection, RRFS)、最大相关最小冗余算法(minimal-redundancy-maximal-relevance, mRMR)、ReliefF算法的对比实验表明,提出的多群体公平算法能够有效选择规模和性能都比较好的特征子集.

关键词: 特征选择, 多群体公平算法, 资源分配, 争夺资源行为, 群内竞争

Abstract: As the world gradually transforms from the information world to the data-driven world, the areas of pattern recognition and date mining are facing more and more challenges. Feature subset selection process becomes a necessary part of big-data pattern recognition due to the data with explosive growth. Inspired by the behavior of grabbing resources in animals, the paper adds personal grabbing-resource behavior into the model of resource distribution transformed from the model of feature selection and proposes multi-colony fairness algorithm(MCFA) to deal with this behavior in order to obtain a better distribution scheme (i.e. to obtain a better feature subset). The algorithm effectively fuses the strategies of the random search and the heuristic search. In addition, it combines the methods of filter and wrapper so as to reduce the amount of calculation while improving the classification accuracy. The convergence and the effectiveness of the proposed algorithm are verified both from mathematical and experimental aspects. MCFA is compared with the other four classic feature selection algorithms SFS(sequential forward selection), SBS(sequential backward selection), SFFS(sequential floating forward selection), SBFS(sequential floating backward selection) and three mainstream feature selection algorithms RRFS(relevance-redundancy feature selection), mRMR(minimal-redundancy-maximal-relevance), ReliefF. The comparison results show that the proposed algorithm can obtain better feature subsets both in the aspects of feature subset length and the classification accuracy which indicates the efficiency and the effectiveness of the proposed algorithm.

Key words: feature selection, multi-colony fairness algorithm, resource distribution, grabbing-resource behavior, colony competition

中图分类号: