ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

适用于多种监督模型的特征选择方法研究

王博 黄九鸣 贾焰 杨树强   

  1. (国防科学技术大学计算机学院 长沙 410073) (wangbo.nudt@gmail.com)
  • 出版日期: 2010-09-15

Research on a Common Feature Selection Method for Multiple Supervised Models

Wang Bo, Huang Jiuming, Jia Yan, and Yang Shuqiang   

  1. (College of Computer, National University of Defense Technology, Changsha 410073)
  • Online: 2010-09-15

摘要: 特征选择是模式识别、机器学习、数据挖掘等领域的重要问题之一,近年来已成为研究热点,并涌现出大量的用于选择特征的算法.现有的特征选择算法大多仅面向某一特定领域,其适用范围有限.采用基于Hilbert-Schmidt相关性标准的核方法衡量特征子集与目标对象间的相关程度,提出了一个适用性更广的特征选择方法FSM_HSIC,能较好地统一有监督、半监督和无监督3种模型下的特征选择过程,而且可从核方法的角度对整个过程进行抽象地描述,并深入理解现有的一些算法.同时以该方法为基础针对交互特征选择问题设计了新颖的FSI算法.理论分析和大量真实与仿真实验结果表明,与若干特征选择算法相比较,提出的算法具有良好的效率和稳定性, FSM_HSIC方法对新算法的产生具有重要的指导意义.

关键词: 数据挖掘, 模式识别, 特征选择, 核函数方法, 交互特征, 稳定性

Abstract: Feature selection is one of the most important problems in pattern recognition, machine learning and data mining areas, as a basic pre-processing step of compressing data. Most of the current algorithms were proposed separately for some special domain, which limited their extension. Especially, different applications are often under different supervised models, such as supervised, semi-supervised and unsupervised model. A concrete feature selection algorithm is always designed for a given environment. When the setting is changed, the original algorithm, which was running fluently and efficiently, turns to be inefficient, or even useless. Hence a new algorithm should be explored in this condition.This paper presents a common feature selection method based on Hilbert-Schmidt Independence Criterion, evaluating the correlation between feature subset and target concept. Intrinsic properties of feature selection are exploited in this method, under multiple supervised models, like supervised, semi-supervised and unsupervised. And a uniform format is applied. Furthermore, some existing algorithms can be explained from the viewpoint of kernel-based methods, which brings a deeper understanding. And a novel algorithm is derived from this method. It can solve a challenging problem, known as interactive feature selection. The experimental results not only demonstrate the efficiency and stability of the algorithm, but also infer that the method can give a considerable guidance for the production of novel feature selection algorithms.

Key words: data mining, pattern recognition, feature selection, kernel-based method, interactive feature, stability