Abstract:
Feature selection is an essential step to perform cancer classification with DNA microarrays, for there are a large number of genes from which to predict classes and a relatively small number of samples. This work addresses the problem of selection of a small subset of genes for classification from broad patterns of gene expression profiles by proposing a two-step feature selection method. The first step uses a new metric proposed in this paper as the criteria for class separability to remove the genes irrelevant to the classification task, and then a support vector machine with radial basis function kernel is applied to validate the classification performance of the genes selected for distinguishing different tissue types. The second step filters out the redundant genes by the sensitivity analysis based on the support vector machine classifier after pair-wise redundancy analysis. The two steps are applied to the gene expression profiles of human acute leukemia, and a better and more compact gene subset is obtained in contrast with the baseline method, which shows the feasibility and effectiveness of the method proposed.