Abstract:
The tumor diagnosis method based on microarray data will be developed into a fast and effective molecular-level diagnosis method applied in clinic in the near future. However, it is a challenging task for traditional classification approaches due to the characteristics of high dimensionality and small samples for microarray data. Therefore, ensemble classification algorithms with better performance have attracted more researchers. A novel ensemble classification algorithm for microarray data based on correlation analysis is proposed in this paper to solve the problems of low classification accuracy and excessive computation for current ensemble classification algorithms. The proposed algorithm may extract some training subsets which have the most difference between each other by computing their correlation. Therefore, the proposed algorithm could effectively improve diversity among base classifiers. Support vector machine is selected as base classifier in this paper and the experiment results on leukemia dataset and colon tumor dataset show the effectiveness and feasibility of the proposed algorithm. Meanwhile, the performances of the proposed algorithm based on different parameters are tested and the results are helpful for selecting appropriate parameters.