基于Stacking算法的组合分类器及其应用于中文组块分析

Combined Multiple Classifiers Based on a Stacking Algorithm and Their Application to Chinese Text Chunking

摘要: 与基于Voting方法的组合分类器相比，提出基于Stacking算法的多分类器组合方法，通过构造一个两层的叠加式框架结构，将4种分类器(fnTBL,SNoW,SVM,MBL)进行了组合,并融合各种可能的上下文信息作为各层分类器的输入特征向量，在中文组块识别中取得了较好的效果. 实验结果表明，组合后的分类器无论在准确率还是召回率上都有所提高，在哈尔滨工业大学树库语料的测试下达到了F=93.64的结果.

Abstract: Comparing with the combined multiple classifiers based on a voting algorithm, a two-layer classifier-combination experimental framework is presented for Chinese text chunking, in which four diverse classifiers (transformation-based learning , sparse network of winnow, support vector machine, and memory based learning) a re combined with a stacking algorithm. The relevant information is incorporated into the two-layer framework as input feature vectors to construct more complete contextual models. The chunking experiments are carried out on the HIT Chinese Treebank Corpus. Experimental results show that it is an effective approach, whi ch can achieve an F score of 93.64.