Abstract:
Word sense disambiguation has long been a central concern for natural language processing, and ensemble of classifiers is one of the four current directions in machine learning study. This paper makes a systematic study on the ensembles of classifiers for Chinese word sense disambiguation. Nine kinds of combining strategies are experimented in this paper: product, average, max, min, majority voting, rank-based voting, weighted voting, weighted probability, and best single combining, among which the three combining methods of product, average and max have not been applied in word sense disambiguation in previous works. Support vector machine, nave Bayes, and decision tree are selected as the three component classifiers. Four kinds of features are used in all of the three classifiers: bag of words, words with position, parts of speech with position and 2-gram collocations. Experiments are conducted in two different datasets: the first dataset is 18 ambiguous words selected from Chinese semantic corpus, and the second dataset is the multilingual Chinese-English lexical sample task at SemEval-2007. The experimental results illustrate that the three kinds of combining strategies of average, product and max, which are applied for the first time in Chinese word sense disambiguation in this paper, exceed the accuracy of best single classifier support vector machine, and also outperform the other six kinds of combining methods.