Abstract:
As a very active branch of natural language processing, open-domain question answering (QA) system has been attached increasing attention to, for it can understand the question in natural language, and thus provide its users with compact and exact results. Question classification (QC), i.e., putting the questions into several semantic categories, is very important for a question answering system and directly affects the performance of the QA system in selecting correct answers. Its main task is to understand the demand of users. In this paper, to investigate automatic question classification, different classification features, such as Bag-of-words, Bi-gram, synset from Wordnet and dependency structure from Minipar, are compared. Support vector machine (SVM) and such machine learning ensemble approaches as transformation-based error-driven learning (TBL), vote and back propagation artificial neural network (BP) are experimented on. Compared with single-feature SVM, multi-feature SVM classifiers and BP, vote ensemble learning means, and the question classification algorithm are presented in this paper. The method, by using combined multiple SVM-classifiers based on a TBL algorithm and with linguistic knowledge like synset from Wordnet and dependency structure from Minipar as question representations, is proved to be more accurate in open question classification corpus. And using dependency structure, a 1.8% improvement over the no use of it is achieved.