Abstract:
Part of speech tagging is one of the basic research for natural language processing fields, which plays an important role on the syntactic analysis, semantic analysis and machine translation, etc. Maximum entropy model is an outstanding statistical model for its good integration of various constraints and it has been favored in the part of speech tagging research. An approach combining linguistic morphological features for Mongolian part of speech tagging based on maximum entropy model is proposed in this paper. Mongolian has great and long history. Nonetheless, there is less research about Mongolian language processing. Mongolian is a typical agglutinative language that is characterized by rich morphology, with a high level of ambiguity. Firstly, based on the analysis of Mongolian scripts, the context feature and internal feature templates are defined and extracted from the training corpus. Then, various morphological features of words are integrated in the maximum entropy model and the IIS algorithm is employed to calculate the parameters of maximum entropy model. Experimental results on the close and open testing set prepared for Mongolian POS tagging task show that the integration of morphological features of the maximum entropy model outperforms the HMM model and can be fitful for Mongolian scripts.