Abstract:
With the development of natural language processing (NLP) technology, the need for automatic named entity recognition (NER) is highlighted in order to enhance the performance of information extraction systems. The task of NER, which plays a vital role in NLP, is to tag each named entity (NE) in documents with a set of certain NE types. In this paper, a hybrid pattern for Chinese NER based on maximum entropy model is proposed, which fuses multiple features. It differentiates from most of the previous approaches mainly in the following aspects. Firstly, maximum entropy model is an outstanding statistical model for its good integration of various constraints and its compatibility to Chinese NER. Secondly, local features and global features are integrated in the hybrid model to get high performance. Thirdly, in order to reduce the searching space and improve the processing efficiency, heuristic human knowledge is introduced into the statistical model, which could increase the recognition performance significantly. From the experimental results on testing set for NER evaluation task in SIGHAN 2008, it can be concluded that the established hybrid model is an effective pattern to combine statistical model and heuristic human knowledge. And the experiments on another different testing set also confirm the above conclusion, which show that this algorithm has consistence on different testing data sources.