Advanced Search
    Zhao Yan, Wang Xiaolong, Liu Bingquan, and Guan Yi. Fusion of Clustering Trigger-Pair Features for POS Tagging Based on Maximum Entropy Model[J]. Journal of Computer Research and Development, 2006, 43(2): 268-274.
    Citation: Zhao Yan, Wang Xiaolong, Liu Bingquan, and Guan Yi. Fusion of Clustering Trigger-Pair Features for POS Tagging Based on Maximum Entropy Model[J]. Journal of Computer Research and Development, 2006, 43(2): 268-274.

    Fusion of Clustering Trigger-Pair Features for POS Tagging Based on Maximum Entropy Model

    • Part-of-speech (POS) information is demanded before constructing more complex analysis. Traditional POS tagger is based on hidden Markov model (HMM), however the HMM can't include the long-distance lexical features which can help to predict the right POS. A kind of “W\-A→W\-B?T\-B” trigger-pair, which contains the long-distance lexical information, is proposed to solve this problem firstly, and then a better correlation measure—average mutual information (AMI) instead of mutual information (MI) is used to extract trigger pairs from the training corpus. To cope with the sparseness problem of trigger word “W\-A”, word clustering is made to build clustering trigger-pairs by semantic similarity calculation which is provided by the vector space model. Finally, the high-quality clustering trigger-pairs are added to the POS tagging system as a new kind of features under the maximum entropy frame-work. The experiment shows that tagging error of the new model is reduced by 34%, compared with the HMM. The idea of the paper can be applied to Pinyin-to-character conversion and word sense disambiguation problem too.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return