Advanced Search
    Hu Yi, Lu Ruzhan, Li Xuening, Duan Jianyong, ChenYuquan. Research on Language Modeling Based Sentiment Classification of Text[J]. Journal of Computer Research and Development, 2007, 44(9): 1469-1475.
    Citation: Hu Yi, Lu Ruzhan, Li Xuening, Duan Jianyong, ChenYuquan. Research on Language Modeling Based Sentiment Classification of Text[J]. Journal of Computer Research and Development, 2007, 44(9): 1469-1475.

    Research on Language Modeling Based Sentiment Classification of Text

    • Presented in this paper is a language modeling approach to the sentiment classification of text. It provides the semantic information beyond topic in text summary when characterizing the semantic orientation of texts as “thumb up” or “thumb down”. The motivation is simple: “thumb up” and “thumb down” language models are likely to be substantially different: they prefer to different language habits. This divergence is exploited in the language models to effectively classify test documents. Therefore, the method can be deployed in two stages: firstly, the two sentiment language models are estimated from training data; secondly, tests are done through comparing the Kullback-Leibler divergence between the language model estimated from test document and those two trained sentiment models. The unigrams and bigrams of words are employed as the model parameters, and correspondingly maximum likelihood estimation and smoothing techniques are used to estimate these parameters. Compared with two different classifiers, i.e. SVMs and Nave Bayes, on movie review corpus when training data is limited, the language modeling approach performs better than SVMs and Nave Bayes classifier, and on the other hand it shows its robustness in sentiment classification. Future works may focus on finding a good way to estimate better language models, especially the higher order n-gram models and more powerful smoothing methods.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return