Using Maximum Entropy Model for Chinese Text Categorization

Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa

Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa. Using Maximum Entropy Model for Chinese Text Categorization[J]. Journal of Computer Research and Development, 2005, 42(1): 94-101.

Citation:

Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa. Using Maximum Entropy Model for Chinese Text Categorization[J]. Journal of Computer Research and Development, 2005, 42(1): 94-101.

Citation:

Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa. Using Maximum Entropy Model for Chinese Text Categorization[J]. Journal of Computer Research and Development, 2005, 42(1): 94-101.

Using Maximum Entropy Model for Chinese Text Categorization

Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa

Graphical Abstract

Graphical Abstract

Abstract

Abstract

With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data. Maximum entropy model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. However, relatively little work has been done on applying maximum entropy model to text categorization problems. In addition, no previous work has focused on using maximum entropy model in classifying Chinese documents. Maximum entropy model is used for text categorization. Its categorization performance is compared and analyzed using different approaches for text feature generation, different number of feature and smoothng technique. Moreover, in experiments it is compared to Bayes, KNN and SVM, and it is shown that its performance is higher than Bayes and comparable with KNN and SVM. It is a promising technique for text categorization.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Using Maximum Entropy Model for Chinese Text Categorization

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content