Advanced Search
    Shang Wenqian, Huang Houkuan, Liu Yuling, Lin Yongmin, Qu Youli, and Dong Hongbin. Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization[J]. Journal of Computer Research and Development, 2006, 43(10): 1688-1694.
    Citation: Shang Wenqian, Huang Houkuan, Liu Yuling, Lin Yongmin, Qu Youli, and Dong Hongbin. Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization[J]. Journal of Computer Research and Development, 2006, 43(10): 1688-1694.

    Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization

    • With the rapid development of World Wide Web, large numbers of documents are available on the Internet. Automatic text categorization becomes more and more important for dealing with massive data. Text categorization has become a key technology in organizing and processing large amount of text data. For most classifiers using vector space model (VSM), text preprocessing has become the bottleneck of categorization. High dimensionality of the feature space is impossible for many classifiers. So adopting appropriate text feature selection algorithms to reduce the dimensionality of the feature space is becoming the key role. At present, there are many text feature selection algorithms. In this paper, all these text feature selection methods are not discussed in detail. but another new text feature selection method—Gini index is presented. Improved Gini-index is used for text feature selection, constructing the measure function based on Gini-index. The experiment results show that the text feature selection based on Gini index can improve the categorization performance further, and that its complexity of computing is small.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return