• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Shang Wenqian, Huang Houkuan, Liu Yuling, Lin Yongmin, Qu Youli, and Dong Hongbin. Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization[J]. Journal of Computer Research and Development, 2006, 43(10): 1688-1694.
Citation: Shang Wenqian, Huang Houkuan, Liu Yuling, Lin Yongmin, Qu Youli, and Dong Hongbin. Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization[J]. Journal of Computer Research and Development, 2006, 43(10): 1688-1694.

Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization

More Information
  • Published Date: October 14, 2006
  • With the rapid development of World Wide Web, large numbers of documents are available on the Internet. Automatic text categorization becomes more and more important for dealing with massive data. Text categorization has become a key technology in organizing and processing large amount of text data. For most classifiers using vector space model (VSM), text preprocessing has become the bottleneck of categorization. High dimensionality of the feature space is impossible for many classifiers. So adopting appropriate text feature selection algorithms to reduce the dimensionality of the feature space is becoming the key role. At present, there are many text feature selection algorithms. In this paper, all these text feature selection methods are not discussed in detail. but another new text feature selection method—Gini index is presented. Improved Gini-index is used for text feature selection, constructing the measure function based on Gini-index. The experiment results show that the text feature selection based on Gini index can improve the categorization performance further, and that its complexity of computing is small.
  • Related Articles

    [1]Liu He, Liu Dayou, Pei Zhili, Gao Ying. A Feature Weighting Scheme for Text Categorization Based on Feature Importance[J]. Journal of Computer Research and Development, 2009, 46(10): 1693-1703.
    [2]Qiu Jiangtao, Tang Changjie, Zeng Tao, Liu Yintian. Strategy of Revising Rules for Association Text Classification[J]. Journal of Computer Research and Development, 2009, 46(4): 683-688.
    [3]Hao Xiulan, Tao Xiaopeng, Xu Hexiang, Hu Yunfa. A Strategy to Class Imbalance Problem for kNN Text Classifier[J]. Journal of Computer Research and Development, 2009, 46(1): 52-61.
    [4]Xu Yan, Li Jintao, Wang Bin, Sun Chunming, Zhang Sen. A Study on Constraints for Feature Selection in Text Categorization[J]. Journal of Computer Research and Development, 2008, 45(4): 596-602.
    [5]Jiang Yuan and Zhou Zhihua. A Text Classification Method Based on Term Frequency Classifier Ensemble[J]. Journal of Computer Research and Development, 2006, 43(10): 1681-1687.
    [6]Chen Wenliang, Zhu Jingbo, Zhu Muhua, and Yao Tianshun. Text Representation Using Domain Dictionary[J]. Journal of Computer Research and Development, 2005, 42(12): 2155-2160.
    [7]Yao Liqun, Tao Qing. Journal Text Categorization with the Combination of Local Linearity and One-Class[J]. Journal of Computer Research and Development, 2005, 42(11): 1862-1869.
    [8]Liu Tao, Wu Gongyi, Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering[J]. Journal of Computer Research and Development, 2005, 42(3).
    [9]Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa. Using Maximum Entropy Model for Chinese Text Categorization[J]. Journal of Computer Research and Development, 2005, 42(1): 94-101.
    [10]Tang Huanling, Sun Jiantao, Lu Yuchang. A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization[J]. Journal of Computer Research and Development, 2005, 42(1): 47-53.

Catalog

    Article views (922) PDF downloads (972) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return