Loading [MathJax]/jax/output/SVG/jax.js
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liu He, Liu Dayou, Pei Zhili, Gao Ying. A Feature Weighting Scheme for Text Categorization Based on Feature Importance[J]. Journal of Computer Research and Development, 2009, 46(10): 1693-1703.
Citation: Liu He, Liu Dayou, Pei Zhili, Gao Ying. A Feature Weighting Scheme for Text Categorization Based on Feature Importance[J]. Journal of Computer Research and Development, 2009, 46(10): 1693-1703.

A Feature Weighting Scheme for Text Categorization Based on Feature Importance

More Information
  • Published Date: October 14, 2009
  • Text categorization is one of the key research fields in text mining. Feature weighting is an important problem in text categorization. For computing feature weights, a feature weighting scheme for text categorization is proposed. In this scheme, the feature importance is defined based on the real rough set theory. By this concept, decision-making information of a feature for categorization is introduced into the weight of this feature. Then, the experiments are performed on two international and standard text datasets, namely, Reuters-21578 Top10 and WebKB. Through the computation of the total within-class scatter and between-class scatter in Fisher linear discriminant, it is verified that the proposed scheme can decrease the total within-class scatter and increase the between-class scatter; that is to say, the scheme can make samples in the same class more compact and those in different classes looser for the two datasets. Thereby, the proposed scheme can improve the space distribution of samples and simplify the mapping relation from samples to classes. Finally, the proposed scheme is evaluated on the two datasets by Nave Bayes, kNN and SVM classifiers. The experimental results show that the scheme can enhance the precision, recall and the value of F\-1 for categorization.
  • Related Articles

    [1]Hu Jun, Chen Yan, Zhang Qinghua, Wang Guoyin. Optimal Scale Selection for Generalized Multi-Scale Set-Valued Decision Systems[J]. Journal of Computer Research and Development, 2022, 59(9): 2027-2038. DOI: 10.7544/issn1000-1239.20210196
    [2]Wang Nian, Peng Zhenghong, Cui Li. EasiFFRA: A Fast Feature Reduction Algorithm Based on Neighborhood Rough Set[J]. Journal of Computer Research and Development, 2019, 56(12): 2578-2588. DOI: 10.7544/issn1000-1239.2019.20180541
    [3]Xie Qin, Zhang Qinghua, Wang Guoyin. An Adaptive Three-way Spam Filter with Similarity Measure[J]. Journal of Computer Research and Development, 2019, 56(11): 2410-2423. DOI: 10.7544/issn1000-1239.2019.20180793
    [4]Wu Weizhi, Yang Li, Tan Anhui, Xu Youhong. Granularity Selections in Generalized Incomplete Multi-Granular Labeled Decision Systems[J]. Journal of Computer Research and Development, 2018, 55(6): 1263-1272. DOI: 10.7544/issn1000-1239.2018.20170233
    [5]Yao Sheng, Xu Feng, Zhao Peng, Ji Xia. Intuitionistic Fuzzy Entropy Feature Selection Algorithm Based on Adaptive Neighborhood Space Rough Set Model[J]. Journal of Computer Research and Development, 2018, 55(4): 802-814. DOI: 10.7544/issn1000-1239.2018.20160919
    [6]Fu Zhiyao, Gao Ling, Sun Qian, Li Yang, Gao Ni. Evaluation of Vulnerability Severity Based on Rough Sets and Attributes Reduction[J]. Journal of Computer Research and Development, 2016, 53(5): 1009-1017. DOI: 10.7544/issn1000-1239.2016.20150065
    [7]Duan Jie, Hu Qinghua, Zhang Lingjun, Qian Yuhua, Li Deyu. Feature Selection for Multi-Label Classification Based on Neighborhood Rough Sets[J]. Journal of Computer Research and Development, 2015, 52(1): 56-65. DOI: 10.7544/issn1000-1239.2015.20140544
    [8]Hu Xiaojian, Yang Shanlin, Hu Xiaoxuan, Fang Fang. Optimal Decomposition of Decision Table Systems Based on Bayesian Networks[J]. Journal of Computer Research and Development, 2007, 44(4): 667-673.
    [9]Wei Lai, Miao Duoqian, Xu Feifei, and Xia Fuchun. Research on a Covering Rough Fuzzy Set Model[J]. Journal of Computer Research and Development, 2006, 43(10): 1719-1723.
    [10]Yi Gaoxiang and Hu Heping. A Web Search Result Clustering Based on Tolerance Rough Set[J]. Journal of Computer Research and Development, 2006, 43(2): 275-280.

Catalog

    Article views (903) PDF downloads (657) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return