• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Jing Hongfang, Wang Bin, YangYahui, Xu Yan. Category Distribution-Based Feature Selection Framework[J]. Journal of Computer Research and Development, 2009, 46(9): 1586-1593.
Citation: Jing Hongfang, Wang Bin, YangYahui, Xu Yan. Category Distribution-Based Feature Selection Framework[J]. Journal of Computer Research and Development, 2009, 46(9): 1586-1593.

Category Distribution-Based Feature Selection Framework

More Information
  • Published Date: September 14, 2009
  • Text categorization is an important technique in data mining domain. Extremely high dimension of features makes text categorization processing complex and expensive, and thus effective dimension reduction methods are extraordinarily desired. Feature selection is widely used to reduce dimension. Many feature selection methods have been proposed in recent years. But to the authors’best knowledge, there is no method that performs very well on unbalanced datasets. This paper proposes a feature selection framework based on the category distribution difference of features named category distribution-based feature selection (CDFS). This approach selects features that have strong discriminative power using distribution information of features. At the same time, weights can be flexibly assigned to categories. If larger weights are assigned to rare categories, the performance on rare categories can be improved. So this framework is suitable for unbalanced data and highly extensible. Besides, OCFS and feature filter based on category distribution difference can be viewed as special cases of this framework. A number of implementations of CDFS are given. The experimental results on Reuters-21578 corpus and Fudan corpus (unbalanced datasets) show that both MacroF1 and MicroF1 by implementations of CDFS given in this paper are better than those by IG, CHI and OCFS.
  • Related Articles

    [1]Jiang Tao, Li Zhanhuai. A Survey on Local Pattern Mining in Gene Expression Data[J]. Journal of Computer Research and Development, 2018, 55(11): 2343-2360. DOI: 10.7544/issn1000-1239.2018.20170629
    [2]Wang Yuanzhuo, Jia Yantao, Liu Dawei, Jin Xiaolong, Cheng Xueqi. Open Web Knowledge Aided Information Search and Data Mining[J]. Journal of Computer Research and Development, 2015, 52(2): 456-474. DOI: 10.7544/issn1000-1239.2015.20131342
    [3]Ding Zhaoyun, Jia Yan, Zhou Bin. Survey of Data Mining for Microblogs[J]. Journal of Computer Research and Development, 2014, 51(4): 691-706.
    [4]Lei Xiangxin, Yang Zhiying, Huang Shaoyin, Hu Yunfa. Mining Frequent Subtree on Paging XML Data Stream[J]. Journal of Computer Research and Development, 2012, 49(9): 1926-1936.
    [5]Liao Guoqiong, Wu Lingqin, Wan Changxuan. Frequent Patterns Mining over Uncertain Data Streams Based on Probability Decay Window Model[J]. Journal of Computer Research and Development, 2012, 49(5): 1105-1115.
    [6]Zhu Ranwei, Wang Peng, and Liu Majin. Algorithm Based on Counting for Mining Frequent Items over Data Stream[J]. Journal of Computer Research and Development, 2011, 48(10): 1803-1811.
    [7]Hu Wenyu, Sun Zhihui, Wu Yingjie. Study of Sampling Methods on Data Mining and Stream Mining[J]. Journal of Computer Research and Development, 2011, 48(1): 45-54.
    [8]Yang Bei, Huang Houkuan. Mining Top-K Significant Itemsets in Landmark Windows over Data Streams[J]. Journal of Computer Research and Development, 2010, 47(3): 463-473.
    [9]Yang Bingru, Gao Jing, and Song Wei. Application Research of Cognitive Physics in Data Mining[J]. Journal of Computer Research and Development, 2006, 43(8): 1432-1438.
    [10]Wang Wei, Zhou Haofeng, Yuan Qingqing, Lou Yubo, and Sui Baile. Mining Frequent Patterns Based on Graph Theory[J]. Journal of Computer Research and Development, 2005, 42(2): 230-235.

Catalog

    Article views (924) PDF downloads (621) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return