A Feature Selection Method Based on Maximal Marginal Relevance

Liu He; Zhang Xianghong; Liu Dayou; Li Yanjun; Yin Lijun

Liu He, Zhang Xianghong, Liu Dayou, Li Yanjun, Yin Lijun. A Feature Selection Method Based on Maximal Marginal Relevance[J]. Journal of Computer Research and Development, 2012, 49(2): 354-360.

Citation:

Liu He, Zhang Xianghong, Liu Dayou, Li Yanjun, Yin Lijun. A Feature Selection Method Based on Maximal Marginal Relevance[J]. Journal of Computer Research and Development, 2012, 49(2): 354-360.

Citation:

Liu He, Zhang Xianghong, Liu Dayou, Li Yanjun, Yin Lijun. A Feature Selection Method Based on Maximal Marginal Relevance[J]. Journal of Computer Research and Development, 2012, 49(2): 354-360.

A Feature Selection Method Based on Maximal Marginal Relevance

Graphical Abstract

Abstract

Abstract

With the rapid growth of textual information on the Internet, text categorization has already been one of the key research directions in data mining. Text categorization is a supervised learning process, defined as automatically distributing free text into one or more predefined categories. At the present, text categorization is necessary for managing textual information and has been applied into many fields. However, text categorization has two characteristics: high dimensionality of feature space and high level of feature redundancy. For the two characteristics, χ\+2 is used to deal with high dimensionality of feature space, and information novelty is used to deal with high level of feature redundancy. According to the definition of maximal marginal relevance, a feature selection method based on maximal marginal relevance is proposed, which can reduce redundancy between features in the process of feature selection. Furthermore, the experiments are carried out on two text data sets, namely, Reuters-21578 Top10 and OHSCAL. The results indicate that the feature selection method based on maximal marginal relevance is more efficient than χ\+〗2 and information gain. Moveover it can improve the performance of three different categorizers, namely, nave Bayes, Rocchio and kNN.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

A Feature Selection Method Based on Maximal Marginal Relevance

Abstract

Catalog

Export File

Citation

Format

Content