Abstract:
Feature selection has been successfully applied to text categorization, but rarely applied to text clustering, because those effective supervised feature selection methods can't be applied to text clustering due to the unavailability of class label information. So a new feature selection method called “K-Means based feature selection (KFS)” method is proposed in this paper, which addresses the unavailability of label information by performing effective supervised feature selections on different K-Means clustering results. Experimental results show that ① KFS successfully selects out the best small part of features and significantly improves the clustering performance; and ② Compared with other feature selection methods, KFS is very close to the ideal supervised feature selection methods and much better than any unsupervised methods.