Abstract:
In this paper an approach to improving the performance of text categorization is presented by using machine learning technique and domain-dictionary. Domain-dictionary based text representation can enhance the ability of text feature expression and reduce the feature dimensionality. But the size of domain dictionary is limited; some words are not included in domain dictionary, so a machine learning technique named self-partition model is proposed to resolve it. The proposed model can automatically map the words to domain features. Then a text categorization system is developed that uses these learned domain features as text features. The experimental results show that the proposed approach can improve the performance of text categorization. And it can provide high accuracy when the size of feature set is small. When the number of features is 500, it yields 6.58%F1 over the system based on BOW.