Abstract:
Real-world text documents usually belong to multiple classes simultaneously, and therefore, using multi-label learning technique to classify text documents is an important research direction. Existing multi-label text categorization approaches usually require using a large amount of documents with correct class labels to achieve good performance. However, in real applications it is often the case that only a small number of labeled documents can be obtained as training samples because of human and material resources. As there are a large amount of unlabeled documents that can be readily obtained, exploiting the unlabeled documents automatically become a basic motivation of this work. Random walk is a popular technique used in semi-supervised learning as well as in transductive learning. In this paper, the authors propose a random walk based transductive multi-label text categorization approach, which is able to exploit abundant unlabeled documents to help improve classification performance. In the proposed approach, labels are spread from the labeled documents to the unlabeled documents. Thus, a small number of labeled documents and a large amount of unlabeled documents are utilized simultaneously in the process of learning. Experimental results show that compared with the existing semi-supervised multi-label method CNMF(constrained non-negative matrix factorization), the proposed approach has a better performance.