An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy

Zhu Yan, Jing Liping, and Yu Jian

Zhu Yan, Jing Liping, and Yu Jian. An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy[J]. Journal of Computer Research and Development, 2012, 49(6): 1306-1312.

Citation:

Zhu Yan, Jing Liping, and Yu Jian. An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy[J]. Journal of Computer Research and Development, 2012, 49(6): 1306-1312.

Citation:

Zhu Yan, Jing Liping, and Yu Jian. An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy[J]. Journal of Computer Research and Development, 2012, 49(6): 1306-1312.

An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy

Zhu Yan, Jing Liping, and Yu Jian

Graphical Abstract

Abstract

Abstract

As it is quite time-consuming to label text documents on a large scale, a kind of text classification with a few labeled data is needed. Thus, semi-supervised text classification emerges and develops rapidly. Different from traditional classification, semi-supervised text classification only requires a small set of labeled data and a large set of unlabeled data to train a classifier. The small set of labeled data is used to initialize the classification model in most cases. Its rationality will affect the performance of the final classifier. In order to make the distribution of the labeled data more consistent with the distribution of the original data, a sampling method is proposed to avoid selecting the K nearest neighbors of the labeled data to be new candidate labeled data. With the help of this method, the data located in various regions will have more opportunities to be labeled. Moreover, in order to obtain more category information from the very few labeled data, this method compares the information entropy of the candidate labeled data and the datum with the highest information entropy is chosen as the next datum to be labeled manually. Experiments on real text data sets suggest that this approach is very effective.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy

Abstract

Catalog

Export File

Citation

Format

Content