• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhu Yan, Jing Liping, and Yu Jian. An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy[J]. Journal of Computer Research and Development, 2012, 49(6): 1306-1312.
Citation: Zhu Yan, Jing Liping, and Yu Jian. An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy[J]. Journal of Computer Research and Development, 2012, 49(6): 1306-1312.

An Active Labeling Method for Text Data Based on Nearest Neighbor and Information Entropy

More Information
  • Published Date: June 14, 2012
  • As it is quite time-consuming to label text documents on a large scale, a kind of text classification with a few labeled data is needed. Thus, semi-supervised text classification emerges and develops rapidly. Different from traditional classification, semi-supervised text classification only requires a small set of labeled data and a large set of unlabeled data to train a classifier. The small set of labeled data is used to initialize the classification model in most cases. Its rationality will affect the performance of the final classifier. In order to make the distribution of the labeled data more consistent with the distribution of the original data, a sampling method is proposed to avoid selecting the K nearest neighbors of the labeled data to be new candidate labeled data. With the help of this method, the data located in various regions will have more opportunities to be labeled. Moreover, in order to obtain more category information from the very few labeled data, this method compares the information entropy of the candidate labeled data and the datum with the highest information entropy is chosen as the next datum to be labeled manually. Experiments on real text data sets suggest that this approach is very effective.
  • Related Articles

    [1]Duan Zhuohui, Liu Haikun, Zhao Jinwei, Liu Yihang, Liao Xiaofei, Jin Hai. A Reconfigurable Cache Consistency Mechanism for Distributed Memory Pool[J]. Journal of Computer Research and Development, 2023, 60(9): 1960-1972. DOI: 10.7544/issn1000-1239.202330448
    [2]Wei Zheng, Dou Yu, Gao Yanzhen, Ma Jie, Sun Ninghui, Xing Jing. A Consistent Hash Data Placement Algorithm Based on Stripe[J]. Journal of Computer Research and Development, 2021, 58(4): 888-903. DOI: 10.7544/issn1000-1239.2021.20190732
    [3]Tian Junfeng, Wang Yanbiao. Causal-Pdh: Causal Consistency Model for NoSQL Distributed Data Storage Using HashGraph[J]. Journal of Computer Research and Development, 2020, 57(12): 2703-2716. DOI: 10.7544/issn1000-1239.2020.20190686
    [4]Wang Jieting, Qian Yuhua, Li Feijiang, Liu Guoqing. Support Vector Machine with Eliminating the Random Consistency[J]. Journal of Computer Research and Development, 2020, 57(8): 1581-1593. DOI: 10.7544/issn1000-1239.2020.20200127
    [5]Chen Bo, Lu Youyou, Cai Tao, Chen Youmin, Tu Yaofeng, Shu Jiwu. A Consistency Mechanism for Distributed Persistent Memory File System[J]. Journal of Computer Research and Development, 2020, 57(3): 660-667. DOI: 10.7544/issn1000-1239.2020.20190074
    [6]Sun Xuejiao and Liu Jinglei. On the Satisfiability and Consistency for CP-nets[J]. Journal of Computer Research and Development, 2012, 49(4): 754-762.
    [7]Lu Yan, Hao Zhongxiao, Zhang Liang. An Algorithm for Checking Absolute Consistency of DTDs[J]. Journal of Computer Research and Development, 2005, 42(11): 1977-1982.
    [8]Xiong Jin, Fan Zhihua, Ma Jie, Tang Rongfeng, Li Hui, Meng Dan. Metadata Consistency in DCFS2[J]. Journal of Computer Research and Development, 2005, 42(6): 1019-1027.
    [9]Zhang Zhongping, Wang Chao, Zhu Yangyong. Constraint-Based Normalization Algorithms for XML Documents[J]. Journal of Computer Research and Development, 2005, 42(5): 755-764.
    [10]Lu Yan, Zhang Liang, Duan Qiyang, Shi Baile. DTD-Based XML Indexing[J]. Journal of Computer Research and Development, 2005, 42(1): 30-37.

Catalog

    Article views (778) PDF downloads (572) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return