Advanced Search
    Wang Jianhui, Wang Hongwei, Shen Zhan, Hu Yunfa. A Simple and Efficient Algorithm to Classify a Large Scale of Texts[J]. Journal of Computer Research and Development, 2005, 42(1): 85-93.
    Citation: Wang Jianhui, Wang Hongwei, Shen Zhan, Hu Yunfa. A Simple and Efficient Algorithm to Classify a Large Scale of Texts[J]. Journal of Computer Research and Development, 2005, 42(1): 85-93.

    A Simple and Efficient Algorithm to Classify a Large Scale of Texts

    • Most of classifying methods are based on VSM (vector space model) in the research on classification at present, of which the widely-used method is kNN (k-nearest neighbors). But most of them are highly complicated on computation, and cannot be used on the occasion of classifying a large number of specimen. Moreover, to them, the classifier must be rebuilt when to increment the corpora of the training specimen. So they have tough scalability. Two new concepts, MD (mutual dependence) and ER (equivalent radius), are put forward in this paper. Furthermore, a new classifying method, SECTILE, is offered. SECTILE can be used to classify a large number of specimen and has good scalability. Later, SECTILE is applied to classify Chinese documents and compared to kNN and CCC method. As a result, SECTILE outperforms kNN and CCC method, and can be used online to classify a large number of specimen while the precision and recall of classification are kept.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return