高级检索

    文本分类中结合评估函数的TEF-WA权值调整技术

    A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization

    • 摘要: 文本自动分类面临的难题之一是如何从高维的特征空间中选取对文本分类有效的特征,以适应文本分类算法并提高分类精度.针对这一问题,在分析比较特征选择和权值调整对文本分类精度和效率的影响后,提出了一种结合评估函数的TEF-WA权重调整技术,设计了一种新的权重函数,将特征评估函数蕴含到权值函数,按照特征对文本分类的辨别能力调整其在分类器中的贡献.实验结果证明了TEF-WA权值调整技术在提高分类精度和降低算法的时间复杂度方面都是有效的.

       

      Abstract: Text categorization (TC) is an important research direction in Text Mining. It aims to assign one or more predefined category label(s) for a text document, and provides efficient methods for documents management and information searching. A major problem in automatic text categorization is how to select the best feature subset from the original high feature space in order to make the categorization algorithm work efficiently and improve the precision. In this paper, the methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. Furthermore, the TEF-WA (term evaluation function-weight adjustment) is introduced. We introduce a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term's strength. To evaluate the TEF-WA method, experiments are carried by using several different scale training document collection, various term evaluation functions such as document frequency, information gain, expected cross entropy, CHI, the weight of evidence for text, term frequency formula or document frequency formula. The experiment results have proved that the TEF-WA technique is efficient in promoting the classification precision and reducing the compute complexity.

       

    /

    返回文章
    返回