高级检索

    基于分级神经网络的Web文档模糊聚类技术

    A Fuzzy Clustering Technology Based on Hierarchical Neural Networks for Web Document

    • 摘要: 给出了一种多层向量空间模型,该模型将一篇文档的相关信息从逻辑上划分为多个相对独立的文本段,按照不同位置的文本段确定相应的索引项权重.然后提出了一种简明而有效的基于分级神经网络的模糊聚类算法.与现有方法不同,该模糊聚类方法采用自组织神经网络和模糊聚类网络两部分组成的3层神经网络来实现.首先采用自组织神经网络从原始数据产生一个初始聚类结果,然后运用FCM方法对初始聚类的数目进行优化.实验结果表明,提出的Web文档聚类算法具有较好的聚类特性,它能将与一个主题相关的Web文档较完全和准确地聚成一类.

       

      Abstract: A multilayer vector space model is proposed in this paper. The model partitions a document into many text paragraphs, and the text weight is defined according to the text paragraphs' position. A simple and effective fuzzy clustering approach is presented. A three-layer hierarchical clustering neural network is developed to cluster the Web documents into some predefined categories or topics. The fuzzy clustering approach differs from existing clustering-based methods. First, a fuzzy competitive neural network is exploited as a data pre-processor to extract a number of subclusters which can be viewed as an initial fuzzy clustering from Web documents. Secondly, based on the initial fuzzy clustering, a fuzzy C-means (FCM) clustering algorithm is used to decide the optimal number of fuzzy clustering. The experimental results show that the Web documents focusing on a subject are rather completely and exactly clustering together.

       

    /

    返回文章
    返回