Abstract:
A multilayer vector space model is proposed in this paper. The model partitions a document into many text paragraphs, and the text weight is defined according to the text paragraphs' position. A simple and effective fuzzy clustering approach is presented. A three-layer hierarchical clustering neural network is developed to cluster the Web documents into some predefined categories or topics. The fuzzy clustering approach differs from existing clustering-based methods. First, a fuzzy competitive neural network is exploited as a data pre-processor to extract a number of subclusters which can be viewed as an initial fuzzy clustering from Web documents. Secondly, based on the initial fuzzy clustering, a fuzzy C-means (FCM) clustering algorithm is used to decide the optimal number of fuzzy clustering. The experimental results show that the Web documents focusing on a subject are rather completely and exactly clustering together.