Abstract:
According to the XML Web page character, an efficient method for computing XML document similarity, position weight and frequency of keywords in documents is presented. Then some features are selected from XML documents based on the method and a multi-classification algorithm of XML Web page is proposed using support vector machines. In this algorithm, a CFK(classifier feature kernel) of common similarity features is created from each sample set of XML documents class. The class label of an XML document is determined by computing similar distance between a test XML document and each CFK. Experimental results prove the effectiveness of the classification algorithm and good performance for multi-classification of XML documents.