Advanced Search
    Guan Renchu, Pei Zhili, Shi Xiaohu, Yang Chen, Liang Yanchun. Weight Affinity Propagation and Its Application to Text Clustering[J]. Journal of Computer Research and Development, 2010, 47(10): 1733-1740.
    Citation: Guan Renchu, Pei Zhili, Shi Xiaohu, Yang Chen, Liang Yanchun. Weight Affinity Propagation and Its Application to Text Clustering[J]. Journal of Computer Research and Development, 2010, 47(10): 1733-1740.

    Weight Affinity Propagation and Its Application to Text Clustering

    • Affinity propagation (AP) is a newly developed and effective clustering algorithm. For its simplicity, general applicability, and good performance, AP has been used in many data mining research fields. In AP implementations, the similarity measurement plays an important role. Conventionally, text mining is based on the whole vector space model (VSM) and its similarity measurements often fall into Euclidean space. By clustering texts in this way, the advantage is simple and easy to perform. However, when the data scale puffs up, the vector space will become high-dimensional and sparse. Then, the computational complexity grows exponentially. To overcome this difficulty, a non-Euclidean space similarity measurement is proposed based on the definitions of similar feature set (SFS), rejective feature set (RFS) and arbitral feature set (AFS). The new similarity measurement not only breaks out the Euclidean space constraint, but also contains the structural information of documents. Therefore, a novel clustering algorithm, named weight affinity propagation (WAP), is developed by combining the new similarity measurement and AP. In addition, as a benchmark dataset, Reuters-21578 is used to test the proposed algorithm. Experimental results show that the proposed method is superior to the classical k-means, traditional SOFM and affinity propagation with classic similarity measurement.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return