高级检索

    基于改进属性约简的细粒度并行AP聚类算法

    An AP Clustering Algorithm of Fine-Grain Parallelism Based on Improved Attribute Reduction

    • 摘要: Affinity Propagation(AP)聚类算法将所有数据点作为潜在的聚类中心,在相似度矩阵的基础上通过消息传递进行聚类.与传统聚类方法相比,对于规模很大的数据集,AP是一种快速、有效的聚类方法.正是这样,属性约简对于AP算法非常重要.另外,在大规模并行系统的设计中,细粒度并行是实现高性能的基本策略.提出了一种基于改进属性约简的细粒度并行AP聚类算法(IRPAP),将粒度思想引入到并行计算中.首先分析了并行计算中的粒度原理.然后用改进的属性约简算法对数据集预处理.此算法并行计算并选择差别矩阵元素,降低了时间空间复杂度,最后用AP算法聚类.整个IRPAP算法将任务划分到多个线程同时处理.实验证明,对于大规模数据集的聚类,IRPAP算法比AP算法效率更高.

       

      Abstract: Affinity propagation (AP) clustering simultaneously considers all data points as potential exemplars. It takes similarity between pairs of data points as input measures, and clusters gradually during the message-passing procedure. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms. Therefore, attributes reduction is important for AP. Meanwhile, fine-grain parallelism is emphasized in the design of massively parallel computers to acquire higher performance. In this paper, an AP clustering algorithm based on improved attribute reduction and fine-grain parallelism (IRPAP) is proposed. Firstly, granularity is introduced into parallel computing and granularity principle is applied as well. Secondly, data set is preprocessed by the improved attribute reduction algorithm through which elements in discernibility matrix will be calculated and selected in parallel, in order to reduce the complexity of time and space. Finally, data set is clustered by means of a parallel AP algorithm. The whole task can be divided into multiple threads to be processed simultaneously. Experimental results show that the IRPAP algorithm is more efficient than the AP algorithm for large data set clustering.

       

    /

    返回文章
    返回