ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (11): 2555-2567.doi: 10.7544/issn1000-1239.2015.20148151

• 人工智能 • 上一篇    下一篇

CPU-GPU异构计算环境下的并行T近邻谱聚类算法

张帅,李涛,焦晓帆,王艺峰,杨愚鲁   

  1. (南开大学计算机与控制工程学院计算机科学与信息安全系 天津 300071) (zhangshuai@mail.nankai.edu.cn)
  • 出版日期: 2015-11-01
  • 基金资助: 
    基金项目:国家自然科学青年基金项目(61212005,61201424);天津市科技特派员项目(14JCTPJC00501);高等学校博士学科点专项科研基金项目(20130031120029)

Parallel TNN Spectral Clustering Algorithm in CPU-GPU Heterogeneous Computing Environment

Zhang Shuai, Li Tao, Jiao Xiaofan, Wang Yifeng, Yang Yulu   

  1. (Department of Computer Science and Information Security, College of Computer and Control Engineering, Nankai University, Tianjin 300071)
  • Online: 2015-11-01

摘要: 谱聚类是数据挖掘领域最常用的聚类算法之一,但对于如何利用多核CPU与资源有限的众核加速器设计并实现一个在异构单节点上能够处理大规模数据集的高效谱聚类算法,目前尚无理想的解决方案.PSCH(parallel spectral clustering for hybrids)算法是专为CPU-GPU异构计算环境设计的并行T近邻(T-nearest-neighbors, TNN)谱聚类算法,通过分块计算相似性矩阵打破了GPU设备内存的限制,所能处理的数据集规模仅受限于CPU主存的容量.PSCH算法中使用CUDA设计实现双缓冲轮转4段流水机制,通过重叠计算与传输在打破存储瓶颈的同时保证了高计算性能.PSCH算法采用隐式重启动Lanczos方法(implicitly restarted Lanczos method, IRIM)在异构硬件上计算稀疏特征矩阵的特征分解,减轻了特征分解步骤的计算瓶颈.PSCH算法在配有一块GTX 480 GPU的单节点上能够对百万以上规模的数据集进行聚类,并对实验中的4个数据集取得了相对于使用16进程的MPI并行谱聚类PSC算法2.0~4.5倍的性能.

关键词: 谱聚类, T近邻, CPU-GPU异构计算, 计算统一设备架构, OpenMP

Abstract: Spectral clustering is one of the most popular clustering algorithms in the data mining field. However, this algorithm suffers from the storage and computational bottlenecks heavily when dealing with large-scale datasets. Current work focuses on improving the spectral clustering on both algorithm and implementation levels. But how to design an efficient spectral clustering algorithm, which can handle million scale datasets on a single node with multicore CPU and manycore accelerators, is still an unsolved problem. A parallel spectral clustering using T-nearest-neighbors (TNN) and its implementation for CPU-GPU heterogeneous computing environment, named parallel spectral clustering for hybrids (PSCH), is proposed in this paper. It breaks the GPU device memory limitation by partitioning the TNN similarity matrix into blocks, so the dataset scale only subjects to the size of the host memory. In PSCH, the 4-stage pipeline mechanism with dual rotating buffers is designed to compute the TNN similarity matrix using CUDA, which keeps all the CPU, GPU, and PCIe bus busy to achieve high performance gains while breaking the device memory limitation. The implicitly restarted Lanczos method (IRIM) on GPU is employed for the eigen-decomposition of the sparse TNN similarity matrix, alleviating the computational bottleneck of the eigensolver. The results show that PSCH is highly-efficient at exploring the GPU memory bandwidth and hybrid CPU-GPU computation power. PSCH is able to cluster million scale datasets on a single node equipped with one GTX 480 GPU and achieve 2.0~4.5 times performance gains compared with the MPI parallel spectral clustering implementation PSC using 16 processes for 4 datasets.

Key words: spectral clustering, T-nearest-neighbors (TNN), CPU-GPU heterogeneous computing, CUDA, OpenMP

中图分类号: