Abstract:
Spectral clustering is one of the most popular clustering algorithms in the data mining field. However, this algorithm suffers from the storage and computational bottlenecks heavily when dealing with large-scale datasets. Current work focuses on improving the spectral clustering on both algorithm and implementation levels. But how to design an efficient spectral clustering algorithm, which can handle million scale datasets on a single node with multicore CPU and manycore accelerators, is still an unsolved problem. A parallel spectral clustering using T-nearest-neighbors (TNN) and its implementation for CPU-GPU heterogeneous computing environment, named parallel spectral clustering for hybrids (PSCH), is proposed in this paper. It breaks the GPU device memory limitation by partitioning the TNN similarity matrix into blocks, so the dataset scale only subjects to the size of the host memory. In PSCH, the 4-stage pipeline mechanism with dual rotating buffers is designed to compute the TNN similarity matrix using CUDA, which keeps all the CPU, GPU, and PCIe bus busy to achieve high performance gains while breaking the device memory limitation. The implicitly restarted Lanczos method (IRIM) on GPU is employed for the eigen-decomposition of the sparse TNN similarity matrix, alleviating the computational bottleneck of the eigensolver. The results show that PSCH is highly-efficient at exploring the GPU memory bandwidth and hybrid CPU-GPU computation power. PSCH is able to cluster million scale datasets on a single node equipped with one GTX 480 GPU and achieve 2.0~4.5 times performance gains compared with the MPI parallel spectral clustering implementation PSC using 16 processes for 4 datasets.