Parallel TNN Spectral Clustering Algorithm in CPU-GPU Heterogeneous Computing Environment

Zhang Shuai; Li Tao; Jiao Xiaofan; Wang Yifeng; Yang Yulu

doi:10.7544/issn1000-1239.2015.20148151

Zhang Shuai, Li Tao, Jiao Xiaofan, Wang Yifeng, Yang Yulu. Parallel TNN Spectral Clustering Algorithm in CPU-GPU Heterogeneous Computing Environment[J]. Journal of Computer Research and Development, 2015, 52(11): 2555-2567. DOI: 10.7544/issn1000-1239.2015.20148151

Citation:

Parallel TNN Spectral Clustering Algorithm in CPU-GPU Heterogeneous Computing Environment

Graphical Abstract

Abstract

Abstract

Spectral clustering is one of the most popular clustering algorithms in the data mining field. However, this algorithm suffers from the storage and computational bottlenecks heavily when dealing with large-scale datasets. Current work focuses on improving the spectral clustering on both algorithm and implementation levels. But how to design an efficient spectral clustering algorithm, which can handle million scale datasets on a single node with multicore CPU and manycore accelerators, is still an unsolved problem. A parallel spectral clustering using T-nearest-neighbors (TNN) and its implementation for CPU-GPU heterogeneous computing environment, named parallel spectral clustering for hybrids (PSCH), is proposed in this paper. It breaks the GPU device memory limitation by partitioning the TNN similarity matrix into blocks, so the dataset scale only subjects to the size of the host memory. In PSCH, the 4-stage pipeline mechanism with dual rotating buffers is designed to compute the TNN similarity matrix using CUDA, which keeps all the CPU, GPU, and PCIe bus busy to achieve high performance gains while breaking the device memory limitation. The implicitly restarted Lanczos method (IRIM) on GPU is employed for the eigen-decomposition of the sparse TNN similarity matrix, alleviating the computational bottleneck of the eigensolver. The results show that PSCH is highly-efficient at exploring the GPU memory bandwidth and hybrid CPU-GPU computation power. PSCH is able to cluster million scale datasets on a single node equipped with one GTX 480 GPU and achieve 2.0~4.5 times performance gains compared with the MPI parallel spectral clustering implementation PSC using 16 processes for 4 datasets.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

Parallel TNN Spectral Clustering Algorithm in CPU-GPU Heterogeneous Computing Environment

Abstract

Catalog

Export File

Citation

Format

Content