Abstract:
Support vector machine (SVM) is a supervised method that is widely used in statistical classification and regression analysis. The interior point method (IPM) based SVM training is prominent in the low memory space and the fast convergence. However, it is still confronted with the challenges of training speed and storage space with the increasing size of training dataset. In this paper, the hybrid parallel SVM training mechanism is proposed to alleviate these problems on the CPU-GPU heterogeneous system. Firstly, the computing intensive operation in IPM algorithm is implemented with compute unified device architecture (CUDA). Then the IPM based SVM training algorithm is modified and implemented using cuBLAS library to further improve the training speed. Secondly, the modified IPM based SVM training algorithm is implemented with message passing interface (MPI) and CUDA hybrid programming model on a four-node cluster system. The training time and memory requirement are both reduced at the same time. Finally, the limitation of GPU device memory is eliminated based on the page-locked host memory supported by Fermi architecture. The large datasets are trained efficiently with the size larger than what the GPU memory allows. The results show that the hybrid parallel SVM training mechanism achieves more than 4 times speedup with MPI and CUDA hybrid programming model, and breaks away the GPU device memory limitation with the page-locked host memory based data storage strategy for large-scale SVM training.