基于混合编程模型的支持向量机训练并行化

李涛; 刘学臣; 张帅; 王恺; 杨愚鲁

doi:10.7544/issn1000-1239.2015.20131492

基于混合编程模型的支持向量机训练并行化

Parallel Support Vector Machine Training with Hybrid Programming Model

摘要

摘要: 支持向量机(support vector machine, SVM)是一种广泛应用于统计分类以及回归分析的监督学习方法.基于内点法(interior point method, IPM)的SVM训练具有空间占用小、迭代趋近快等优点，但随着训练数据集规模的增大，仍面临处理速度与存储空间所带来的双重挑战.针对此问题，提出利用CPU-GPU异构系统进行大规模SVM训练的混合并行机制.首先利用计算统一设备架构(compute unified device architecture, CUDA)对基于内点法的SVM训练算法的计算密集部分并行化，并改进算法使其适合利用cuBLAS线性代数库加以实现，提高训练速度;然后利用消息传递接口(message passing interface, MPI)在集群系统上实现CUDA加速后算法的分布并行化，利用分布存储有效地增加所处理数据集规模并减少训练时间；进而基于Fermi架构支持的页锁定内存技术，打破了GPU设备存储容量不足对数据集规模的限制.结果表明，利用消息传递接口(MPI)和CUDA混合编程模型以及页锁定内存数据存储策略，能够在CPU-GPU异构系统上实现大规模数据集的高效并行SVM训练，提升其在大数据处理领域的计算性能和应用能力.

Abstract: Support vector machine (SVM) is a supervised method that is widely used in statistical classification and regression analysis. The interior point method (IPM) based SVM training is prominent in the low memory space and the fast convergence. However, it is still confronted with the challenges of training speed and storage space with the increasing size of training dataset. In this paper, the hybrid parallel SVM training mechanism is proposed to alleviate these problems on the CPU-GPU heterogeneous system. Firstly, the computing intensive operation in IPM algorithm is implemented with compute unified device architecture (CUDA). Then the IPM based SVM training algorithm is modified and implemented using cuBLAS library to further improve the training speed. Secondly, the modified IPM based SVM training algorithm is implemented with message passing interface (MPI) and CUDA hybrid programming model on a four-node cluster system. The training time and memory requirement are both reduced at the same time. Finally, the limitation of GPU device memory is eliminated based on the page-locked host memory supported by Fermi architecture. The large datasets are trained efficiently with the size larger than what the GPU memory allows. The results show that the hybrid parallel SVM training mechanism achieves more than 4 times speedup with MPI and CUDA hybrid programming model, and breaks away the GPU device memory limitation with the page-locked host memory based data storage strategy for large-scale SVM training.

HTML全文

参考文献(0)

施引文献

资源附件(0)