高级检索

    面向GPU的单颗粒冷冻电镜软件RELION并行与优化

    Optimization and Parallelization Single Particle Cryo-EM Software RELION with GPU

    • 摘要: 单颗粒冷冻电镜是结构生物学研究的重要手段之一,基于贝叶斯理论的冷冻电镜3维图像数据处理软件RELION(regularized likelihood optimization)具有很好的性能和易用性,受到广泛关注.然而其计算需求极大,限制了RELION的应用.针对RELION算法的特点,研究了基于GPU 的并行优化问题.首先全面分析了RELION的原理、RELION程序的算法结构及性能瓶颈;在此基础上,针对GPU细粒度体系结构对程序进行优化设计,提出了基于GPU的多级并型模型.为了获得良好的性能,对RELION的数据结构进行重组.为了避免GPU存储空间不足的问题,设计了自适应并行框架.实验结果表明:基于GPU的RELION实现可以获得良好的性能,相比于单CPU,整个应用的加速比超过36倍,计算密集型算法的加速比达到75倍以上.在多GPU上的测试结果表明基于GPU的RELION具有很好的可扩展性.

       

      Abstract: Single particle cryo-electron microscopy (cryo-EM) is one of the most important methods of macromolecular structure. RELION (regularized likelihood optimization) is an open-source computer program for the refinement of macromolecular structures by single-particle analysis of cryo-EM data. Due to its easy usage and high quality results, RELION has attracted a lot of attentions from researchers. However, the computation requirement of this program is too huge to solve some large molecular structures with CPU, which harpers the popularization of RELION. In this paper, we characterize the algorithm of RELION and parallelize it with GPU. Firstly, the mathematical theory, computer patterns and performance bottlenecks of RELION are analyzed comprehensively. Then, we optimize the program targeting at fine-grained many-core architecture processor, such as GPU. We propose an efficient multi-level parallel model to utilize the powerful computation capacity of many-core processor. In order to achieve high performance, we reconstruct the data structure for GPU continues memory access. To avoid the limitation of GPU memory size, we implement an adaptive framework. The experimental results show that the proposed GPU based algorithm can achieve good performance. When compared with the CPU implementation, the speedup ratio of the application is more than 36 times, while the speedup ratio of compute-intensive algorithm is about 75X. Moreover, the testing results on multi GPUs show that the GPU based implementation has good scalability.

       

    /

    返回文章
    返回