ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (7): 1525-1538.doi: 10.7544/issn1000-1239.2018.20170080

• 信息处理 • 上一篇    下一篇

CUDA-TP:基于GPU的自顶向下完整蛋白质鉴定并行算法

段琼1,田博1,陈征1,王洁1,2,何增有1,2   

  1. 1(大连理工大学软件学院 辽宁大连 116620); 2(辽宁省泛在网络与服务软件重点实验室(大连理工大学) 辽宁大连 116620) (wangjie1003@163.com)
  • 出版日期: 2018-07-01
  • 基金资助: 
    国家自然科学基金项目(61572094);中央高校基本科研业务费专项资金(DUT14QY07)

CUDA-TP: A GPU-Based Parallel Algorithm for Top-Down Intact Protein Identification

Duan Qiong1, Tian Bo1, Chen Zheng1, Wang Jie1,2, He Zengyou1,2   

  1. 1(School of Software, Dalian University of Technology, Dalian, Liaoning 116620); 2(Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province (Dalian University of Technology), Dalian, Liaoning 116620)
  • Online: 2018-07-01

摘要: 蛋白质及蛋白质翻译后修饰(post-translational modifications, PTMs)的鉴定是蛋白质组学研究的基础,对整个领域的进一步发展有着十分重要的意义.近年来,质谱设备的快速发展使得获取“自顶向下”(top-down,TD)的高精度完整蛋白质质谱数据成为可能.目前基于TD质谱数据的完整蛋白质鉴定算法虽然在匹配精度、PTM位点的推断上取得了一些成效,但它们运行时间还有很大的不足和提升空间.利用图形处理器(graphics processing unit, GPU)可以将大规模的重复计算并行化,提高串行程序的执行速度.CUDA-TP算法基于通用并行计算架构(compute unified device architecture, CUDA)来计算蛋白质与TD质谱数据的匹配分数.首先,对每一个质谱数据,CUDA-TP利用优化的MS-Filter算法在蛋白质数据库中过滤出其对应的少数候选蛋白质集合,然后通过AVL(adelson-velskii and landis)树加速质谱匹配过程.GPU中的多线程技术被用来并行化谱图网格及最终数组中所有元素的前驱结点的求解.同时,该算法还使用target-decoy策略来控制蛋白质与质谱图匹配结果的错误发现率(false discovery rate, FDR).实验结果表明:CUDA-TP算法能够有效地加速完整蛋白质的鉴定,速度分别比MS-TopDown和MS-Align+快10倍与2倍.到目前为止,这是唯一能够利用CUDA架构来加速完整蛋白质鉴定的研究工作.CUDA-TP源代码公布在https://github.com/dqiong/CUDA-TP.

关键词: “自顶向下”蛋白质组学, 蛋白质鉴定, 图形处理器, 通用并行计算架构, 谱图比对

Abstract: Identifying proteins and their post-translational modifications are critical to the success of proteomics. Recent advances in mass spectrometry (MS) instrumentation have made it possible to generate high-resolution mass spectra of intact proteins. The existing algorithms for identifying proteins from top-down MS data are able to achieve good performance with respect to protein-spectrum matching precision and prediction accuracy of PTM locations, but their efficiencies in terms of running time are still far from satisfactory. Graphics processing unit (GPU) can be applied to parallelize large-scale replication computations and reduce the running time of serial programs. Based on compute unified device architecture (CUDA), this paper proposes an algorithm called CUDA-TP for computing alignment scores between proteins and mass spectra. Firstly, CUDA-TP uses the optimized MS-Filter algorithm to quickly filter out proteins in the database that cannot possibly attain high score for a given mass spectrum, thus only a small number of candidate proteins are obtained. Then, an AVL tree is introduced into the algorithm to speed up the computation of protein-spectrum matching. Multi-thread technique on GPU is applied to get the previous diagonal points of all nodes in the spectra grid created from mass spectra and proteins as well as the final array. Meanwhile, this algorithm utilizes target-decoy approach to control false discovery rate (FDR) of proteins and mass spectral matching results. Experimental results demonstrate that CUDA-TP can significantly accelerate protein identification such that its running time is about 10 times and 2 times faster than that of MS-TopDown and MS-Align+. To our knowledge, there are still no existing methods in the literature that can perform protein identification from top-down spectra using CUDA architecture. The source codes of the algorithm are available at https://github.com/dqiong/CUDA-TP.

Key words: top-down proteomics, protein identification, graphics processing unit(GPU), compute unified device architecture(CUDA), spectral alignment

中图分类号: