Abstract:
Identifying proteins and their post-translational modifications are critical to the success of proteomics. Recent advances in mass spectrometry (MS) instrumentation have made it possible to generate high-resolution mass spectra of intact proteins. The existing algorithms for identifying proteins from top-down MS data are able to achieve good performance with respect to protein-spectrum matching precision and prediction accuracy of PTM locations, but their efficiencies in terms of running time are still far from satisfactory. Graphics processing unit (GPU) can be applied to parallelize large-scale replication computations and reduce the running time of serial programs. Based on compute unified device architecture (CUDA), this paper proposes an algorithm called CUDA-TP for computing alignment scores between proteins and mass spectra. Firstly, CUDA-TP uses the optimized MS-Filter algorithm to quickly filter out proteins in the database that cannot possibly attain high score for a given mass spectrum, thus only a small number of candidate proteins are obtained. Then, an AVL tree is introduced into the algorithm to speed up the computation of protein-spectrum matching. Multi-thread technique on GPU is applied to get the previous diagonal points of all nodes in the spectra grid created from mass spectra and proteins as well as the final array. Meanwhile, this algorithm utilizes target-decoy approach to control false discovery rate (FDR) of proteins and mass spectral matching results. Experimental results demonstrate that CUDA-TP can significantly accelerate protein identification such that its running time is about 10 times and 2 times faster than that of MS-TopDown and MS-Align+. To our knowledge, there are still no existing methods in the literature that can perform protein identification from top-down spectra using CUDA architecture. The source codes of the algorithm are available at https://github.com/dqiong/CUDA-TP.