pepReap: 基于支持向量机的肽鉴定算法

王海鹏; 付  岩; 孙瑞祥; 贺思敏; 曾  嵘; 高  文

pepReap: 基于支持向量机的肽鉴定算法

pepReap: A Peptide Identification Algorithm Using Support Vector Machines

摘要

摘要: 利用生物质谱技术进行肽/蛋白质鉴定是蛋白质组学研究中的关键问题. 提出了一种基于支持向量机(SVM)的肽鉴定算法pepReap.算法由粗细两层打分体系构成，粗打分利用匹配谱峰总强度和数目及肽长度等信息得到候选肽序列的列表，细打分通过SVM算法综合利用多项匹配指标如离子相关性、离子匹配误差、肽序列信息等对粗打分结果进行评价，得到更为可靠的肽鉴定结果.在SVM的参数选择过程中，采用马修斯相关系数来评价分类性能以适应不平衡数据集的情况.在公开发表的数据集上的实验表明，该算法与采用阈值评价方法的流行商业软件SEQUEST相比，在鉴定精度相当的情况下可以获得更高的鉴定灵敏度.

Abstract: Protein identification plays an important role in proteomics. An algorithm for peptide identification using support vector machines (SVM), pepReap, which consists of two-layered scoring scheme, is designed and implemented. First, a list of peptide candidates is obtained by coarse scoring calculated from total intensity and number of matched peaks, and peptide length. Second, the above preliminary peptide candidates are evaluated by an SVM-based scoring scheme using other important factors, such as correlations between ions, average match error, peptide sequence information, to improve the reliability of peptide identifications. Matthews correlation coefficient is used to measure the classification performance in the SVM training process in order to accommodate to unbalanced datasets. Experiments on a public dataset of tandem mass spectra demonstrate that the pepSeap algorithm outperforms the popular software SEQUEST which uses threshold evaluation in terms of identification sensitivity with comparable precision.

HTML全文

参考文献(0)

施引文献

资源附件(0)