ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (1): 1-19.doi: 10.7544/issn1000-1239.2017.20151076

• 综述 •    下一篇

蛋白质中残基远程相互作用预测算法研究综述

张海仓1,2,高玉娟3,邓明华3,4,5,郑伟谋6,卜东波1   

  1. 1(中国科学院计算技术研究所 北京 100190); 2(中国科学院大学 北京 100049); 3(北京大学定量生物学中心 北京 100871); 4(北京大学数学科学学院 北京 100871); 5(北京大学统计科学中心 北京 100871); 6(中国科学院理论物理研究所 北京 100190) (zhanghaicang@ict.ac.cn)
  • 出版日期: 2017-01-01
  • 基金资助: 
    国家“九七三”重点基础研究发展计划基金项目(2012CB316502,2015CB910303);国家自然科学基金项目(11175224,11121403,31270834,61272318,31171262,31428012,31471246);中国科学院理论物理研究所理论物理国家重点实验室开放工程项目(Y4KF171CJ1) This work was supported by the National Basic Research Program of China (973 Program) (2012CB316502, 2015CB910303), the National Natural Science Foundation of China (11175224, 11121403, 31270834, 61272318, 31171262, 31428012, 31471246), and the Open Project Program of State Key Laboratory of the Institute of Theoretical Physics, Chinese Academy of Sciences (Y4KF171CJ1).

A Survey on Algorithms for Protein Contact Prediction

Zhang Haicang1,2, Gao Yujuan3, Deng Minghua3,4,5, Zheng Weimou6, Bu Dongbo1   

  1. 1(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100049); 3(Centre for Quantitative Biology, Peking University, Beijing 100871); 4(School of Mathematical Sciences, Peking University, Beijing 100871); 5(Center for Statistical Sciences, Peking University, Beijing 100871); 6(Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2017-01-01

摘要: 蛋白质是由多个氨基酸残基顺序连接而成的长链.在天然状态下,蛋白质并不是无规则的自由状态,而是自发形成特定的空间结构,以执行其特定的生物学功能.驱动蛋白质形成特定空间结构的主要因素是残基间的非共价相互作用,包括疏水作用、静电相互作用、范德华力等.因此,对残基之间远程相互作用的准确预测将有助于对蛋白质空间结构的预测,进而有助于对蛋白质生物学功能的了解.在蛋白质进化过程,有相互作用残基对之间存在一种“共进化”模式,即当一个残基发生变异时,与其有相互作用的残基也要发生相应的变异,以维持相互作用,进而维持整体空间结构以及生物学功能.基于上述生物学观察,研究者开发了多个统计模型和算法以预测残基对之间的相互作用:1)概述残基之间远程相互作用的两大类基本预测算法,包括无监督学习方法和监督学习方法;2)使用蛋白质结构预测CASP比赛结果来客观比较上述各类算法的性能,分析各个算法的特点和优势;3)从生物学观察和统计模型2个角度分析总结了未来的发展趋势.

关键词: 残基远程相互作用预测, 蛋白质三级结构预测, 图模型, 共进化, 机器学习

Abstract: Proteins are large molecules consisting of a linear sequence of amino acids. In the natural environment, a protein spontaneously folds into specific tertiary structure to perform its biological functionality. The main factors that drive proteins to fold are interactions between residues, including hydrophobic interaction, Van der Waals’ force and electrostatic interactions. The interactions between residues usually lead to residue-residue contacts, and the prediction of residue-residue contacts should greatly facilitate understanding of protein structures and functionalities. A great variety of techniques have been proposed for residue-residue contacts prediction, including machine learning, statistical models, and linear programing. It should be pointed out that most of these techniques are based on the biological insight of co-evolution, i.e., during the evolutionary history of proteins, a residue’s mutation usually leads its contacting partner to mutate accordingly. In this review, we summarize the state-of-art algorithms in this field with emphasis on the construction of statistical models based on biological insights. We also present the evaluation of these algorithms using CASP (critical assessment of techniques for protein structure prediction) targets as well as popular benchmark datasets, and describe the trends in the field of protein contact prediction.

Key words: protein contact prediction, protein tertiary structure prediction, graphical model, co-evolution, machine learning

中图分类号: