ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (8): 1674-1682.doi: 10.7544/issn1000-1239.2018.20180361

所属专题: 2018数据挖掘前沿进展专题

• 人工智能 • 上一篇    下一篇



  1. 1(南京邮电大学地理与生物信息学院 南京 210023);2(南京邮电大学通信与信息工程学院 南京 210003) (
  • 出版日期: 2018-08-01
  • 基金资助: 
    国家自然科学基金项目(81771478,61571233);江苏省高等学校自然科学研究项目(17KJA510003);南京邮电大学科研基金项目(NY218092) This work was supported by the National Natural Science Foundation of China (81771478, 61571233), the Key University Science Research Project of Jiangsu Province (17KJA510003), and the Natural Science Foundation of Nanjing University of Posts and Telecommunications (NY218092).

Predicting Biological Functions of G Protein-Coupled Receptors Based on Fast Multi-Instance Multi-Label Learning

Wu Jiansheng1, Feng Qiaoyu2, Yuan Jingzhou1, Hu Haifeng2, Zhou Jiate1,Gao Hao1   

  1. 1(School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210023);2(School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003)
  • Online: 2018-08-01

摘要: G蛋白偶联受体(G protein-coupled receptors, GPCRs)是人类中最庞大的膜蛋白家族,也是很多药物的重要靶点,准确了解GPCRs生物学功能是理解它们参与的生物学过程及其药物作用机制的关键.以前的研究表明,蛋白质功能预测可抽象为多示例多标记学习(multi-instance multi-label learning, MIML)问题.设计了一种基于快速多示例多标记学习方法MIMLfast的GPCRs生物学功能预测模型.该模型采用了一种新的混合特征,它考虑了GPCRs结构域的三联氨基酸、氨基酸关联、进化、二级结构关联、信号肽及无序残基等多种信息.实验结果证明,该模型获得了很好的性能,优于目前最优的多示例多标记学习、多标记学习的预测方法和CAFA蛋白质功能预测方法.

关键词: G蛋白偶联受体, 生物学功能预测, 快速多示例多标记学习, 结构域, 混合特征

Abstract: G protein-coupled receptors (GPCRs) constitute the largest family among human membrane proteins which are the important targets of many drugs on the market. An accurate annotation of the biological functions of GPCR proteins is key to understand their involved biological processes and drug-acting mechanisms. In our previous work, we found that protein function prediction problem can be formulated as a multi-instance multi-label learning (MIML) task. In this paper, we propose a novel method for predicting biological functions of G protein-coupled receptors by using a fast MIML learning called MIMLfast along with a hybrid feature. The hybrid feature consists of amino acid triple information, amino acid correlation information, evolutionary information, secondary structure correlation information, signal peptide information, disordered residue information, physical and chemical properties among GPCR domains. The experimental results show that our method achieves good performance which is superior to state-of-the-art multi-instance multi-label learning methods, multi-label learning methods and CAFA protein function prediction methods.

Key words: G protein-coupled receptors (GPCRs), predicting biological functions, fast multi-instance multi-label learning, domains, hybrid feature