ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (8): 1753-1765.doi: 10.7544/issn1000-1239.2016.20160196

所属专题: 2016数据挖掘前沿技术专题

• 人工智能 • 上一篇    下一篇

基于正负样例的蛋白质功能预测

傅广垣1,余国先1,王峻1,郭茂祖2   

  1. 1(西南大学计算机与信息科学学院 重庆 400715); 2(哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001) (gxyu@swu.edu.cn)
  • 出版日期: 2016-08-01
  • 基金资助: 
    国家自然科学基金项目(61402378,61571163,61532014);重庆市基础与前沿研究项目(cstc2014jcyjA40031,cstc2016jcyjA0351);重庆市研究生科研创新项目(CYS16070);中央高校基本科研业务费基金项目(2362015XK07,XDJK2016B009,XDJK2016D021)

Protein Function Prediction Using Positive and Negative Examples

Fu Guangyuan1, Yu Guoxian1, Wang Jun1,Guo Maozu2   

  1. 1(College of Computer and Information Science, Southwest University, Chongqing 400715);2(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)
  • Online: 2016-08-01

摘要: 蛋白质功能预测是后基因组时代生物信息学的核心问题之一.蛋白质功能标记数据库通常仅提供蛋白质具有某个功能(正样例)的信息,极少提供蛋白质不具有某个功能(负样例)的信息.当前的蛋白质功能预测方法通常仅利用蛋白质正样例,极少关注量少但富含信息的蛋白质负样例.为此,提出一种基于正负样例的蛋白质功能预测方法(protein function prediction using positive and negative examples, ProPN).ProPN首先通过构造一个有向符号混合图描述已知的蛋白质与功能标记的正负关联信息、蛋白质之间的互作信息和功能标记间的关联关系,再通过符号混合图上的标签传播算法预测蛋白质功能.在酵母菌、老鼠和人类蛋白质数据集上的实验表明,ProPN不仅在预测已知部分功能标记蛋白质的负样例任务上优于现有算法,在预测功能标记完全未知蛋白质的功能任务上也获得了较其他相关方法更高的精度.

关键词: 蛋白质功能预测, 正样例, 负样例, 符号混合图, 标签传播

Abstract: Predicting protein function is one of the key challenges in the post genome era. Functional annotation databases of proteins mainly provide the knowledge of positive examples that proteins carrying out a given function, and rarely record the knowledge of negative examples that proteins not carrying out a given function. Current computational models almost only focus on utilizing the positive examples for function prediction and seldom pay attention to these scarce but informative negative examples. It is well recognized that both positive and negative examples should be used to achieve a discriminative predictor. Motivated by this recognition, in this paper, we propose a protein function prediction approach using positive and negative examples (ProPN) to bridge this gap. ProPN first utilizes a direct signed hybrid graph to describe the positive examples, negative examples, interactions between proteins and correlations between functions; and then it employs label propagation on the graph to predict protein function. The experimental results on several public available proteomic datasets demonstrate that ProPN not only makes better performance in predicting negative examples of proteins whose functional annotations are partially known than state-of-the-art algorithms, but also performs better than other related approaches in predicting functions of proteins whose functional annotations are completely unknown.

Key words: protein function prediction, positive examples, negative examples, signed hybrid graph, label propagation

中图分类号: