ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (5): 1020-1033.doi: 10.7544/issn1000-1239.2019.20180274

• 人工智能 • 上一篇    下一篇

基于布尔矩阵分解的蛋白质功能预测框架

刘琳1,唐麟2,唐明靖3,周维4   

  1. 1(云南师范大学信息学院 昆明 650500); 2(民族教育信息化教育部重点实验室(云南师范大学) 昆明 650500); 3(云南师范大学校长办公室 昆明 650500); 4(云南大学国家示范性软件学院 昆明 650091) (liulinrachel@163.com)
  • 出版日期: 2019-05-01
  • 基金资助: 
    国家自然科学基金项目(61862067,61762089);云南师范大学博士启动项目(2016zb009);云南大学数据驱动的软件工程省科技创新团队项目(2017HC012)

The Framework of Protein Function Prediction Based on Boolean Matrix Decomposition

Liu Lin1, Tang Lin2, Tang Mingjing3, Zhou Wei4   

  1. 1(School of Information, Yunnan Normal University, Kunming 650500); 2(Key Laboratory of Educational Informatization for Nationalities(Yunnan Normal University ), Ministry of Education, Kunming 650500); 3(President Office, Yunnan Normal University, Kunming 650500); 4(National Pilot School of Software, Yunnan University, Kunming 650091)
  • Online: 2019-05-01

摘要: 蛋白质是细胞生命活动中最重要和最多样的一种大分子物质.因此,研究蛋白质功能对于破解生命密码具有重要的意义.以往的研究表明蛋白质功能预测问题本质上是一个多标签分类问题,但庞大的功能标签数量使得各种多标签分类器在蛋白质功能预测中的应用面临巨大挑战.针对蛋白质功能标签数量庞大且标签关联性较高的特点,提出了一种基于布尔矩阵分解的蛋白质功能预测框架(protein function prediction based on Boolean matrix decomposition, PFP-BMD).同时,针对目前布尔矩阵分解算法中精确分解和列利用条件难以同时满足的问题,提出一种基于标签簇的精确布尔矩阵分解算法,使其通过标签关联矩阵实现标签的层次扩展聚簇,并通过相关推论证明了该算法可实现最优的精确布尔矩阵分解.实验结果表明:提出的布尔矩阵分解算法在计算复杂度上具有较大优势,且应用了该算法的蛋白质功能预测框架可有效提升蛋白质功能预测的准确率,为各种多标签分类器在蛋白质功能预测中的高效应用奠定了基础.

关键词: 多标签分类, 蛋白质功能预测, 标签空间降维, 标签关联矩阵, 布尔矩阵分解

Abstract: Protein is the most essential and versatile macromolecule of living cells, and thus the research on protein functions is of great significance in decoding the secret of life. Previous researches have suggested that prediction of protein function is essentially a multi-label classification problem. Nonetheless, the large number of protein functional annotation labels brings the huge challenge to various kinds of multi-label classifiers applied to protein function prediction. To achieve more accuracy prediction of protein function by multi-label classifiers, we consider the characteristics of high correlation between protein functional labels, and propose a framework of protein function prediction based on Boolean matrix decomposition (PFP-BMD). Meanwhile, considering the problem of hardly satisfying exact decomposition and column in condition simultaneously of current Boolean matrix decomposition algorithms, an exact Boolean matrix decomposition algorithm based on label clusters is proposed, which realizes the hierarchical extended clustering of labels by the label-associated matrix. What’s more, we prove its ability of optimal Boolean matrix decomposition based on related deductions. The experimental results show that this exact Boolean matrix decomposition algorithm possesses considerable advantage in reducing the computational complexity in comparison with existing algorithms. In addition, the application of the proposed algorithm in PFP-BMD can effectively improve the accuracy of protein function prediction, and more importantly, reducing and restoring dimensions in the functional label space of proteins using this algorithm lays the foundation of a more efficient classification of various multi-label classifiers.

Key words: multi-label classification, protein function prediction, label space dimension reduction, label-associated matrix, Boolean matrix decomposition

中图分类号: