ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (5): 1020-1033.doi: 10.7544/issn1000-1239.2019.20180274

Previous Articles     Next Articles

The Framework of Protein Function Prediction Based on Boolean Matrix Decomposition

Liu Lin1, Tang Lin2, Tang Mingjing3, Zhou Wei4   

  1. 1(School of Information, Yunnan Normal University, Kunming 650500); 2(Key Laboratory of Educational Informatization for Nationalities(Yunnan Normal University ), Ministry of Education, Kunming 650500); 3(President Office, Yunnan Normal University, Kunming 650500); 4(National Pilot School of Software, Yunnan University, Kunming 650091)
  • Online:2019-05-01

Abstract: Protein is the most essential and versatile macromolecule of living cells, and thus the research on protein functions is of great significance in decoding the secret of life. Previous researches have suggested that prediction of protein function is essentially a multi-label classification problem. Nonetheless, the large number of protein functional annotation labels brings the huge challenge to various kinds of multi-label classifiers applied to protein function prediction. To achieve more accuracy prediction of protein function by multi-label classifiers, we consider the characteristics of high correlation between protein functional labels, and propose a framework of protein function prediction based on Boolean matrix decomposition (PFP-BMD). Meanwhile, considering the problem of hardly satisfying exact decomposition and column in condition simultaneously of current Boolean matrix decomposition algorithms, an exact Boolean matrix decomposition algorithm based on label clusters is proposed, which realizes the hierarchical extended clustering of labels by the label-associated matrix. What’s more, we prove its ability of optimal Boolean matrix decomposition based on related deductions. The experimental results show that this exact Boolean matrix decomposition algorithm possesses considerable advantage in reducing the computational complexity in comparison with existing algorithms. In addition, the application of the proposed algorithm in PFP-BMD can effectively improve the accuracy of protein function prediction, and more importantly, reducing and restoring dimensions in the functional label space of proteins using this algorithm lays the foundation of a more efficient classification of various multi-label classifiers.

Key words: multi-label classification, protein function prediction, label space dimension reduction, label-associated matrix, Boolean matrix decomposition

CLC Number: