Abstract:
Protein is the most essential and versatile macromolecule of living cells, and thus the research on protein functions is of great significance in decoding the secret of life. Previous researches have suggested that prediction of protein function is essentially a multi-label classification problem. Nonetheless, the large number of protein functional annotation labels brings the huge challenge to various kinds of multi-label classifiers applied to protein function prediction. To achieve more accuracy prediction of protein function by multi-label classifiers, we consider the characteristics of high correlation between protein functional labels, and propose a framework of protein function prediction based on Boolean matrix decomposition (PFP-BMD). Meanwhile, considering the problem of hardly satisfying exact decomposition and column in condition simultaneously of current Boolean matrix decomposition algorithms, an exact Boolean matrix decomposition algorithm based on label clusters is proposed, which realizes the hierarchical extended clustering of labels by the label-associated matrix. What’s more, we prove its ability of optimal Boolean matrix decomposition based on related deductions. The experimental results show that this exact Boolean matrix decomposition algorithm possesses considerable advantage in reducing the computational complexity in comparison with existing algorithms. In addition, the application of the proposed algorithm in PFP-BMD can effectively improve the accuracy of protein function prediction, and more importantly, reducing and restoring dimensions in the functional label space of proteins using this algorithm lays the foundation of a more efficient classification of various multi-label classifiers.