ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 基于多网络数据协同矩阵分解预测蛋白质功能

1. 1(西南大学计算机与信息科学学院 重庆 400715); 2(广东工业大学计算机学院 广州 510006) (gxyu@swu.edu.cn)
• 出版日期: 2017-12-01
• 基金资助:
国家自然科学基金项目(61402378,61772143);重庆市自然科学基金项目(cstc2016jcyjA0351)

### Protein Function Prediction Based on Multiple Networks Collaborative Matrix Factorization

Yu Guoxian1, Wang Keyao1, Fu Guangyuan1, Wang Jun1, Zeng An2

1. 1(College of Computer and Information Science, Southwest University, Chongqing 400715); 2(School of Computers, Guangdong University of Technology, Guangzhou 510006)
• Online: 2017-12-01

Abstract: Accurately and automatically predicting biological functions of proteins is one of the fundamental tasks in bioinformatics, and it is also one of the key applications of artificial intelligence in biological data analysis. The wide application of high throughput technologies produces various functional association networks of molecules. Integrating these networks contributes to more comprehensive view for understanding the functional mechanism of proteins and to improve the performance of protein function prediction. However, existing network integration based solutions cannot apply to a large number of functional labels, ignore the correlation between labels, or cannot differentially integrate multiple networks. This paper proposes a protein function prediction approach based on multiple networks collaborative matrix factorization (ProCMF). To explore the latent relationship between proteins and between labels, ProCMF firstly applies nonnegative matrix factorization to factorize the protein-label association matrix into two low-rank matrices. To employ the correlation between labels and to guide the collaborative factorization with proteomic data, it defines two smoothness terms on these two low-rank matrices. To differentially integrate these networks, ProCMF sets different weights to them. In the end, ProCMF combines these goals into a unified objective function and introduces an alternative optimization technique to jointly optimize the low-rank matrices and weights. Experimental results on three model species (yeast, human and mouse) with multiple functional networks show that ProCMF outperforms other related competitive methods. ProCMF can effectively and efficiently handle massive labels and differentially integrate multiple networks.