The Framework of Protein Function Prediction Based on Boolean Matrix Decomposition
-
摘要: 蛋白质是细胞生命活动中最重要和最多样的一种大分子物质.因此,研究蛋白质功能对于破解生命密码具有重要的意义.以往的研究表明蛋白质功能预测问题本质上是一个多标签分类问题,但庞大的功能标签数量使得各种多标签分类器在蛋白质功能预测中的应用面临巨大挑战.针对蛋白质功能标签数量庞大且标签关联性较高的特点,提出了一种基于布尔矩阵分解的蛋白质功能预测框架(protein function prediction based on Boolean matrix decomposition, PFP-BMD).同时,针对目前布尔矩阵分解算法中精确分解和列利用条件难以同时满足的问题,提出一种基于标签簇的精确布尔矩阵分解算法,使其通过标签关联矩阵实现标签的层次扩展聚簇,并通过相关推论证明了该算法可实现最优的精确布尔矩阵分解.实验结果表明:提出的布尔矩阵分解算法在计算复杂度上具有较大优势,且应用了该算法的蛋白质功能预测框架可有效提升蛋白质功能预测的准确率,为各种多标签分类器在蛋白质功能预测中的高效应用奠定了基础.Abstract: Protein is the most essential and versatile macromolecule of living cells, and thus the research on protein functions is of great significance in decoding the secret of life. Previous researches have suggested that prediction of protein function is essentially a multi-label classification problem. Nonetheless, the large number of protein functional annotation labels brings the huge challenge to various kinds of multi-label classifiers applied to protein function prediction. To achieve more accuracy prediction of protein function by multi-label classifiers, we consider the characteristics of high correlation between protein functional labels, and propose a framework of protein function prediction based on Boolean matrix decomposition (PFP-BMD). Meanwhile, considering the problem of hardly satisfying exact decomposition and column in condition simultaneously of current Boolean matrix decomposition algorithms, an exact Boolean matrix decomposition algorithm based on label clusters is proposed, which realizes the hierarchical extended clustering of labels by the label-associated matrix. What’s more, we prove its ability of optimal Boolean matrix decomposition based on related deductions. The experimental results show that this exact Boolean matrix decomposition algorithm possesses considerable advantage in reducing the computational complexity in comparison with existing algorithms. In addition, the application of the proposed algorithm in PFP-BMD can effectively improve the accuracy of protein function prediction, and more importantly, reducing and restoring dimensions in the functional label space of proteins using this algorithm lays the foundation of a more efficient classification of various multi-label classifiers.
-
-
期刊类型引用(12)
1. 郭锋,刘飞洋,刘鸽. LoongArch处理器存储管理技术研究与实现. 电脑编程技巧与维护. 2025(03): 82-84 . 百度学术
2. 邢世远,张见齐,王焕东,吴学智,吴瑞阳. 片间互连总线协议层关键技术研究. 高技术通讯. 2025(02): 113-123 . 百度学术
3. 舒燕君,郑翔宇,徐成华,黄沛,王永琪,周凡,张展,左德承. 面向LoongArch边界检查访存指令的GCC优化. 计算机研究与发展. 2025(05): 1136-1150 . 本站查看
4. 孙东华,刘亚冬. 基于龙芯3A5000的全国产化EtherCAT主站控制器设计. 现代信息科技. 2025(08): 16-19+24 . 百度学术
5. 谢汶兵,田雪,漆锋滨,武成岗,王俊,罗巧玲. 二进制翻译技术综述. 软件学报. 2024(06): 2687-2723 . 百度学术
6. 谢君,陈汉云,袁璐,张梦娇,王增,石锐. 基于信创的医院自助服务系统建设探索. 中国数字医学. 2024(07): 41-45+69 . 百度学术
7. 谭弘泽,王剑. 基于动态压缩的高存储效率末级分支目标缓冲. 高技术通讯. 2024(07): 671-680 . 百度学术
8. 刘登峰,李东亚,柴志雷,周浩杰,丁海峰. 基于QEMU的SIMD指令替换浮点指令框架. 湖南大学学报(自然科学版). 2024(08): 70-77 . 百度学术
9. 陈国良,汤晓宇,尤帅,姚小良,梅超君,林时俊,刘尚东,吴少刚,孙雅薇,王汝传,季一木. 基于国产处理器的智能大数据一体机架构及应用研究. 南京邮电大学学报(自然科学版). 2024(04): 1-16 . 百度学术
10. 贾金成,朱家鑫,唐震,王志鹏,王伟. 映射字典导向的64位ARM到RISC-V汇编翻译. 小型微型计算机系统. 2024(08): 2041-2048 . 百度学术
11. 游英杰,刘宣佑,唐文武,张统兵,王岩. 基于CPU的PCIe驱动及与DSP和FPGA的交互设计. 火控雷达技术. 2024(04): 88-93+123 . 百度学术
12. 王一泠,吴琦,安军社. 支持MIPS架构的轻量型开源鸿蒙系统移植. 计算机工程. 2023(12): 25-34+45 . 百度学术
其他类型引用(23)
计量
- 文章访问数: 960
- HTML全文浏览量: 6
- PDF下载量: 302
- 被引次数: 35