ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (8): 1784-1793.doi: 10.7544/issn1000-1239.2015.20150180

所属专题: 2015面向大数据的人工智能技术

• 人工智能 • 上一篇    下一篇

一种面向蛋白质复合体检测的图聚类方法

王杰,梁吉业,郑文萍   

  1. (山西大学计算机与信息技术学院 太原 030006)(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006)(xhcwj@sina.com)
  • 出版日期: 2015-08-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61432011,61272004);山西省自然科学基金项目(2011011016-1);教育部高等学校博士学科点专项科研基金项目(200801081017)

A Graph Clustering Method for Detecting Protein Complexes

Wang Jie,Liang Jiye,Zheng Wenping   

  1. (School of Computer & Information Technology, Shanxi University, Taiyuan 030006) (Key Laboratory of Computation Intelligence & Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online: 2015-08-01

摘要: 蛋白质互作用(protein-protein interaction, PPI)网络是广泛存在的一类复杂生物网络,其网络拓扑特征与功能模块分析密切相关.图聚类是对复杂网络进行分析和处理的一种重要计算方法.传统的PPI网络中蛋白质复合体检测算法通常对网络图中的对象进行硬划分,而寻找网络中的重叠簇的软聚类算法已成为当前研究热点之一.现有的软聚类算法较少关注寻找网络中具有重要生物意义的小规模非稠密簇.对此,基于网络中结点邻域给出了边关联强度的度量方法,并在此基础上提出了一种基于流模拟的PPI网络中复合体检测的图聚类(flow-simulation graph clustering, F-GCL)算法,该算法可以在快速发现PPI网络中的重叠簇的同时找到小规模非稠密簇;同时,与MCODE(molecular complex detection),MCL(Markov clustering),RNSC(restricted neighborhood search clustering)和CPM(clique percolation method)算法在6个酿酒酵母PPI网络上进行比较,该算法在F-measure,Accuracy,Separation方面表现了较好的性能.

关键词: 流模拟, 图聚类, 软聚类, 蛋白质互作用网络, 蛋白质复合体

Abstract: Protein-protein interaction (PPI) networks are widely present in complex biological networks. The topological features of PPI networks play an important role in analyzing the functional modules in networks. Some graph clustering methods have been successfully used to complex networks to detect protein complexes in PPI networks. Traditional graph clustering algorithms in PPI analyzing methods primarily focus on hard clustering for a network, while, nowadays soft clustering algorithms to find overlapped clusters have become one of the hotspots of current research. Existing soft clustering algorithms pay less attention on small-scale non-dense clusters, while some small-scale non-dense clusters often have important biological meaning in PPI networks. A measuring method of the association strength of edges is developed based on node neighborhoods in networks, and then a soft clustering algorithm named flow-simulation graph clustering (F-GCL) on the basis of flow simulation is presented to detect complexes in a PPI network. Experiments show that the proposed soft clustering algorithm F-GCL can simultaneously find out overlapping clusters and small-scale non-dense clusters without improving the running time. Compared with MCODE(molecular complex detection), MCL(Markov clustering), RNSC(restricted neighborhood search clustering) and CPM(clique percolation method) algorithms on six Saccharomyces cerevisiae PPI networks, the algorithm F-GCL shows considerable or better performance on three evaluating indicators: F-measure, Accuracy and Separation.

Key words: flow simulation, graph clustering, soft clustering, protein-protein interaction network, protein complex

中图分类号: