ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (8): 1784-1793.doi: 10.7544/issn1000-1239.2015.20150180

Special Issue: 2015面向大数据的人工智能技术

Previous Articles     Next Articles

A Graph Clustering Method for Detecting Protein Complexes

Wang Jie,Liang Jiye,Zheng Wenping   

  1. (School of Computer & Information Technology, Shanxi University, Taiyuan 030006) (Key Laboratory of Computation Intelligence & Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online:2015-08-01

Abstract: Protein-protein interaction (PPI) networks are widely present in complex biological networks. The topological features of PPI networks play an important role in analyzing the functional modules in networks. Some graph clustering methods have been successfully used to complex networks to detect protein complexes in PPI networks. Traditional graph clustering algorithms in PPI analyzing methods primarily focus on hard clustering for a network, while, nowadays soft clustering algorithms to find overlapped clusters have become one of the hotspots of current research. Existing soft clustering algorithms pay less attention on small-scale non-dense clusters, while some small-scale non-dense clusters often have important biological meaning in PPI networks. A measuring method of the association strength of edges is developed based on node neighborhoods in networks, and then a soft clustering algorithm named flow-simulation graph clustering (F-GCL) on the basis of flow simulation is presented to detect complexes in a PPI network. Experiments show that the proposed soft clustering algorithm F-GCL can simultaneously find out overlapping clusters and small-scale non-dense clusters without improving the running time. Compared with MCODE(molecular complex detection), MCL(Markov clustering), RNSC(restricted neighborhood search clustering) and CPM(clique percolation method) algorithms on six Saccharomyces cerevisiae PPI networks, the algorithm F-GCL shows considerable or better performance on three evaluating indicators: F-measure, Accuracy and Separation.

Key words: flow simulation, graph clustering, soft clustering, protein-protein interaction network, protein complex

CLC Number: