高级检索

    一种面向蛋白质复合体检测的图聚类方法

    A Graph Clustering Method for Detecting Protein Complexes

    • 摘要: 蛋白质互作用(protein-protein interaction, PPI)网络是广泛存在的一类复杂生物网络,其网络拓扑特征与功能模块分析密切相关.图聚类是对复杂网络进行分析和处理的一种重要计算方法.传统的PPI网络中蛋白质复合体检测算法通常对网络图中的对象进行硬划分,而寻找网络中的重叠簇的软聚类算法已成为当前研究热点之一.现有的软聚类算法较少关注寻找网络中具有重要生物意义的小规模非稠密簇.对此,基于网络中结点邻域给出了边关联强度的度量方法,并在此基础上提出了一种基于流模拟的PPI网络中复合体检测的图聚类(flow-simulation graph clustering, F-GCL)算法,该算法可以在快速发现PPI网络中的重叠簇的同时找到小规模非稠密簇;同时,与MCODE(molecular complex detection),MCL(Markov clustering),RNSC(restricted neighborhood search clustering)和CPM(clique percolation method)算法在6个酿酒酵母PPI网络上进行比较,该算法在F-measure,Accuracy,Separation方面表现了较好的性能.

       

      Abstract: Protein-protein interaction (PPI) networks are widely present in complex biological networks. The topological features of PPI networks play an important role in analyzing the functional modules in networks. Some graph clustering methods have been successfully used to complex networks to detect protein complexes in PPI networks. Traditional graph clustering algorithms in PPI analyzing methods primarily focus on hard clustering for a network, while, nowadays soft clustering algorithms to find overlapped clusters have become one of the hotspots of current research. Existing soft clustering algorithms pay less attention on small-scale non-dense clusters, while some small-scale non-dense clusters often have important biological meaning in PPI networks. A measuring method of the association strength of edges is developed based on node neighborhoods in networks, and then a soft clustering algorithm named flow-simulation graph clustering (F-GCL) on the basis of flow simulation is presented to detect complexes in a PPI network. Experiments show that the proposed soft clustering algorithm F-GCL can simultaneously find out overlapping clusters and small-scale non-dense clusters without improving the running time. Compared with MCODE(molecular complex detection), MCL(Markov clustering), RNSC(restricted neighborhood search clustering) and CPM(clique percolation method) algorithms on six Saccharomyces cerevisiae PPI networks, the algorithm F-GCL shows considerable or better performance on three evaluating indicators: F-measure, Accuracy and Separation.

       

    /

    返回文章
    返回