高级检索

    双网络中影响力凝聚子图发现算法

    Influential Cohesive Subgraph Discovery Algorithm in Dual Networks

    • 摘要: 双网络由物理图和概念图构成,其中物理图和概念图共享网络结点集合而具有不同边集合. 物理图中边表示结点间实际存在的关系;概念图中边表示结点间的相似程度,通常由计算得出. 最近,从双网络中发现凝聚子图,即物理图中连通且概念图中稠密的子图受到研究者的广泛关注,在研讨会筹备、商品推荐和致病基因发现等真实场景中具有广泛应用. 但现有研究鲜有考虑双网络中凝聚子图的影响力. 为此:1)提出一种基于最小边权重定义的影响力凝聚子图,即影响力k-连通truss(k-ICT)子图模型.k-ICT子图模型能够有效刻画子图在双网络中的重要性且对低影响力边鲁棒. 2)由证明可知,发现影响力最大的k-ICT子图是NP-难的,因此提出一种基于概念图边等价类划分的CT索引结构. 利用索引的概要图,能够根据不同的k值,快速发现包含所有k-ICT子图的候选子图. 3)提出了基于全局枚举删除和局部子图扩展的精确算法Exact- \mathrmG kICT和Exact-LkICT,用于发现top-r具有最大影响力的k-ICT子图. 通过大量在真实数据集上的实验,验证算法的高效性和有效性.

       

      Abstract: Dual networks are composed of physical graph and concept graph. The physical graph and concept graph share the same set of vertices but have different sets of edges. The edges of physical graph represent the real relationship between vertices, while those of concept graph represent the similarity between vertices, usually obtained by computation. Recently, discovering cohesive subgraphs from dual networks, i.e., the subgraphs that connected in physical graph, while cohesive in concept graphs, has attracted extensive attention of researchers. The subgraphs have been widely used in many real scenarios such as seminar preparation, product recommendation and disease causing gene discovery. Yet, few existing studies have considered the influence of cohesive subgraphs in dual networks. To this end, in this paper: 1) An influential cohesive subgraph based on the minimum edge weight, i.e., influential k-connected truss (k-ICT) subgraph model is proposed. The k-ICT subgraph model can effectively characterize the importance of subgraphs in dual networks and is robust to low-influence edges. 2) As it proves finding the k-ICT with the largest influence is NP-hard, a CT index based on equivalence class partition of edges in the concept graph is proposed. By exploiting the summary graph of the index, candidate subgraphs containing all k-ICTs can be quickly found for different k values. 3) Exact- \mathrmG kICT and Exact-LkICT algorithms are proposed, respectively, based on the global enumeration deletion and local subgraph expansion, to identify the top-r k-ICTs with the largest influences. Extensive experiments conducted on real datasets verify the efficiency and effectiveness of our approaches.

       

    /

    返回文章
    返回