ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (6): 1376-1388.doi: 10.7544/issn1000-1239.2016.20148339

• 信息处理 • 上一篇    下一篇

一种半监督的局部扩展式重叠社区发现方法

陈俊宇,周刚,南煜,曾琦   

  1. (数学工程与先进计算国家重点实验室 郑州 450002) (junychen1107@gmail.com)
  • 出版日期: 2016-06-01
  • 基金资助: 
    数学工程与先进计算国家重点实验室开放基金项目(2013A02)

Semi-Supervised Local Expansion Method for Overlapping Community Detection

Chen Junyu, Zhou Gang, Nan Yu, Zeng Qi   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002)
  • Online: 2016-06-01

摘要: 重叠社区发现是近年来复杂网络领域的研究热点之一.提出一种半监督的局部扩展式重叠社区发现方法SLEM(semi-supervised local expansion method).该方法借鉴了带约束的半监督聚类的思想,不仅利用网络的拓扑结构信息,还充分地利用网络节点的属性信息.首先将网络节点的属性信息转化为成对约束,并根据成对约束修正网络的拓扑结构,使网络中的社区结构更加明显;然后基于网络节点的度中心性选取种子节点,得到分散的、局部节点度大的种子作为初始社区;再采用贪心策略将初始社区向邻居节点扩展,得到局部连接紧密的社区;最后检测并合并冗余社区,得到高覆盖率的社区发现结果.在模拟网络数据和真实网络数据上与当前有代表性的基于局部扩展的重叠社区发现算法进行了对比实验,结果表明SLEM方法在稀疏程度不同的网络上均能发现较高质量的重叠社区结构.

关键词: 复杂网络, 重叠社区发现, 半监督聚类, 局部扩展, SLEM方法

Abstract: Overlapping community detection has become one of the hottest research issues in the field of network science, as well as attracted the attention of researchers with different backgrounds. A novel semi-supervised local expansion method (SLEM) is proposed for detecting overlapping communities more effectively in real world networks. The proposed method makes use of not only the topology information of the network but also the attribute information of partial vertices. Inspired by the idea of semi-supervised clustering with constraints in the field of machine learning, SLEM starts from utilizing the attribute information of partial vertices to get pairwise constraints which can be used to modify the topology structure of the original network. Afterward, a vertex degree centrality-based seeding method is proposed for selecting seeds as initial communities. Then these seeds expand into local communities by a greedy strategy, after which partial connected close-knit communities are formed. Finally, similarities between different communities are computed on the basis of a community distance measurement, and then near-duplicated communities are combined. Taking more advantage of network information than traditional unsupervised community detection methods, SLEM can produce communities with higher structure quality. Experimental results on both synthetic benchmark networks and real world networks show that SLEM can achieve better effect than the state-of-the-art local expansion methods on the networks of different sparsity degrees.

Key words: complex network, overlapping community detection, semi-supervised clustering, local expansion, semi-supervised local expansion method (SLEM)

中图分类号: