ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (9): 1959-1971.doi: 10.7544/issn1000-1239.2018.20180277

所属专题: 2018优青专题

• 信息处理 • 上一篇    下一篇

一种基于标签传播的两阶段社区发现算法

郑文萍1,2,3,车晨浩2,钱宇华1,2,3,王杰2   

  1. 1(山西大学大数据科学与产业研究院 太原 030006); 2(山西大学计算机与信息技术学院 太原 030006); 3(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006) (wpzheng@sxu.edu.cn)
  • 出版日期: 2018-09-01
  • 基金资助: 
    国家自然科学基金项目(61672332,61572005);国家自然科学基金优秀青年科学基金项目(61322211);山西省回国留学人员科研资助项目(2017-014) This work was supported by the National Natural Science Foundation of China (61672332, 61572005), the National Natural Science Foundation of China for Excellent Young Scientists (61322211), and the Research Project of Shanxi Scholarship Council of China (2017-014).

A Two-Stage Community Detection Algorithm Based on Label Propagation

Zheng Wenping1,2,3, Che Chenhao2, Qian Yuhua1,2,3, Wang Jie2   

  1. 1(Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006); 2(School of Computer and Information Technology, Shanxi University, Taiyuan 030006); 3(Key Laboratory of Computational Intelligence and Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online: 2018-09-01

摘要: 针对标签传播社区发现算法在节点更新顺序及标签传播过程中存在较大随机性而导致划分结果稳定性差的问题,提出一种基于标签传播的两阶段社区发现算法(a two-stage community detection algorithm based on label propagation, LPA-TS),通过参与系数确定节点更新顺序,并在标签传播过程中依据节点间相似性更新节点标签,得到初始社区划分.将社区看作节点,社区间连边数作为边权重,得到社区关系网络.按照参与系数由低到高的顺序合并社区关系网络中的节点,得到最终社区划分结果.算法LPA-TS减少了传统LPA方法在节点更新和标签传播过程的随机性;在第2阶段,将不符合弱社区定义的初始社区与连边最多的相邻社区合并,再按照社区参与系数由低到高的顺序合并初始社区提升社区发现质量.通过与一些经典算法在8个真实网络及不同参数下LFR benchmark人工网络数据集上的实验比较表明LPA-TS算法表现了良好的稳定性,在NMI、ARI、模块性等方面表现良好.

关键词: 复杂网络, 社区发现, 标签传播, 参与系数, 弱社区

Abstract: Due to the random process in node selection and label propagation, the stability of the results of traditional LPA is poor. A two-stage community detection algorithm is proposed based on label propagation, abbreviated as LPA-TS. In the first step of LPA-TS, the labels of nodes are updated according to their participation coefficients in non-decreasing order, and the node label is determined according to the similarity of nodes in the process of node label updating. Some clusters found by Step1 might not satisfy the weak community condition. If a cluster is not a weak community, in the beginning of Step2, we will merge it with the cluster that has most connections with it. Next, we treat each community as a node, and the number of edges between two communities as their edge weights between corresponding nodes. We compute the participation coefficients of each node of the resulted network, and use similar process to get the final results of the communities. The proposed algorithm LPA-TS reduces the randomness in the process of node selection and label propagation; hence, we might obtain stable communities by LPA-TS. In addition, LPA-TS combines small scale communities with adjacent communities in the second phase of the algorithm to improve the quality of community detection. Compared with other classical community detection algorithms on some real networks and artificial networks, the proposed algorithm shows preferable performance on stability, NMI, ARI and modularity.

Key words: complex network, community detection, label propagation, participation coefficient, weak community

中图分类号: