ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (12): 2611-2619.doi: 10.7544/issn1000-1239.2018.20180575

所属专题: 2018碎片化知识融合与应用专题

• 其他应用技术 • 上一篇    下一篇

基于群体智慧的簇连接聚类集成算法

张恒山1,2,高宇坤1,陈彦萍1,2,王忠民1,2   

  1. 1(西安邮电大学计算机学院 西安 710121);2(陕西省网络数据分析与智能处理重点实验室(西安邮电大学) 西安 710121) (hengshzhang@foxmail.com)
  • 出版日期: 2018-12-01
  • 基金资助: 
    国家自然科学基金项目(61373116);陕西省科技统筹创新工程基金项目(2016KTZDGY04-01)

Clustering Ensemble Algorithm with Cluster Connection Based on Wisdom of Crowds

Zhang Hengshan1,2, Gao Yukun1, Chen Yanping1,2, Wang Zhongmin1,2   

  1. 1(School of Computer Science & Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121);2(Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing (Xi’an University of Posts and Telecommunications), Xi’an 710121)
  • Online: 2018-12-01

摘要: 利用群体智慧原理,将多个相互独立的聚类算法的结果进行聚合,将显著提高聚类结果的准确性.基于群体智慧的簇连接聚类集成算法,首先使用群体智慧理论的独立性、分散性、多样性原则引导个体聚类结果的生成,然后提出基于连接三元组的聚类集成算法对个体聚类结果进行分组聚合,将分组聚合的结果再次进行聚合得到最终的聚类结果.该算法的优点包括:1)通过簇的分组和权重调整,避免了对基聚类生成的簇进行选择,有利于充分利用已生成簇的信息;2)采用连接三元组算法计算数据之间的相似性,可以充分挖掘数据点之间的关系.对不同数据集的实验研究表明:该算法相对传统的集成聚类算法以及群体智慧与机器学习相结合的集成聚类算法,可以进一步提高集成聚类结果的准确性.

关键词: 群体智慧, 聚类集成, 连接三元组, 聚类集成选择, 数据挖掘

Abstract: The accuracy and stability of clustering will be obviously improved when a lot of independent clustering results for the same data set are aggregated by utilizing the principle of wisdom of crowds. In this paper, clustering ensemble algorithm with cluster connection based on wisdom of crowds (CECWOC) is proposed. Firstly, the independent clustering results are produced by the different clustering algorithms, which is guided by utilizing the independency, decentralization, diversity of wisdom of crowds. Secondly, the clustering ensemble algorithm based on connecting triple is developed to grouping aggregate the produced independent clusters, and the obtained results are aggregated again and the final cluster set is produced. The advantages of proposed algorithm are that: 1)The produced clusters by base clustering is grouping aggregated and weights of clusters are adjusted so that the selection of clusters is avoided, as a result, information on the produced clusters are not ignored; 2)Similarities of data are computed by using connected triple algorithm, the relations of data that their similarities are zero can be used. The experimental results at the different data sets show that the proposed algorithm can obtain the more accurate and stable results than other clustering ensemble algorithms, including the ones based on framework of wisdom of crowds.

Key words: wisdom of crowds (WOC), clustering ensemble, connecting triple, clustering ensemble select (CES), data mining

中图分类号: