ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (11): 2594-2606.doi: 10.7544/issn1000-1239.2016.20150467

• 人工智能 • 上一篇    下一篇

分类数据的多目标模糊中心点聚类算法

周治平,朱书伟,张道文   

  1. (江南大学物联网工程学院 江苏无锡 214122) (zzp@jiangnan.edu.cn)
  • 出版日期: 2016-11-01
  • 基金资助: 
    国家自然科学基金项目(61373126);江苏省自然科学基金项目(BK20131107);江苏省产学研联合创新资金-前瞻性联合研究基金项目(BY2013015-33) This work was supported by the National Natural Science Foundation of China (61373126), the Natural Science Foundation of Jiangsu Province of China (BK20131107), and the Cooperative Industry-Academy-Research Innovation Foundation of Jiangsu Province of China (BY2013015-33).

Multiobjective Clustering Algorithm with Fuzzy Centroids for Categorical Data

Zhou Zhiping, Zhu Shuwei, Zhang Daowen   

  1. (School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122)
  • Online: 2016-11-01

摘要: 针对传统面向分类属性数据的聚类算法大多是对单一指标优化而存在的局限性,将类内和类间信息同时引入到优化过程中,结合多目标优化算法与模糊中心点聚类,提出一种新颖的多目标模糊聚类算法.与传统的基于遗传算法的混合聚类方法不同的是,采用模糊隶属度对染色体进行编码,同时优化2个相对的聚类目标函数获得一组最优解集,并且采用了一种提前终止准则判断算法是否达到稳定状态并停止操作,以减少不必要的计算开销.为了进一步提高算法的效率,通过采样子集计算出相应的模糊中心点作为类的表达,然后以这些模糊中心点计算出全体样本的隶属度矩阵即可获得最终的聚类结果.对10种数据集的实验结果表明:所提方法在聚类精度和稳定性方面优于当前最新的多目标聚类算法,且计算效率也获得较大的提升.

关键词: 分类数据, 聚类, 多目标优化, 模糊中心点, 最优解集

Abstract: It has been shown that most traditional clustering algorithms for categorical data that only optimize a single criteria suffer from some limitations, thus a novel multiobjective fuzzy clustering is proposed, which simultaneously considers within-cluster and between-cluster information. The lately reported algorithms are all based on K-modes, and the more accurate algorithm fuzzy centroids is utilized as the base algorithm to design the proposed method. Fuzzy membership is used as chromosome that is different from traditional genetic based hybrid algorithms, and a set of optimal clustering solutions can be produced by optimizing two conflicting objectives simultaneously. Meanwhile, a termination criterion in advance which can reduce unnecessary computing cost is used to judge whether the algorithm is steady or not. To further improve the efficiency of the proposed method, fuzzy centroids can be calculated using a subset of the dataset, and then the membership matrix can be calculated by these centroids to obtain the final clustering result. The experimental results of 10 datasets show that the clustering accuracy and stability of the proposed algorithm is better than the state of art multiobjective algorithm, and also the computing efficiency is improved to a large extern.

Key words: categorical data, clustering, multiobjective optimization, fuzzy centroids, Pareto-optimal solutions

中图分类号: