ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (8): 1719-1728.doi: 10.7544/issn1000-1239.2016.20160136

所属专题: 2016数据挖掘前沿技术专题

• 人工智能 • 上一篇    下一篇



  1. (盲信号处理国家科技重点实验室 成都 610041) (
  • 出版日期: 2016-08-01

Self-Adaptive Clustering Based on Local Density by Descending Search

Xu Zhengguo, Zheng Hui, He Liang,Yao Jiaqi   

  1. (National Key Laboratory of Science and Technology on Blind Signals Processing, Chengdu 610041)
  • Online: 2016-08-01

摘要: 聚类分析是数据挖掘中一个重要的研究领域,用于在无监督条件下,从混合类别的数据集中分离各样本的自然分组.根据不同的先验条件,现已提出了多种不同的聚类算法.但复杂数据集中存在的聚类个数未知、聚类形态混杂、样本分布不均匀以及类间样本数不均衡等问题,仍然是当前聚类分析研究中的重难点问题.针对这些问题,通过定义样本分布的局部密度,提出了一种利用类内密度有序性搜索聚类边界的新的聚类方法,能够实现在未知聚类个数条件下,对任意分布形态的数据样本集进行聚类.同时,通过自适应调节聚类参数来处理数据分布疏密度不一、类间样本数不均衡以及局部密度异常等特殊情况,避免样本类别被误划分和噪声数据干扰.实验结果表明,在6类典型测试集上,提出的新聚类算法均有较好的适用性,而在与典型聚类算法和最近发表的一种聚类算法的性能指标对比上,新算法也表现更优.

关键词: 数据挖掘, 聚类, 局部密度, 下降搜索, 自适应

Abstract: Cluster analysis is an important research domain of data mining. On the unsupervised condition, it is aimed at figuring out the class attributes of samples in a mixed data set automatically. For decades a certain amount of clustering algorithms have been proposed associated with different kinds of priori knowledge. However, there are still some knotty problems unsolved in clustering complex data sets, such as the unknown number and miscellaneous patterns of clusters, the unbalanced numbers of samples between clusters, and varied densities within clusters. These problems have become the difficult and emphatic points in the research nowadays. Facing these challenges, a novel clustering method is introduced. Based on the definition of local density and the intuition of ordered density in clusters, the new clustering method can find out natural partitions by self-adapted searching the boundaries of clusters. Furthermore, in the clustering process, it can overcome the straitened circumstances mentioned above, with avoiding noise disturbance and false classification. The clustering method is testified on 6 typical and markedly different data sets, and the results show that it has good feasibility and performance in the experiments. Compared with other classic clustering methods and an algorithm presented recently, in addition, the new clustering method outperforms them on 2 different evaluation indexes.

Key words: data mining, clustering, local density, descending search, self-adaption