ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 密度峰值聚类算法综述

1. 1(华侨大学计算机科学与技术学院 福建厦门 361021);2(食品安全大数据技术北京市重点实验室(北京工商大学) 北京 100048);3(江苏省计算机信息处理技术重点实验室(苏州大学) 江苏苏州 215006);4(福建省大数据智能与安全重点实验室(华侨大学) 福建厦门 361021);5(宁波大学信息学院 浙江宁波 315211) (ywchen@hqu.edu.cn)
• 出版日期: 2020-02-01
• 基金资助:
国家自然科学基金项目(61673186,71771094,61876068,61972010)；泉州市高层次人才创新创业项目(2018C114R,2018C110R)；福建省科技计划项目(2017H01010065,2019H01010129)

### Survey on Density Peak Clustering Algorithm

Chen Yewang1,2,3,4, Shen Lianlian1, Zhong Caiming5, Wang Tian1, Chen Yi2, and Du Jixiang1

1. 1(College of Computer Science and Technology, Huaqiao University, Xiamen, Fujian 361021);2(Beijing Key Laboratory of Big Data Technology for Food Safety (Beijing Technology and Business University), Beijing 100048);3(Provincial Key Laboratory for Computer Information Processing Technology (Soochow University), Suzhou, Jiangsu 215006);4(Fujian Key Laboratory of Big Data Intelligence and Security (Huaqiao University), Xiamen, Fujian 361021);5(College of Information, Ningbo University, Ningbo, Zhejiang 315211)
• Online: 2020-02-01
• Supported by:
This work was supported by the National Natural Science Foundation of China (61673186, 71771094, 61876068, 61972010), the Quanzhou City Science & Technology Program of China (2018C114R, 2018C110R), and the Project of Science and Technology Plan of Fujian Province of China (2017H01010065, 2019H01010129).

Abstract: DPeak(density peak) is a simple but effective clustering method. It is able to map data with arbitrary dimension onto a 2-dimensional space, and construct hierarchical relationship for all data points on the new reduction space. This makes it is easy to pick up some distinguished points (density peaks), each of which has high density and large distance from other regions of higher density. In addition, based on regarding theses density peaks as cluster centers and the hierarchical relationship, the algorithm provides two different ways to perform the final task of clustering, i.e., one is decision diagram that can interact with users, and the other is an automatic method. In this paper, we trace the development and application trends of DPeak in recent years, summarize and comb various improvements or variations of DPeak algorithm from the following aspects. Firstly, the principle of DPeak algorithm is introduced, and its position in the classification system of clustering algorithm is discussed as well. After comparing DPeak with several other main clustering algorithms, it is found that DPeak is highly similar to mean shift, and hence, we think that DPeak may be a special variant of mean shift. Secondly, some shortcomings of DPeak are discussed, such as high time complexity, lack of adaptability, low precision and inefficiency in high dimensional space etc., and then various improved algorithms are demonstrated in different categories. In addition, some applications of DPeak in different fields, such as natural language processing, biomedical analysis and optical applications etc., are presented and combed. Last but not least, we look forward to its future work based on the problems and challenges of the DPeak.