ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (8): 1763-1771.doi: 10.7544/issn1000-1239.2017.20170345

所属专题: 2017人工智能前沿进展专题

• 人工智能 • 上一篇    下一篇



  1. (南京航空航天大学计算机科学与技术学院 南京 211106) (
  • 出版日期: 2017-08-01
  • 基金资助: 

Convex Clustering Combined with Weakly-Supervised Information

Quan Zhenzhen, Chen Songcan   

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106)
  • Online: 2017-08-01

摘要: 基于目标函数的聚类是一类重要的聚类分析技术,其中几乎所有算法均是经非凸目标的优化建立,因而难以保证全局最优并对初始值敏感.近年提出的凸聚类通过优化凸目标函数克服了上述不足,同时获得了相对更稳定的解.当现实中存在辅助信息(典型的如必连和或不连约束)可资利用时,通过将其结合到相应目标所得优化模型已证明能有效提高聚类性能,然而,现有通过在目标函数中添加约束惩罚项的常用结合方式往往会破坏其原有凸目标的凸性.鉴于此,提出了一种新的结合此类弱监督辅助信息的凸聚类算法.其实现关键是代替在目标函数中添加约束,而是通过对目标函数中距离度量的改造以保持凸性,由此既保持了原凸聚类的优势同时有效提高了聚类性能.

关键词: 基于目标函数的聚类, 凸聚类, 弱监督信息, 约束, 距离度量, 半监督聚类

Abstract: Objective function-based clustering is a class of important clustering analysis techniques, of which almost all the algorithms are built by optimization of non-convex objective. Thus, these algorithms can hardly get global optimal solution and are sensitive to the provided initialization. Recently, convex clustering has been proposed by optimizing a convex objective function, not only does it overcome the insufficiency illustrated above, but it also obtains a relatively stable solution. It has been proven that clustering performance can be improved effectively by combining useful auxiliary information (typically must-links and/or cannot-links) obtained from reality with the corresponding objective. To the best of our knowledge, all such semi-supervised objective function-based clustering algorithms are based on non-convex objective, semi-supervised convex clustering has not been proposed yet. Thus, we attempt to combine pairwise constraints with convex clustering. However, the existing methods usually make the original convex objectives lose their convexity, which add constraint penalty terms to the objective function. In order to deal with such problem, we introduce a novel semi-supervised convex clustering model by using the weakly-supervised information. In particular, the key idea is to change distance metric instead of adding constraint penalty terms to the objective function. As a result, the proposed method not only maintains the advantages of convex clustering, but also improves the performance of convex clustering.

Key words: objective function-based clustering, convex clustering, weakly-supervised information, constraints, distance metric, semi-supervised clustering