Convex Clustering Combined with Weakly-Supervised Information
-
摘要: 基于目标函数的聚类是一类重要的聚类分析技术,其中几乎所有算法均是经非凸目标的优化建立,因而难以保证全局最优并对初始值敏感.近年提出的凸聚类通过优化凸目标函数克服了上述不足,同时获得了相对更稳定的解.当现实中存在辅助信息(典型的如必连和或不连约束)可资利用时,通过将其结合到相应目标所得优化模型已证明能有效提高聚类性能,然而,现有通过在目标函数中添加约束惩罚项的常用结合方式往往会破坏其原有凸目标的凸性.鉴于此,提出了一种新的结合此类弱监督辅助信息的凸聚类算法.其实现关键是代替在目标函数中添加约束,而是通过对目标函数中距离度量的改造以保持凸性,由此既保持了原凸聚类的优势同时有效提高了聚类性能.Abstract: Objective function-based clustering is a class of important clustering analysis techniques, of which almost all the algorithms are built by optimization of non-convex objective. Thus, these algorithms can hardly get global optimal solution and are sensitive to the provided initialization. Recently, convex clustering has been proposed by optimizing a convex objective function, not only does it overcome the insufficiency illustrated above, but it also obtains a relatively stable solution. It has been proven that clustering performance can be improved effectively by combining useful auxiliary information (typically must-links and/or cannot-links) obtained from reality with the corresponding objective. To the best of our knowledge, all such semi-supervised objective function-based clustering algorithms are based on non-convex objective, semi-supervised convex clustering has not been proposed yet. Thus, we attempt to combine pairwise constraints with convex clustering. However, the existing methods usually make the original convex objectives lose their convexity, which add constraint penalty terms to the objective function. In order to deal with such problem, we introduce a novel semi-supervised convex clustering model by using the weakly-supervised information. In particular, the key idea is to change distance metric instead of adding constraint penalty terms to the objective function. As a result, the proposed method not only maintains the advantages of convex clustering, but also improves the performance of convex clustering.
-
-
期刊类型引用(10)
1. 徐怡,陶强. 划分序乘积空间约简算法研究. 系统工程理论与实践. 2025(02): 554-570 . 百度学术
2. 刘长顺,刘炎,宋晶晶,徐泰华. 基于论域离散度的属性约简算法. 山东大学学报(理学版). 2023(05): 26-35+52 . 百度学术
3. 张清华,艾志华,张金镇. 融合密度与邻域覆盖约简的分类方法. 陕西师范大学学报(自然科学版). 2022(03): 33-42 . 百度学术
4. 张雨新,孙达明,李飞. 基于粒化单调的不完备混合型数据增量式属性约简算法. 计算机应用与软件. 2021(03): 279-286 . 百度学术
5. 邹丽,任思远,杨光,杨鑫华. 基于改进条件邻域熵的接头疲劳寿命影响因素分析. 焊接学报. 2021(11): 43-50+99-100 . 百度学术
6. 刘正,陈雪勤,张书锋. 基于最小化邻域互信息的邻域熵属性约简算法. 微电子学与计算机. 2020(03): 26-32 . 百度学术
7. 陈帅,张贤勇,唐玲玉,姚岳松. 邻域互补信息度量及其启发式属性约简. 数据采集与处理. 2020(04): 630-641 . 百度学术
8. 周艳红,张强. 基于三层粒结构的三支邻域熵. 数学的实践与认识. 2020(14): 83-93 . 百度学术
9. 亓慧,史颖. 不同度量下集成属性选择器的对比研究. 山西大学学报(自然科学版). 2019(04): 848-853 . 百度学术
10. 周艳红,张迪,张强. 基于单调信息度量的特定类属性约简. 内江师范学院学报. 2019(12): 35-39 . 百度学术
其他类型引用(11)
计量
- 文章访问数: 1680
- HTML全文浏览量: 1
- PDF下载量: 680
- 被引次数: 21