Sample-Weighted Multi-View Clustering
-
摘要: 大数据时代,人类收集、存储、传输、管理数据的能力日益提高,各行各业已经积累了大量的数据资源,这些数据常呈现出多源性和异构性.如何对这些多源数据进行有效的聚类(也称为多视图聚类)已成为当今机器学习研究关注的焦点之一.现有的多视图聚类算法主要从“全局”角度关注不同视图和特征对簇结构的贡献,没有考虑不同样本间存在的“局部”信息间的差异.因此,提出一种新的多视图样本加权聚类算法(sample-weighted multi-view clustering, SWMVC),该算法对每个样本的不同视图进行加权,采用交替方向乘子法自适应学习样本权值,不仅可以学习不同样本点间不同视图权重的“局部”差异,还可以从学习到的“局部”差异反映出不同视图对簇结构贡献的“全局”差异,具有较好的灵活性.多个数据集上的实验表明:SWMVC方法在异质视图数据上具有较好的聚类效果.Abstract: In the era of big data, the ability of humans to collect, store, transmit and manage data has been increasingly improved. Various industries have accumulated a large amount of data resources, which are often multi-source and heterogeneous. How to effectively cluster these multi-source data (also known as multi-view clustering) has become one of the focuses of today’s machine learning research. The existing multi-view clustering algorithms mainly pay attention to the contribution of different views and features to the cluster structure from the “global” perspective, without considering the “local” information complementary differences between different samples. Therefore, this paper proposes a new sample-weighted multi-view clustering (SWMVC). The method weights each sample with different views and adopts alternating direction method of multipliers (ADMM) to learn sample weight, which can not only learn the “local” difference of weights among multiple views in different sample points, but also reflect the “global” difference of the contribution of different views to the cluster structure, and has better flexibility. Experiments on multiple datasets show that the SWMVC method has a better clustering effect on heterogeneous view data.
-
Keywords:
- data mining /
- multi-view /
- cluster /
- K-means /
- sample weights
-
-
期刊类型引用(11)
1. 周显春,喻佳. 基于图神经网络的人工自然语言语义挖掘仿真. 计算机仿真. 2024(01): 344-348 . 百度学术
2. 孟祥福,田友发,张霄雁. 基于LightGBM模型的肺腺癌免疫相关基因筛选与患者生存率预测. 生物医学工程学杂志. 2024(01): 70-79 . 百度学术
3. 陈伟,周丽华,王亚峰,王丽珍,陈红梅. 异质信息网络中基于解耦图神经网络的社区搜索. 计算机科学. 2024(03): 90-101 . 百度学术
4. 万齐智,万常选,胡蓉,刘德喜,刘喜平,廖国琼. 面向研究问题的深度学习事件抽取综述. 自动化学报. 2024(11): 2079-2101 . 百度学术
5. 刘超,孔兵,杜国王,周丽华,陈红梅,包崇明. 高阶互信息最大化与伪标签指导的深度聚类. 浙江大学学报(工学版). 2023(02): 299-309 . 百度学术
6. 杨成波,周丽华,黄亚群,杨宇迪. 异质网络中基于关键词属性的Truss社区搜索. 计算机应用研究. 2023(06): 1708-1714 . 百度学术
7. 白明昌. 基于折叠路径聚合的属性网络节点嵌入方法. 计算机工程. 2023(07): 76-84 . 百度学术
8. 谢小杰,梁英,王梓森,刘政君. 基于图卷积的异质网络节点分类方法. 计算机研究与发展. 2022(07): 1470-1485 . 本站查看
9. 王宏琳,杨丹,聂铁铮,寇月. 自注意力机制的属性异构信息网络嵌入的商品推荐. 计算机研究与发展. 2022(07): 1509-1521 . 本站查看
10. 盛妍,朱龙珠,丁毛毛,刘鲲鹏,刘海龙. 面向电力服务情绪识别的图卷积网络方法研究. 电子器件. 2022(04): 959-963 . 百度学术
11. 李琳,梁永全,刘广明. 基于重启随机游走的图自编码器. 计算机应用研究. 2021(10): 3009-3013 . 百度学术
其他类型引用(20)
计量
- 文章访问数: 1863
- HTML全文浏览量: 1
- PDF下载量: 841
- 被引次数: 31