ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (8): 1677-1685.doi: 10.7544/issn1000-1239.2019.20190150

所属专题: 2019人工智能前沿进展专题

• 人工智能 • 上一篇    下一篇

样本加权的多视图聚类算法

洪敏1,2,贾彩燕1,2,李亚芳3,于剑1,2   

  1. 1(交通数据分析与挖掘北京市重点实验室(北京交通大学) 北京 100044);2(北京交通大学计算机与信息技术学院 北京 100044);3(北京工业大学信息学部 北京 100124) (16120372@bjtu.edu.cn)
  • 出版日期: 2019-08-01
  • 基金资助: 
    国家自然科学基金项目(61876016,61632004);中央高校基本科研业务费专项资金项目(2018JBZ006)

Sample-Weighted Multi-View Clustering

Hong Min1,2, Jia Caiyan1,2, Li Yafang3, Yu Jian1,2   

  1. 1(Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044);2(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044);3(Faculty of Information Technology, Beijing University of Technology, Beijing 100124)
  • Online: 2019-08-01

摘要: 大数据时代,人类收集、存储、传输、管理数据的能力日益提高,各行各业已经积累了大量的数据资源,这些数据常呈现出多源性和异构性.如何对这些多源数据进行有效的聚类(也称为多视图聚类)已成为当今机器学习研究关注的焦点之一.现有的多视图聚类算法主要从“全局”角度关注不同视图和特征对簇结构的贡献,没有考虑不同样本间存在的“局部”信息间的差异.因此,提出一种新的多视图样本加权聚类算法(sample-weighted multi-view clustering, SWMVC),该算法对每个样本的不同视图进行加权,采用交替方向乘子法自适应学习样本权值,不仅可以学习不同样本点间不同视图权重的“局部”差异,还可以从学习到的“局部”差异反映出不同视图对簇结构贡献的“全局”差异,具有较好的灵活性.多个数据集上的实验表明:SWMVC方法在异质视图数据上具有较好的聚类效果.

关键词: 数据挖掘, 多视图, 聚类, K-means, 样本权重

Abstract: In the era of big data, the ability of humans to collect, store, transmit and manage data has been increasingly improved. Various industries have accumulated a large amount of data resources, which are often multi-source and heterogeneous. How to effectively cluster these multi-source data (also known as multi-view clustering) has become one of the focuses of today’s machine learning research. The existing multi-view clustering algorithms mainly pay attention to the contribution of different views and features to the cluster structure from the “global” perspective, without considering the “local” information complementary differences between different samples. Therefore, this paper proposes a new sample-weighted multi-view clustering (SWMVC). The method weights each sample with different views and adopts alternating direction method of multipliers (ADMM) to learn sample weight, which can not only learn the “local” difference of weights among multiple views in different sample points, but also reflect the “global” difference of the contribution of different views to the cluster structure, and has better flexibility. Experiments on multiple datasets show that the SWMVC method has a better clustering effect on heterogeneous view data.

Key words: data mining, multi-view, cluster, K-means, sample weights

中图分类号: