ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (6): 1325-1337.doi: 10.7544/issn1000-1239.2019.20180737

• 人工智能 • 上一篇    下一篇

基于分类型矩阵对象数据的MD fuzzy k-modes聚类算法


  1. 1(山西大学数学科学学院 太原 030006);2(山西大学计算机与信息技术学院 太原 030006) (
  • 出版日期: 2019-06-01
  • 基金资助: 

A MD fuzzy k-modes Algorithm for Clustering Categorical Matrix-Object Data

Li Shunyong1, Zhang Miaomiao1, Cao Fuyuan2   

  1. 1(School of Mathematical Sciences, Shanxi University, Taiyuan 030006);2(School of Computer and Information Technology, Shanxi University, Taiyuan 030006)
  • Online: 2019-06-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61573229), the Shanxi Provincial Basic Research Foundation of China (201701D121004), the Shanxi Scholarship Council of China (2017-020), and the Shanxi Provincial Teaching Reform and Innovation Program in Higher Education (J2017002).

摘要: 传统的聚类算法一般是对单值属性数据进行聚类.但在许多实际应用中,每个对象通常被多个特征向量所描述.例如,顾客在购物时可能同时购买多个产品.由多个特征向量描述的对象称为矩阵对象,由矩阵对象构成的数据集称为矩阵对象数据集.目前,针对矩阵对象数据聚类算法的研究相对较少,还有很多问题有待解决.利用fuzzy k-modes算法的聚类过程,提出一种基于矩阵对象数据的matrix-object data fuzzy k-modes(MD fuzzy k-modes)聚类算法.该算法结合模糊集的概念引入模糊因子β,重新定义了矩阵对象间的相异性度量,并给出类中心的启发式更新算法.最后,在5个真实数据集上验证了MD fuzzy k-modes算法的有效性,并分析了模糊因子β与隶属度w之间的关系.大数据时代,利用MD fuzzy k-modes算法对多条记录进行聚类,能更易发现顾客的消费偏好,从而做出更有针对性的推荐.

关键词: 矩阵对象数据, MD fuzzy k-modes算法, 相异性度量, 类中心, 聚类

Abstract: Traditional algorithms generally cluster single-valued attributed data. However, in practice, each attribute of the data object is described by more than one feature vector. For example, customers may purchase multiple products at the same time as they shop. An object described by multiple feature vectors is called a matrix object and such data are called matrix-object data. At present, the research work on clustering algorithms for categorical matrix- object data is relatively rare, and there are still many issues to be settled. In this paper, we propose a new matrix-object data fuzzy k-modes (MD fuzzy k-modes) algorithm that uses the fuzzy k-modes clustering process to cluster categorical matrix-object data. In the proposed algorithm, we introduce the fuzzy factor β with the concept of fuzzy set. The dissimilarity measure between two categorical matrix-objects is redefined, and the heuristic updating algorithm of the cluster centers is provided. Finally, the effectiveness of the MD fuzzy k-modes algorithm is verified on the five real-world data sets, and the relationship between fuzzy factor β and membership w is analyzed. Therefore, in the era of big data, clustering multiple records by using the MD fuzzy k-modes algorithm can make it easier to find customers’ spending habits and preferences, so as to make more targeted recommendation.

Key words: matrix-object data, MD fuzzy k-modes algorithm, dissimilarity measure, cluster centers, clustering