高级检索
    杨欣欣, 黄少滨. 高阶异构数据层次联合聚类算法[J]. 计算机研究与发展, 2015, 52(1): 200-210. DOI: 10.7544/issn1000-1239.2015.20130493
    引用本文: 杨欣欣, 黄少滨. 高阶异构数据层次联合聚类算法[J]. 计算机研究与发展, 2015, 52(1): 200-210. DOI: 10.7544/issn1000-1239.2015.20130493
    Yang Xinxin, Huang Shaobin. A Hierarchical Co-Clustering Algorithm for High-Order Heterogeneous Data[J]. Journal of Computer Research and Development, 2015, 52(1): 200-210. DOI: 10.7544/issn1000-1239.2015.20130493
    Citation: Yang Xinxin, Huang Shaobin. A Hierarchical Co-Clustering Algorithm for High-Order Heterogeneous Data[J]. Journal of Computer Research and Development, 2015, 52(1): 200-210. DOI: 10.7544/issn1000-1239.2015.20130493

    高阶异构数据层次联合聚类算法

    A Hierarchical Co-Clustering Algorithm for High-Order Heterogeneous Data

    • 摘要: 在实际应用中,包含多种特征空间信息的高阶异构数据广泛出现.由于高阶联合聚类算法能够有效融合多种特征空间信息提高聚类效果,近年来逐渐成为研究热点.目前高阶联合聚类算法多数为非层次聚类算法.然而,高阶异构数据内部往往隐藏着层次聚簇结构,为了更有效地挖掘数据内部隐藏的层次聚簇模式,提出了一种高阶层次联合聚类算法(high-order hierarchical co-clustering algorithm, HHCC).该算法利用变量相关性度量指标Goodman-Kruskal τ衡量对象变量和特征变量的相关性,将相关性较强的对象划分到同一个对象聚簇中,同时将相关性较强的特征划分到同一个特征聚簇中.HHCC算法采用自顶向下的分层聚类策略,利用指标Goodman-Kruskal τ评估每层对象和特征的聚类质量,利用局部搜索方法优化指标Goodman-Kruskal τ,自动确定聚簇数目,获得每层的聚类结果,最终形成树状聚簇结构.实验结果表明HHCC算法的聚类效果优于4种经典的同构层次聚类算法和5种已有的非层次高阶联合聚类算法.

       

      Abstract: The availability of high-order heterogeneous data represented with multiple features coming from heterogeneous domains is getting more and more common in real world application. High-order co-clustering algorithms can fuse multiple feature space information to improve clustering results effectivity, so recently it is becoming one of the hottest research topics. Most existing high-order co-clustering algorithms are non-hierarchical clustering algorithms. However, there are always hierarchical cluster structures hidden in high-order heterogeneous data. In order to mine the hidden patterns in datasets more effectively, we develop a high-order hierarchical co-clustering algorithm (HHCC). Goodman-Kruskal τ is used to measure the association of objects and features, which is an index measuring association of categorical variables. The objects which are strong association are partitioned into the same objects clusters, and simutaneously the features which are strong association are partitioned into the same features clusters too. HHCC algorithm uses Goodman-Kruskal τ to quantify the quality of clustering results of objects and features of every level. According to optimizing Goodman-Kruskal τ by a locally search approach, the number of clusters is automatically determined and clustering results of every hierarchy are obtained. The top-down strategy is adopted and a tree-like cluster structure is formed at last. Experimental results demonstrate that HHCC algorithm outperforms four classical homogeneous hierarchical algorithms and five previous high-order co-clustering algorithms.

       

    /

    返回文章
    返回