高级检索

    一种基于关键域子空间的离群数据聚类算法

    An Algorithm for Clustering of Outliers Based on Key Attribute Subspace

    • 摘要: 离群数据发现与分析是数据挖掘的重要组成部分,现有离群数据挖掘算法主要针对如何检测离群对象,缺乏对挖掘出的离群数据集进行解释与分析的有效方法.通过对离群数据来源及特性进行分析并结合粗糙集理论,定义了离群划分相似度的概念,提出了一种基于关键属性域子空间的离群数据聚类算法COKAS,该算法不仅揭示了离群数据子空间特性,进一步获取了扩展知识,而且有助于对整体数据集的理解.对两个多维数据集的实验结果表明,该算法具有良好的适应性及有效性.

       

      Abstract: It is an important part of data mining to discover and analyze outlying observations. Outliers may contain crucial information, and so detecting them is much more significant than detecting general patterns in some applications which include, for instance, credit card fraud in finance, calling fraud in telecommunication, intrusion in network, disease diagnosis, etc. Existing outlier mining algorithms focus on detecting and identifying outliers, but studies of outliers include both mining outliers and analyzing why they are exceptional. The research on explaining and analyzing outliers slightly lags behind outlier mining technology now. It is inevitable that analyzing outliers to the full needs a great deal of knowledge from object task fields. However, some further discoveries of outliers may be obtained from studies of distributing characteristics of dataset in attribute space. By analyzing the origin and feature of outliers and using the theory of rough set, a concept of outlying partition similarity is defined and then an algorithm for clustering outliers based on key attribute subspace (COKAS) is proposed. The approach can provide the extended knowledge of identified outliers and improve the understanding of the whole data set. Experimental results of real multi-dimension data set show that this algorithm is scalable and efficient.

       

    /

    返回文章
    返回