ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (8): 1729-1739.doi: 10.7544/issn1000-1239.2016.20160178

所属专题: 2016数据挖掘前沿技术专题

• 人工智能 • 上一篇    下一篇

异质网中基于张量表示的动态离群点检测方法

刘露1,左万利1,2,彭涛1,2   

  1. 1(吉林大学计算机科学与技术学院 长春 130012); 2(符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012) (liulu12@mails.jlu.edu.cn)
  • 出版日期: 2016-08-01
  • 基金资助: 
    国家自然科学基金项目(60903098);吉林省工业技术研究和开发项目(JF2012c016-2);吉林大学研究生创新基金项目(2015040)

Tensor Representation Based Dynamic Outlier Detection Method in Heterogeneous Network

Liu Lu1, Zuo Wanli1,2,Peng Tao1,2   

  1. 1(College of Computer Science and Technology, Jilin University, Changchun 130012);2(Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012)
  • Online: 2016-08-01

摘要: 挖掘隐藏在异质信息网络中丰富的语义信息是数据挖掘的重要任务之一.离群点在值、数据分布、和产生机制上都明显不同于正常数据对象.检测离群点并分析其不同的产生机制,最终消除离群点具有重要的现实意义.目前,针对异质信息网络动态离群点检测的研究工作相对较少,还有很多问题有待解决.由于异质信息网络的动态性,随着时间的变化,正常数据对象也可能转变为离群点.针对异质网络提出一种基于张量表示的动态离群点检测方法(TRBOutlier),并根据张量表示的高阶数据构建张量索引树.通过搜索张量索引树,将特征加入到直接项集和间接项集中.同时,根据基于短文本相关性的聚类方法来判断数据集中的数据对象是否偏离其原聚簇来动态检测网络中的离群点.该模型能够在充分降低时间和空间复杂度的条件下保留异质网络中的语义信息.实验结果表明:该方法能够快速有效地进行异质网络环境下的动态离群点检测.

关键词: 动态离群点检测, 异质信息网络, 张量表示, 张量索引树, 聚类

Abstract: Mining rich semantic information hidden in heterogeneous information network is an important task in data mining. The value, data distribution and generation mechanism of outliers are all different from that of normal data. It is of great significance of analyzing its generation mechanism or even eliminating outliers. Outlier detection in homogeneous information network has been studied and explored for a long time. However, few of them are aiming at dynamic outlier detection in heterogeneous networks. Many issues need to be settled. Due to the dynamics of the heterogeneous information network, normal data may become outliers over time. This paper proposes a dynamic tensor representation based outlier detection method, called TRBOutlier. It constructs tensor index tree according to the high order data represented by tensor. The features are added to direct item set and indirect item set respectively when searching the tensor index tree. Meanwhile, we describe a clustering method based on the correlation of short texts to judge whether the objects in datasets change their original clusters and then detect outliers dynamically. This model can keep the semantic relationship in heterogeneous networks as much as possible in the case of fully reducing the time and space complexity. The experimental results show that our proposed method can detect outliers dynamically in heterogeneous information network effectively and efficiently.

Key words: dynamic outlier detection, heterogeneous information network, tensor representation, tensor index tree, clustering

中图分类号: