Abstract:
Mining rich semantic information hidden in heterogeneous information network is an important task in data mining. The value, data distribution and generation mechanism of outliers are all different from that of normal data. It is of great significance of analyzing its generation mechanism or even eliminating outliers. Outlier detection in homogeneous information network has been studied and explored for a long time. However, few of them are aiming at dynamic outlier detection in heterogeneous networks. Many issues need to be settled. Due to the dynamics of the heterogeneous information network, normal data may become outliers over time. This paper proposes a dynamic tensor representation based outlier detection method, called TRBOutlier. It constructs tensor index tree according to the high order data represented by tensor. The features are added to direct item set and indirect item set respectively when searching the tensor index tree. Meanwhile, we describe a clustering method based on the correlation of short texts to judge whether the objects in datasets change their original clusters and then detect outliers dynamically. This model can keep the semantic relationship in heterogeneous networks as much as possible in the case of fully reducing the time and space complexity. The experimental results show that our proposed method can detect outliers dynamically in heterogeneous information network effectively and efficiently.