Abstract:
With rapid growth of data, data mining becomes more and more important. Detecting outlier is one of the very important data mining techniques, which is to find exceptional objects that deviate from the most rest of the data set. There are two kinds of outliers: global outliers and local outliers. In many scenarios, the detection of local outliers is more valuable than that of global outliers. The LOF algorithm is a very distinguished local outlier detecting algorithm, which assigns each object an outlier-degree value. However, when the outlier-degree value is calculated, the algorithm should equally consider all attributes. In fact, different attributes have different effects. The attributes with more large effects are known as outlier attributes. In this paper, a density-based local outlier detecting algorithm (DLOF) is proposed, which educes outlier attributes of each data object by information entropy. The weighted distance is introduced to calculate the distance of two data object, which those outlier attributes are assigned with bigger weight. So the algorithm improves outlier detection accuracy. In addition, when the local outlier factors are calculated, we present our two improvements of the algorithm and their time complexity analysis. Theoretical analysis and experimental results show that DLOF is efficient and effective.