Abstract:
Outlier mining is an important branch in the area of data mining. It has been widely applied to many fields such as industrial and financial applications for IDS and detecting credit card fraud. Dealing with massive and high dimensional data has become tasks and challenges for outlier algorithm to be faced. Based on the definitions of density and grid, a fast incremental outlier mining algorithm is proposed. It introduces seven-tuple information grid to reduce the number and dimension of data, and use incremental updates to reduce memory requirements. Dense grid, sparseness grid and neighbor grid are defined, which could make computation deal with grid conveniently. Through the appropriate representative point filtering the main data, an approximate method to reduce computation and decrease the complexity of the algorithm is adopted. The experiments are performed on different initial datasets and incremental datasets. And the results demonstrate the detection rate, false rate alarm rate, precisions and average running time. The real and simulated data sets of tests show that the proposed algorithm can maintain the same accuracy with LOF algorithm, but the implementation efficiency is improved significantly.