Abstract:
In many cases, outliers are more important than the normal data, as they may demonstrate either deviant behavior, or the beginning of a new pattern, they may be cause damage to user. Outlier detection has become an important branch of data mining. In this paper, a new generalized method measuring the difference of two objects with mixed attributes is presented, and the weighted power mean is introduced to data mining. Based on these, a new outlier detection approach based on the nearest neighborhood is proposed. The approach measures outlier degree of an object by generalized local outlier factor (GLOF), and detects outlier by the rule of “Bσ”; also it needn't threshold or the prior knowledge about the number of outlier in dataset. GLOF generalizes LOF (local outlier factor) and COF (connectivity-based outlier factor). The theoretic analysis finds out some interesting properties of GLOF. The experimental results show that:(1) The definition about the difference of two objections can be used to dataset with ixed attributes. (2) In some cases GLOF measures the local outlier more accurately than LOF,CBLOF,RNN do. (3) The rule of “Bσ” is simple and promising in practice.