Abstract:
As its application in credit card fraud detection and many other fields, more and more scholars are paying attention to the classification for concept drifting data streams. Most existing algorithms assume that the true labels of the testing instances can be accessed right after they are classified, and utilize them to detect concept drift and adjust current model. It is an impractical assumption in real-world because manual labeling of instances which arrive continuously at a high speed requires a lot of time and effort. For the problem mentioned above, this paper proposes a concept drift detection method based on KNNModel algorithm and incremental Bayes algorithm which is called KnnM-IB. The proposed method has the virtue of the KNNModel algorithm when classifying instances covered by the model clusters. In addition, the incremental Bayes algorithm is used to handle the confused instances and update the model. Using the change of the window size and the few labeled most informative instances which are chosen by active learning, the KnnM-IB algorithm can detect the concept drift on data streams. Semi-supervised learning technology is also used to increase the number of the labeled instances to update the model when the underlying concept of the data streams is stable. Experimental results show that compared with the traditional classification algorithms, the proposed method not only adapts to the situation of concept drift, but also acquires the comparable or better classification accuracy.