ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (8): 1781-1791.doi: 10.7544/issn1000-1239.2016.20160223

Special Issue: 2016数据挖掘前沿技术专题

Previous Articles     Next Articles

Online Transfer Learning for Mining Recurring Concept in Data Stream Classification

Wen Yimin1,2,3, Tang Shiqi1, Feng Chao1,Gao Kai4   

  1. 1(School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004);2(Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), Guilin, Guangxi 541004);3(Guangxi Experiment Center of Information Science (Guilin University of Electronic Technology), Guilin, Guangxi 541004);4(School of Information Science & Engineering, Hebei University of Science and Technology, Shijiazhuang 050018)
  • Online:2016-08-01

Abstract: At the age of big data, data stream classification is being applied to many fields, like spam filtering, market predicting, and weather forecasting, et al, in which recurring concept is an important character. Aiming to reduce the influence of negative transfer and improve the lag of detection of concept drift, RC-OTL is proposed for mining recurring concepts in data stream based on online transfer learning strategy. When a concept drift is detected, RC-OTL selects one current base classifier to store, and then computes the domain similarities between the current training samples and the stored classifiers, in order to select the most appropriate source classifier to combine with a new classifier for learning the upcoming samples, which results in knowledge transfer from the source domain to the target domain. In addition, RC-OTL can select appropriate classifier to classify when the current classification accuracy is detected below a given threshold before concept drift detection. The preliminary theory analysis explains why RC-OTL can reduce negative transfer effectively, and the experiment results further illustrates that RC-OTL can efficiently promote the cumulate accuracy of data stream classification, and faster adapt to the samples of new concept after concept drift takes place.

Key words: concept drift, transfer learning, recurring concept, online learning, negative transfer

CLC Number: