ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (12): 2673-2682.doi: 10.7544/issn1000-1239.2020.20190691

Previous Articles     Next Articles

A Classification Approach Based on Divergence for Network Traffic in Presence of Concept Drift

Cheng Guang1,2,3, Qian Dexin1,2,3, Guo Jianwei4, Shi Haibin1,2,3, Hua1,2,3, Zhao Yuyu1,2,3   

  1. 1(School of Cyber Science and Engineering, Southeast University, Nanjing 211189);2(Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing 211189);3(Jiangsu Provincial Key Laboratory of Computer Network Technology (Southeast University), Nanjing 211189);4(Xi’an Research Institute, Huawei Technologies Co., Ltd., Xi’an 710075)
  • Online:2020-12-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFB1800602, 2017YFB0801703), the Ministry of Education-China Mobile Research Fund Project (MCM20180506), the National Natural Science Foundation of China (61602114), and the CERNET Innovation Project (NGIICS20190101, NGII20170406).

Abstract: Due to the high dynamic variability, suddenness and irreversibility of network traffic, the statistical characteristics and distribution of traffic may change dynamically, resulting in a concept drift problem based on the flow-based machine learning method. The problem of concept drift makes the classification model based on the original data set worse on the new sample, which causes the classification accuracy to decrease. Based on this, a classification approach based on divergence for network traffic in presence of concept drift, named ECDD (ensemble classification based on divergence detection) is proposed. The method uses a double-layer window mechanism to track the concept drift. From the perspective of information entropy, the Jensen-Shannon divergence is used to measure the difference of data distribution between old and new windows, so as to effectively detect the concept drift. This paper draws on the idea of incremental ensemble learning, trains a new classifier on the concept drift traffic based on the pre-retention classifier, and replaces the classifier with the original performance degradation according to the classifier weight, so that the ensemble classifier is effectively updated. For common network application traffic, this paper constructs a concept drift data set according to different application feature distributions. This paper compares the method with common concept drift detection methods and the experimental results show that the method can effectively detect concept drift and update the classifier, showing better classification performance.

Key words: concept drift, machine learning, Jensen-Shannon divergence, incremental ensemble learning, traffic classification

CLC Number: