Abstract:
State-of-the-art intrusion detection schemes for unknown attacks employ machine learning techniques to identify anomaly features within network traffic data. However, due to the lack of enough training set, the difficulty of selecting features quantitatively and the dynamic change of unknown attacks, the existing schemes cannot detect unknown attacks effectually. To address this issue, an intrusion detection scheme based on semi-supervised learning and information gain ratio is proposed. In order to overcome the limited problem of training set in the training period, the semi-supervised learning algorithm is used to obtain large-scale training set with a small amount of labelled data. In the detection period, the information gain ratio is introduced to determine the impact of different features and weight voting to infer the final output label to identify unknown attacks adaptively and quantitatively, which can not only retain the information of features at utmost, but also adjust the weight of single decision tree adaptively against dynamic attacks. Extensive experiments indicate that the proposed scheme can quantitatively analyze the important network traffic features of unknown attacks and detect them by using a small amount of labelled data with no less than 91% accuracy and no more than 5% false negative rate, which have obvious advantages over existing schemes.