Citation: | Cheng Yudong, Zhou Fang. Semi-Supervised Learning-Based Method for Unknown Anomaly Detection[J]. Journal of Computer Research and Development, 2024, 61(7): 1670-1680. DOI: 10.7544/issn1000-1239.202330627 |
Anomaly detection aims to identify data that deviates from expected behavior patterns. Despite the potential of semi-supervised anomaly detection methods in enhancing detection accuracy by utilizing a limited amount of labeled data as prior knowledge, the labeled anomalies (i.e., seen anomalies) acquired are unlikely to cover all types of anomalies. In real-world scenarios, novel types of anomalies (i.e., unseen anomalies) often emerge, which may exhibit distinct characteristics from the known anomalies, thereby rendering them challenging to detect using existing semi-supervised anomaly detection methods. To address this issue, we propose a semi-supervised unknown anomaly detection (SSUAD) method, aimed at simultaneously identifying both known and unseen anomalies. This method utilizes a closed-set classifier for the classification of known anomalies and normal instances, and an unknown anomaly detector for the detection of unseen anomalies. Moreover, considering the extreme imbalance between anomalies and normal instances in the anomaly detection scenario, we design an effective data augmentation strategy to increase the number of anomaly samples. Experiments are conducted on UNSW-NB15 and KDDCUP99 datasets, as well as a real-world dataset SQB. The results reveal that, compared with existing anomaly detection methods, SSUAD exhibits significant improvement in the anomaly detection performance metrics AUC-ROC and AUC-PR, thereby verifying the effectiveness and reasonableness of the proposed method.
[1] |
Dal Pozzolo A, Boracchi G, Caelen O, et al. Credit card fraud detection: A realistic modeling and a novel learning strategy[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(8): 3784−3797
|
[2] |
Liao H J, Lin C H R, Lin Y C, et al. Intrusion detection system: A comprehensive review[J]. Journal of Network and Computer Applications, 2013, 36(1): 16−24 doi: 10.1016/j.jnca.2012.09.004
|
[3] |
Fernandes G, Rodrigues J J P C, Carvalho L F, et al. A comprehensive survey on network anomaly detection[J]. Telecommunication Systems, 2019, 70(3): 447−489 doi: 10.1007/s11235-018-0475-8
|
[4] |
Pang Guansong, Shen Chunhua, Cao Longbing, et al. Deep learning for anomaly detection: A review[J]. ACM Computing Surveys, 2021, 54(2): 1−38
|
[5] |
Ding Kaize, Zhou Qinghai, Tong Hanghang, et al. Few-shot network anomaly detection via cross-network meta-learning[C]// Proc of the 30th Int Conf on World Wide Web. New York: ACM, 2021: 2448−2456
|
[6] |
Pang Guansong, Cao Longbing, Chen Ling, et al. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection[C]// Proc of the 24th Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2018: 2041−2050
|
[7] |
Huang Junkai, Fang Chaowei, Chen Weikai, et al. Trash to treasure: Harvesting OOD data with cross-modal matching for open-set semi-supervised learning[C]// Proc of the 18th IEEE/CVF Int Conf on Computer Vision(ICCV). Piscataway, NJ: IEEE, 2021: 8310−8319
|
[8] |
Li C L , Sohn K, Yoon J, et al. Cutpaste: Self-supervised learning for anomaly detection and localization[C]//Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 9664−9674
|
[9] |
Meng Debin, Peng Xiaojiang, Wang Kai, et al. Frame attention networks for facial expression recognition in videos[C]//Proc of the 28th IEEE Int Conf on Image Processing (ICIP). Piscataway, NJ: IEEE, 2019: 3866−3870
|
[10] |
Campos G O, Zimek A, Sander J, et al. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study[J]. Data Mining and Knowledge Discovery, 2016, 30(4): 891−927 doi: 10.1007/s10618-015-0444-8
|
[11] |
Li K L, Huang H K, Tian S F, et al. Improving one-class SVM for anomaly detection[C]//Proc of the 2003 Int Conf on Machine Learning and Cybernetics. Piscataway, NJ: IEEE, 2003: 3077–3081
|
[12] |
Pang Guansong, Shen Chunhua, Van Den Hengel A. Deep anomaly detection with deviation networks[C]//Proc of the 25th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2019: 353−362
|
[13] |
Pang Guansong, Shen Chunhua, Jin Huidong, et al. Deep weakly-supervised anomaly detection[C]//Proc of the 29th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2023: 1795−1807
|
[14] |
Zhang Yalin, Li Longfei, Zhou Jun, et al. Anomaly detection with partially observed anomalies[C]//Proc of the 27th Int Conf on World Wide Web. New York: ACM, 2018: 639−646
|
[15] |
Ruff L, Vandermeulen R A, Görnitz N, et al. Deep semi supervised anomaly detection[C/OL]//Proc of the 8th Int Conf on Learning Representations. 2020[2023-06-11]. https://openreview.net/pdf?id=HkgH0TEYwH
|
[16] |
Zhou Yingjie, Song Xucheng, Zhang Yanru, et al. Feature encoding with autoencoders for weakly supervised anomaly detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2454−2465
|
[17] |
Li Zhe, Sun Chunhua, Liu Chunli, et al. Dual-MGAN: An efficient approach for semi-supervised outlier detection with few identified anomalies[J]. ACM Transactions on Knowledge Discovery from Data, 2022, 16(6): 1−30
|
[18] |
Zong Weixian, Zhou Fang, Pavlovski M, et al. Peripheral instance augmentation for end-to-end anomaly detection using weighted adversarial learning[C]//Proc of the 27th Int Conf on Database Systems for Advanced Applications. Berlin: Springer, 2022: 506−522
|
[19] |
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321−357
|
[20] |
Zhang Hongyi, Cisse M, Dauphin Y N, et al. Mixup: Beyond empirical risk minimization[C/OL]//Proc of the 6th Int Conf on learning Representations. 2018[2023-05-30]. https://openreview.net/pdf?id=r1Ddp1-Rb
|
[21] |
Chen Yanbei, Zhu Xiatian, Li Wei, et al. Semi-supervised learning under class distribution mismatch[C]//Proc of the 14th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 3569−3576
|
[22] |
Yu Qing, Ikami D, Irie G, et al. Multi-task curriculum framework for open-set semi-supervised learning[C]//Proc of the 16th European Conf on Computer Vision (ECCV 2020). Berlin: Springer, 2020: 438−454
|
[23] |
Huang Zhuo, Yang Jian, Gong Chen. They are not completely useless: Towards recycling transferable unlabeled data for class-mismatched semi-supervised learning[J]. IEEE Transactions on Multimedia, 2022, 25: 1844−1857
|
[24] |
Moustafa N, Slay J. UNSW-NB15: A comprehensive data set for network intrusion detection systems[C/OL]//Proc of the Conf on Military Communications and Information Systems Conf. 2015[2023-07-09]. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7348942
|
[25] |
Tavallaee M, Bagheri E, Lu Wei, et al. A detailed analysis of the KDD CUP 99 data set[C/OL]//Proc of the IEEE Symp on Computational Intelligence for Security and Defense Applications. Piscataway, NJ: IEEE, 2009[2023-04-20]. https://ieeexplore.ieee.org/abstract/document/5356528
|
[26] |
Liu F T, Ting K M, Zhou Zhihua. Isolation-based anomaly detection[J]. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1): 1−39
|
[27] |
Han Hui, Wang Wenyuan, Mao Binghuan. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[C]//Proc of the Int Conf on Intelligent Computing. Berlin: Springer, 2005: 878−887
|
[28] |
Douzas G, Bacao F, Last F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information Sciences, 2018, 465: 1−20 doi: 10.1016/j.ins.2018.06.056
|
[1] | Wu Jingya, Lu Wenyan, Yan Guihai, Li Xiaowei. HyperTree: High Concurrent B+tree Index Accelerator[J]. Journal of Computer Research and Development, 2023, 60(7): 1661-1677. DOI: 10.7544/issn1000-1239.202111055 |
[2] | Yang Yongpeng, Jiang Dejun. A Method for Solving the wandering B+ tree Problem[J]. Journal of Computer Research and Development, 2023, 60(3): 539-554. DOI: 10.7544/issn1000-1239.202220555 |
[3] | Yan Wei, Zhang Xingjun, Ji Zeyu, Dong Xiaoshe, Ji Chenzhao. One-Direction Shift B+-Tree Based on Persistent Memory[J]. Journal of Computer Research and Development, 2021, 58(2): 371-383. DOI: 10.7544/issn1000-1239.2021.20200403 |
[4] | Te Rigen, Li Wei, and Li Xiongfei. Storage Model and Implementation of the Dynamic Ordered Tree[J]. Journal of Computer Research and Development, 2013, 50(5): 969-985. |
[5] | Shen Yan, Song Shunlin, Zhu Yuquan. Mining Algorithm of Association Rules Based on Disk Table Resident FP-TREE[J]. Journal of Computer Research and Development, 2012, 49(6): 1313-1322. |
[6] | Wang Hongqiang, Li Jianzhong, and Wang Hongzhi. Processing XPath over F&B-Index[J]. Journal of Computer Research and Development, 2010, 47(5): 866-877. |
[7] | Zhou Da, Liang Zhichao, Meng Xiaofeng. HF-Tree: An Update-Efficient Index for Flash Memory[J]. Journal of Computer Research and Development, 2010, 47(5): 832-840. |
[8] | Sun Xiaojuan, Sun Ninghui, Chen Mingyu. Optimization of B-NIDS for Multicore[J]. Journal of Computer Research and Development, 2007, 44(10): 1733-1740. |
[9] | Ju Dapeng, Li Ming, Hu Jinfeng, Wang Dongsheng, Zheng Weimin, and Ma Yongquan. An Algorithm of B\++ Tree Management in P2P Environment[J]. Journal of Computer Research and Development, 2005, 42(8): 1438-1444. |
[10] | Dong Daoguo, Liang Liuhong, and Xue Xiangyang. VAR-Tree—A New High-Dimensional Data Index Structure[J]. Journal of Computer Research and Development, 2005, 42(1): 10-17. |
1. |
LUO Haoran,HU Shuisong,WANG Wenyong,TANG Yuke,ZHOU Junwei. Research on Multi-Core Processor Analysis for WCET Estimation. ZTE Communications. 2024(01): 87-94 .
![]() |