Citation: | Fu Hao, Long Chun, Gong Liangyi, Wei Jinxia, Huang Pan, Lin Yanzhong, Sun Degang. Malicious Domain Detection Technology Based on Semantic Graph Learning[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440375 |
Malicious domain name detection is a critical component of network intrusion detection systems, enabling the rapid identification of network attacks through domain name requests. Machine learning methods overcome the limitations of blacklist mechanisms and improve detection accuracy. However, challenges such as the high variability of domain name structures and the complexity of real-world environments lead to low detection efficiency and poor robustness in practical applications. To address these issues, a malicious domain name detection technology based on domain name semantic graph learning is proposed, leveraging semantic graph association analysis for efficient detection. Specifically, 12 months of domain request data from China Science and Technology Network is first collected, encompassing 3.33 billion access records, including more than 6.5 million malicious domain name entries across 284 attack types. Semantic analysis reveals significant differentiation between domain categories, yet considerable feature overlap in certain regions degrades classifier performance. To tackle this, a domain association graph model based on character-level semantic similarity is proposed. By integrating features of neighboring domains, the model enhances semantic representations in overlapping regions, thereby improving detection performance. The method includes filtering noise characters through structural similarity analysis, constructing a dynamic domain semantic graph using an online aggregation algorithm, and training a multi-head attention-based message-passing graph model with node-degree-weighted samples. Finally, a multi-layer neural network classifier is employed for malicious domain detection. Experimental results demonstrate that the proposed method achieves an average precision rate of 96% and a recall rate of 97% on the dataset of different types of malicious domain names. Furthermore, the model exhibits strong online adaptability, achieving high detection rate and robustness.
[1] |
Versign. Domain names: Introducing the all new dnib. com [EB/OL]. (2024-12-07)[2024-12-25]. https://www.verisign.com/en_US/domain-names/dnib/index.xhtml
|
[2] |
章坚武,安彦军,邓黄燕. DNS攻击检测与安全防护研究综述[J]. 电信科学,2022,38(9):1−17
Zhang Jianwu, An Yanjun, Deng Huangyan. A survey on DNS attack detection and security protection[J]. Telecommunications Science, 2022, 38(9): 1−17(in Chinese)
|
[3] |
Porras P, Saïdi H, Yegneswaran V. A foray into conficker's logic and rendezvous points[C/OL] //Proc of the 2nd USENIX Conf on Large-scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More. Berkeley, CA: USENIX Association, 2009[2025-01-22]. https://dl.acm.org/doi/10.5555/1855676.1855683
|
[4] |
Gong Liangyi, Li Zhenhua, Wang Hongyi, et al. Overlay-based android malware detection at market scales: Systematically adapting to the new technological landscape[J]. IEEE Transactions on Mobile Computing, 2021, 21(12): 4488−4501
|
[5] |
赵凡,赵宏,常兆斌. 基于迁移学习的小样本恶意域名检测[J]. 计算机工程与设计,2022,43(12):3381−3387
Zhao Fan, Zhao Hong, Chang Zhaobin. Small sample malicious domain names detection method based on transfer learning[J]. Computer Engineering and Design, 2022, 43(12): 3381−3387 (in Chinese)
|
[6] |
Gong Liangyi, Li Zhenhua, Qian Feng, et al. Experiences of landing machine learning onto market-scale mobile malware detection[C/OL] //Proc of the 15th European Conf on Computer Systems. New York: ACM, 2020[2025-01-22]. https://doi.org/10.1145/3342195.3387530
|
[7] |
张清,张文川,冉兴程. 基于CNN-BiLSTM和注意力机制的恶意域名检测[J]. 中国电子科学研究院学报,2022,17(9):848−855
Zhang Qing, Zhang Wenchuan, Ran Xingcheng. Malicious domain names detection based on CNN-BiLSTM and attention mechanism[J]. Journal of China Academy of Electronics and Information Technology, 2022, 17(9): 848−855 (in Chinese)
|
[8] |
袁福祥,王琤,刘粉林,等. 基于IP分布及请求响应时间的恶意fast-flux域名检测算法[J]. 信息工程大学学报,2017,18(5):601−606 doi: 10.3969/j.issn.1671-0673.2017.05.017
Yuan Fuxiang, Wang Zheng, Liu Fenlin, et al. Malicious fast-flux domains detection algorithm based on IP distribution and request response time[J]. Journal of Information Engineering University, 2017, 18(5): 601−606 (in Chinese) doi: 10.3969/j.issn.1671-0673.2017.05.017
|
[9] |
彭成维,云晓春,张永铮等. 一种基于域名请求伴随关系的恶意域名检测方法[J]. 计算机研究与发展,2019,56(6):1263−1274 doi: 10.7544/issn1000-1239.2019.20180481
Peng Chengwei, Yun Xiaochun, Zhang Yongzheng, et al. Detecting malicious domains using co-occurrence relation between DNS query[J]. Journal of Computer Research and Development, 2019, 56(6): 1263−1274 (in Chinese) doi: 10.7544/issn1000-1239.2019.20180481
|
[10] |
Gong Liangyi, Lin Hao, Li Zhenhua, et al. Systematically landing machine learning onto market-scale mobile malware detection[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(7): 1615−1628
|
[11] |
Yadav S, Reddy A K K, Reddy A L N, et al. Detecting algorithmically generated malicious domain names[C]//Proc of the 10th ACM SIGCOMM Conf on Int Measurement. New York: ACM, 2010: 48−61
|
[12] |
Cucchiarelli A, Morbidoni C, Spalazzi L, et al. Algorithmically generated malicious domain names detection based on n-grams features[J]. Expert Systems with Applications, 2021, 170: 114554 doi: 10.1016/j.eswa.2020.114554
|
[13] |
Zhao Hong, Chen Zhiwen, Yan Rongjing. Malicious domain names detection algorithm based on statistical features of URLs[C]//Proc of the 25th IEEE Int Conf on Computer Supported Cooperative Work in Design (CSCWD). Piscataway, NJ: IEEE, 2022: 11−16
|
[14] |
Nguyen T D, CAO T D, Nguyen L G. DGA botnet detection using collaborative filtering and density-based clustering[C]//Proc of the 6th Int Symp on Information and Communication Technology. New York: ACM, 2015: 203−209
|
[15] |
Can N V, Tu D N, Tuan T A, et al. A new method to classify malicious domain name using neutrosophic sets in DGA botnet detection[J]. Journal of Intelligent & Fuzzy Systems, 2020, 38(4): 4223−4236
|
[16] |
Bilge L, Sen S, Balzarotti D, et al. Exposure: A passive DNS analysis service to detect and report malicious domains[J]. ACM Transactions on Information and System Security (TISSEC), 2014, 16(4): 1−28
|
[17] |
Manadhata P, Yadav S, Rao P, et al. Detecting malicious domains via graph inference[C]//Proc of the 2014 Workshop on Artificial Intelligent and Security Workshop. New York: ACM, 2014: 59−60
|
[18] |
Sun Xiaoqing, Tong Mingkai, Yang Jiahai, et al. HinDom: A robust malicious domain detection system based on heterogeneous information network with transductive classification[C]// Proc of the 22nd Int Symp on Research in Attacks, Intrusions and Defenses (RAID 2019). Berkeley, CA: USENIX Association, 2019: 399−412
|
[19] |
Cheng Yanan, Chai Tingting, Zhang Zhaoxin, et al. Detecting malicious domain names with abnormal whois records using feature-based rules[J]. The Computer Journal, 2022, 65(9): 2262−2275 doi: 10.1093/comjnl/bxab062
|
[20] |
Antonakakis M, Perdisci R, Nadji Y, et al. From throw-away traffic to bots: Detecting the rise of DGA-based malware[C]//Proc of the 21st USENIX Security Symp (USENIX Security 12). Berkeley, CA: USENIX Association, 2012: 491−506
|
[21] |
Vinayakumar R, Soman K P, Poornachandran P. Detecting malicious domain names using deep learning approaches at scale[J]. Journal of Intelligent & Fuzzy Systems, 2018, 34(3): 1355−1367
|
[22] |
Park K H, Song H M, Do Yoo J, et al. Unsupervised malicious domain detection with less labeling effort[J]. Computers & Security, 2022, 116: 102662
|
[23] |
Ma Donglin, Zhang Shuhuan, Kong Fanqi, et al. Malicious domain name detection based on Doc2Vec and hybrid network[C]//IOP Conf Series: Earth and Environmental Science, 693: Proc of the 8th Annual Int Conf on Geo-Spatial Knowledge and Intelligence. Princeton, NJ: IOP Publishing, 2021: 12089
|
[24] |
Jiang Yanshu, Jia Mingqi, Zhang Biao, et al. Malicious domain name detection model based on CNN-GRU-attention[C]//Proc of the 33rd Chinese Control Adecision Conf (CCDC). Piscataway, NJ: IEEE, 2021: 1602−1607
|
[25] |
Yang Luhui, Liu Guangjie, Dai Yuewei, et al. Detecting stealthy domain generation algorithms using heterogeneous deep neural network framework[J]. IEEE Access, 2020, 8: 82876−82889 doi: 10.1109/ACCESS.2020.2988877
|
[26] |
王伟,罗鹏宇. 基于机器学习建模的DGA恶意域名检测[J]. 通信技术,2022,55(6):753−761 doi: 10.3969/j.issn.1002-0802.2022.06.012
Wang Wei, Luo Pengyu. DGA malicious domain detection based on machine learning modeling[J]. Communications Technology, 2022, 55(6): 753−761(in Chinese) doi: 10.3969/j.issn.1002-0802.2022.06.012
|
[27] |
刘善玲,祁正华. 基于特征多样化的恶意域名检测[J]. 南京邮电大学学报:自然科学版,2021,41(6):95−100
Liu Shanling, Qi Zhenghua. Malicious domain detection based on diversified characteristics[J]. Journal of Nanjing University of Posts and Telecommunications: Natural Science Edition, 2021, 41(6): 95−100(in Chinese)
|
[28] |
蒋鸿玲,戴俊伟. DGA恶意域名检测方法[J]. 北京信息科技大学学报:自然科学版,2019,34(5):45-50
Jiang Hongling, Dai Junwei. DGA malicious domain name detection method[J]. Journal of Beijing Information Science & Technology University: Natural Science Edition, 2019, 34(5): 45-50 (in Chinese)
|
[29] |
张洋,柳厅文,沙泓州,等. 基于多元属性特征的恶意域名检测[J]. 计算机应用,2016,36(4):941−944 doi: 10.11772/j.issn.1001-9081.2016.04.0941
Zhang Yang, Liu Tingwen, Sha Hongzhou, et al. Malicious domain detection based on multiple-dimensional features[J]. Journal of Computer Applications, 2016, 36(4): 941−944(in Chinese) doi: 10.11772/j.issn.1001-9081.2016.04.0941
|
[30] |
Vaswani A, Shazeer N, Paramar N, et al. Attention is all you need[C] //Proc of the 31st Int Conf on Neural Information Processing Systems (NIPS'17). New York: ACM, 2017: 6000−6010
|
[31] |
Yang Luhui, Liu Guangjie, Wang Jinwei, et al. A semantic element representation model for malicious domain name detection[J]. Journal of Information Security and Applications, 2022, 66: 103148 doi: 10.1016/j.jisa.2022.103148
|
[32] |
Mikolov T, Chenkai, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint, arXiv: 1301.378, 2013
|
[33] |
Schüppen S, Teubert D, Herrmann P, et al. FANCI: Feature-based automated NXDomain classification and intelligence[C]//Proc of the 27th USENIX Security Symp (USENIX Security 18). Berkeley, CA: USENIX Association, 2018: 1165−1181
|
[34] |
Xu Congyuan, Shen Jizhong, Du Xin. Detection method of domain names generated by DGAs based on semantic representation and deep neural network[J]. Computers & Security, 2019, 85: 77−88
|
[35] |
Le F, Ortiz J, Verma D, et al. Policy-based identification of IoT devices’ vendor and type by DNS traffic analysis[J/OL]. Policy-Based Autonomic DataGovernance, 2019: 180−201[2025-01-22]. https://doi.org/10.1007/978-3-030-17277-0_10
|
[36] |
魏金侠,龙春,付豪,等. 基于增强嵌入特征超图学习的恶意域名检测方法[J]. 计算机研究与发展,2024,61(9):2334−2346 doi: 10.7544/issn1000-1239.202330117
Wei Jinxia, Long Chun, Fu Hao, et al. Malicious domain name detection method based on enhanced embedded feature hypergraph learning[J]. Journal of Computer Research and Development, 2024, 61(9): 2334−2346 (in Chinese) doi: 10.7544/issn1000-1239.202330117
|
[1] | Wu Zehui, Wei Qiang, Wang Xinlei, Wang Yunchao, Yan Chenyu, Chen Jing. Survey of Automatic Software Vulnerability Exploitation[J]. Journal of Computer Research and Development, 2024, 61(9): 2261-2274. DOI: 10.7544/issn1000-1239.202220410 |
[2] | Li Jinpeng, Zhang Chuang, Chen Xiaojun, Hu Yue, Liao Pengcheng. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21. DOI: 10.7544/issn1000-1239.2021.20190785 |
[3] | Ma Yanchun, Liu Yongjian, Xie Qing, Xiong Shengwu, Tang Lingli. Review of Automatic Image Annotation Technology[J]. Journal of Computer Research and Development, 2020, 57(11): 2348-2374. DOI: 10.7544/issn1000-1239.2020.20190793 |
[4] | Xie Juanying, Hou Qi, Shi Yinghuan, Lü Peng, Jing Liping, Zhuang Fuzhen, Zhang Junping, Tan Xiaoyang, Xu Shengquan. The Automatic Identification of Butterfly Species[J]. Journal of Computer Research and Development, 2018, 55(8): 1609-1618. DOI: 10.7544/issn1000-1239.2018.20180181 |
[5] | Ling Jimin, Zhang Li. An Approach to Automatically Build Customizable Reference Process Models[J]. Journal of Computer Research and Development, 2017, 54(3): 642-653. DOI: 10.7544/issn1000-1239.2017.20151047 |
[6] | You Feng, Zhao Ruilian, Lü Shanshan. Output Domain Based Automatic Test Case Generation[J]. Journal of Computer Research and Development, 2016, 53(3): 541-549. DOI: 10.7544/issn1000-1239.2016.20148045 |
[7] | Hao Fanchang, Luan Junfeng, Zhu Daming, Zhang Peng, and Li Ming. A Faster Algorithm for Sorting Genomes by Reciprocal Translocation, Insertion and Deletion[J]. Journal of Computer Research and Development, 2010, 47(11): 2011-2023. |
[8] | Ma Peijun, Wang Tiantian, and Su Xiaohong. Automatic Grading of Student Programs Based on Program Understanding[J]. Journal of Computer Research and Development, 2009, 46(7): 1136-1142. |
[9] | Shi Yuliang, Huang Guang'an, Ye Wei, Zhang Liang, Shi Baile. Automatic Composition of Web Services Based on Task Dependency Specification[J]. Journal of Computer Research and Development, 2006, 43(12): 2110-2116. |
[10] | Wang Zhiming, Cai Lianhong, Ai Haizhou. Automatic Estimation of Visual Speech Parameters[J]. Journal of Computer Research and Development, 2005, 42(7): 1185-1190. |