基于语义图学习的恶意域名检测技术

付豪; 龙春; 宫良一; 魏金侠; 黄潘; 林延中; 孙德刚

doi:10.7544/issn1000-1239.202440375

摘要: 恶意域名检测是网络入侵检测系统中重要的组成部分，能够通过域名请求快速发现网络攻击. 基于机器学习的恶意域名检测能够克服黑名单机制缺陷，提升对恶意域名的识别精度，然而由于域名构造差异性大，实际环境域名复杂多变，应用过程中检测效率低、鲁棒性差. 为此，提出一种基于域名语义图学习的恶意域名检测技术，利用语义图关联分析来实现高效的恶意域名检测. 具体而言，首先收集了中国科技网12个月的域名请求数据，共33.3亿访问记录，其中包括超过650万条恶意域名记录，涉及284个攻击类型. 通过对不同类别域名的语义特征分析，发现不同类别域名之间具有明显的语义区分度，但存在较大的特征分布重叠区间，重叠的域名数据降低了分类器性能. 因此，提出一种基于字符语义相似性的域名关联图模型，通过融合邻居域名特征增强重叠区间域名语义特征，进而提升检测性能. 首先，通过分析域名结构的相似性过滤域名中吻合度较高的噪声字符以消除域名固有结构造成的检测干扰；其次通过提取域名字符的语义相似性特征构造域名语义图模型，进而通过在线聚合算法构建动态的域名语义图，以基于节点度权重抽样经验池获取的样本集为基础，训练得到基于样本语义权重的多头注意力消息传播图模型；最后使用多层神经网络分类器实现恶意域名检测. 实验结果表明，提出的恶意域名检测技术在不同类型恶意域名的数据集上取得了平均0.96的精确率和0.97的召回率，并且该模型能够在线进行自演进，具有较高的识别率和鲁棒性.

Abstract: Malicious domain name detection is a critical component of network intrusion detection systems, enabling the rapid identification of network attacks through domain name requests. Machine learning methods overcome the limitations of blacklist mechanisms and improve detection accuracy. However, challenges such as the high variability of domain name structures and the complexity of real-world environments lead to low detection efficiency and poor robustness in practical applications. To address these issues, a malicious domain name detection technology based on domain name semantic graph learning is proposed, leveraging semantic graph association analysis for efficient detection. Specifically, 12 months of domain request data from China Science and Technology Network are first collected, encompassing 3.33 billion access records, including more than 6.5 million malicious domain name entries across 284 attack types. Semantic analysis reveals significant differentiation between domain categories, yet considerable feature overlap in certain regions degrades classifier performance. To tackle this, a domain association graph model based on character-level semantic similarity is proposed. By integrating features of neighboring domains, the model enhances semantic representations in overlapping regions, thereby improving detection performance. The method includes filtering noise characters through structural similarity analysis, constructing a dynamic domain semantic graph using an online aggregation algorithm, and training a multi-head attention-based message-passing graph model with node-degree-weighted samples. Finally, a multi-layer neural network classifier is employed for malicious domain detection. Experimental results demonstrate that the proposed method achieves an average precision rate of 0.96 and a recall rate of 0.97 on the dataset of different types of malicious domain names. Furthermore, the model exhibits strong online adaptability, achieving high detection rate and robustness.

基于语义图学习的恶意域名检测技术

Malicious Domain Detection Technology Based on Semantic Graph Learning