Abstract:
Malicious domain name detection is a critical component of network intrusion detection systems, enabling the rapid identification of network attacks through domain name requests. Machine learning methods overcome the limitations of blacklist mechanisms and improve detection accuracy. However, challenges such as the high variability of domain name structures and the complexity of real-world environments lead to low detection efficiency and poor robustness in practical applications. To address these issues, a malicious domain name detection technology based on domain name semantic graph learning is proposed, leveraging semantic graph association analysis for efficient detection. Specifically, 12 months of domain request data from China Science and Technology Network is first collected, encompassing 3.33 billion access records, including more than 6.5 million malicious domain name entries across 284 attack types. Semantic analysis reveals significant differentiation between domain categories, yet considerable feature overlap in certain regions degrades classifier performance. To tackle this, a domain association graph model based on character-level semantic similarity is proposed. By integrating features of neighboring domains, the model enhances semantic representations in overlapping regions, thereby improving detection performance. The method includes filtering noise characters through structural similarity analysis, constructing a dynamic domain semantic graph using an online aggregation algorithm, and training a multi-head attention-based message-passing graph model with node-degree-weighted samples. Finally, a multi-layer neural network classifier is employed for malicious domain detection. Experimental results demonstrate that the proposed method achieves an average precision rate of 96% and a recall rate of 97% on the dataset of different types of malicious domain names. Furthermore, the model exhibits strong online adaptability, achieving high detection rate and robustness.