新通用顶级域名解析行为分析与恶意域名检测方法

杨东辉; 曾彬; 李振宇

doi:10.7544/issn1000-1239.202220846

新通用顶级域名解析行为分析与恶意域名检测方法

New gTLD Resolution Behavior Analysis and Malicious Domain Detection Method

摘要

摘要: 自2013年ICANN发起新通用顶级域名（new gTLD）的授权以来，域名系统（domain name system，DNS）中已增加了上千个new gTLD. 已有工作表明new gTLD在为域名注册者带来了灵活性的同时，由于注册成本低等原因也经常被用于恶意行为，识别恶意new gTLD域名具有重要的意义. 然而，由于new gTLD域名在域名长度等方面的独有特征，已有恶意域名识别方法应用于new gTLD恶意域名的识别时准确率低. 针对这一问题，首先基于海量域名解析数据，从顶级域名对应二级域名（SLD）数量、查询量、查询失败率、内容复制和承载基础设施共享5个方面刻画了new gTLD域名解析行为. 然后分析恶意域名的解析行为并发现其在内容承载基础设施集中性、SLD对应的完全限定域名（FQDN）数目、域名查询次数、请求用户网络空间分布、SLD长度分布等方面的特征. 最后根据这些特征设计了一种基于随机森林的new gTLD恶意域名检测方法. 实验结果表明，所提方法达到了94%的准确率，优于已有恶意域名检测方法.

Abstract: Since ICANN initiated the delegation of new generic top-level domains (new gTLDs) in 2013, more than a thousand of new gTLDs have been added to the domain name system (DNS). Previous work has shown that while new gTLD domains bring flexibility to registrants, they are also commonly used for malicious behavior because of their low registration costs, and it is important to identify malicious new gTLD domains. However, because of the unique characteristics (e.g., domain length) of new gTLD domains, the accuracy is low when applying existing malicious domain identification methods to malicious new gTLD domain identification. To address this issue, we first characterize the resolution behavior of new gTLD domains based on massive domain name resolution data from five aspects including the number of associated SLDs per new gTLD, query volume, query failure rate, content replication and hosting infrastructure sharing. Then we analyze the resolution behavior of malicious new gTLD domains and find their unique behavioral characteristics in terms of content hosting infrastructure concentration, the number of FQDNs per SLD, the number of queries, the distribution of end users’ network footprints, and the distribution of the length of SLDs. Finally, according to these features, we design a malicious new gTLD domain identification method based on random forest. The results of the experiment show that the proposed method achieves 94% accuracy, which is better than the existing malicious domain identification methods.

HTML全文

参考文献(20)

施引文献

资源附件(0)