高级检索
    杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2): 433-446. DOI: 10.7544/issn1000-1239.2020.20190160
    引用本文: 杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2): 433-446. DOI: 10.7544/issn1000-1239.2020.20190160
    Du Peng, Ding Shifei. A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding[J]. Journal of Computer Research and Development, 2020, 57(2): 433-446. DOI: 10.7544/issn1000-1239.2020.20190160
    Citation: Du Peng, Ding Shifei. A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding[J]. Journal of Computer Research and Development, 2020, 57(2): 433-446. DOI: 10.7544/issn1000-1239.2020.20190160

    基于混合词向量深度学习模型的DGA域名检测方法

    A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding

    • 摘要: 域名生成算法(domain generation algorithm, DGA)是域名检测中防范僵尸网络攻击的重要手段之一,对于生成威胁情报、阻断僵尸网络命令与控制流量、保障网络安全有重要的实际意义.近年来,DGA域名检测技术从依靠手工提取特征发展到自动提取特征的基于深度学习模型的方法,在DGA域名检测任务中取得了较大的进展.但对于不同僵尸网络家族的DGA域名的多分类任务,由于家族种类多,且各家族域名数据存在不平衡性,因此许多已有的深度学习模型在DGA域名的多分类任务上仍有提高空间.针对以上挑战,设计了基于字符和双字母组级别的混合词向量,以提高域名字符串的信息利用度,并设计了基于混合词向量方法的深度学习模型.最后设计了包含多种对比模型的实验,对混合词向量的有效性进行验证.实验结果表明基于混合词向量的深度学习模型在DGA域名检测与分类任务中相比只基于字符级词向量的模型有更好的分类性能,特别是在小样本的DGA域名类别上的分类性能更优,证明了该模型的有效性.

       

      Abstract: DGA domain name detection plays a key role in preventing botnet attacks. It is practically significant in generating threat intelligence, blocking botnet command and control traffic, and maintaining cyber security. In recent years, DGA domain name detection algorithms have made great progress, from the methods using manually-crafted features to the automatically extracting features generated by deep learning methods. Multiple studies have indicated that deep learning methods perform better in DGA detection. However, DGA families are various and domain name data is imbalanced in the multi-class classification of different DGA families. Many existing deep learning models can still be improved. To solve the above problems, a mixed word embedding method is designed, based on character level embedding and bigram level embedding, to improve the information utilization of domain names. The paper also designs a deep learning model using the mixed word embedding method. At the end of the paper, an experiment with multiple comparison models is conducted to test the model. The experimental results show that the model based on the mixed word embedding achieves better performance in DGA domain name detection and multi-class classification tasks compared with the models based on character level embedding, especially in the small DGA families with few samples. The results show the proposed approach is effective.

       

    /

    返回文章
    返回