A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding

Du Peng; Ding Shifei

doi:10.7544/issn1000-1239.2020.20190160

Journal of Computer Research and Development > 2020 > 57(2): 433-446. > DOI: 10.7544/issn1000-1239.2020.20190160

Du Peng, Ding Shifei. A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding[J]. Journal of Computer Research and Development, 2020, 57(2): 433-446. DOI: 10.7544/issn1000-1239.2020.20190160

Citation:

PDF (1366 KB)

A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding

(School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116) (Engineering Research Center of Mine Digitization (China University of Mining and Technology), Ministry of Education, Xuzhou, Jiangsu 221116)

Funds: This work was supported by the National Natural Science Foundation of China (61672522, 61976216, 61379101), the Graduate Innovation Fund of Jiangsu Province (KYCX19_2196), and the Postgraduate Research & Practice Innovation Program of China University of Mining and Technology (KYCX19_2196).

More Information

Published Date: January 31, 2020

Graphical Abstract

Abstract

Abstract

DGA domain name detection plays a key role in preventing botnet attacks. It is practically significant in generating threat intelligence, blocking botnet command and control traffic, and maintaining cyber security. In recent years, DGA domain name detection algorithms have made great progress, from the methods using manually-crafted features to the automatically extracting features generated by deep learning methods. Multiple studies have indicated that deep learning methods perform better in DGA detection. However, DGA families are various and domain name data is imbalanced in the multi-class classification of different DGA families. Many existing deep learning models can still be improved. To solve the above problems, a mixed word embedding method is designed, based on character level embedding and bigram level embedding, to improve the information utilization of domain names. The paper also designs a deep learning model using the mixed word embedding method. At the end of the paper, an experiment with multiple comparison models is conducted to test the model. The experimental results show that the model based on the mixed word embedding achieves better performance in DGA domain name detection and multi-class classification tasks compared with the models based on character level embedding, especially in the small DGA families with few samples. The results show the proposed approach is effective.

FullText(HTML)

References (0)

[1]	Dai Chenglong, Li Guanghui, Li Dong, Shen Jiahua, Pi Dechang. Electroencephalogram Clustering with Multiple Regularization Constrained Pseudo Label Propagation Optimization[J]. Journal of Computer Research and Development, 2024, 61(1): 156-171. DOI: 10.7544/issn1000-1239.202220295
[2]	Wang Hang, Tian Shengzhao, Tang Qing, Chen Duanbing. Few-Shot Image Classification Based on Multi-Scale Label Propagation[J]. Journal of Computer Research and Development, 2022, 59(7): 1486-1495. DOI: 10.7544/issn1000-1239.20210376
[3]	Cao Jiuxin, Gao Qingqing, Xia Rongqing, Liu Weijia, Zhu Xuelin, Liu Bo. Information Propagation Prediction and Specific Information Suppression in Social Networks[J]. Journal of Computer Research and Development, 2021, 58(7): 1490-1503. DOI: 10.7544/issn1000-1239.2021.20200809
[4]	Hu Dou, Wei Lingwei, Zhou Wei, Huai Xiaoyong, Han Jizhong, Hu Songlin. A Rumor Detection Approach Based on Multi-Relational Propagation Tree[J]. Journal of Computer Research and Development, 2021, 58(7): 1395-1411. DOI: 10.7544/issn1000-1239.2021.20200810
[5]	Du Ming, Yang Yun, Zhou Junfeng, Chen Ziyang, Yang Anping. Efficient Methods for Label-Constraint Reachability Query[J]. Journal of Computer Research and Development, 2020, 57(9): 1949-1960. DOI: 10.7544/issn1000-1239.2020.20190569
[6]	Zheng Wenping, Che Chenhao, Qian Yuhua, Wang Jie. A Two-Stage Community Detection Algorithm Based on Label Propagation[J]. Journal of Computer Research and Development, 2018, 55(9): 1959-1971. DOI: 10.7544/issn1000-1239.2018.20180277
[7]	Song Pan, Jing Liping. Exploiting Label Relationships in Multi-Label Classification with Neural Networks[J]. Journal of Computer Research and Development, 2018, 55(8): 1751-1759. DOI: 10.7544/issn1000-1239.2018.20180362
[8]	Ma Gang, Du Yuge, An Bo, Zhang Bo, Wang Wei, Shi Zhongzhi. Risk Evaluation of Complex Information System Based on Threat Propagation Sampling[J]. Journal of Computer Research and Development, 2015, 52(7): 1642-1659. DOI: 10.7544/issn1000-1239.2015.20140184
[9]	Zhu Xiang, Jia Yan, Nie Yuanping, Qu Ming. Event Propagation Analysis on Microblog[J]. Journal of Computer Research and Development, 2015, 52(2): 437-444. DOI: 10.7544/issn1000-1239.2015.20140187
[10]	She Qiaoqiao, Yu Yang, Jiang Yuan, and Zhou Zhihua. Large-Scale Image Annotation via Random Forest Based Label Propagation[J]. Journal of Computer Research and Development, 2012, 49(11): 2289-2295.