ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (6): 1263-1274.doi: 10.7544/issn1000-1239.2019.20180481

• 信息安全 • 上一篇    下一篇

一种基于域名请求伴随关系的恶意域名检测方法

彭成维1,2,云晓春1,2,3,张永铮2,3,李书豪2,3   

  1. 1(中国科学院计算技术研究所 北京 100190);2(中国科学院大学 北京 100049);3(中国科学院信息工程研究所 北京 100093) (pengchengwei@iie.ac.cn)
  • 出版日期: 2019-06-01
  • 基金资助: 
    国家重点研发计划项目(2016YFB0801502);国家自然科学基金项目(U1736218)

Detecting Malicious Domains Using Co-Occurrence Relation Between DNS Query

Peng Chengwei1,2, Yun Xiaochun1,2,3, Zhang Yongzheng2,3, Li Shuhao2,3   

  1. 1(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093)
  • Online: 2019-06-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB0801502) and the National Natural Science Foundation of China (U1736218).

摘要: 恶意域名在网络非法攻击活动中承担重要的角色.恶意域名检测能够有效地减少攻击活动所带来的经济损失.提出CoDetector恶意域名检测模型,通过挖掘域名请求之间潜在的时空伴随关系进行恶意域名检测.研究发现域名请求之间存在彼此伴随关系,而并非相互独立.因此,彼此伴随的域名之间存在紧密关联,偏向于同时是正常域名或恶意域名.1)利用域名请求的先后时间顺序对域名数据进行粗粒度的聚类操作,将彼此伴随出现的域名划分到同一簇中;2)采用嵌入学习构建映射函数,在保留域名伴随关系的同时将每一个域名投影成低维空间的特性向量;3)结合有标记的数据,训练恶意域名检测分类器,用于检测更多未知恶意域名.实验结果表明,CoDetector能够有效地检测恶意域名,具有91.64%检测精度和96.04%召回率.

关键词: 域名请求, 请求伴随, 恶意域名, 时间序列切割, 向量化表示, 域名分类

Abstract: Malicious domains play a vital role in illicit online activities. Effectively detecting the malicious domains can significantly decrease the damage of evil attacks. In this paper, we propose CoDetector, a novel technique to detect malicious domains based on the co-occurrence relationships of domains in DNS (domain name system) queries. We observe that DNS queries are not isolated, whereas co-occur with each other. We base it design on the intuition that domains that tend to co-occur in DNS traffic are strongly associated and are likely to be in the same property (i.e., malicious or benign). Therefore, we first perform coarse-grained clustering of DNS traffic based on the chronological order of DNS queries. The domains co-occurring with each other will be clustered. Then, we design a mapping function that automatically projects every domain into a low-dimensional feature vector while maintaining their co-occurrence relationships. Domains that co-occur with each others are mapped to similar vectors while domains that not co-occur are mapped to distant vectors. Finally, based on the learned feature representations, we train a classifier over a labeled dataset and further apply it to detect unknown malicious domains. We evaluate CoDetector using real-world DNS traffic collected from an enterprise network over two months. The experimental results show that CoDetector can effectively detect malicious domains (91.64% precision and 96.04% recall).

Key words: DNS queries, co-occurrence, malicious domains, DNS cut, tensor representation, domain classification

中图分类号: