ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (6): 1263-1274.doi: 10.7544/issn1000-1239.2019.20180481

Previous Articles     Next Articles

Detecting Malicious Domains Using Co-Occurrence Relation Between DNS Query

Peng Chengwei1,2, Yun Xiaochun1,2,3, Zhang Yongzheng2,3, Li Shuhao2,3   

  1. 1(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093)
  • Online:2019-06-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB0801502) and the National Natural Science Foundation of China (U1736218).

Abstract: Malicious domains play a vital role in illicit online activities. Effectively detecting the malicious domains can significantly decrease the damage of evil attacks. In this paper, we propose CoDetector, a novel technique to detect malicious domains based on the co-occurrence relationships of domains in DNS (domain name system) queries. We observe that DNS queries are not isolated, whereas co-occur with each other. We base it design on the intuition that domains that tend to co-occur in DNS traffic are strongly associated and are likely to be in the same property (i.e., malicious or benign). Therefore, we first perform coarse-grained clustering of DNS traffic based on the chronological order of DNS queries. The domains co-occurring with each other will be clustered. Then, we design a mapping function that automatically projects every domain into a low-dimensional feature vector while maintaining their co-occurrence relationships. Domains that co-occur with each others are mapped to similar vectors while domains that not co-occur are mapped to distant vectors. Finally, based on the learned feature representations, we train a classifier over a labeled dataset and further apply it to detect unknown malicious domains. We evaluate CoDetector using real-world DNS traffic collected from an enterprise network over two months. The experimental results show that CoDetector can effectively detect malicious domains (91.64% precision and 96.04% recall).

Key words: DNS queries, co-occurrence, malicious domains, DNS cut, tensor representation, domain classification

CLC Number: