Citation: | Yang Donghui, Zeng Bin, Li Zhenyu. New gTLD Resolution Behavior Analysis and Malicious Domain Detection Method[J]. Journal of Computer Research and Development, 2024, 61(4): 1038-1048. DOI: 10.7544/issn1000-1239.202220846 |
Since ICANN initiated the delegation of new generic top-level domains (new gTLDs) in 2013, more than a thousand of new gTLDs have been added to the domain name system (DNS). Previous work has shown that while new gTLD domains bring flexibility to registrants, they are also commonly used for malicious behavior because of their low registration costs, and it is important to identify malicious new gTLD domains. However, because of the unique characteristics (e.g., domain length) of new gTLD domains, the accuracy is low when applying existing malicious domain identification methods to malicious new gTLD domain identification. To address this issue, we first characterize the resolution behavior of new gTLD domains based on massive domain name resolution data from five aspects including the number of associated SLDs per new gTLD, query volume, query failure rate, content replication and hosting infrastructure sharing. Then we analyze the resolution behavior of malicious new gTLD domains and find their unique behavioral characteristics in terms of content hosting infrastructure concentration, the number of FQDNs per SLD, the number of queries, the distribution of end users’ network footprints, and the distribution of the length of SLDs. Finally, according to these features, we design a malicious new gTLD domain identification method based on random forest. The results of the experiment show that the proposed method achieves 94% accuracy, which is better than the existing malicious domain identification methods.
[1] |
Korczynśki M, Wullink M, Tajalizadehkhoob S, et al. Cybercrime after the sunrise: A statistical analysis of DNS abuse in new gTLDs[C]//Proc of the 13th Asia Conf on Computer and Communications Security. New York: ACM, 2018: 609−623
[2] |
樊昭杉,王青,刘俊荣,等. 域名滥用行为检测技术综述[J]. 计算机研究与发展,2022,59(11):2581−2605
Fan Zhaoshan, Wang Qing, Liu Junrong, et al. Survey on domain name abuse detection technology[J]. Journal of Computer Research and Development, 2022, 59(11): 2581−2605 (in Chinese)
[3] |
Halvorson T, Der M F, Foster I, et al. From . academy to . zone: An analysis of the new TLD land rush[C]//Proc of the 15th Internet Measurement Conf. New York: ACM, 2015: 381−394
[4] |
Chen Q A, Osterweil E, Thomas M, et al. MitM attack by name collision: Cause analysis and vulnerability assessment in the new gTLD era[C]//Proc of the 37th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2016: 675−690
[5] |
Chen Q A, Thomas M, Osterweil E, et al. Client-side name collision vulnerability in the new gTLD era: A systematic study[C]//Proc of the 24th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2017: 941−956
[6] |
Pouryousef S, Dar M D, Ahmad S, et al. Extortion or expansion? An investigation into the costs and consequences of ICANN’s gTLD experiments[G]//LNCS 12048: Proc of the 21st Int Conf on Passive and Active Measurement. Berlin: Springer, 2020: 141−157
[7] |
Hao Shuang, Feamster N, Pandrangi R. Monitoring the initial DNS behavior of malicious domains[C]//Proc of the 11th Internet Measurement Conf. New York: ACM, 2011: 269−278
[8] |
Hao Shuang, Kantchelian A, Miller B, et al. Predator: Proactive recognition and elimination of domain abuse at time-of-registration[C]//Proc of the 23rd ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 1568−1579
[9] |
Manadhata P K, Yadav S, Rao P, et al. Detecting malicious domains via graph inference[C]// Proc of the 7th Workshop on Artificial Intelligent and Security Workshop. New York: ACM, 2014: 59−60
[10] |
Schüppen S, Teubert D, Herrmann P, et al. FANCI: Feature-based automated NXdomain classification and intelligence[C]// Proc of the 27th USENIX Security Symp. Berkeley, CA: USENIX Association, 2018: 1165−1181
[11] |
Yu Bin, Gray D L, Pan Jie, et al. Inline DGA detection with deep networks[C]//Proc of the 17th IEEE Int Conf on Data Mining Workshops (ICDMW). Piscataway, NJ: IEEE, 2017: 683−692
[12] |
Lei Kai, Fu Qiuai, Ni Jiake, et al. Detecting malicious domains with behavioral modeling and graph embedding[C]//Proc of the 39th Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2019: 601−611
[13] |
greenSec GmbH. nTLDStats[EB/OL]. [2019-05-23].
[14] |
nexB Inc. public suffix2 2.20191221[EB/OL]. [2020-03-01].
[15] |
Gao Hongyu, Yegneswaran V, Chen Yan, et al. An empirical reexamination of global DNS behavior[C]//Proc of the 27th ACM SIGCOMM Conf. New York: ACM, 2013: 267−278
[16] |
Ager B, Mühlbauer W, Smaragdakis G, et al. Web content cartography[C]//Proc of the 11th ACM SIGCOMM Internet Measurement Conf. New York: ACM, 2011: 585−600
[17] |
Fagin R, Kumar R, Sivakumar D. Comparing top k lists[J]. SIAM Journal on Discrete Mathematics, 2003, 17(1): 134−160 doi: 10.1137/S0895480102412856
[18] |
McCown F, Nelson M L. Agreeing to disagree: Search engines and their public interfaces[C]//Proc of the 7th ACM/IEEE-CS Joint Conf on Digital Libraries. New York: ACM, 2007: 309−318
[19] |
Callahan T, Allman M, Rabinovich M. On modern DNS behavior and properties[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(3): 7−15 doi: 10.1145/2500098.2500100
[20] |
Allman M. Putting DNS in context[C]//Proc of the 20th Internet Measurement Conf. New York: ACM, 2020: 309−316