ADFuzz：使用异常检测筛选低频路径高效模糊测试

李航宇; 方浩然; 曲彦文; 郭帆

doi:10.7544/issn1000-1239.202111238

ADFuzz：使用异常检测筛选低频路径高效模糊测试

江西师范大学计算机信息工程学院　南昌　330022

基金项目: 国家自然科学基金项目(61562040)；江西省教育厅科技项目(GJJ200313)

详细信息

作者简介:
李航宇: 1995年生. 硕士研究生. 主要研究方向为模糊测试、程序分析、异常检测

方浩然: 1996年生. 硕士研究生. 主要研究方向为编译器、程序分析

曲彦文: 1983年生. 博士. 主要研究方向为机器学习、密码学、金融科技

郭帆: 1977年生. 博士. 主要研究方向为信息安全、程序分析

通讯作者:
郭帆（fguo@jxnu.edu.cn）

中图分类号: TP311
计量
- 文章访问数: 201
- HTML全文浏览量: 62
- PDF下载量: 97
出版历程
- 收稿日期: 2021-12-14
- 修回日期: 2022-10-23
- 网络出版日期: 2023-05-22
- 刊出日期: 2023-07-31

ADFuzz: Using Anomaly Detection to Filter Rare Paths for Efficient Fuzzing

School of Computer Information and Engineering, Jiangxi Normal University, Nanchang 330022

Funds: This work was supported by the National Natural Science Foundation of China (61562040) and the Science and Technology Project of Jiangxi Provincial Education Department (GJJ200313).

More Information

Author Bio:
Li Hangyu: born in 1995. Master candidate. His main research interests include fuzzing, program analysis, and anomaly detection

Fang Haoran: born in 1996. Master candidate. His main research interests include compiler and program analysis

Qu Yanwen: born in 1983. PhD. His main research interests include machine learning, cryptography, and finance technology

Guo Fan: born in 1977. PhD. His main research interests include information security and program analysis

摘要

摘要:
基于覆盖率引导的模糊测试（Fuzzing）是当前最有效的漏洞自动挖掘技术. 目前大部分的模糊测试工具对于新产生的测试用例实施全追踪策略. 但是随着时间的流逝，模糊工具生成的测试用例都集中在程序的高频路径，使能够产生新覆盖的测试用例远少于已生成测试用例的总数，以至于全追踪策略花费了大量无意义的时间成本和运行开销. 因此提出基于异常检测模型的模糊测试工具ADFuzz，筛选低频路径以减少高频路径的执行次数，从而加速模糊测试，持续引导模糊测试朝着低频路径方向变异运行，并扩大程序覆盖. 通过ADFuzz，AFL，Untracer在12个真实程序上运行24 h的实验结果显示，相比AFL，ADFuzz平均速度提升23.8%，平均覆盖率增加11.78%，最高增加25.8%；相比Untracer，ADFuzz平均速度降低较少，但是漏洞数量和覆盖率都有较大提升.
- 漏洞挖掘 /
- 模糊测试 /
- 异常检测 /
- 对抗生成网络 /
- 路径频数
Abstract:
Coverage guided Fuzzing is currently the most effective technology for automatic discovering vulnerabilities in a program. At present, most popular Fuzzing tools implement a full tracking strategy for newly generated test cases. But over time, most of them always focus on the highly frequent paths of the program and are unable to generate any new coverage. As a result, the strategy costs a lot of meaningless time and running overhead. In this paper, we propose a new tool called ADFuzz based on an anomaly detection model. Firstly, ADFuzz filters out rare paths to extremely reduce the number of test cases on frequent paths so as to speed up Fuzzing. Then, it constantly guides Fuzzing to mutate towards the targets of rare paths in order to generate new coverage. ADFuzz are tested on 12 real programs for 24 hours running with the same configuration as to AFL and Untracer. Compared with AFL, ADFuzz is 23.8% faster on average, averagely increases 11.78% and raises 25.8% at most on the percentage of coverage. Compared with Untracer, ADFuzzer makes much improvement on the number of crashes and the percentage of coverage while it has almost the same average speed.
- vulnerability mining /
- Fuzzing /
- anomaly detection /
- adversarial generative network /
- path frequency

HTML全文

图 1 Fuzzing框架

Figure 1. Fuzzing framework

下载: 全尺寸图片幻灯片

图 2 Untracer框架

Figure 2. Untracer framework

下载: 全尺寸图片幻灯片

图 3 交叉路径

Figure 3. Cross paths

下载: 全尺寸图片幻灯片

图 4 ADFuzz框架

Figure 4. ADFuzz framework

下载: 全尺寸图片幻灯片

图 5 24 h 路径覆盖

Figure 5. 24 h path coverage

下载: 全尺寸图片幻灯片

图 6 漏洞数量

Figure 6. Number of crashes

下载: 全尺寸图片幻灯片

图 7 过滤种子比

Figure 7. The filtration ratio of seeds

下载: 全尺寸图片幻灯片

图 8 总种子数

Figure 8. Total number of seeds

下载: 全尺寸图片幻灯片

表 1 相比AFL的平均覆盖率提升

Table 1 Improvement of Average Coverage Rate Compared with AFL %

测试工具	平均覆盖率
Zeror^[27]	+10.14^[27]
ADFuzz(本文)	+11.78
CSI-Fuzz^[26]	+7.78^[26]
Untracer^[21]	−10.7^[27]

下载: 导出CSV

表 2 crash数量

Table 2 The Number of crash

被测程序	测试工具
被测程序	AFL	ADFuzz	Untracer
flvmeta	106	108	86
imaginfo	2	24	6
mp42acc	489	598	247
infotocap	354	419	201
binutils	95	96	80
poppler	0	2	0
audiofile	62	65	45
总和	1108	1312	665

下载: 导出CSV

表 3 单个测试用例的平均运行时间

Table 3 Average Running Time for Each Testcase μs

被测程序	测试工具
被测程序	AFL	ADFuzz	Untracer
cjson	242	193	162
libjpeg	923	670	535
libarchive	634	440	366
libksba	490	280	311
binutils	605	436	319
poppler	5350	4039	4513
tcpdump	369	271	271
audiofile	1471	1287	1292
flvmeta	312	282	266
imaginfo	829	742	649
mp42acc	569	426	478
infotocap	2358	1715	1843
注：黑体数字表示最好结果.

下载: 导出CSV

参考文献(40)

[1]	Manes V, Han H S, Han C, et al. The art, science and engineering of Fuzzing: A survey[J]. IEEE Transactions on Software Engineering, 2019, 47(11): 2312−2331
[2]	任泽众,郑晗,张嘉元,等. 模糊测试技术综述[J]. 计算机研究与发展,2021,58(5):944−963 doi: 10.7544/issn1000-1239.2021.20201018 Ren Zezhong, Zheng Han, Zhang Jiayuan, et al. A review of Fuzzing techniques[J]. Journal of Computer Research and Development, 2021, 58(5): 944−963 (in Chinese) doi: 10.7544/issn1000-1239.2021.20201018
[3]	Google. OSS-Fuzz : Continuous fuzzing of open source software [CP/OL]. 2021[2021-12-21].https://google.git-hub.io/oss-fuzz/
[4]	Böhme M, Pham V T, Roychoudhury A. Coverage-based greybox fuzzing as Markov chain[J]. IEEE Transactions on Software Engineering, 2017, 45(5): 489−506
[5]	Gan Shuitao, Zhang Chao, Qin Xiaojun, et al. CollAFL: Path sensitive Fuzzing[C] //Proc of the 39th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2018: 679−696
[6]	Rawat S, Jain V, Kumar A, et al. VUzzer: Application-aware evolutionary fuzzing[C/OL] //Proc of the 24th NDSS 2017. San Diego, CA: University of California, 2017 [2021-12-21].https://dblp.org/db/conf/ndss/ndss2017.html#0001JKCGB17
[7]	Herrera A, Gunadi H, Magrath S, et al. Seed selection for successful fuzzing[C] //Proc of the 30th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2021: 230−243
[8]	Cha S K, Woo M, Brumley D. Program-adaptive mutational fuzzing[C] //Prco of the 36th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2015: 725−741
[9]	Lemieux C, Sen K. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage[C] //Proc of the 33rd IEEE Int Conf on Automated Software Engineering. Piscataway, NJ: IEEE, 2018: 475−485
[10]	Lv Chenyang, Ji Shouling, Zhang Chao, et al. MOPT: Optimized mutation scheduling for Fuzzers[C] //Proc of the 28th USENIX Security Symp. Berkeley , CA: USENIX Association , 2019: 1949−1966
[11]	Zhang Hangwei, Lu Kai, Zhou Xu, et al. SIoTFuzzer: Fuzzing web interface in IoT firmware via stateful message generation[J]. Applied Sciences, 2021, 11(7): 3120−3138 doi: 10.3390/app11073120
[12]	Yue Tai, Wang Pengfei, Tang Yong, et al. EcoFuzz: Adaptive energy-saving greybox Fuzzing as a variant of the adversarial multi-armed bandit[C] //Proc of the 29th USENIX Security Symp. Berkeley, CA: USENIX Association, 2020: 2307−2324
[13]	She Dongdong, Chen Yizheng, Shah A, et al. NEUTAINT: Efficient dynamic taint analysis with neural networks[C] //Proc of the 41st IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2020: 1527−1543
[14]	Lu Hui, Jin Chengjie, Helu Xiaohan, et al. Research on intelligent detection of command level stack pollution for binary program analysis[J]. Mobile Networks and Applications, 2021, 26(4): 1723−1732 doi: 10.1007/s11036-019-01507-0
[15]	Aschermann C, Schumilo S, Blazytko T, et al. RED-QUEEN: Fuzzing with input-to-state correspondence[C/OL] //Proc of the 26th NDSS 2019. San Diego, CA: University of California, 2019 [2021-12-21].https://dblp.org/db/conf/ndss/ndss2019.html#ZhaoDYX19
[16]	Chen Peng, Chen Hao. Angora: Efficient Fuzzing by principled search[C] //Proc of the 39th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2018: 711−725
[17]	Stephens N, Grosen J, Salls C, et al. Driller: Augmenting fuzzing through selective symbolic execution[C/OL] // Proc of the 23rd NDSS 2016. San Diego, CA: University of California, 2016 [2021-12-21].https://dblp.org/db/conf/ndss/ndss2016.html#StephensGSD WCSK16
[18]	Wang Mingzhe, Jie Liang, Chen Yuanliang, et al. SAFL: Increasing and accelerating testing coverage with symbolic execution and guided fuzzing[C] //Proc of the 40th Int Conf on Software Engineering: Companion. Piscataway, NJ: IEEE , 2018: 61−64
[19]	Yun Insu, Lee S, Xu Meng, et al. QSYM: A practical concolic execution engine tailored for hybrid fuzzing[C] //Proc of the 27th USENIX Security Symp. Berkeley , CA: USENIX Association , 2018: 745−761
[20]	Zhao Lei, Duan Yue, Yin Heng, et al. Send hardest problems my way: Probabilistic path prioritization for hybrid fuzzing[C/OL] //Pro of the 26th NDSS 2019. San Diego, CA: University of California, 2019[2021-12-21].https://dblp.org/db/conf/ndss/ndss2019.html#ZhaoDYX19
[21]	Nagy S, Hicks M. Full-speed fuzzing: Reducing fuzzing overhead through coverage-guided tracing[C] //Proc of the 40th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2019: 787−802
[22]	Pang Guansong, Cao Longbing, Aggarwal C. Deep learning for ano-maly detection: Challenges, methods and opportunities[C] //Proc of the 14th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2021: 1127−1130
[23]	Zalewski M. American fuzzy lop [CP/OL]. 2021[2021-12-21].https://lcamtuf.coredump.cx/afl/
[24]	Chen Jinghui, Sathe S, Aggarwal C, et al. Outlier detection with autoencoder ensembles[C] //Proc of the 41st SIAM Int Conf on Data Mining. Piscataway, NJ: IEEE, 2020: 90−98
[25]	Saxena D, Cao Jiannong. Generative adversarial networks: Challenges, solutions, and future directions[J/OL]. ACM Computing Surveys, 2020 [2021-12-21].https://dl.acm.org/doi/10.1145/3446374
[26]	Zhu Xiaogang, Feng Xiaotao, Meng Xiaozhu, et al. CSI-Fuzz: Full-speed edge tracing using coverage sensitive instrumentation[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(2): 912−923
[27]	Zhou Chijin, Wang Mingzhe, Jie Liang, et al. Zeror: Speed up fuzzing with coverage-sensitive tracing and scheduling[C] //Proc of the 35th IEEE Int Conf on Automated Software Engineering. Piscataway, NJ: IEEE, 2020: 858−870
[28]	Manès V J M, Kim S, Cha S K. Ankou: Guiding grey-box fuzzing towards combinatorial difference[C] //Proc of the 42nd IEEE Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2020: 1024−1036
[29]	Akcay S, Atapour-Abarghouei A, Breckon T P. GANomaly: Semi-supervised anomaly detection via adversarial training[C] //Proc of the 17th Asian Conf on Computer Vision. Berlin: Springer, 2018: 622−637
[30]	Nagy S, Hicks M. FoRTE-FuzzBench: FoRTE-research’s fuzzing benchmarks [CP/OL]. 2021[2021-12-21].https://github.com/ FoRTE-Research/FoRTE-FuzzBench
[31]	lcamtuf. Fast LLVM-based instrumentation for AFL-Fuzz [CP/OL]. 2021[2021-12-21].https://github.com/google/AFL/blob/master/llvm_mode/afl-clang
[32]	Godefroid P, Peleg H, Singh R. Learn&Fuzz: Machine learning for input fuzzing[C] //Proc of the 32nd IEEE Int Conf on Automated Software Engineering (ASE). Piscataway, NJ: IEEE, 2017: 50−59
[33]	Hu Zhicheng, Shi Jiangqi, Huang Yanhong, et al. GANFuzz: A GAN-based industrial network protocol fuzzing framework[C] //Proc of the 15th ACM Int Conf on Computing Frontiers. New York: ACM, 2018: 138−145
[34]	Ispoglou K, Austin D, Mohan V, et al. FuzzGen: Automatic Fuzzer generation[C] //Proc of the 29th USENIX Security Symp. Berkeley, CA: USENIX Association, 2020: 2271−2287
[35]	Karamcheti S, Mann G, Rosenberg D. Improving grey-box fuzzing by modeling program behavior[J]. arXiv preprint, arXiv: 1811.08973, 2018
[36]	Schlegl T, Seeböck P, Waldstein S M, et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery[C] //Proc of the 23rd Int Conf on Information Processing in Medical Imaging. Berlin: Springer, 2017: 146−157
[37]	Zenati H, Foo C S, Lecouat B, et al. Efficient GAN-based anomaly detection[J]. arXiv preprint, arXiv: 1802.06222, 2018
[38]	Khandait P, Hubballi N, Mazumdar B. IoTHunter: IoT network traffic classification using device specific keywords[J]. IET Networks, 2021, 10(2): 59−75 doi: 10.1049/ntw2.12007
[39]	Hu Ning, Tian Zhidong, Lu Hui, et al. A multiple-kernel clustering based intrusion detection scheme for 5G and IoT networks[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(11): 3129−3144 doi: 10.1007/s13042-020-01253-w
[40]	Lu Hui, Jin Chengjie, Helu Xiaohan, et al. AutoD: Intelligent blockchain application unpacking based on JNI layer deception call[J]. IEEE Network, 2020, 35(2): 215−221

施引文献(14)

期刊类型引用(6)

1.	徐雪峰，郭广伟，黄余. 改进全卷积神经网络的遥感图像小目标检测. 机械设计与制造. 2024(10): 38-42 . 百度学术
2.	刘雯雯，汪皖燕，程树林. 融合项目热门惩罚因子改进协同过滤推荐方法. 计算机技术与发展. 2023(03): 15-19 . 百度学术
3.	冯勇，刘洋，王嵘冰，徐红艳，张永刚. 面向用户需求的生成对抗网络多样性推荐方法. 小型微型计算机系统. 2023(06): 1192-1197 . 百度学术
4.	冯晨娇，宋鹏，张凯涵，梁吉业. 融合社交网络信息的长尾推荐方法. 模式识别与人工智能. 2022(01): 26-36 . 百度学术
5.	韩迪，陈怡君，廖凯，林坤玲. 推荐系统中的准确性、新颖性和多样性的有效耦合与应用. 南京大学学报(自然科学). 2022(04): 604-614 . 百度学术
6.	甘亚男，耿生玲，郝立. 超贝叶斯图模型及其联结树的构建. 青海师范大学学报(自然科学版). 2021(02): 42-48 . 百度学术