Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation

Guo Hongjing; Tao Chuanqi; Huang Zhiqiu

doi:10.7544/issn1000-1239.202220745

Journal of Computer Research and Development > 2024 > 61(4): 1003-1017. > DOI: 10.7544/issn1000-1239.202220745 CSTR: 32373.14.issn1000-1239.202220745

Guo Hongjing, Tao Chuanqi, Huang Zhiqiu. Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation[J]. Journal of Computer Research and Development, 2024, 61(4): 1003-1017. DOI: 10.7544/issn1000-1239.202220745

Citation:

PDF (1820 KB)

Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation

Guo Hongjing^1,,
Tao Chuanqi^{1, 2, 3, ,},
Huang Zhiqiu^{1, 2}

1.
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016
2.
Ministry Key Laboratory for Safety-Critical Software Development and Verification (Nanjing University of Aeronautics and Astronautics), Nanjing 210016
3.
National Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023

Funds: This work was supported by the Key Program of the National Natural Science Foundation of China (U224120044), the National Natural Science Foundation of China (62202223), the Natural Science Foundation of Jiangsu Province (BK20220881), the Open Fund Project of the State Key Laboratory for Novel Software Technology (KFKT2021B32), and the Fundamental Research Funds for the Central Universities (NT2022027).

More Information

Author Bio:
Guo Hongjing: born in 1996. PhD candidate. Student member of CCF. Her main research interest includes intelligent software testing

Tao Chuanqi: born in 1984. PhD, associate professor. Senior member of CCF. His main research interests include intelligent software development and quality assurance of intelligent software

Huang Zhiqiu: born in 1965. PhD, professor. Distinguished member of CCF. His main research interests include software quality assurance, system safety, and formal methods
Received Date: August 23, 2022
Revised Date: August 14, 2023
Available Online: January 24, 2024

Graphical Abstract

Abstract

Abstract

Due to the complexity and uncertainty of deep neural network (DNN) models, generating test inputs to comprehensively test general and corner case behaviors of DNN models is of great significance for ensuring model quality. Current research primarily focuses on designing coverage criteria and utilizing fuzzing testing technique to generate test inputs, thereby improving test adequacy. However, few studies have taken into consideration the diversity and individual fault-revealing ability of test inputs. Surprise adequacy quantifies the neuron activation differences between a test input and the training set. It is an important metric to measure test adequacy, which has not been leveraged for test input generation. Therefore, we propose a surprise adequacy-guided test input generation approach. Firstly, the approach selects important neurons that contribute more to decision-making. Activation values of these neurons are used as features to improve the surprise adequacy metric. Then, seed test inputs are selected with error-revealing capability based on the improved surprise adequacy measurements. Finally, the approach utilizes the idea of coverage-guided fuzzing testing to jointly optimize the surprise adequacy value of test inputs and the prediction probability differences among classes. The gradient ascent algorithm is adopted to calculate the perturbation and iteratively generate test inputs. Empirical studies on 5 DNN models covering 4 different image datasets demonstrate that the improved surprise adequacy metric effectively captures surprising test inputs and reduces the time cost of the calculation. Concerning test input generation, compared with DeepGini and RobOT, the follow-up test set generated by using the proposed seed input selection strategy exhibits the highest surprise coverage improvement of 5.9% and 15.9%, respectively. Compared with DLFuzz and DeepXplore, the proposed approach achieves the highest surprise coverage improvement of 26.5% and 33.7%, respectively.
- software testing,
- test input generation,
- test coverage,
- deep neural network,
- surprise adequacy

FullText(HTML)

References (35)

References

[1]	The New York Times. After fatal uber crash, a self-driving start-up moves forward[EB/OL]. [2022-06-10].https://www.nytimes.com/2018/05/07/technology/uber-crash-autonomous-driveai.html
[2]	Zhang Jie, Harman M, Ma Lei, et al. Machine learning testing: Survey, landscapes and horizons[J]. IEEE Transactions on Software Engineering, 2022, 48(1): 1−36
[3]	Huang Xiaowei, Kroening D, Ruan Wenjie, et al. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability[J]. Computer Science Review, 2020, 37: 10270
[4]	王赞,闫明,刘爽,等. 深度神经网络测试研究综述[J]. 软件学报,2020,31(5):1255−1275 Wang Zan, Yan Ming, Liu Shuang, et al. Survey on testing of deep neural networks[J]. Journal of Software, 2020, 31(5): 1255−1275 (in Chinese)
[5]	Xie Xiaofei, Ma Lei, Juefei-Xu F, et al. DeepHunter: A coverage guided fuzz testing framework for deep neural networks[C] //Proc of the 28th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2019: 146−157
[6]	代贺鹏,孙昌爱,金慧,等. 面向深度学习系统的模糊测试技术研究进展[J]. 软件学报,2023, 34(11): 5008−5028 Dai Hepeng, Sun Chang’ai, Jin Hui, et al. State-of-the-art survey of fuzzing for deep learning systems[J]. Journal of Software, 2023, 34(11): 5008−5028
[7]	Pei Kexin, Cao Yinzhi, Yang Junfeng, et al. DeepXplore: Automated whitebox testing of deep learning systems[C] //Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 1−18
[8]	Guo Jianmin, Jiang Yu, Zhao Yue, et al. DLFuzz: Differential fuzzing testing of deep learning systems[C] //Proc of the 26th Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software. New York: ACM, 2018: 739−743
[9]	Kim J, Feldt R, Yoo S. Guiding deep learning system testing using surprise adequacy[C] //Proc of the 41st Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2019: 1039−1049
[10]	Kim J, Ju J, Feldt R, et al. Reducing DNN labelling cost using surprise adequacy: An industrial case study for autonomous driving[C] //Proc of the 28th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2022: 1466−1476
[11]	Kim S, Yoo S. Evaluating surprise adequacy for question answering[C] //Proc of the 42nd Int Conf on Software Engineering Workshops. New York: ACM, 2020: 197−202
[12]	Weiss M, Chakraborty R, Tonella P. A review and refinement of surprise adequacy[C] //Proc of the 3rd IEEE/ACM Int Workshop on Deep Learning for Testing and Testing for Deep Learning. Piscataway, NJ: IEEE, 2021: 17−24
[13]	Gerasimous S, Eniser H, Sen A, et al. Importance-driven deep learning system testing[C] //Proc of the 42nd ACM/IEEE Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2020: 702−713
[14]	Xie Xiaofei, Li Tianlin, Wang Jian, et al. NPC: Neuron path coverage via characterizing decision Logic of deep neural networks[J]. ACM Transactions on Software Engineering and Methodology, 2022, 31(3): 47: 1−47: 27
[15]	Ma Lei, Juefei-Xu F, Zhang Fuyuan, et al. DeepGauge: Multi-granularity testing criteria for deep learning systems[C] //Proc of the 33rd ACM/IEEE Int Conf on Automated Software Engineering. New York: ACM, 2018: 120−131
[16]	Feng Yang, Shi Qingkai, Gao Xinyu, et al. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks[C] //Proc of the 29th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2020: 177−188
[17]	Wang Jingyi, Chen Jialuo, Sun Youcheng, et al. RobOT: Robustness-oriented testing for deep learning systems[C] //Proc of the 43rd Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2021: 300−311
[18]	Wang Dong, Wang Ziyuan, Fang Chunrong, et al. DeepPath: Path-driven testing criteria for deep neural networks[C] //Proc of the 1st IEEE Int Conf on Artificial Intelligence Testing. Piscataway, NJ: IEEE, 2019: 119−120
[19]	Ma Lei, Juefei-Xu F, Xue Minhui, et al. DeepCT: Tomographic combinatorial testing for deep learning systems[C] //Proc of the 26th Int Conf on Software Analysis, Evaluation and Reengineering. Piscataway, NJ: IEEE, 2019: 614−618
[20]	Du Xiaoning, Xie Xiaofei, Li Yi, et al. DeepStellar: Model-based quantitative analysis of stateful deep learning systems[C] //Proc of the 27th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2019: 477−487
[21]	李舵,董超群,司品超,等. 神经网络验证和测试技术研究综述[J]. 计算机工程与应用,2021,57(22):53−67 Li Duo, Dong Chaoqun, Si Pinchao, et al. Survey of research on neural network verification and testing technology[J]. Computer Engineering and Applications, 2021, 57(22): 53−67 (in Chinese)
[22]	Bach S, Binder A, Montavon G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J]. PLoS ONE, 2015, 7(10): 1−46
[23]	纪守领,李进锋,杜天宇,等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展,2019,56(10):2071−2096 Ji Shouling, Li Jinfeng, Du Tianyu, et al. Survey on techniques, applications and security of machine learning interpretability[J]. Journal of Computer Research and Development, 2019, 56(10): 2071−2096 (in Chinese)
[24]	沐燕舟,王赞,陈翔,等. 采用多目标优化的深度学习测试优化方法[J]. 软件学报,2022,33(7):2499−2524 Mu Yanzhou, Wang Zan, Chen Xiang, et al. A deep learning test optimization method using multi-objective optimization[J]. Journal of Software, 2022, 33(7): 2499−2524 (in Chinese)
[25]	LeCun Y, Cortes C. The MNIST database of handwritten digits[EB/OL]. [2022-06-10]. http://yann.lecun.com/exdb/mnist/
[26]	Krizhevsky N, Vinod H, Geoffrey C, et al. The CIFAR-10 dataset[EB/OL]. [2022-06-10]. http://www.cs.toronto.edu/~kriz/cifar.html
[27]	Xiao Han, Rasul K, Vollgraf R. Fashion-MNIST is a dataset of Zalando’s article images[EB/OL]. [2022-06-10]. https://github.com/zalandoresearch/fashion-mnist
[28]	Udacity. Dataset wiki[EB/OL]. [2022-06-10]. https://github.com/udacity/self-driving-car/tree/master/datasets
[29]	Alber M, Lapuschkin S, Seegerer P, et al. iNNvestigate neural networks![J]. Journal of Machine Learning Research, 2019, 20(93): 1−8
[30]	Zhou Zhiyang, Dou Wensheng, Liu Jie, et al. DeepCon: Contribution coverage testing for deep learning systems[C] //Proc of the 28th IEEE Int Conf on Software Analysis, Evolution and Reengineering. Piscataway, NJ: IEEE, 2021: 189−200
[31]	Goodfellow I, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv preprint, arXiv: 1412.6572, 2015
[32]	Carlini N, Wagner D. Towards evaluating the robustness of neural networks[C] //Proc of the 38th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2017: 39−57
[33]	Lee S, Cha S, Lee D, et al. Effective white-box testing of deep neural networks with adaptive neuron-selection strategy[C] //Proc of the 29th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2020: 165−176
[34]	Zhang Pengcheng, Ren Bin, Dong Hai, et al. CAGFuzz: Coverage-guided adversarial generative fuzzing testing for image-based deep learning systems[J]. IEEE Transactions on Software Engineering, 2021, 48(11): 4630−4646
[35]	Shen Weijun, Li Yanhui, Chen Lin, et al. Multiple-boundary clustering and prioritization to promote neural network retraining[C] //Proc of the 35th IEEE/ACM Int Conf on Automated Software Engineering. Piscataway, NJ: IEEE, 2020: 410−422

[1]	Wang Haotian, Ding Yan, He Xianhao, Xiao Guoqing, Yang Wangdong. SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550139
[2]	Yan Zhiyuan, Xie Biwei, Bao Yungang. HVMS: A Hybrid Vectorization-Optimized Mechanism of SpMV[J]. Journal of Computer Research and Development, 2024, 61(12): 2969-2984. DOI: 10.7544/issn1000-1239.202330204
[3]	Feng Jingge, He Yeping, Tao Qiuming, Ma Hengtai. SLP Vectorization Method Based on Multiple Isomorphic Transformations[J]. Journal of Computer Research and Development, 2023, 60(12): 2907-2927. DOI: 10.7544/issn1000-1239.202220354
[4]	Li Xiaodan, Wu Wenling, Zhang Li. Efficient Search for Optimal Vector Permutations of uBlock-like Structures[J]. Journal of Computer Research and Development, 2022, 59(10): 2275-2285. DOI: 10.7544/issn1000-1239.20220485
[5]	Chen Yu, Liu Zhongjin, Zhao Weiwei, Ma Yuan, Shi Zhiqiang, Sun Limin. A Large-Scale Cross-Platform Homologous Binary Retrieval Method[J]. Journal of Computer Research and Development, 2018, 55(7): 1498-1507. DOI: 10.7544/issn1000-1239.2018.20180078
[6]	Li Junnan, Yang Xiangrui, Sun Zhigang. DrawerPipe: A Reconfigurable Packet Processing Pipeline for FPGA[J]. Journal of Computer Research and Development, 2018, 55(4): 717-728. DOI: 10.7544/issn1000-1239.2018.20170927
[7]	Zhao Jianghua, Mu Shuting, Wang Xuezhi, Lin Qinghui, Zhang Xi, Zhou Yuanchun. Crowdsourcing-Based Scientific Data Processing[J]. Journal of Computer Research and Development, 2017, 54(2): 284-294. DOI: 10.7544/issn1000-1239.2017.20160850
[8]	Luo Zhangqi, Huang Kun, Zhang Dafang, Guan Hongtao, Xie Gaogang. A Many-Core Processor Resource Allocation Scheme for Packet Processing[J]. Journal of Computer Research and Development, 2014, 51(6): 1159-1166.
[9]	Wen Shuguang, Xie Gaogang. libpcap-MT: A General Purpose Packet Capture Library with Multi-Thread[J]. Journal of Computer Research and Development, 2011, 48(5): 756-764.
[10]	Tian Daxin, Liu Yanheng, Li Yongli, Tang Yi. A Fast Matching Algorithm and Conflict Detection for Packet Filter Rules[J]. Journal of Computer Research and Development, 2005, 42(7): 1128-1135.