• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Guo Hongjing, Tao Chuanqi, Huang Zhiqiu. Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation[J]. Journal of Computer Research and Development, 2024, 61(4): 1003-1017. DOI: 10.7544/issn1000-1239.202220745
Citation: Guo Hongjing, Tao Chuanqi, Huang Zhiqiu. Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation[J]. Journal of Computer Research and Development, 2024, 61(4): 1003-1017. DOI: 10.7544/issn1000-1239.202220745

Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation

Funds: This work was supported by the Key Program of the National Natural Science Foundation of China (U224120044), the National Natural Science Foundation of China (62202223), the Natural Science Foundation of Jiangsu Province (BK20220881), the Open Fund Project of the State Key Laboratory for Novel Software Technology (KFKT2021B32), and the Fundamental Research Funds for the Central Universities (NT2022027).
More Information
  • Author Bio:

    Guo Hongjing: born in 1996. PhD candidate. Student member of CCF. Her main research interest includes intelligent software testing

    Tao Chuanqi: born in 1984. PhD, associate professor. Senior member of CCF. His main research interests include intelligent software development and quality assurance of intelligent software

    Huang Zhiqiu: born in 1965. PhD, professor. Distinguished member of CCF. His main research interests include software quality assurance, system safety, and formal methods

  • Received Date: August 23, 2022
  • Revised Date: August 14, 2023
  • Available Online: January 24, 2024
  • Due to the complexity and uncertainty of deep neural network (DNN) models, generating test inputs to comprehensively test general and corner case behaviors of DNN models is of great significance for ensuring model quality. Current research primarily focuses on designing coverage criteria and utilizing fuzzing testing technique to generate test inputs, thereby improving test adequacy. However, few studies have taken into consideration the diversity and individual fault-revealing ability of test inputs. Surprise adequacy quantifies the neuron activation differences between a test input and the training set. It is an important metric to measure test adequacy, which has not been leveraged for test input generation. Therefore, we propose a surprise adequacy-guided test input generation approach. Firstly, the approach selects important neurons that contribute more to decision-making. Activation values of these neurons are used as features to improve the surprise adequacy metric. Then, seed test inputs are selected with error-revealing capability based on the improved surprise adequacy measurements. Finally, the approach utilizes the idea of coverage-guided fuzzing testing to jointly optimize the surprise adequacy value of test inputs and the prediction probability differences among classes. The gradient ascent algorithm is adopted to calculate the perturbation and iteratively generate test inputs. Empirical studies on 5 DNN models covering 4 different image datasets demonstrate that the improved surprise adequacy metric effectively captures surprising test inputs and reduces the time cost of the calculation. Concerning test input generation, compared with DeepGini and RobOT, the follow-up test set generated by using the proposed seed input selection strategy exhibits the highest surprise coverage improvement of 5.9% and 15.9%, respectively. Compared with DLFuzz and DeepXplore, the proposed approach achieves the highest surprise coverage improvement of 26.5% and 33.7%, respectively.

  • [1]
    The New York Times. After fatal uber crash, a self-driving start-up moves forward[EB/OL]. [2022-06-10].https://www.nytimes.com/2018/05/07/technology/uber-crash-autonomous-driveai.html
    [2]
    Zhang Jie, Harman M, Ma Lei, et al. Machine learning testing: Survey, landscapes and horizons[J]. IEEE Transactions on Software Engineering, 2022, 48(1): 1−36
    [3]
    Huang Xiaowei, Kroening D, Ruan Wenjie, et al. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability[J]. Computer Science Review, 2020, 37: 10270
    [4]
    王赞,闫明,刘爽,等. 深度神经网络测试研究综述[J]. 软件学报,2020,31(5):1255−1275

    Wang Zan, Yan Ming, Liu Shuang, et al. Survey on testing of deep neural networks[J]. Journal of Software, 2020, 31(5): 1255−1275 (in Chinese)
    [5]
    Xie Xiaofei, Ma Lei, Juefei-Xu F, et al. DeepHunter: A coverage guided fuzz testing framework for deep neural networks[C] //Proc of the 28th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2019: 146−157
    [6]
    代贺鹏,孙昌爱,金慧,等. 面向深度学习系统的模糊测试技术研究进展[J]. 软件学报,2023, 34(11): 5008−5028

    Dai Hepeng, Sun Chang’ai, Jin Hui, et al. State-of-the-art survey of fuzzing for deep learning systems[J]. Journal of Software, 2023, 34(11): 5008−5028
    [7]
    Pei Kexin, Cao Yinzhi, Yang Junfeng, et al. DeepXplore: Automated whitebox testing of deep learning systems[C] //Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 1−18
    [8]
    Guo Jianmin, Jiang Yu, Zhao Yue, et al. DLFuzz: Differential fuzzing testing of deep learning systems[C] //Proc of the 26th Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software. New York: ACM, 2018: 739−743
    [9]
    Kim J, Feldt R, Yoo S. Guiding deep learning system testing using surprise adequacy[C] //Proc of the 41st Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2019: 1039−1049
    [10]
    Kim J, Ju J, Feldt R, et al. Reducing DNN labelling cost using surprise adequacy: An industrial case study for autonomous driving[C] //Proc of the 28th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2022: 1466−1476
    [11]
    Kim S, Yoo S. Evaluating surprise adequacy for question answering[C] //Proc of the 42nd Int Conf on Software Engineering Workshops. New York: ACM, 2020: 197−202
    [12]
    Weiss M, Chakraborty R, Tonella P. A review and refinement of surprise adequacy[C] //Proc of the 3rd IEEE/ACM Int Workshop on Deep Learning for Testing and Testing for Deep Learning. Piscataway, NJ: IEEE, 2021: 17−24
    [13]
    Gerasimous S, Eniser H, Sen A, et al. Importance-driven deep learning system testing[C] //Proc of the 42nd ACM/IEEE Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2020: 702−713
    [14]
    Xie Xiaofei, Li Tianlin, Wang Jian, et al. NPC: Neuron path coverage via characterizing decision Logic of deep neural networks[J]. ACM Transactions on Software Engineering and Methodology, 2022, 31(3): 47: 1−47: 27
    [15]
    Ma Lei, Juefei-Xu F, Zhang Fuyuan, et al. DeepGauge: Multi-granularity testing criteria for deep learning systems[C] //Proc of the 33rd ACM/IEEE Int Conf on Automated Software Engineering. New York: ACM, 2018: 120−131
    [16]
    Feng Yang, Shi Qingkai, Gao Xinyu, et al. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks[C] //Proc of the 29th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2020: 177−188
    [17]
    Wang Jingyi, Chen Jialuo, Sun Youcheng, et al. RobOT: Robustness-oriented testing for deep learning systems[C] //Proc of the 43rd Int Conf on Software Engineering. Piscataway, NJ: IEEE, 2021: 300−311
    [18]
    Wang Dong, Wang Ziyuan, Fang Chunrong, et al. DeepPath: Path-driven testing criteria for deep neural networks[C] //Proc of the 1st IEEE Int Conf on Artificial Intelligence Testing. Piscataway, NJ: IEEE, 2019: 119−120
    [19]
    Ma Lei, Juefei-Xu F, Xue Minhui, et al. DeepCT: Tomographic combinatorial testing for deep learning systems[C] //Proc of the 26th Int Conf on Software Analysis, Evaluation and Reengineering. Piscataway, NJ: IEEE, 2019: 614−618
    [20]
    Du Xiaoning, Xie Xiaofei, Li Yi, et al. DeepStellar: Model-based quantitative analysis of stateful deep learning systems[C] //Proc of the 27th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2019: 477−487
    [21]
    李舵,董超群,司品超,等. 神经网络验证和测试技术研究综述[J]. 计算机工程与应用,2021,57(22):53−67

    Li Duo, Dong Chaoqun, Si Pinchao, et al. Survey of research on neural network verification and testing technology[J]. Computer Engineering and Applications, 2021, 57(22): 53−67 (in Chinese)
    [22]
    Bach S, Binder A, Montavon G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J]. PLoS ONE, 2015, 7(10): 1−46
    [23]
    纪守领,李进锋,杜天宇,等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展,2019,56(10):2071−2096

    Ji Shouling, Li Jinfeng, Du Tianyu, et al. Survey on techniques, applications and security of machine learning interpretability[J]. Journal of Computer Research and Development, 2019, 56(10): 2071−2096 (in Chinese)
    [24]
    沐燕舟,王赞,陈翔,等. 采用多目标优化的深度学习测试优化方法[J]. 软件学报,2022,33(7):2499−2524

    Mu Yanzhou, Wang Zan, Chen Xiang, et al. A deep learning test optimization method using multi-objective optimization[J]. Journal of Software, 2022, 33(7): 2499−2524 (in Chinese)
    [25]
    LeCun Y, Cortes C. The MNIST database of handwritten digits[EB/OL]. [2022-06-10]. http://yann.lecun.com/exdb/mnist/
    [26]
    Krizhevsky N, Vinod H, Geoffrey C, et al. The CIFAR-10 dataset[EB/OL]. [2022-06-10]. http://www.cs.toronto.edu/~kriz/cifar.html
    [27]
    Xiao Han, Rasul K, Vollgraf R. Fashion-MNIST is a dataset of Zalando’s article images[EB/OL]. [2022-06-10]. https://github.com/zalandoresearch/fashion-mnist
    [28]
    Udacity. Dataset wiki[EB/OL]. [2022-06-10]. https://github.com/udacity/self-driving-car/tree/master/datasets
    [29]
    Alber M, Lapuschkin S, Seegerer P, et al. iNNvestigate neural networks![J]. Journal of Machine Learning Research, 2019, 20(93): 1−8
    [30]
    Zhou Zhiyang, Dou Wensheng, Liu Jie, et al. DeepCon: Contribution coverage testing for deep learning systems[C] //Proc of the 28th IEEE Int Conf on Software Analysis, Evolution and Reengineering. Piscataway, NJ: IEEE, 2021: 189−200
    [31]
    Goodfellow I, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv preprint, arXiv: 1412.6572, 2015
    [32]
    Carlini N, Wagner D. Towards evaluating the robustness of neural networks[C] //Proc of the 38th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2017: 39−57
    [33]
    Lee S, Cha S, Lee D, et al. Effective white-box testing of deep neural networks with adaptive neuron-selection strategy[C] //Proc of the 29th ACM SIGSOFT Int Symp on Software Testing and Analysis. New York: ACM, 2020: 165−176
    [34]
    Zhang Pengcheng, Ren Bin, Dong Hai, et al. CAGFuzz: Coverage-guided adversarial generative fuzzing testing for image-based deep learning systems[J]. IEEE Transactions on Software Engineering, 2021, 48(11): 4630−4646
    [35]
    Shen Weijun, Li Yanhui, Chen Lin, et al. Multiple-boundary clustering and prioritization to promote neural network retraining[C] //Proc of the 35th IEEE/ACM Int Conf on Automated Software Engineering. Piscataway, NJ: IEEE, 2020: 410−422
  • Related Articles

    [1]Wang Houzhen, Qin Wanying, Liu Qin, Yu Chunwu, Shen Zhidong. Identity Based Group Key Distribution Scheme[J]. Journal of Computer Research and Development, 2023, 60(10): 2203-2217. DOI: 10.7544/issn1000-1239.202330457
    [2]Chen Yewang, Shen Lianlian, Zhong Caiming, Wang Tian, Chen Yi, Du Jixiang. Survey on Density Peak Clustering Algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394. DOI: 10.7544/issn1000-1239.2020.20190104
    [3]Zhang Qikun, Gan Yong, Wang Ruifang, Zheng Jiamin, Tan Yu’an. Inter-Cluster Asymmetric Group Key Agreement[J]. Journal of Computer Research and Development, 2018, 55(12): 2651-2663. DOI: 10.7544/issn1000-1239.2018.20170651
    [4]Xu Xiao, Ding Shifei, Sun Tongfeng, Liao Hongmei. Large-Scale Density Peaks Clustering Algorithm Based on Grid Screening[J]. Journal of Computer Research and Development, 2018, 55(11): 2419-2429. DOI: 10.7544/issn1000-1239.2018.20170227
    [5]Wang Haiyan, Dong Maowei. Latent Group Recommendation Based on Dynamic Probabilistic Matrix Factorization Model Integrated with CNN[J]. Journal of Computer Research and Development, 2017, 54(8): 1853-1863. DOI: 10.7544/issn1000-1239.2017.20170344
    [6]Gong Shufeng, Zhang Yanfeng. EDDPC: An Efficient Distributed Density Peaks Clustering Algorithm[J]. Journal of Computer Research and Development, 2016, 53(6): 1400-1409. DOI: 10.7544/issn1000-1239.2016.20150616
    [7]Zhang Qikun, Wang Ruifang, Tan Yu'an. Identity-Based Authenticated Asymmetric Group Key Agreement[J]. Journal of Computer Research and Development, 2014, 51(8): 1727-1738. DOI: 10.7544/issn1000-1239.2014.20121165
    [8]Zhu Mu, Meng Fanrong, and Zhou Yong. Density-Based Link Clustering Algorithm for Overlapping Community Detection[J]. Journal of Computer Research and Development, 2013, 50(12): 2520-2530.
    [9]Wang Feng, Zhou Yousheng, Gu Lize, Yang Yixian. A Multi-Policies Threshold Signature Scheme with Group Verifiability[J]. Journal of Computer Research and Development, 2012, 49(3): 499-505.
    [10]Cao Jia, Lu Shiwen. Research on Topology Discovery in the Overlay Multicast[J]. Journal of Computer Research and Development, 2006, 43(5): 784-790.
  • Cited by

    Periodical cited type(7)

    1. 毛伊敏,甘德瑾,廖列法,陈志刚. 基于Spark框架和ASPSO的并行划分聚类算法. 通信学报. 2022(03): 148-163 .
    2. 王永贵,林佳敏,何佳玉. 融合领导者影响与隐式信任度的群组推荐方法. 计算机工程与应用. 2022(09): 98-106 .
    3. 刘鑫,梅红岩,王嘉豪,李晓会. 图神经网络推荐方法研究. 计算机工程与应用. 2022(10): 41-49 .
    4. 刘聪,谢莉,杨慧中. 基于改进DPC的青霉素发酵过程多模型软测量建模. 化工学报. 2021(03): 1606-1615 .
    5. 刘功民,朱俊杰. WSN中利用双重接收器结合自适应加权数据融合的簇首优化聚类算法. 计算机应用与软件. 2021(05): 145-151 .
    6. 任昌鸿,安军. 改进PSO结合DSA技术的无线传感器网络均衡密度聚类方法. 计算机应用与软件. 2020(08): 122-129 .
    7. 许晓明,梅红岩,于恒,李晓会. 基于偏好融合的群组推荐方法研究综述. 小型微型计算机系统. 2020(12): 2500-2508 .

    Other cited types(13)

Catalog

    Article views (116) PDF downloads (51) Cited by(20)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return