Citation: | Ye Wentao, Hu Jiaqi, Wang Haobo, Chen Gang, Zhao Junbo. A Trusted Evaluation System for Safe Deployment of Large Language Models[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440566 |
The recent popularity of large language models (LLMs) has brought a significant impact to boundless fields, particularly through their open-ended ecosystem such as the APIs, open-sourced models, and plugins. However, with their widespread deployment, there is a general lack of research that thoroughly discusses and analyzes the potential risks concealed. In that case, we intend to conduct a preliminary but pioneering study covering the robustness, consistency, and credibility of LLMs systems. With most of the related literature in the era of LLMs uncharted, we propose an automated workflow that copes with an upscaled number of queries/responses. Overall, we conduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA, and OPT. Core to our workflow consists of a data primitive, followed by an automated interpreter that evaluates these LLMs under different adversarial metrical systems. As a result, we draw several, and perhaps unfortunate, conclusions that are quite uncommon from this trendy community. Briefly, they are: 1) the minor but inevitable error occurrence in the user-generated query input may, by chance, cause the LLM to respond unexpectedly; 2) LLMs possess poor consistency when processing semantically similar query input. In addition, as a side finding, we find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. While this phenomenon demonstrates the powerful memorization of the LLMs, it raises serious concerns about using such data for LLM-involved evaluation in academic development. To deal with it, we propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation. Extensive empirical studies are tagged to support the aforementioned claims.
[1] |
舒文韬,李睿潇,孙天祥,等. 大型语言模型:原理、实现与发展[J]. 计算机研究与发展,2024,61(2):351−361 doi: 10.7544/issn1000-1239.202330303
Shu Wentao, Li Ruixiao, Sun Tianxiang, et al. Large language models: Principles, implementation, and progress[J]. Journal of Computer Research and Development, 2024, 61(2): 351−361 (in Chinese) doi: 10.7544/issn1000-1239.202330303
|
[2] |
Touvron H, Lavril T, Izacard G, et al. Llama: Open and efficient foundation language models[J]. arXiv preprint, arXiv: 2302.13971, 2023
|
[3] |
陈慧敏,刘知远,孙茂松. 大语言模型时代的社会机遇与挑战[J]. 计算机研究与发展,2024,61(5):1094−1103 doi: 10.7544/issn1000-1239.202330700
Chen Huimin, Liu Zhiyuan, Sun Maosong. The social opportunities and challenges in the era of large language models[J]. Journal of Computer Research and Development, 2024, 61(5): 1094−1103 (in Chinese) doi: 10.7544/issn1000-1239.202330700
|
[4] |
Zhong Qihuang, Ding Liang, Liu Juhua, et al. Can chatgpt understand too? A comparative study on chatgpt and fine-tuned bert[J]. arXiv preprint, arXiv: 2302.10198, 2023
|
[5] |
Qin Chengwei, Zhang A, Zhang Zhuosheng, et al. Is chatgpt a general-purpose natural language processing task solver?[C]// Proc of the 2023 Conf on Empirical Methods in Natural Language Processing, Stroudsburg, PA: ACL, 2023: 1339–1384
|
[6] |
Huang Fan, Kwak H, An Jisun. Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech[C]// Companion Proc of the ACM Web Conf 2023. New York: ACM, 2023: 294−297
|
[7] |
Kocoń J, Cichecki I, Kaszyca O, et al. Chatgpt: Jack of all trades, master of none[J]. Information Fusion, 2023, 99: 101861
|
[8] |
Yang Xianjun, Li Yan, Zhang Xinlu, et al. Exploring the limits of chatgpt for query or aspect-based text summarization[J]. arXiv preprint, arXiv: 2302.08081, 2023
|
[9] |
Jia R, Liang P. Adversarial examples for evaluating reading comprehension systems[C]// Proc of the 2017 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2017: 2021–2031
|
[10] |
Gao Ji, Lanchantin J, Soffa M L, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//Proc of the 2018 IEEE Security and Privacy Workshops (SPW). Piscataway, NJ: IEEE, 2018: 50−56
|
[11] |
Belinkov Y, Bisk Y. Synthetic and natural noise both break neural machine translation[J]. arXiv preprint, arXiv: 1711.02173, 2017
|
[12] |
Heigold G, Neumann G, van Genabith J. How robust are character-based word embeddings in tagging and mt against wrod scramlbing or randdm nouse?[C]//Proc of the 13th Conf of the Association for Machine Translation in the Americas (Volume 1: Research Track). Stroudsburg, PA: Association for Machine Translation in the Americas, 2018: 68−80
|
[13] |
Eger S, Şahin G G, Rücklé A, et al. Text processing like humans do: Visually attacking and shielding nlp systems[C]//Proc of the 2019 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: ACL, 2019: 1634−1647
|
[14] |
Papernot N, McDaniel P, Swami A, et al. Crafting adversarial input sequences for recurrent neural networks[C]//Proc of the 35th IEEE Military Communications Conference (MILCOM 2016). Piscataway, NJ: IEEE, 2016: 49−54
|
[15] |
Alzantot M, Sharma Y, Elgohary A, et al. Generating natural language adversarial examples[C]//Proc of the 2018 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL , 2018: 2890–2896
|
[16] |
Zhao Zhengli, Dua D, Singh S. Generating natural adversarial examples[J]. arXiv preprint, arXiv: 1710.11342, 2017
|
[17] |
Iyyer M, Wieting J, Gimpel K, et al. Adversarial example generation with syntactically controlled paraphrase networks[C]// Proc of the 2018 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: ACL , 2018: 1875–1885
|
[18] |
Li Linyang, Shao Yunfan, Song Demin, et al. Generating adversarial examples in Chinese texts using sentence-pieces[J]. arXiv preprint, arXiv: 2012.14769, 2020
|
[19] |
Li Jinfeng, Ji Shouling, Du Tianyu, et al. Textbugger: Generating adversarial text against real-world applications[J]. arXiv preprint, arXiv: 1812.05271, 2018
|
[20] |
Zang Yuan, Qi Fanchao, Yang Chenghao, et al. Word-level textual adversarial attacking as combinatorial optimization[C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA: ACL, 2020: 6066–6080
|
[21] |
Ebrahimi J, Rao A, Lowd D, et al. Hotflip: White-box adversarial examples for text classification[C]// Proc of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: ACL, 2018: 31–36
|
[22] |
Li Linyang, Ma Ruotian, Guo Qipeng, et al. Bert-attack: Adversarial attack against bert using bert[C]// Proc of the 2020 Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2020: 6193–6202
|
[23] |
Wallace E, Feng S, Kandpal N, et al. Universal adversarial triggers for attacking and analyzing nlp[C]// Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 2153–2162
|
[24] |
Garg S, Ramakrishnan G. Bae: Bert-based adversarial examples for text classification[C]// Proc of the 2020 Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2020: 6174–6181
|
[25] |
Altinisik E, Sajjad H, Sencar H T, et al. Impact of adversarial training on robustness and generalizability of language models[C]// Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA: ACL, 2023: 7828–7840
|
[26] |
Moradi M, Samwald M. Evaluating the robustness of neural language models to input perturbations[C]// Proc of the 2021 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 1558–1570
|
[27] |
Stolfo A, Jin Zhijing, Shridhar K, et al. A causal framework to quantify the robustness of mathematical reasoning with language models[C]// Proc of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2023: 545–561
|
[28] |
Zhang Yunxiang, Pan Liangming, Tan S, et al. Interpreting the robustness of neural nlp models to textual perturbations[C]//Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg, PA: ACL, 2022: 3993−4007
|
[29] |
陈炫婷,叶俊杰,祖璨,等. GPT系列大语言模型在自然语言处理任务中的鲁棒性[J]. 计算机研究与发展,2024,61(5):1128−1142 doi: 10.7544/issn1000-1239.202330801
Chen Xuanting, Ye Junjie, Zu Can, et al. Robustness of GPT large language models on natural language processing tasks[J]. Journal of Computer Research and Development, 2024, 61(5): 1128−1142 (in Chinese) doi: 10.7544/issn1000-1239.202330801
|
[30] |
Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proc of the 2019 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: ACL, 2019: 4171–4186
|
[31] |
Yang Zhilin, Dai Zihang, Yang Yiming, et al. Xlnet: Generalized autoregressive pretraining for language understanding[C]// Proc of the 33rd Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2019: 5753−5763
|
[32] |
Wang J, Hu X, Hou W, et al. On the robustness of chatgpt: An adversarial and out-of-distribution perspective[J]. arXiv preprint, arXiv: 2302.12095, 2023
|
[33] |
Brendel W, Rauber J, Bethge M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models[J]. arXiv preprint, arXiv: 1712.04248, 2017
|
[34] |
Neekhara P, Hussain S, Dubnov S, et al. Adversarial reprogramming of text classification neural networks[C]// Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 5216–5225
|
[35] |
Hambardzumyan K, Khachatrian H, May J. Warp: Word-level adversarial reprogramming[C]// Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2021: 4921–4933
|
[36] |
Wallace E, Feng S, Kandpal N, et al. Universal adversarial triggers for attacking and analyzing nlp[C]// Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 2153–2162
|
[37] |
Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint, arXiv: 1706.06083, 2017
|
[38] |
Geva M, Khashabi D, Segal E, et al. Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 346−361
|
[39] |
Ling Wang, Yogatama D, Dyer C, et al. Program induction by rationale generation: Learning to solve and explain algebraic word problems[C]//Proc of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2017: 158−167
|
[40] |
Onoe Y, Zhang M J, Choi E, et al. Creak: A dataset for commonsense reasoning over entity knowledge[J]. arXiv preprint, arXiv: 2109.01653, 2021
|
[41] |
Zhang Qiyuan, Wang Lei, Yu Sicheng, et al. Noahqa: Numerical reasoning with interpretable graph question answering dataset[C]// Findings of the Association for Computational Linguistics: EMNLP 2021. Stroudsburg, PA: ACL, 2021: 4147–4161
|
[42] |
Cobbe K, Kosaraju V, Bavarian M, et al. Training verifiers to solve math word problems[J] arXiv preprint, arXiv: 2110.14168, 2021
|
[43] |
Dodge J, Gane A, Zhang Xiang, et al. Evaluating prerequisite qualities for learning end-to-end dialog systems[J]. arXiv preprint arXiv: 1511.06931, 2015
|
[44] |
Gaunt A L, Johnson M A, Riechert M, et al. Ampnet: Asynchronous model-parallel training for dynamic neural networks[J]. arXiv preprint, arXiv: 1705.09786, 2017
|
[45] |
Khot T, Clark P, Guerquin M, et al. Qasc: A dataset for question answering via sentence composition[C]//Proc of the 34th AAAI Conf on Artificial Intelligence, vol 34, no 05. Palo Alto, CA: AAAI, 2020: 8082−8090
|
[46] |
Aggarwal S, Mandowara D, Agrawal V, et al. Explanations for commonsenseqa: New dataset and models[C]//Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2021: 3050−3065
|
[47] |
Camburu O M, Rocktäschel T, Lukasiewicz T, et al. e-snli: Natural language inference with natural language explanations[C]// Proc of the 32nd Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2018: 9560−9572
|
[48] |
Wang Cunxiang, Liang Shuailong, Zhang Yue, et al. Does it make sense? And why? A pilot study for sense making and explanation[C]// Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4020–4026
|
[49] |
Lamm M, Palomaki J, Alberti C, et al. Qed: A framework and dataset for explanations in question answering[J]. Transactions of the Association for computational Linguistics, 2021, 9: 790−806
|
[50] |
Brill E. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging[J]. Computational Linguistics, 1995, 21(4): 543−565
|
[51] |
Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners[C]// Proc of the 34th Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2020: 1877−1901
|
[52] |
Wang A, Singh A, Michael J, et al. Glue: A multi-task benchmark and analysis platform for natural language understanding[C]// Proc of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: ACL , 2018: 353–355
|
[53] |
Wang Boxin, Xu Chejian, Wang Shuohang, et al. Adversarial glue: A multi-task benchmark for robustness evaluation of language models[J]. arXiv preprint, arXiv: 2111.02840, 2021
|
[54] |
Nie Yixin, Williams A, Dinan E, et al. Adversarial nli: A new benchmark for natural language understanding[C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 4885–4901
|
[55] |
Miller G A. Wordnet: A lexical database for english[J]. Communications of the ACM, 1995, 38(11): 39−41 doi: 10.1145/219717.219748
|
[56] |
Jabri A, Joulin A, van der Maaten L. Revisiting visual question answering baselines[C]// Proc of the 14th European Conf on Computer Vision. Cham: Springer International Publishing, 2016: 727−739
|
[57] |
Nguyen-Son H Q, Thao T P, Hidano S, et al. Identifying adversarial sentences by analyzing text complexity[C]//Proc of the 33rd Pacific Asia Conf on Language, Information and Computation. Tokyo: Waseda Institute for the Study of Language and Information, 2019: 182−190
|
[58] |
Goodman D, Zhonghou L, et al. Fastwordbug: A fast method to generate adversarial text against NLP applications[J]. arXiv preprint, arXiv: 2002.00760, 2020
|
[59] |
Zheng Xiaoqing, Zeng Jiehang, Zhou Yi, et al. Evaluating and enhancing the robustness of neural network-based dependency parsing models with adversarial examples[C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 6600–6610
|
[60] |
Xue Mingfu, Yuan Chengxiang, Wang Jian, et al. DPAEG: A dependency parse-based adversarial examples generation method for intelligent q&a robots[J]. Security and Communication Networks, 2020, 2020(1): 5890820
|
[61] |
Wang Wenqi, Wang Run, Wang Lina, et al. Towards a robust deep neural network against adversarial texts: A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 35(3): 3159−3179
|
[62] |
Rai A, Borah S. Study of various methods for tokenization[C]// Applications of Internet of Things: Proc of ICCCIOT 2020. Singapore: Springer Singapore, 2021: 193−200
|
[63] |
Almeida F, Xexéo G. Word embeddings: A survey[J]. arXiv preprint, arXiv: 1901.09069, 2019
|
[64] |
Zhang S, Roller S, Goyal N, et al. Opt: Open pre-trained transformer language models[J]. arXiv preprint, arXiv: 2205.01068, 2022
|
[65] |
Ye Wentao, Hu Jiaqi, Li Liyao, et al. Data contamination calibration for black-box LLMs[C]// Findings of the Association for Computational Linguistics: ACL 2024. Stroudsburg, PA: ACL, 2024: 10845–10861
|
[66] |
Zha Liangyu, Zhou Junlin, Li Liyao, et al. Tablegpt: Towards unifying tables, nature language and commands into one gpt[J]. arXiv preprint, arXiv: 2307.08674, 2023
|
[1] | Yang Guang, Zhou Yu, Chen Xiang, Zhang Xiangyu. CodeScore-R: An Automated Robustness Metric for Assessing the Functional Correctness of Code Synthesis[J]. Journal of Computer Research and Development, 2024, 61(2): 291-306. DOI: 10.7544/issn1000-1239.202330715 |
[2] | Yu Yang, Xia Chunhe, Wang Xinghe. A Cloud Model Based Trust Evaluation Model for Defense Agent[J]. Journal of Computer Research and Development, 2015, 52(10): 2178-2191. DOI: 10.7544/issn1000-1239.2015.20150417 |
[3] | Si Guannan, Ren Yuhan, Xu Jing, and Yang Jufeng. A Dependability Evaluation Model for Internetware Based on Bayesian Network[J]. Journal of Computer Research and Development, 2012, 49(5): 1028-1038. |
[4] | Cai Hongyun, Tian Junfeng, Li Zhen, and He Lihui. Trust Model Based on Trust Area and Evaluation Credibility[J]. Journal of Computer Research and Development, 2011, 48(11): 2131-2138. |
[5] | Tian Junfeng, Du Ruizhong, Liu Yuling. Trust Evaluation Model Based on Node Behavior Character[J]. Journal of Computer Research and Development, 2011, 48(6): 934-944. |
[6] | Ding Shuai, Lu Fujun, Yang Shanlin, Xia Chengyi. A Requirement-Driven Software Trustworthiness Evaluation and Evolution Model[J]. Journal of Computer Research and Development, 2011, 48(4): 647-655. |
[7] | Zhan Jing, Zhang Huanguo. Automated Testing of the Trusted Platform Module[J]. Journal of Computer Research and Development, 2009, 46(11): 1839-1846. |
[8] | Yang Shanlin, Ding Shuai, and Chu Wei. Trustworthy Software Evaluation Using Utility Based Evidence Theory[J]. Journal of Computer Research and Development, 2009, 46(7): 1152-1159. |
[9] | Tian Junfeng, Xiao Bing, Ma Xiaoxue, and Wang Zixian. The Trust Model and Its Analysis in TDDSS[J]. Journal of Computer Research and Development, 2007, 44(4): 598-605. |
[10] | Chen Feifei and Gui Xiaolin. Research on Dynamic Trust-Level Evaluation Mechanism Based on Machine Learning[J]. Journal of Computer Research and Development, 2007, 44(2): 223-229. |