高级检索

    面向大语言模型安全部署的可信评估体系

    A Trusted Evaluation System for Safe Deployment of Large Language Models

    • 摘要: 最近,大语言模型(large language model,LLM)的流行在众多领域带来了重大影响,特别是它们的开放式生态系统,如应用程序接口、开源模型和插件. 然而,尽管大模型已经广泛部署,对其潜在风险进行深入讨论与分析的研究仍然普遍缺乏. 在这种情况下,针对大模型系统的鲁棒性、一致性和可信性进行一项初步但具有开创性的研究. 由于大模型时代的许多文献都尚未实证,提出了一个自动化的工作流,用以应对不断增长的查询和响应. 总体而言,对包括ChatGPT,LLaMA,OPT在内的主流大模型进行了100多万次查询. 工作流程的核心是一个数据原语,然后是一个自动解释器,它在不同的对抗性度量系统下评估这些大模型. 最终,从这一主流社区中得出了几个十分不同寻常结论(一定程度上不太乐观). 简而言之,这些结论包括:1)用户生成的查询输入中的微小但不可避免的错误可能偶然地导致大模型的意外响应;2)大模型在处理语义相似的查询时具有较差的一致性. 此外,本研究还附带发现ChatGPT即使在输入受到极端污染的情况下仍然能够产生正确的答案. 这一现象虽然表明了大模型的强大记忆力,但也引发了人们对在学术发展中使用大模型参与评估的严重关切. 为了解决这一问题,提出了一个与数据集相关联的新指标,该指标大致决定了基于这些数据对大模型进行评估的可行性. 本研究进行了广泛的实证研究,以支持上述主张.

       

      Abstract: The recent popularity of large language models (LLMs) has brought a significant impact to boundless fields, particularly through their open-ended ecosystem such as the APIs, open-sourced models, and plugins. However, with their widespread deployment, there is a general lack of research that thoroughly discusses and analyzes the potential risks concealed. In that case, we intend to conduct a preliminary but pioneering study covering the robustness, consistency, and credibility of LLMs systems. With most of the related literature in the era of LLMs uncharted, we propose an automated workflow that copes with an upscaled number of queries/responses. Overall, we conduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA, and OPT. Core to our workflow consists of a data primitive, followed by an automated interpreter that evaluates these LLMs under different adversarial metrical systems. As a result, we draw several, and perhaps unfortunate, conclusions that are quite uncommon from this trendy community. Briefly, they are: 1) the minor but inevitable error occurrence in the user-generated query input may, by chance, cause the LLM to respond unexpectedly; 2) LLMs possess poor consistency when processing semantically similar query input. In addition, as a side finding, we find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. While this phenomenon demonstrates the powerful memorization of the LLMs, it raises serious concerns about using such data for LLM-involved evaluation in academic development. To deal with it, we propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation. Extensive empirical studies are tagged to support the aforementioned claims.

       

    /

    返回文章
    返回