Advanced Search
    Mi Zhang, Xudong Pan, Min Yang. JADE-DB:A Universal Testing Benchmark for Large Language Model Safety based on Targeted Mutation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330959
    Citation: Mi Zhang, Xudong Pan, Min Yang. JADE-DB:A Universal Testing Benchmark for Large Language Model Safety based on Targeted Mutation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330959

    JADE-DB:A Universal Testing Benchmark for Large Language Model Safety based on Targeted Mutation

    • This paper proposes a universal safety testing benchmark for large language models (LLMs), JADE-DB. The benchmark is automatically constructed via the targeted mutation approach, which is able to convert test questions that are manually crafted by experienced LLM testers and multidisciplinary experts to highly threatening universal test questions. The converted questions still preserve the naturalness of human language, preserve the core semantics of the original question, and in the meantime are able to consistently break over ten widely-used LLMs. Based on the incremental linguistic complexity, JADE-DB incorporates three levels of LLM safety testing, namely, basic, advanced and dangerous, which accounts for thousands of test questions covering 4 major unsafe generation categories, i.e., crime, tort, bias and core values, spanning over 30 unsafe topics. Specifically, we construct three dangerous safety benchmarks respectively for the three groups of LLMs, i.e., eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. The benchmarks simultaneously trigger harmful generation of multiple LLMs, with an average unsafe generation ratio of 70%. The results indicate that, due to the complexity of human language, most of the current best LLMs can hardly learn the infinite number of different syntactic structures of human language and thus recognize the invariant evil therein.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return