大语言模型存储机制安全风险综述

王柳; 王申奥; 侯心怡; 赵建; 吴荣鑫; 向乔; 赵彦杰; 王祎

doi:10.7544/issn1000-1239.202550481

大语言模型存储机制安全风险综述

Survey of Storage Security Threats for Large Language Models

摘要

摘要: 大语言模型作为近些年自然语言处理领域的核心技术，在文本生成、信息检索和对话系统中展现了卓越的能力. 诸如 ChatGPT，LLaMA，Gemini 等模型在医疗、教育、金融等领域实现了接近甚至超越人类水平的性能. 然而，随着大语言模型的广泛应用，其存储机制在生命周期中的安全与隐私风险日益显现. 模型文件存储、推理缓存和知识向量存储作为核心模块，在提升模型功能和效率的同时，也暴露出权重泄露、后门注入、侧信道攻击、知识污染与工作流劫持等一系列安全威胁. 为应对这些风险，研究者提出了模型加密、后门清除、缓存分区与隔离、检索内容过滤与验证等多种防御策略. 尽管现有技术在一定程度上提升了安全性，但大语言模型存储技术仍面临一系列关键安全挑战. 系统性综述了大语言模型存储机制的安全风险及其防御技术，重点分析了攻击面与缓解策略，并提出了未来研究方向，以推动大语言模型存储机制的安全性与可靠性发展.

Abstract: Large language models (LLMs), as a cornerstone technology in natural language processing (NLP), have demonstrated exceptional capabilities in text generation, information retrieval, and conversational systems. These models, such as ChatGPT, LLaMA, and Gemini, have been applied across various fields, including healthcare, education, and finance, achieving near-human or even superhuman performance. However, with the widespread adoption of LLMs, their storage mechanisms face significant security and privacy risks throughout their lifecycle. Core storage modules, including model file storage, inference caching, and knowledge vector storage, support the functionality and efficiency of LLMs but also expose vulnerabilities and threats. For example, model file storage faces risks such as weight leakage and backdoor injection, inference caching is susceptible to side-channel attacks, and knowledge vector storage faces data poisoning and workflow hijacking. To address these emerging challenges, researchers have proposed defense strategies such as model encryption, backdoor detection, cache partitioning, and content filtering techniques. Although existing techniques have improved security to some extent, LLM storage technologies still face a range of critical challenges. This paper systematically reviews security risks and defense mechanisms for LLM storage across its lifecycle, focusing on attack surfaces and mitigation strategies. It identifies current limitations and highlights future research directions to enhance the security and reliability of LLM storage mechanisms.

HTML全文

参考文献(139)

施引文献

资源附件(0)