基于抗测绘表征扰动的RAG敏感信息保护技术

谭学; 郑毅; 张云若; 陈怡然; 陈平; 薛向阳

doi:10.7544/issn1000-1239.202550713

基于抗测绘表征扰动的RAG敏感信息保护技术

Anti-Mapping Representation Perturbation for RAG Sensitive Information Protection

摘要

摘要: 检索增强生成（RAG）系统通过集成外部数据库扩展了语言模型的能力。然而，这种增强方式引入了一种新型隐私漏洞：测绘攻击（MA），它能揭示私有片段是否被索引及其检索方式。目前尚无专门防御此类攻击的策略。我们提出了AMRP-SIP框架，这种双重随机化方案能同时保护文档嵌入向量和检索轨迹，同时保持最先进的实用性。AMRP-SIP包含三个轻量级阶段：首先，通过正交投影将查询和文档压缩至低维潜在空间，隐藏原始嵌入向量并降低下游噪声；其次，自适应差分隐私注入自适应高斯噪声，确保实现(ε, δ)级别的文档片段隐私保护；最后，通过扰动丢弃层对相似度分数施加噪声扰动，并以概率p随机丢弃部分检索文档，从而模糊检索轨迹。在Wiki-40B、PubMed和IP-Database上的实验表明，AMRP-SIP将成员推理攻击（MIA）的AUC值从0.75降至0.27。该框架在保护敏感信息的同时，维持了与现有技术相当的检索性能，为RAG系统提供了首个针对测绘攻击的防御解决方案。

Abstract: Retrieval-augmented generation (RAG) systems extend language model capacity by incorporating an external database. However, this augmentation introduces a novel privacy vulnerability: mapping attacks (MA), which reveal whether a private fragment is indexed and how it is retrieved. However, there is currently no defense strategy specifically designed to counter such attacks. We introduce AMRP-SIP, a dual-randomization framework that concurrently protects both embeddings of the documents and retrieval traces, while preserving state-of-the-art utility. AMRP-SIP comprises three lightweight stages. First, a Random Orthogonal Projection compresses each query and document into a low-dimensional latent space, hiding raw embeddings and reducing downstream noise. Second, Adaptive Differential Privacy injects cluster-adaptive Gaussian noise, ensuring (ε, δ) fragment-level privacy. Third, a score-dropout layer introduces randomness by perturbing similarity scores with noise and probabilistically dropping a portion of the retrieved documents with probability p, thereby obfuscating the retrieval trajectory. Experiments on Wiki-40B, PubMed, and IP-Database demonstrate that AMRP-SIP reduces the AUC of membership inference attacks (MIA) from 0.75 to 0.27.

HTML全文

参考文献(0)

施引文献

资源附件(0)