The paper introduces MEntA, a highly query-efficient and surrogate-free membership inference attack that uses natural-language entailment to detect if a specific document was used by a RAG system, achieving high accuracy with only five queries.
Retrieval-augmented generation (RAG) has become central to large language model (LLM) deployments, grounding responses in enterprise or proprietary data to reduce hallucinations. However, this design introduces a new privacy risk: model outputs may signal the presence of specific documents in the retrieval corpus, enabling membership inference attacks (MIAs) that leak sensitive information. Existing MIAs are feasible, but they often rely on easily detected templated queries or require many non-templated yet costly and repetitive queries, limiting practicality. We ask: Can an adversary launch a limited-budget, surrogate-free, stealthy, and defense-agnostic membership inference attack using non-templated queries? We present MEntA (Membership Entailment Attack), a query-efficient MIA that leverages natural-language entailment to maximize information gained per query. By asking low-cost, broad, information-seeking questions and measuring entailment between model responses and candidate documents, MEntA eliminates the need for costly shadow models and large query budgets. Across NFCorpus, SCIDOCS, and TREC-COVID, MEntA achieves up to 0.991 AUC with only 5 queries, outperforming prior methods by up to 0.42 AUC under equivalent conditions. It remains effective under state-of-the-art (SOTA) RAG defenses, while current detectors either miss MEntA or flag benign queries at high rates. Regarding cost, MEntA reduces total attack cost by up to 65 $\times$ lower compared to SOTA attacks under the same attack setting. Our findings expose the feasibility of realistic, low-cost privacy leakage in RAG systems and highlight the urgent need for privacy-aware retrieval and defense mechanisms.
A Critical Review on the Effectiveness and Privacy Threats of Membership Inference Attacks
The paper proposes a new evaluation framework showing that, under realistic cond…
PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Gener…
The paper introduces PIDP-Attack, a novel compound adversarial attack that combi…
Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents
The paper introduces AutoMIA, a novel framework that uses LLM agents to automate…
Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Bench…
This paper provides the first comprehensive, end-to-end survey dedicated to the…
Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retriev…
The paper systematically evaluates advanced retrieval-augmented generation (RAG)…
Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions
This paper proposes a comprehensive taxonomy (SLOT) to systematically categorize…
Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from th…
This paper empirically evaluates the effectiveness of Differential Privacy (DP)…
ReproMIA: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks
The paper introduces ReproMIA, a novel and efficient framework that uses model r…