Reranker Helps, but Not Enough: Towards Strong Poisoning Attacks Against Retrieval-Augmented Generation
Xiaokun Yang ⋅ Yesheng Liu ⋅ Xin Xiong ⋅ Jian Liang ⋅ Ran He ⋅ Tieniu Tan
Abstract
Retrieval-Augmented Generation (RAG) augments large language models with external knowledge, which in turn exposes their retrieval corpora to data poisoning risks. However, existing poisoning attacks exhibit limited effectiveness against RAG equipped with a reranker to enhance retrieval quality. Remarkably, this defensive capability requires no adversarial training: a reranker fine-tuned solely on benign, in-domain corpora can effectively filter malicious content. Towards realistic RAG red-teaming, we conclude practical prompt design principles that reveal reranker blind spots. Building on these insights, we introduce the Prompt-Perturbation Poisoning Attack ($\mathbf{P}^3 \mathbf{A}$). $\mathbf{P}^3 \mathbf{A}$ first employs rule-based prompt engineering to craft initial poisoned texts. It then injects subtle character-level perturbations into these texts, which promotes their ranking by the reranker while maintaining their adversarial effectiveness. These perturbations introduce only about 1\% textual change, ensuring the poisoned texts remain natural and readable. Extensive experiments show that $\mathbf{P}^3 \mathbf{A}$ achieves strong attack effectiveness and transferability, even when constrained to poisoning a single document. Code is available in the supplementary material.
Successful Page Load