Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning
Shijie Liu ⋅ Andrew C. Cullen ⋅ Paul MONTAGUE ⋅ Sarah Erfani ⋅ Benjamin Rubinstein
Abstract
Existing backdoor attacks on Reinforcement Learning (RL) typically rely on unrealistic white-box access to victim parameters, rewards, or observations. Inspired by real world behaviors, we introduce the Supply-Chain Backdoor (SCAB) attack to demonstrate that such assumptions are unnecessary. SCAB targets the common practice of training with third-party policies, poisoning the dataset solely through a black-box of legitimate agent-environment interactions. With only 3% data corruption, SCAB achieves a 90% attack success rate and reduces victim returns by 80%. These findings expose a critical vulnerability in the modern RL supply chain, highlighting that reliance on untrusted external agents constitutes a severe and practical security risk.
Successful Page Load