Toward Subspace-Perturbed Trajectory-Aware Backdoor Attacks in Deep Reinforcement Learning
Abstract
Deep Reinforcement Learning agents are in- creasingly used in safety-critical domains but remain vulnerable to stealthy backdoor attacks. Existing outer-loop attacks face a trade-off be- tween perceptual stealth, poisoning efficiency, and value-function consistency, often making the at- tack ineffective or easily exposed. To address these challenges, we propose SpecDRL, a uni- fied framework that ❶ embeds triggers in the least sensitive subspaces of the state manifold via Subspace-Aware Injection, exploiting percep- tual blind spots, ❷ selects the most influential time steps for poisoning through Value-Guided Strategic Sampling based on Return-to-Go and Temporal-Difference error, and ❸ preserves re- ward integrity via Bellman-Consistent Dynamic Reward Poisoning, which analytically enforces ϵ- consistency of value functions and bounds global return deviations. Experiments across 12 Atari en- vironments demonstrate that SpecDRL achieves near-100% attack success, accelerates backdoor convergence, and maintains benign task perfor- mance.