Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning
Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers
Keywords: [ adversarial attacks; sequential decision making; detectability of adversarial attacks ]
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible.We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of temporal consistency makes them \textit{detectable} using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations.We introduce \textit{perfect illusory attacks}, a novel form of adversarial attack on sequential decision-makers that is both effective and provably \textit{statistically undetectable}. We then propose the more versatile \eattacks{}, which result in observation transitions that are consistent with the state-transition function of the adversary-free environment and can be learned end-to-end.Compared to existing attacks, we empirically find \eattacks{} to be significantly harder to detect with automated methods, and a small study with human subjects\footnote{IRB approval under reference xxxxxx/xxxxx} suggests they are similarly harder to detect for humans. We propose that undetectability should be a central concern in the study of adversarial attacks on mixed-autonomy settings.