Poster
in
Workshop: Principles of Distribution Shift (PODS)
Adversarial Cheap Talk
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster
Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the learning agent’s parameters, environment or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary has a minimal range of influence over the Victim. Parameterised as a deterministic policy that only conditions on the current state, an Adversary can merely append information to a Victim’s observation. To motivate the minimum-viability, we prove that in this setting the Adversary cannot occlude the ground truth, influence the underlying dynamics of the environment, introduce non-stationarity, add stochasticity, see the Victim’s actions, or access their parameters. Additionally, we present a novel meta-learning algorithm to train the Adversary, called adversarial cheap talk (ACT). Using ACT, we demonstrate that the resulting Adversary still manages to influence the Victim’s training and test performance despite these restrictive assumptions. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner’s function approximation and helping the Victim’s performance by appending useful features. Finally, we demonstrate that an ACT Adversary can append information during train-time to directly and arbitrarily control the Victim at test-time in a zero-shot manner.