Adaptive Reward-Poisoning Attacks against Reinforcement Learning
Xuezhou Zhang · Yuzhe Ma · Adish Singla · Jerry Zhu

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy. We categorize such attacks by the infinity-norm constraint on $\delta_t$: We provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe; we provide a corresponding upper threshold above which the attack is feasible. Feasible attacks can be further categorized as non-adaptive where $\delta_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $\delta_t$ depends further on the RL agent's learning process at time $t$. Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps. We provide a constructive proof that a Fast Adaptive Attack strategy achieves the polynomial rate. Finally, we show that empirically an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.

Xuezhou Zhang (UW-Madison)
Yuzhe Ma (Univ. of Wisconsin-Madison)
Adish Singla (Max Planck Institute (MPI-SWS))
Adish Singla

Adish Singla is a faculty member at the Max Planck Institute for Software Systems (MPI-SWS), Germany, where he has been leading the Machine Teaching Group since 2017. He conducts research in the area of Machine Teaching, with a particular focus on open-ended learning and problem-solving domains. In recent years, his research has centered around developing AI-driven educational technology for introductory programming environments. He has received several awards for his research, including an AAAI Outstanding Paper Honorable Mention Award (2022) and an ERC Starting Grant (2021). He also has extensive experience working in the industry and is a recipient of several industry awards, including a research grant from Microsoft Research Ph.D. Scholarship Programme (2018), Facebook Graduate Fellowship (2015), Microsoft Tech Transfer Award (2011), and Microsoft Gold Star Award (2010).

Jerry Zhu (University of Wisconsin-Madison)

