Timezone: »

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang

Wed Jul 21 09:00 AM -- 11:00 AM (PDT) @ Virtual
We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of $\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $K$ is the number of episodes, and $C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a $\sqrt{K}$-type regret in this setting and is computationally efficient.

Author Information

Tianhao Wu (Peking University)
Yunchang Yang (Peking University)
Simon Du (University of Washington)
Liwei Wang (Peking University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors