Poster
in
Workshop: New Frontiers in Adversarial Machine Learning
Reducing Exploitability with Population Based Training
Pavel Czempin ·
Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games.Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim.Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new adversaries.We conjecture this limitation was due to insufficient diversity of adversaries seen during training.We propose a defense using population-based training to pit the victim against a range of opponents.We evaluate this defense's robustness against new adversaries in two low-dimensional environments.We find that our defense significantly increases robustness against adversaries in both environments and show that robustness is correlated with the size of the opponent population.