Timezone: »
Robust reinforcement learning (RL) seeks to train policies that can perform well under environment perturbations or adversarial attacks. Existing approaches typically assume that the space of possible perturbations remains the same across timesteps. However, in many settings, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks demonstrate that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces.
Author Information
Yongyuan Liang (Sun Yat-sen University)
Yanchao Sun (University of Maryland, College Park)
Ruijie Zheng (University of Maryland, College Park)
Xiangyu Liu (University of Maryland, College Park)
Tuomas Sandholm (Carnegie Mellon University)
Tuomas Sandholm is Angel Jordan Professor of Computer Science at Carnegie Mellon University. He is Founder and Director of the Electronic Marketplaces Laboratory. He has published over 450 papers. With his student Vince Conitzer, he initiated the study of automated mechanism design in 2001. In parallel with his academic career, he was Founder, Chairman, and CTO/Chief Scientist of CombineNet, Inc. from 1997 until its acquisition in 2010. During this period the company commercialized over 800 of the world's largest-scale generalized combinatorial multi-attribute auctions, with over $60 billion in total spend and over $6 billion in generated savings. He is Founder and CEO of Optimized Markets, Strategic Machine, and Strategy Robot. Also, his algorithms run the UNOS kidney exchange, which includes 69% of the transplant centers in the US. He has developed the leading algorithms for several general classes of game. The team that he leads is the two-time world champion in computer Heads-Up No-Limit Texas Hold’em poker, and Libratus became the first and only AI to beat top humans at that game. Among his many honors are the NSF Career Award, inaugural ACM Autonomous Agents Research Award, Sloan Fellowship, Carnegie Science Center Award for Excellence, Edelman Laureateship, Newell Award for Research Excellence, and Computers and Thought Award. He is Fellow of the ACM, AAAI, and INFORMS. He holds an honorary doctorate from the University of Zurich.
Furong Huang (University of Maryland)

Furong Huang is an Assistant Professor of the Department of Computer Science at University of Maryland. She works on statistical and trustworthy machine learning, reinforcement learning, graph neural networks, deep learning theory and federated learning with specialization in domain adaptation, algorithmic robustness and fairness. Furong is a recipient of the MIT Technology Review Innovators Under 35 Asia Pacific Award, the MLconf Industry Impact Research Award, the NSF CRII Award, the Adobe Faculty Research Award, three JP Morgan Faculty Research Awards and finalist of AI in Research - AI researcher of the year for Women in AI Awards North America. She received her Ph.D. in electrical engineering and computer science from UC Irvine in 2016, after which she spent one year as a postdoctoral researcher at Microsoft Research NYC.
Stephen Mcaleer (UC Irvine)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 : Adapting Robust Reinforcement Learning to Handle Temporally-Coupled Perturbations »
Dates n/a. Room
More from the Same Authors
-
2022 : Everyone Matters: Customizing the Dynamics of Decision Boundary for Adversarial Robustness »
Yuancheng Xu · Yanchao Sun · Furong Huang -
2022 : Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy »
xiyao wang · Wichayaporn Wongkamjan · Furong Huang -
2022 : Certifiably Robust Multi-Agent Reinforcement Learning against Adversarial Communication »
Yanchao Sun · Ruijie Zheng · Parisa Hassanzadeh · Yongyuan Liang · Soheil Feizi · Sumitra Ganesh · Furong Huang -
2022 : Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning »
Yongyuan Liang · Yanchao Sun · Ruijie Zheng · Furong Huang -
2023 : Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers »
Tim Franzmeyer · Stephen Mcaleer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2023 : Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making »
Yuancheng Xu · Chenghao Deng · Yanchao Sun · Ruijie Zheng · xiyao wang · Jieyu Zhao · Furong Huang -
2023 : Reviving Shift Equivariance in Vision Transformers »
Peijian Ding · Davit Soselia · Thomas Armstrong · Jiahao Su · Furong Huang -
2023 : C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder »
Xiaoyu Liu · Jiaxin Yuan · Bang An · Yuancheng Xu · Yifan Yang · Furong Huang -
2023 : Language Models can Solve Computer Tasks »
Geunwoo Kim · Pierre Baldi · Stephen Mcaleer -
2023 : C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder »
Xiaoyu Liu · Jiaxin Yuan · Bang An · Yuancheng Xu · Yifan Yang · Furong Huang -
2023 : Mental Calibration: Discovering and Adjusting for Latent Factors Improves Zero-Shot Inference of CLIP »
Bang An · Sicheng Zhu · Michael-Andrei Panaitescu-Liess · Chaithanya Kumar Mummadi · Furong Huang -
2023 : Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Mengdi Wang -
2023 Poster: Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing »
Xiangyu Liu · Kaiqing Zhang -
2023 Poster: MANSA: Learning Fast and Slow in Multi-Agent Systems »
David Mguni · Haojun Chen · Taher Jafferjee · Jianhong Wang · Longfei Yue · Xidong Feng · Stephen Mcaleer · Feifei Tong · Jun Wang · Yaodong Yang -
2023 Poster: Regret-Minimizing Double Oracle for Extensive-Form Games »
Xiaohang Tang · Le Cong Dinh · Stephen Mcaleer · Yaodong Yang -
2023 Poster: A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems »
Oliver Slumbers · David Mguni · Stefano Blumberg · Stephen Mcaleer · Yaodong Yang · Jun Wang -
2023 Poster: Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games »
Ioannis Anagnostides · Gabriele Farina · Tuomas Sandholm -
2023 Poster: STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Mengdi Wang · Furong Huang · Dinesh Manocha -
2023 Poster: Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy »
xiyao wang · Wichayaporn Wongkamjan · Ruonan Jia · Furong Huang -
2023 Poster: Team Belief DAG: Generalizing the Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria via Regret Minimization »
Brian Zhang · Gabriele Farina · Tuomas Sandholm -
2023 Poster: Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator »
Sicheng Zhu · Bang An · Furong Huang · Sanghyun Hong -
2022 Poster: On Last-Iterate Convergence Beyond Zero-Sum Games »
Ioannis Anagnostides · Ioannis Panageas · Gabriele Farina · Tuomas Sandholm -
2022 Poster: Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks »
Litian Liang · Yaosheng Xu · Stephen Mcaleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2022 Spotlight: Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks »
Litian Liang · Yaosheng Xu · Stephen Mcaleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2022 Spotlight: On Last-Iterate Convergence Beyond Zero-Sum Games »
Ioannis Anagnostides · Ioannis Panageas · Gabriele Farina · Tuomas Sandholm -
2022 Poster: Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework »
Jiahao Su · Wonmin Byeon · Furong Huang -
2022 Spotlight: Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework »
Jiahao Su · Wonmin Byeon · Furong Huang -
2021 Poster: Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results »
Gabriele Farina · Andrea Celli · Nicola Gatti · Tuomas Sandholm -
2021 Spotlight: Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results »
Gabriele Farina · Andrea Celli · Nicola Gatti · Tuomas Sandholm -
2020 Poster: Refined bounds for algorithm configuration: The knife-edge of dual class approximability »
Nina Balcan · Tuomas Sandholm · Ellen Vitercik -
2020 Poster: Sparsified Linear Programming for Zero-Sum Equilibrium Finding »
Brian Zhang · Tuomas Sandholm -
2020 Poster: Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination »
Somdeb Majumdar · Shauharda Khadka · Santiago Miret · Stephen Mcaleer · Kagan Tumer -
2020 Poster: Stochastic Regret Minimization in Extensive-Form Games »
Gabriele Farina · Christian Kroer · Tuomas Sandholm -
2019 Poster: Deep Counterfactual Regret Minimization »
Noam Brown · Adam Lerer · Sam Gross · Tuomas Sandholm -
2019 Poster: Stable-Predictive Optimistic Counterfactual Regret Minimization »
Gabriele Farina · Christian Kroer · Noam Brown · Tuomas Sandholm -
2019 Poster: Regret Circuits: Composability of Regret Minimizers »
Gabriele Farina · Christian Kroer · Tuomas Sandholm -
2019 Oral: Deep Counterfactual Regret Minimization »
Noam Brown · Adam Lerer · Sam Gross · Tuomas Sandholm -
2019 Oral: Stable-Predictive Optimistic Counterfactual Regret Minimization »
Gabriele Farina · Christian Kroer · Noam Brown · Tuomas Sandholm -
2019 Oral: Regret Circuits: Composability of Regret Minimizers »
Gabriele Farina · Christian Kroer · Tuomas Sandholm -
2018 Poster: Learning to Branch »
Nina Balcan · Travis Dick · Tuomas Sandholm · Ellen Vitercik -
2018 Oral: Learning to Branch »
Nina Balcan · Travis Dick · Tuomas Sandholm · Ellen Vitercik -
2018 Tutorial: Machine Learning in Automated Mechanism Design for Pricing and Auctions »
Nina Balcan · Tuomas Sandholm · Ellen Vitercik -
2017 Poster: Regret Minimization in Behaviorally-Constrained Zero-Sum Games »
Gabriele Farina · Christian Kroer · Tuomas Sandholm -
2017 Poster: Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning »
Noam Brown · Tuomas Sandholm -
2017 Talk: Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning »
Noam Brown · Tuomas Sandholm -
2017 Talk: Regret Minimization in Behaviorally-Constrained Zero-Sum Games »
Gabriele Farina · Christian Kroer · Tuomas Sandholm