Timezone: »
One proposed solution towards the goal of designing machines that can extrapolate experience across environments and tasks, are inductive biases. Providing and starting algorithms with inductive biases might help to learn invariances e.g. a causal graph structure, which in turn will allow the agent to generalize across environments and tasks.
While some inductive biases are already available and correspond to common knowledge, one key requirement to learn inductive biases from data seems to be the possibility to perform and learn from interventions. This assumption is partially motivated by the accepted hypothesis in psychology about the need to experiment in order to discover causal relationships. This corresponds to an reinforcement learning environment, where the agent can discover causal factors through interventions and observing their effects.
We believe that one reason which has hampered progress on building intelligent agents is the limited availability of good inductive biases. Learning inductive biases from data is difficult since this corresponds to an interactive learning setting, which compared to classical regression or classification frameworks is far less understood e.g. even formal definitions of generalization in RL have not been developed. While Reinforcement Learning has already achieved impressive results, the sample complexity required to achieve consistently good performance is often prohibitively high. This has limited most RL to either games or settings where an accurate simulator is available. Another issue is that RL agents are often brittle in the face of even tiny changes to the environment (either visual or mechanistic changes) unseen in the training phase.
To build intuition for the scope of the generalization problem in RL, consider the task of training a robotic car mechanic that can diagnose and repair any problem with a car. Current methods are all insufficient in some respect -- on-policy policy gradient algorithms need to cycle through all possible broken cars on every single iteration, off-policy algorithms end up with a mess of instability due to perception and highly diverse data, and model-based methods may struggle to fully estimate a complex web of causality.
In our workshop we hope to explore research and new ideas on topics related to inductive biases, invariances and generalization, including:
- What are efficient ways to learn inductive biases from data?
- Which inductive biases are most suitable to achieve generalization?
- Can we make the problem of generalization in particular for RL more concrete and figure out standard terms for discussing the problem?
- Causality and generalization especially in RL
- Model-based RL and generalization.
- Sample Complexity in reinforcement learning.
- Can we create models that are robust visual environments, assuming all the underlying mechanics are the same. Should this count as generalization or transfer learning?
- Robustness to changes in the mechanics of the environment, such as scaling of rewards.
- Can we create a theoretical understanding of generalization in RL, and understand how it is related to the well developed ideas from statistical learning theory.
- in RL, the training data is collected by the agent and it is affected by the agent's policy.
Therefore, the training distribution is not a fixed distribution. How does this affect how we should think about generalization?
The question of generalization in reinforcement learning is essential to the field’s future both in theory and in practice. However there are still open questions about the right way to think about generalization in RL, the right way to formalize the problem, and the most important tasks. This workshop would help to address this issue by bringing together researchers from different backgrounds to discuss these challenges.
Sat 3:00 a.m. - 3:15 a.m.
|
Opening remarks
|
|
Sat 3:15 a.m. - 4:30 a.m.
|
Poster Session 1
(Poster Session)
|
|
Sat 4:30 a.m. - 5:00 a.m.
|
Invited talk 1 Silver
(Talk)
»
Video
Meta Gradient Reinforcement Learning |
David Silver |
Sat 5:00 a.m. - 5:10 a.m.
|
QA for invited talk 1 Silver
(10min QA)
|
David Silver |
Sat 5:10 a.m. - 5:40 a.m.
|
Invited talk 2 Uhler
(Talk)
»
Video
Multi-Domain Data Integration: From Observations to Mechanistic Insights |
Caroline Uhler |
Sat 5:40 a.m. - 5:50 a.m.
|
QA for invited talk 2 Uhler
(10min QA)
|
Caroline Uhler |
Sat 5:50 a.m. - 6:05 a.m.
|
Automatic Data Augmentation for Generalization in Reinforcement Learning
(Spotlight)
»
Video
http://slideslive.com/38931360 |
Roberta Raileanu |
Sat 6:15 a.m. - 7:30 a.m.
|
Poster Session 2
(Poster Session)
|
|
Sat 7:30 a.m. - 8:10 a.m.
|
Invited talk 3 Yang
(Talk)
»
Video
Augmenting data to improve robustness – a blessing or a curse? |
Fanny Yang |
Sat 8:10 a.m. - 8:40 a.m.
|
Invited talk 4 Bengio
(Talk)
»
Video
System 2 Priors |
Yoshua Bengio |
Sat 8:40 a.m. - 8:50 a.m.
|
QA for invited talk 4 Bengio
(10min QA)
|
Yoshua Bengio |
Sat 8:50 a.m. - 9:05 a.m.
|
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
(Spotlight)
»
Video
http://slideslive.com/38931338 |
Swapnil Asawa, Benjamin Eysenbach |
Sat 9:05 a.m. - 9:15 a.m.
|
QA for invited talk 3 Yang
(10min QA)
|
Fanny Yang |
Sat 9:15 a.m. - 9:45 a.m.
|
Invited talk 5 White
(Talk)
»
Video
A New RNN algorithm using the computational inductive bias of span independence |
Martha White |
Sat 9:45 a.m. - 9:55 a.m.
|
QA for invited talk 5 White
(10min QA)
|
Martha White |
Sat 10:00 a.m. - 10:30 a.m.
|
Invited talk 6 Heess
(Talk)
»
Video
From skills to tasks: Reusing and generalizing knowledge for motor control |
Nicolas Heess |
Sat 10:30 a.m. - 10:40 a.m.
|
QA for invited talk 6 Heess
(10min QA)
|
Nicolas Heess |
Sat 10:40 a.m. - 11:55 a.m.
|
Poster Session 3
(Poster Session)
|
|
Sat 11:55 a.m. - 12:25 p.m.
|
Invited talk 7 Wang
(Talk)
»
Video
Statistical Complexity of RL and the use of regression |
Mengdi Wang |
Sat 12:25 p.m. - 12:35 p.m.
|
QA for invited talk 7 Wang
(10min QA)
|
Mengdi Wang |
Sat 12:35 p.m. - 1:05 p.m.
|
Invited talk 8 Kakade
(Talk)
»
Video
On the Theory of Policy Gradient Methods: Optimality, Generalization and Distribution Shift |
Sham Kakade |
Sat 1:05 p.m. - 1:15 p.m.
|
QA for invited talk 8 Kakade
(10min QA)
|
Sham Kakade |
Sat 1:15 p.m. - 2:15 p.m.
|
Panel Discussion Session
(Discussion Panel)
|
|
-
|
Evaluating Agents without Rewards
(Poster)
»
Video
http://slideslive.com/38931301 |
Danijar Hafner |
-
|
Reinforcement Learning Generalization with Surprise Minimization
(Poster)
»
Video
http://slideslive.com/38931304 |
Jerry Zikun Chen |
-
|
Learning Action Priors for Visuomotor transfer
(Poster)
»
Video
http://slideslive.com/38931326 |
aajay3110 Ajay |
-
|
MOPO: Model-based Offline Policy Optimization
(Poster)
»
Video
http://slideslive.com/38931302 |
Tianhe Yu |
-
|
Meta Attention Networks: Meta Learning Attention To Modulate Information Between Sparsely Interacting Recurrent Modules
(Poster)
»
Video
http://slideslive.com/38931327 |
Kanika Madan |
-
|
Spatially Structured Recurrent Modules
(Poster)
»
Video
http://slideslive.com/38931303 |
Nasim Rahaman |
-
|
Neural Dynamic Policies for End-to-End Sensorimotor Learning
(Poster)
»
Video
http://slideslive.com/38931329 |
Abhinav Gupta |
-
|
Watch your Weight Reinforcement Learning
(Poster)
»
Video
http://slideslive.com/38931330 |
Robert Müller |
-
|
HAT: Hierarchical Alternative Training for Long Range Policy Transfer
(Poster)
»
Video
http://slideslive.com/38931331 |
Min Sun, Wei-Cheng Tseng |
-
|
Learning to Learn from Failures Using Replay
(Poster)
»
Video
http://slideslive.com/38931328 |
Tao Chen |
-
|
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
(Poster)
»
Video
http://slideslive.com/38931362 |
Gerrit Schoettler |
-
|
Bridging Worlds in Reinforcement Learning with Model-Advantage
(Poster)
»
Video
http://slideslive.com/38931333 |
Ashwin Kalyan, Nirbhay Modhe |
-
|
Fighting Copycat Agents in Behavioral Cloning From Multiple Observations
(Poster)
»
Video
http://slideslive.com/38931530 |
Chuan Wen |
-
|
Conditioning of Reinforcement Learning Agents and its Policy Regularization Application
(Poster (5 min))
»
Video
http://slideslive.com/38931436 |
Arip Asadulaev |
-
|
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
(Poster)
»
Video
http://slideslive.com/38931332 |
Karl Pertsch |
-
|
Planning to Explore via Self-Supervised World Models
(Poster)
»
Video
http://slideslive.com/38931334 |
Ramanan Sekar |
-
|
Robust Reinforcement Learning using Adversarial Populations
(Poster)
»
Video
http://slideslive.com/38931336 |
Eugene Vinitsky |
-
|
Structure Mapping for Transferability of Causal Models
(Poster)
»
Video
http://slideslive.com/38931337 |
Purva Pruthi |
-
|
Efficient Imitation Learning with Local Trajectory Optimization
(Poster)
»
Video
http://slideslive.com/38931339 |
Jialin Song |
-
|
Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation
(Poster)
»
Video
http://slideslive.com/38931340 |
Gaurav Sukhatme |
-
|
Exact (Then Approximate) Dynamic Programming for Deep Reinforcement Learning
(Poster)
»
Video
http://slideslive.com/38931341 |
Henrik Marklund |
-
|
Learning Long-term Dependencies Using Cognitive Inductive Biases in Self-attention RNNs
(Poster)
»
Video
http://slideslive.com/38931335 |
Giancarlo Kerg |
-
|
A Differentiable Newton Euler Algorithm for Multi-body Model Learning
(Poster)
»
Video
http://slideslive.com/38931342 |
Michael Lutter |
-
|
Nesterov Momentum Adversarial Perturbations in the Deep Reinforcement Learning Domain
(Poster)
»
Video
http://slideslive.com/38931437 |
Ezgi Korkmaz |
-
|
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
(Poster)
»
Video
http://slideslive.com/38931343 |
Aviral Kumar |
-
|
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
(Poster)
»
Video
http://slideslive.com/38931345 |
Saurabh Kumar, Aviral Kumar |
-
|
Attention Option-Critic
(Poster)
»
Video
http://slideslive.com/38931346 |
Ravi Chunduru |
-
|
Towards Self-Paced Context Evaluation for Contextual Reinforcement Learning
(Poster)
»
Video
http://slideslive.com/38931344 |
Theresa Eimer |
-
|
Group Equivariant Deep Reinforcement Learning
(Poster)
»
Video
http://slideslive.com/38931347 |
Arnab Kumar Mondal |
-
|
Probing Dynamic Environments with Informed Policy Regularization
(Poster)
»
Video
http://slideslive.com/38931348 |
Pierre-Alexan Kamienny |
-
|
Counterfactual Transfer via Inductive Bias in Clinical Settings
(Poster)
»
Video
http://slideslive.com/38931438 |
Taylor Killian |
-
|
Learning Invariant Representations for Reinforcement Learning without Reconstruction
(Poster)
»
Video
http://slideslive.com/38931349 |
Amy Zhang |
-
|
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling
(Poster)
»
Video
http://slideslive.com/38931350 |
Russell Mendonca |
-
|
Towards TempoRL: Learning When to Act
(Poster)
»
Video
http://slideslive.com/38931351 |
André Biedenkapp |
-
|
On the Equivalence of Bi-Level Optimization and Game-Theoretic Formulations of Invariant Risk Minimization
(Poster)
»
Video
http://slideslive.com/38931353 |
Kartik Ahuja |
-
|
PAC Imitation and Model-based Batch Learning of Contextual MDPs
(Poster)
»
Video
http://slideslive.com/38931354 |
Yash Nair |
-
|
Learning Robust Representations with Score Invariant Learning
(Poster)
»
Video
http://slideslive.com/38931352 |
Daksh Idnani |
-
|
Learning Off-Policy with Online Planning
(Poster)
»
Video
http://slideslive.com/38931439 |
Harshit Sikchi |
-
|
Model-based Adversarial Meta-Reinforcement Learning
(Poster)
»
Video
http://slideslive.com/38931355 |
Tengyu Ma, Zichuan Lin |
-
|
If MaxEnt RL is the Answer, What is the Question?
(Poster)
»
Video
http://slideslive.com/38931440 |
Benjamin Eysenbach |
-
|
Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
(Poster)
»
Video
http://slideslive.com/38931356 |
Amy Zhang |
-
|
Discrete Planning with End-to-end Trained Neuro-algorithmic Policies
(Poster)
»
Video
http://slideslive.com/38931357 |
Marin Vlastelica Pogancic |
-
|
Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors
(Poster)
»
Video
http://slideslive.com/38931358 |
Chi Zhang |
-
|
Counterfactual Data Augmentation using Locally Factored Dynamics
(Poster)
»
Video
http://slideslive.com/38931359 |
Silviu Pitis |
-
|
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
(Poster)
»
Video
http://slideslive.com/38931361 |
Ilya Kostrikov |
Author Information
Anirudh Goyal (Université de Montréal)
Rosemary Nan Ke (MILA, University of Montreal)
I am a PhD student at Mila, I am advised by Chris Pal and Yoshua Bengio. My research interest are efficient credit assignment, causal learning and model-based reinforcement learning. Here is my homepage https://nke001.github.io/
Stefan Bauer (Max Planck Institute for Intelligent Systems)
Jane Wang (DeepMind)
Theo Weber (DeepMind)
Fabio Viola (DeepMind)
Bernhard Schölkopf (MPI for Intelligent Systems Tübingen, Germany)
Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.
Stefan Bauer (MPI for Intelligent Systems)
More from the Same Authors
-
2020 Poster: Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules »
Sarthak Mittal · Alex Lamb · Anirudh Goyal · Vikram Voleti · Murray Shanahan · Guillaume Lajoie · Michael Mozer · Yoshua Bengio -
2020 Poster: Small-GAN: Speeding up GAN Training using Core-Sets »
Samrath Sinha · Han Zhang · Anirudh Goyal · Yoshua Bengio · Hugo Larochelle · Augustus Odena -
2020 Poster: Weakly-Supervised Disentanglement Without Compromises »
Francesco Locatello · Ben Poole · Gunnar Ratsch · Bernhard Schölkopf · Olivier Bachem · Michael Tschannen -
2019 Poster: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer -
2019 Poster: Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness »
Raphael Suter · Djordje Miladinovic · Bernhard Schölkopf · Stefan Bauer -
2019 Oral: Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness »
Raphael Suter · Djordje Miladinovic · Bernhard Schölkopf · Stefan Bauer -
2019 Oral: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer -
2019 Poster: Kernel Mean Matching for Content Addressability of GANs »
Wittawat Jitkrittum · Wittawat Jitkrittum · Patsorn Sangkloy · Muhammad Waleed Gondal · Amit Raj · James Hays · Bernhard Schölkopf -
2019 Oral: Kernel Mean Matching for Content Addressability of GANs »
Wittawat Jitkrittum · Wittawat Jitkrittum · Patsorn Sangkloy · Patsorn Sangkloy · Muhammad Waleed Gondal · Muhammad Waleed Gondal · Amit Raj · Amit Raj · James Hays · James Hays · Bernhard Schölkopf · Bernhard Schölkopf -
2019 Poster: An Investigation of Model-Free Planning »
Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap -
2019 Poster: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2019 Poster: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2019 Oral: An Investigation of Model-Free Planning »
Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap -
2019 Oral: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2019 Oral: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2018 Poster: Detecting non-causal artifacts in multivariate linear regression models »
Dominik Janzing · Bernhard Schölkopf -
2018 Poster: On Matching Pursuit and Coordinate Descent »
Francesco Locatello · Anant Raj · Sai Praneeth Reddy Karimireddy · Gunnar Ratsch · Bernhard Schölkopf · Sebastian Stich · Martin Jaggi -
2018 Oral: Detecting non-causal artifacts in multivariate linear regression models »
Dominik Janzing · Bernhard Schölkopf -
2018 Oral: On Matching Pursuit and Coordinate Descent »
Francesco Locatello · Anant Raj · Sai Praneeth Reddy Karimireddy · Gunnar Ratsch · Bernhard Schölkopf · Sebastian Stich · Martin Jaggi -
2018 Poster: Tempered Adversarial Networks »
Mehdi S. M. Sajjadi · Giambattista Parascandolo · Arash Mehrjou · Bernhard Schölkopf -
2018 Poster: Differentially Private Database Release via Kernel Mean Embeddings »
Matej Balog · Ilya Tolstikhin · Bernhard Schölkopf -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Oral: Differentially Private Database Release via Kernel Mean Embeddings »
Matej Balog · Ilya Tolstikhin · Bernhard Schölkopf -
2018 Oral: Tempered Adversarial Networks »
Mehdi S. M. Sajjadi · Giambattista Parascandolo · Arash Mehrjou · Bernhard Schölkopf -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Poster: Learning Independent Causal Mechanisms »
Giambattista Parascandolo · Niki Kilbertus · Mateo Rojas-Carulla · Bernhard Schölkopf -
2018 Poster: Generative Temporal Models with Spatial Memory for Partially Observed Environments »
Marco Fraccaro · Danilo J. Rezende · Yori Zwols · Alexander Pritzel · S. M. Ali Eslami · Fabio Viola -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Been There, Done That: Meta-Learning with Episodic Recall »
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick -
2018 Oral: Been There, Done That: Meta-Learning with Episodic Recall »
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick -
2018 Oral: Learning Independent Causal Mechanisms »
Giambattista Parascandolo · Niki Kilbertus · Mateo Rojas-Carulla · Bernhard Schölkopf -
2018 Oral: Generative Temporal Models with Spatial Memory for Partially Observed Environments »
Marco Fraccaro · Danilo J. Rezende · Yori Zwols · Alexander Pritzel · S. M. Ali Eslami · Fabio Viola -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 Invited Talk: Causal Learning »
Bernhard Schölkopf