Poster Tue, Jul 15, 2025 • 11:00 AM – 1:30 PM PDT

Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning

Brett Barkley · David Fridovich-Keil

[ OpenReview]

Abstract

Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in control tasks in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process --- the backbone of Dyna-style algorithms --- significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.

Lay Summary

Many AI systems learn by trial and error, using simulations to practice before making real-world decisions. A popular technique to speed up this learning process is to let algorithms imagine "what-if" scenarios using a model of the world. This idea, called model-based reinforcement learning, is supposed to make learning more efficient by generating synthetic training data.However, our research found that this approach doesn’t always work as expected. We compared its performance on two popular testing platforms for robotic control tasks, OpenAI Gym and DeepMind Control Suite, which have similar physics and task types. Surprisingly, model-based methods performed well in Gym but often failed in the DeepMind environments.We investigated why this gap exists and found that adding these "what-if" experiences - the core idea of this technique - can sometimes hurt performance. Our findings challenge the assumption that model-based learning is always a method to improve efficiency and highlight the need for more robust techniques that work consistently across different environments.

Video

Chat is not available.