Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Automated Reinforcement Learning: Exploring Meta-Learning, AutoML, and LLMs

Learning In-Context Decision Making with Synthetic MDPs

Akarsh Kumar · Christopher Lu · Louis Kirsch · Phillip Isola

[ ] [ Project Page ]
Sat 27 Jul 1 a.m. PDT — 2 a.m. PDT

Abstract:

Current AI models are trained on huge datasets of real world data.This is increasingly true in RL, with generalist agents being trained on data from hundreds of real environments.It is thought that real data/environments are the only way to capture the intricate complexities of real world RL tasks.In this paper, we challenge this notion by training generalist in-context decision making agents on only data generated by simple random processes.We investigate data generated from eight different families of synthetic environments ranging from Markov chains and bandits to discrete, continuous, and hybrid Markov decision processes (MDPs).Surprisingly, the resulting agents' performances are comparable to agents trained on real environment data.We additionally analyze what properties of the pretraining MDPs are ideal for creating good agents, thus giving RL practitioners insights on choosing which environments to train on.

Chat is not available.