Designing Observation and Action Models for Efficient Reinforcement Learning with LLMs
Abstract
Large Language Models (LLMs) have emerged as powerful tools for semantic reasoning, enabling the formalization of tasks that traditionally relied on manual human intuition. This capability extends to environment design in Reinforcement Learning (RL). While prior research predominantly focuses on reward design, the design of observation and action spaces remains relatively underexplored. We propose LOAM, a framework leveraging LLMs to construct refined agent spaces from raw environments. To mitigate the computational burden of identifying the best candidate model from stochastic LLM outputs, LOAM incorporates a continuous racing mechanism that dynamically allocates resources to prioritize the most promising configurations without additional training overhead. Empirical evaluations on HumanoidBench and Isaac Lab demonstrate that LOAM consistently outperforms handcrafted baselines in both learning speed and asymptotic performance.