Goal-Conditioned Agents that Learn Everything All at Once
Abstract
A goal-conditioned reinforcement learning agent acting in an environment will see a wealth of information throughout a trajectory, most of which is discarded when only considering the trajectory with respect to a single goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a 250x speed-up compared to all-goals relabelling. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code at https://anonymous.4open.science/r/CraftaxGC-D3E1.