Poster
in
Workshop: RLxF: RL from World Feedback Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Can We Really Learn One Representation to Optimize All Rewards?

Chongyi Zheng ⋅ Royina Karegoudra Jayanth ⋅ Benjamin Eysenbach

Project Page

Abstract

As machine learning has moved towards leveraging large models as priors for downstream tasks, the community has debated the right form of prior for solving reinforcement learning (RL) problems. If one were to try to prefetch as much computation as possible, they would attempt to learn a prior over the policies for some yet-to-be-determined reward function. Recent work (forward-backward (FB) representation learning) has tried this, arguing that an unsupervised representation learning procedure can enable optimal control over arbitrary rewards without further fine-tuning. However, FB's training objective and learning behavior remain mysterious. In this paper, we shed light on FB by formally contextualizing the method within a broader class of recent methods that use classification to obtain a low-rank approximation of a successor measure ratio. Our analysis clarifies when such low-rank approximations can exist and how it converges in practice. Our analysis suggests a simplified unsupervised pre-training method for RL more amenable to theoretical analysis. The proposed method, **one-step forward-backward representation learning (one-step FB)**, serves as a stable plug-and-play alternative to FB for RL practitioners. Experiments in didactic settings, as well as in $10$ state-based and image-based continuous control domains, demonstrate that one-step FB converges to desired representations with $10^5 \times$ smaller errors than FB and improves zero-shot performance by $+24\\%$ on average. We also demonstrate that zero-shot policies inferred by one-step FB provide an efficient initialization if the user prefers further fine-tuning on downstream tasks.