Return-Critic: Bridging Goal Discrepancy for Efficient Visual Reinforcement Learning
Abstract
Sample inefficiency remains a challenge in pixel-based visual reinforcement learning (RL), primarily due to ineffective state representation learning. While recent advances employ auxiliary tasks to improve representation learning, their representation goals (e.g., mask reconstruction, state prediction) are misaligned with the ultimate RL goal of maximizing return, constraining further improvements in representation quality. To achieve efficient visual reinforcement learning, we propose Return-Critic (RC), an auxiliary framework that bridges goal discrepancy by return prediction. RC samples partial frames from an episode, processes them through a shared visual encoder, and employs a lightweight Transformer to predict the episode's return, forcing the encoder to learn return-relevant representation. The attention weights naturally highlight important frames, enabling a key function for prioritized learning. Theoretically, RC can be shown to bridge goal discrepancy, thereby improving representation quality. Extensive experiments on both online (DMControl) and offline (V-D4RL) benchmarks demonstrate that RC significantly enhances the sample efficiency, particularly achieving 68% performance boost on average across nine challenging tasks from DMControl.