Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

Accelerating Q-learning through Efficient Value-sharing across Actions

Prabhat Nagarajan ⋅ Brett Daley ⋅ Martha White ⋅ Marlos C. Machado

Abstract

Learning action-values efficiently is central to reinforcement learning (RL), as they underpin many control algorithms such as Q-learning. However, action-value learning can be slow, requiring many updates to move values from their initialization, typically near zero, to their true values, which may be far from zero. Moreover, action-value learning algorithms typically update each state–action pair independently, without learning shared value structure across actions within a state. In this paper, we address these inefficiencies by introducing the mean-expansion transformation, which accelerates action-value learning by sharing values across actions within a state and by changing the problem from directly learning potentially large action-values to learning a lower-norm representation of them. In deep RL, this transformation can be applied as a parameter-free modification to Q-network architectures without altering the underlying algorithm. Empirically, we show that it improves DQN's performance in aggregate across 57 Atari games while increasing action gaps and dramatically reducing value overestimation.