Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Bagged Critic for Continuous Control

Payal Bawa


Actor-critic methods have been successfully applied to several high dimensional continuous control tasks. Despite their success, they are prone to overestimation bias that leads to sub-optimal policies and divergent behaviour. Algorithms like TD3 and Soft Actor Critic (SAC) address overestimation bias by employing twin Q functions and optimizing the policy with respect to the lower bound of the two Q functions. Although this resolves the issue of overestimation bias, it inadvertently introduces underestimation bias. Underestimation bias, though not as problematic as overestimation bias, affects the asymptotic performance of the algorithms. To address both overestimation and underestimation bias in actor critic methods, we propose Bagged Critic for Continuous Control (BC3). BC3 uses an ensemble of independently trained Q functions as critic to address estimation biases. We present theoretical bounds on the biases and convergence analysis of our method demonstrating its benefits. The empirical results on several challenging reinforcement learning benchmarks substantiate our theoretical analysis and demonstrate reduction in biases with overall more robust policies.

Chat is not available.