Skip to yearly menu bar Skip to main content


Poster

Learning the Target Network in Function Space

Ming Yin · Kavosh Asadi · Shoham Sabach · Yao Liu · Rasool Fakoor


Abstract:

We focus on the task of learning the value function in the approximate reinforcement learning (RL) setting. Existing algorithms solve this task by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the algorithm is designed to maintain an equivalence between the two networks in the function space, which is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based updates significantly improve the performance of deep RL on the Atari benchmark.

Live content is unavailable. Log in and register to view live content