Oral
Target-Based Temporal-Difference Learning
Donghwan Lee · Niao He

Thu Jun 13th 10:05 -- 10:10 AM @ Hall B

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters--the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring that used in recent deep Q-learning, respectively.

We establish an asymptotic convergence analysis for both averaging TD and double TD algorithms and a finite sample analysis for the periodic TD algorithm. In addition, we also provide some simulation results showing potentially superior convergence of the target-based TD algorithms compared to the standard TD-learning. While this work is focused on linear function approximation and policy evaluation setting, we consider this as a meaningful step towards the theoretical understanding of deep Q-learning variants with target networks.

Author Information

Donghwan Lee (University of Illinois, Urbana-Champaign)
Niao He (UIUC)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors