ICML Poster Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

Poster

Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

Ashley Edwards · Himanshu Sahni · Rosanne Liu · Jane Hung · Ankit Jain · Rui Wang · Adrien Ecoffet · Thomas Miconi · Charles Isbell · Jason Yosinski

Keywords: [ Deep Reinforcement Learning ] [ Reinforcement Learning ] [ Reinforcement Learning - General ]

[ Abstract ]

Abstract: In this paper, we introduce a novel form of value function,

Q (s, s^{'})

$Q(s, s')$ , that expresses the utility of transitioning from a state

s

$s$ to a neighboring state

s^{'}

$s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.

Chat is not available.