Constructing Basis Functions from Directed Graphs for Value Function Approximation

Constructing Basis Functions from Directed Graphs for Value Function Approximation
Jeffrey Johns - University of Massachusetts Amherst, U.S.A. Sridhar Mahadevan - University of Massachusetts Amherst, U.S.A.
Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This paper explores the properties of bases created from directed graphs which are a more natural fit for expressing state connectivity. Digraphs capture the effect of non-reversible MDPs whose value functions may not be smooth across adjacent states. We provide an analysis using the Dirichlet sum of the directed graph Laplacian to show how the smoothness of the basis functions is affected by the graph's invariant distribution. Experiments in discrete and continuous MDPs with nonreversible actions demonstrate a significant improvement in the policies learned using directed graph bases.

Jeffrey Johns - University of Massachusetts Amherst, U.S.A.
Sridhar Mahadevan - University of Massachusetts Amherst, U.S.A.

Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This paper explores the properties of bases created from directed graphs which are a more natural fit for expressing state connectivity. Digraphs capture the effect of non-reversible MDPs whose value functions may not be smooth across adjacent states. We provide an analysis using the Dirichlet sum of the directed graph Laplacian to show how the smoothness of the basis functions is affected by the graph's invariant distribution. Experiments in discrete and continuous MDPs with nonreversible actions demonstrate a significant improvement in the policies learned using directed graph bases.