Impact of Connectivity on Laplacian Representations in Reinforcement Learning
Abstract
Learning state representations in Markov Decision Processes (MDPs) has proven crucial for addressing the curse of dimensionality in large-scale reinforcement learning (RL) problems. A widely recognized approach exploits structural priors on the MDP by constructing state representations as linear combinations of the state-graph Laplacian eigenvectors. When there is no prior knowledge of the graph, or the state space is too large, the spectral features can be estimated in a model-free fashion. In this work, we prove an upper bound on the approximation error of linear value function approximation based on learned spectral features. We show how this error scales with the algebraic connectivity of the state-graph, making it interpretable in terms of the MDP topology. We also bound the error arising from the estimation of the spectral features themselves, providing an all-round view of the representation learning pipeline. Additionally, we provide a new expression of the Laplacian operator in the RL setting that clarifies some subtle ambiguities in the literature. Our results apply to general (non-uniform) policies without making any assumptions on the symmetry of the induced transition kernel. We validate our theoretical findings with numerical simulations on gridworld environments.