Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

A Spectral Approach to Off-Policy Evaluation for POMDPs

Yash Nair · Nan Jiang


We consider the off-policy evaluation problem in POMDPs. Prior work on this problem uses a causal identification strategy based on one-step observable proxies of the hidden state (Tennenholtz et al., 2020a). In this work, we relax the assumptions made in the prior work by using spectral methods. We further relax these assumptions by extending one-step proxies into the past. Finally, we derive an importance sampling algorithm which assumes rank, distinctness, and positivity conditions on certain probability matrices, and not on sufficiency conditions of observable trajectories with respect to the reward and hidden state structure required in the prior work.

Chat is not available.