Reinforcement Learning with Action-Triggered Observations
Alexander Ryabchenko ⋅ Wenlong Mou
Abstract
We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a novel reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent reformulation in which, upon each state observation, agents commit to a sequence of actions until the next observation. Under the linear MDP assumption, we show that the resulting action sequence value functions admit linear representations in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $\gamma$ the discount factor (continuation probability), matching the known rate for linear MDPs with full observability.
Successful Page Load