Poster Wed, Jul 8, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

Reinforcement Learning with Action-Triggered Observations

Alexander Ryabchenko ⋅ Wenlong Mou

Abstract

We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a novel reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent reformulation in which, upon each state observation, agents commit to a sequence of actions until the next observation. Under the linear MDP assumption, we show that the resulting action sequence value functions admit linear representations in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $\gamma$ the discount factor (continuation probability), matching the known rate for linear MDPs with full observability.