Timezone: »

Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang
Event URL: https://openreview.net/forum?id=3jO2Bfhpas »
In real-world reinforcement learning (RL) systems, various forms of impaired observability can complicate matters. These situations arise when an agent is unable to observe the most recent state of the system due to latency or lossy channels, yet the agent must still make real-time decisions. This paper introduces a theoretical investigation into efficient RL in control systems where agents must act with delayed and missing state observations. We establish near-optimal regret bounds, of the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in both the delayed and missing observation settings. Despite impaired observability posing significant challenges to the policy class and planning, our results demonstrate that learning remains efficient, with the regret bound optimally depending on the state-action size of the original system. Additionally, we provide a characterization of the performance of the optimal policy under impaired observability, comparing it to the optimal value obtained with full observability.

Author Information

Minshuo Chen (Princeton University)
Yu Bai (Salesforce Research)
H. Vincent Poor (Princeton University)
Mengdi Wang (Princeton University)

More from the Same Authors