Poster
in
Workshop: RLxF: RL from World Feedback Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Incentivizing Exploration with Returning Agents: Bandits with Decentralized Feedback for Invasive Species Removal

Icey Siyi Ai ⋅ Lily Xu

Project Page

Abstract

We study a principal–agent online learning problem motivated by invasive species management in the Florida Everglades, in which a central agency coordinates a small pool of contractors to capture invasive Burmese pythons. Unlike prior work on incentivizing exploration, which assumes the principal and agents share a common history, our setting features decentralized feedback: contractors only access their own private survey histories but the principal observes the joint history, creating a persistent information mismatch between the principal's global view and each agent's local belief. We formalize this setting and show that always bonusing the agent to follow the principal's preferred action can lead to overpay: in many cases, simply waiting for the agent's continued sampling to flip their preference resolves the disagreement for free. We propose an adaptive flip-time policy that randomizes between waiting and bonusing and prove a $\widetilde{O}(\sqrt{NKT/\varepsilon})$ compensation upper bound for the always-bonus baseline, exposing a new disagreement term absent from single-history analyses. We further state two conjectures characterizing the improvement under our adaptive policy. Empirical results against three baselines support our conjectures: our adaptive flip-time policy achieves near-optimal regret while reducing cumulative payments substantially relative to always bonusing.