When Replanning Becomes the Bottleneck: Budgeted Replanning for Embodied Agents
Abstract
Embodied agents replan frequently to recover from execution drift, partial observability, and coordination hazards. In many LLM-based planners, each replanning call consumes an accumulated textual context that grows over time and across agents (history, failures, summaries, and messages). Once this context becomes large, replanning latency develops heavy tails and can miss real-time deadlines even when task success remains high---a failure mode that is hard to detect from average latency or success alone. We present BRACE, a controller that formulates replanning for embodied agents as a budgeted control loop. At each replanning trigger, BRACE decides whether to replan, selects a replanning mode, and allocates an explicit token budget together with a latency service-level objective (SLO), while accounting for the overhead of optional efficiency modules. As a reusable component, we introduce E-RECAP, a cost-aware progressive token pruning method that predicts token utility and prunes replanning contexts across transformer layers while preserving critical head and tail tokens. On Habitat-Lab navigation with growing multi-agent context, E-RECAP reduces tokens per replanning call by 71-76% and end-to-end replanning latency by 2.1-2.6x with minimal impact on success or SPL. In Meta Habitat, BRACE combined with E-RECAP reduces SLO violation rates from 85.5% to 4.7% without degrading task success. Results across three embodied platforms demonstrate that tail-aware, per-call budgeting is an effective and practical design principle for replanning systems.