Poster
in
Workshop: RLxF: RL from World Feedback Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Adaptive Action Chunking Strategy from World Feedback in Mixed Traffic

Hongki Kim ⋅ Sangeun Park ⋅ Minhae Kwon

Project Page

Abstract

Learning from real-world feedback requires policies that can act on measurable interaction outcomes rather than human preference signals alone. In sequential decision-making tasks, such feedback often depends on consequences that unfold over multiple future timesteps under uncertainty. We propose $\textbf{UA2C}$, an uncertainty-aware adaptive action chunking framework from world feedback in mixed traffic. UA2C first learns a flow-matching chunk policy from offline data and then refines the policy through online interaction. To account for behaviorally diverse surrounding vehicles, UA2C incorporates a driving-style inference module that augments the policy with local behavior context. UA2C estimates uncertainty from sampled action chunks and executes only a reliable prefix before replanning. Experiments show that UA2C improves the driving performance over a one-step method and fixed chunk execution.