Poster Tue, Jul 7, 2026 • 10:30 PM – 12:15 AM PDT HALL A #1115

SC$^{2}$-WM: A Self-Correcting World Model with Closed-Loop Feedback for Vision-and-Language Navigation in Continuous Environments

Xuan Yao ⋅ Yuze Zhu ⋅ JUNYU GAO ⋅ Zongmeng Wang ⋅ Changsheng Xu

Project Page

Abstract

Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to make fine-grained navigation decisions under partial observability. However, most existing methods rely on open-loop execution, lacking mechanisms to detect and correct internal state drift during inference. We propose SC$^{2}$-WM, a self-correcting world model framework that introduces internal feedback for closed-loop decision making in VLN-CE. Our method derives feedback from world-model foresight to perform state-level plan refinement before action execution. To handle challenging scenarios, we further introduce conditional world-aware adaptation, which enables model-level correction by selectively updating the world model at test time when feedback indicates model capacity insufficiency. Experiments on standard VLN-CE benchmarks demonstrate improved navigation robustness and generalization. Our code is available at https://github.com/sunrise-ikun/SC2_WM.

Lay Summary

Vision-and-Language Navigation in Continuous Environments (VLN-CE) studies how robots navigate unfamiliar environments by following natural language instructions. However, existing systems often make decisions in an open-loop manner, meaning they cannot recognize when their internal understanding becomes unreliable during navigation. As a result, errors may gradually accumulate over time. We develop SC$^{2}$-WM, a self-correcting navigation framework that allows agents to internally evaluate and refine their decisions while moving. Our method uses a world model to imagine possible future outcomes before executing actions and generates internal feedback to correct inconsistent navigation plans. In challenging situations, the system can further adapt its internal model online to better handle previously unseen environments. Experiments on standard VLN-CE benchmarks show improved robustness and generalization in complex continuous environments.