NeurVLA: Unleashing Failure-Handling Capability of Vision-Language-Action Models via Neural-Symbolic Reasoning
Abstract
Vision-Language-Action models have recently shown promising progress in embodied robotic manipulation, yet their generalization to diverse open-ended embodied tasks is often hindered by execution failures. While prior work has explored failure handling, existing approaches still suffer from two fundamental limitations: coarse-grained failure correction and unreliable failure prevention. These limitations lead to brittle decision-making when VLA models are deployed in novel tasks and environments. To address them, we propose NeurVLA, a neural-symbolic framework that jointly addresses failure correction and prevention via neural-symbolic reasoning and further internalizes these failure-handling capabilities into VLA models. Experiments demonstrate that NeurVLA achieves strong performance and robust generalization across diverse tasks. Code is provided in the supplementary material.