Poster
in
Workshop: Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
LLM3: Large Language Model-based Task and Motion Planning with Motion Failure Reasoning
Shu Wang · Muzhi Han · Ziyuan Jiao · Zeyu Zhang · Ying Nian Wu · Song-Chun Zhu · Hangxin Liu
Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in ad- dressing emerging tasks in real-world settings. Here, we present LLM3, a novel multi-modal foundation model TAMP framework featuring a domain-independent interface. Specifically, we leverage the powerful reasoning and planning capabilities of foundation models to propose symbolic action sequences and select continuous action parameters for motion planning. Through a series of simulations in a box-packing domain, we quantitatively demonstrate the effectiveness of our method. Ablation studies underscore the significant contribution of motion failure reasoning to the success of LLM3. Furthermore, we conduct qualitative experiments on a physical manipulator, demonstrating the practical applicability of our approach in real-world settings. Code is available: https://github.com/AssassinWS/LLM-TAMP.