Distilling Task-Level Coordination Policies for Generalizable Multi-Agent Cooperation
Abstract
Large language models have shown strong reasoning abilities and are increasingly explored as high-level coordinators for multi-agent systems. However, directly deploying LLMs for coordination remains challenging, as effective policies often fail to reliably emerge at the low-level control level, and inference costs limit scalability. We propose SynCoord (Synthetic Coordination Distillation), a self-supervised pipeline that distills task-level decision-making for cooperation from high-capacity reasoning models into lightweight agent policies. Our approach does not rely on explicit supervision or handcrafted coordination rules. Instead, we define a set of task-level tool interfaces that constrain LLM interaction and enable the collection of interaction trajectories, which are then used to train compact coordinated policies. This distillation process transfers coordination behaviors that are difficult to elicit through prompting alone, while substantially reducing inference overhead at execution time. We evaluate our method in the multi-agent cooperation benchmark Overcooked-AI with varying team sizes and environment layouts. Experimental results show that the distilled policies achieve success rates and efficiency comparable to reinforcement learning–based methods, while exhibiting fewer erroneous or redundant actions and generalizing across team sizes without retraining.