Scaling Multi-Agent Environment Co-Design with Diffusion Models
Abstract
The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance, promising to fundamentally reshape how we deploy multi-agent systems in domains such as warehouse logistics and windfarm management. However, current co-design methods collapse under high dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address this by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework incorporating two core innovations. We introduce Projected Universal Guidance (PUG), enabling exploration of constraint-satisfying reward-maximising environments, and devise a critic distillation mechanism to transfer knowledge from the reinforcement learning loop to a guided diffuision model. Together, these improvements lead to superior environment-policy pairs when validated on challenging multi-agent co-design benchmarks, for example, exceeding state-of-the art in a warehouse setting with 39% higher rewards and 66% fewer simulation steps.