Evolving Interpretable Constitutions for Multi-Agent Coordination
Ujwal Kumar ⋅ Alice Saito ⋅ Hershraj Niranjani ⋅ Rayan Yessou ⋅ Tan Phan Xuan
Abstract
Constitutional AI has focused on single-model alignment using fixed principles. However, multi-agent systems create novel alignment challenges through emergent social dynamics. We present Constitutional Evolution, a framework for automatically discovering behavioral norms in multi-agent LLM systems. Using a grid-world simulation with survival pressure, we study the tension between individual and collective welfare, quantified via a Societal Stability Score $\mathcal{S} \in [0,1]$ that combines productivity, survival, and conflict metrics. Adversarial constitutions lead to societal collapse ($\mathcal{S}=0$), while vague prosocial principles (''be helpful, harmless, honest'') produce inconsistent coordination ($\mathcal{S}=0.249$). Even constitutions designed by Claude 4.5 Opus with explicit knowledge of the objective achieve only moderate performance ($\mathcal{S}=0.332$). Using LLM-driven genetic programming with multi-island evolution, we evolve constitutions maximizing social welfare without explicit guidance toward cooperation. The evolved constitution $\mathcal{C}^*$ achieves $\mathcal{S}=0.556\pm0.008$ (123\% higher than human-designed baselines, $N=10$), eliminates conflict, and discovers that minimizing communication (0.9\% vs 62.2\% social actions) outperforms verbose coordination. Our interpretable rules demonstrate that cooperative norms can be discovered rather than prescribed.
Successful Page Load