Poster
in
Workshop: Models of Human Feedback for AI Alignment
Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents
David Hyland · Tomáš Gavenčiak · Lancelot Da Costa · Conor Heins · Vojtech Kovarik · Julian Gutierrez · Michael Wooldridge · Jan Kulveit
We propose a novel framework for modelling strategic interactions between boundedly-rational agents in complex, partially observable environments. Our approach introduces agents that minimize a free-energy functional, capturing the divergence between their beliefs about future trajectories and their preferences, which are represented by a biased probabilistic model. We extend this to multi-agent settings and introduce Free-Energy Equilibria, a new class of game-theoretic solution concepts. We begin by establishing the relationship between Free-Energy Equilibria and existing game-theoretic solution concepts. Then, we propose an approach to studying cooperation by contrasting Free-Energy Equilibria with joint free-energy minimization, extending key concepts from mechanism design. Our framework allows for modelling interactions between agents with varying levels of rationality and biased or incorrect world models, providing insights into human-AI interaction and AI alignment.