What Preferences Can—and Cannot—Predict in Multi-Agent Online Learning
Abstract
We examine the interplay between ordinal, preference-based solution concepts in games and the outcomes of payoff-driven learning dynamics, asking to what extent the combinatorial data of a game—its preference graph—can predict the long-run behavior of no-regret dynamics such as follow-the-regularized-leader (FTRL). In one direction, we show that the skeleton of every dynamically stable set, i.e., the set of pure profiles it contains, must be preferentially stable, that is, closed under pure profitable deviations. We then ask the converse question: when are preferences sufficient to describe long-run behavior? For subgames—subsets of pure profiles obtained by restricting players’ action sets—preferences are enough to fully characterize asymptotic stability. Beyond subgames however, we construct a three-player counterexample with a preferentially stable set whose span is dynamically unstable, thus establishing that preferences are not sufficient to describe dynamically stable behavior in general. To restore stability, we introduce the notion of leaklessness, a measure of aggregate payoff drift away from a set of pure profiles, and use it to identify a payoff-based condition under which the span of a set of pure profiles remains stable and attracting, thereby setting forth a natural cardinal guarantee of dynamic stability.