Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems

Analyzing the Sample Complexity of Model-Free Opponent Shaping

Kitty Fung · Qizhen Zhang · Christopher Lu · Timon Willi · Jakob Foerster


Abstract: In mixed-incentive multi-agent environments, methods developed for zero-sum games often yield collectively sub-optimal results. Addressing this, \textit{opponent shaping} (OS) strategies aim to actively guide the learning processes of other agents, empirically leading to enhanced individual and group performances. Early OS methods use higher-order derivatives to shape the learning of co-players, making them unsuitable to anticipate multiple learning steps ahead. Follow-up work Model-free Opponent Shaping (M-FOS) addresses the shortcomings of earlier OS methods by reframing the OS problem into a meta-game. In the meta-game, the meta-step corresponds to an episode of the ``inner'' game. The OS meta-state corresponds to the inner policies, while the meta-policy outputs an inner policy at each meta-step. Leveraging model-free optimization techniques, M-FOS learns meta-policies that demonstrate long-horizon opponent shaping, e.g., by discovering a novel extortion strategy in the Iterated Prisoner's Dilemma (IPD). In contrast to early OS methods, there is little theoretical understanding of the M-FOS framework. In this work, we derive the sample complexity bounds for the M-FOS agents theoretically and empirically. To quantify the sample complexity, we adapt the $R_{max}$ algorithm, most prominently used to derive sample bounds for MDPs, as the meta-learner in the M-FOS framework and derive an exponential sample complexity. Our theoretical results are empirically supported in the Matching Pennies environment.

Chat is not available.