Poster
in
Affinity Event: The 6th Muslims in ML (MusIML) Workshop

AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models

Adib Sakhawat ⋅ Fardeen Sadab Anonta ⋅ Tamjid H Fahim

Project Page

Abstract

Evaluating LLM social intelligence requires moving beyond static text toward dynamic interactions. We introduce the Adversarial Resource Extraction Game (AREG), a benchmark operationalizing persuasion and resistance as a multi-turn, zero-sum financial negotiation. A tournament across frontier models reveals that offensive and defensive capabilities are empirically dissociated and weakly correlated ($\rho = 0.33$). While models show a systematic defensive advantage, effectiveness depends heavily on dialogue structure: incremental persuasion outperforms single asks, and verification-seeking defends better than explicit refusal. These findings demonstrate that social influence is not a monolithic capability, highlighting the need for dual-sided evaluation to uncover asymmetric behavioral vulnerabilities.