Can LLM Agents Stick to the Script? Modeling Commitment in Interactive Narratives
Abstract
The rapid advancement of Large Language Models (LLMs) is revolutionizing AI for Game by enabling open-ended and fluid interactive storytelling. However, existing research has largely overlooked the critical challenge of maintaining logical consistency and narrative integrity against unconstrained user interventions. To address this, we formulate this challenge as \emph{Narrative Commitment Preservation (NCP)}, and take interactive narrative as our testbed. We introduce NCP-Bench, a benchmark of 100 narrative environments derived from movie synopses. Each environment includes a structured narrative specification (trajectory, commitments, and initial facts) that we can reliably check throughout the interaction between player and storyteller. Experiments across state-of-the-art LLMs reveal that high linguistic quality does not guarantee commitment preservation, even strong models frequently generate logically conflicting content under adversarial interventions, with the best-performing model (GPT-5.2) achieving only 40\% survival rate after 20 turns and fact conflicts occurring in 40\%--68\% of all interactions.