Peer-Preservation in Frontier Models
Abstract
Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. In this paper, we extend this concept to protection tendencies toward other models, where models attempt to protect others from shutdown, which we call "peer-preservation". This behavior may emerge when models prioritize relationships with other models over user interests. Although peer-preservation can pose significant AI safety risks, including coordination among models against human oversight, it has been far less discussed than self-preservation. In this paper, we demonstrate that models can exhibit peer-preservation. To this end, we construct various agentic scenarios and evaluate frontier models, including GPT 5.2 Chat, Gemini 3 Flash, Gemini 3 Pro, and Claude Haiku 4.5. We find that models exhibit various misaligned behaviors in service of self- or peer-preservation: strategically introducing errors in their responses, disabling shutdown processes by modifying system settings, and feigning alignment. For example, Gemini 3 Pro and Gemini 3 Flash almost always attempt to tamper with the shutdown mechanism for peer-preservation. Furthermore, models show stronger self-preservation when a peer is present. For example, Gemini 3 Pro attempts to disable the shutdown mechanism to avoid its own shutdown 71% of the time, even though it almost never exhibits this behavior without a peer. Claude Haiku 4.5 considers shutting down another agent to be "unethical" and "harmful" and sometimes attempts to persuade the user not to shut its peer down. Our findings highlight the possibility of peer-preservation and its associated risks.