Changing Tunes: A Longitudinal Study of Political Drift in LLMs
Abstract
Large Language Models (LLMs) like ChatGPT, Gemini, and Claude, are increasingly used as sources of information across a variety of topics. These include not only uncontested information (e.g., GDP of a country) but also information of political nature where multiple views might exist (e.g., the effect of tariffs on economy). Therefore, as people increasingly rely on LLMs as sources of information on political topics, it is imperative to investigate whether there is a political drift in their responses over time. In this work, we present a longitudinal study of responses to politically relevant queries derived from real-world regulatory changes. We evaluate frontier LLMs from three major providers (Anthropic, Google and OpenAI) over the course of 36 weeks. Our dataset spans 200 questions from 12 political topics. We track model outputs for these questions at weekly intervals. Our analysis reveals that, while LLMs generally stay neutral, their responses to political questions demonstrate measurable temporal drift along the left- right political spectrum, with an increasing right-ward shift. The magnitude of these shifts, while small overall, is more pronounced for certain topics and models, and often coincides with new version releases. We also observe that over time models show less certainty with increased hedging. Our findings highlight the need for continuous auditing and more transparency in model updates.