Position: ICML Should Treat Hosted LLM APIs as Versioned Dependencies and Require Drift-Audit Artifacts
Abstract
This position paper argues that ICML should require a minimal drift-audit artifact for papers whose main claims materially rely on hosted LLM APIs. Hosted APIs can change behavior over time, undermining the scientific interpretability of results even when evaluation code and prompts are held fixed. While existing proposals address API contracts and change reporting, there is not yet a widely adopted, venue-aligned standard for attaching a minimal drift-audit artifact to results that rely on hosted endpoints. The paper proposes a lightweight artifact consisting of a small suite of invariant-checking probes (e.g., schema, tool-call, or refusal invariants), machine-readable provenance metadata, and a rerun script that can detect and characterize post-publication behavioral drift at bounded cost. It further argues that provider-side behavioral versioning and machine-readable changelogs are enabling infrastructure that would make drift-aware reporting more reliable and less burdensome. The paper concludes with concrete actions for conferences, providers, and tool builders, and with falsifiable predictions about improved replication stability and reduced time-to-diagnosis when results stop reproducing.