Skip to yearly menu bar Skip to main content


Shopify

Expo Talk Panel

Model Optimization Flywheel: Continuously Self-Improving LLMs in Production

Andrew McNamara ⋅ Cody Mazza-Anthony ⋅ Shuying Sun

HALL D1
[ ]
Mon 6 Jul 11:30 a.m. KST — 12:30 p.m. KST

Abstract:

We present Shopify's Model Optimization Flywheel, a practical methodology for turning frontier-quality LLM behavior into faster, cheaper, and continuously improving production systems. The flywheel starts with reliable evaluation: LLM-as-judge evaluators grounded in human-labeled data become the canonical metrics for prompt optimization, distillation, and production regressions.
Using Tangle-powered experimentation workflows, we optimize frontier-model system prompts, collect training data from production A/B traffic and synthetic merchant/user rollouts, and distill smaller models with SFT, on-policy distillation, and GRPO. These models can replicate, and in some cases exceed, frontier-model behavior at much lower serving cost. We then compress prompts with gist tokens to reduce context overhead and improve latency.After deployment, the loop continues by sampling low-scoring production conversations, using stronger reasoning models to critique and "heal" them, folding repaired examples back into training, and re-running distillation. This flywheel has reduced serving cost and latency while improving production quality. We will share concrete recipes, quality-cost-latency trade-offs, and a blueprint for building self-improving LLM systems that get better and cheaper over time.

Live content is unavailable. Log in and register to view live content