Characterizing Agents in Production
Abstract
LLM-based agents already operate in production across many industries, yet we lack a clear understanding of which technical methods make these deployments successful. We present the first systematic study of Characterizing Agents in Production (CAP) using first-hand data from agent developers. We conducted 20 in-depth case studies through interviews and surveyed 306 practitioners across 26 domains. We examine why organizations build agents, how they build them, how they evaluate them, and the key challenges they face in deployment. Our findings show that production agents rely on simple, controllable approaches: 68% execute at most 10 steps before human intervention, 70% rely on prompting off-the-shelf models rather than weight tuning, and 74% depend primarily on human evaluation. Reliability—defined as consistent correct behavior over time—emerges as the dominant challenge, which practitioners address through system-level design choices. CAP documents the current state of production agents, providing the research community with visibility into real-world deployment practices and underexplored research opportunities.