Agent JIT Compilation for Latency-Optimizing Computer-Use Agent Planning and Scheduling
Caleb Winston ⋅ Ron Yifeng Wang ⋅ Azalia Mirhoseini ⋅ Christoforos Kozyrakis
Abstract
Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell", by generating sequences of calls to tools such as click, type, and scroll on a browser. Current CUA implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use. We present agent JIT compilation, an alternative that compiles task descriptions directly into executable code that is free to include LLM calls, tool calls, and parallelization. Our approach comprises three components: (1) JIT-Planner, which generates multiple code plans from a task, validates each against tool specifications, and selects the minimum-cost candidate; (2) JIT-Scheduler, which explores parallelization strategies via Monte Carlo cost estimation from learned latency distributions; and (3) an invariant-enforcing tool protocol specifying pre/postcondition state requirements that reduces the rate of generating plans with incorrect tool use. Evaluation across 5 applications demonstrates that JIT-Planner achieves $10.4\times$ speedup and $+28\%$ accuracy improvement over Browser-Use, while JIT-Scheduler achieves $2.6\times$ speedup and $+9\%$ accuracy improvement over OpenAI CUA.
Successful Page Load