Position: Web Agents Should Use Typed Actions Instead of Click-Based Browsing
Abstract
This position paper argues that building a reliable agentic web requires shifting from click-based browsing to typed actions supported by a standardized semantic layer. Today’s agents primarily operate over low-level primitives such as clicks, keystrokes, and DOM manipulation. This reliance leads to brittle long-horizon behavior, high execution cost, and limited auditability. We contend that a semantic layer of typed web actions, analogous to the abstraction provided by high-level programming languages, is necessary for agents to compose reliable workflows from stable, well-specified operations. We recommend Web Verbs as a concrete instantiation of this semantic layer. A verb is a typed, semantically documented function that exposes a site capability through a uniform interface, whether implemented via server APIs or by wrapping robust client-side workflows. Verbs can attach preconditions, postconditions, policy tags, and logging hooks, allowing agents to synthesize concise programs with explicit control and data flow and to produce checkable execution traces. Using representative cross-site case studies, we demonstrate that verb-level composition produces correct, reproducible outcomes, while GUI-level agents often exhibit brittle behavior or incorrect reasoning. We conclude with a call to action on standardization, developer tooling, and community processes needed to make this semantic layer deployable and trustworthy at web scale.