PSBench: Editing Image via GUI Agents in Photoshop
Abstract
Photoshop is a professional image editing software whose complex multi-level menus, fine-grained operations, and layer-based non-destructive editing pose substantial challenges for automated agents. Existing GUI benchmarks and methods primarily target web interfaces and short-horizon, low-complexity tasks, falling short in modeling the multi-step decision-making and semantic understanding required by professional graphic software. We introduce PSBench, the first benchmark specifically designed for image editing in Adobe Photoshop, consisting of 600 human-annotated tasks across three difficulty levels, with tasks drawn from official tutorials and popular real-world workflows. PSBench covers core functionalities such as canvas adjustment, layer manipulation, and filter application, and provides fine-grained evaluation metrics tailored to each task category. Our experiments show that even the state-of-the-art system, Agent S3, achieves a success rate of only 18.09\% on difficult tasks, indicating that GUI agents still face considerable challenges in operating complex professional software. Furthermore, human-in-the-loop evaluations reveal that MLLMs, when serving as interactive assistants, can significantly improve novice users’ task completion rates and reduce operation time.