Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

FRAME: Framework for Robotic Action and Motion Evaluation

Ameya Wagh ⋅ Vishnu Rudrasamudram

Project Page

Abstract

The rapid emergence of Vision-Language-Action (VLA) models has fundamentally shifted robotics toward end-to-end, generalist architectures capable of complex semantic reasoning. However, the lack of a unified evaluation standard remains a critical bottleneck, as research often relies on disparate metrics that fail to bridge the gap between offline action accuracy and online physical deployment. We introduce FRAME, a comprehensive, open-source evaluation framework built natively on TorchMetrics to facilitate reproducible and scalable robot policy research. FRAME provides a modular taxonomy of 15+ standardized metrics across four critical dimensions: Task Performance, Trajectory Quality, Safety, and Efficiency. Safety metrics (collision rate, obstacle proximity, risk factor) are implemented in the library; their deployment requires contact or proximity sensing signals, which we leave to future hardware-instrumented experiments. We demonstrate the utility of FRAME through an empirical study of four pre-trained policies (Diffusion Policy, VQ-BeT, and ACT) on the PushT manipulation task, and a physical robot evaluation of SmolVLA and $pi_0$ on the SO-101 manipulator. Our analysis reveals significant discrepancies between traditional success rates and trajectory-level quality, highlighting critical failure modes that binary metrics alone cannot capture. By providing interpretable process metrics and trace-level diagnostics, FRAME enables a more nuanced understanding of robot policy performance and establishes a common language for identifying and mitigating failure modes in embodied AI.