Skip to yearly menu bar Skip to main content


Spotlight Poster

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

Hjalmar Wijk ⋅ Tao Lin ⋅ Joel Becker ⋅ Sami Jawhar ⋅ Neev Parikh ⋅ Thomas Broadley ⋅ Lawrence Chan ⋅ Michael Chen ⋅ Joshua Clymer ⋅ Jai Dhyani ⋅ Elena Ericheva ⋅ Katharyn Garcia ⋅ Brian Goodrich ⋅ Nikola Jurkovic ⋅ Megan Kinniment ⋅ Aron Lajko ⋅ Seraphina Nix ⋅ Lucas Jun Koba Sato ⋅ William Saunders ⋅ Maksym Taran ⋅ Ben West ⋅ Elizabeth Barnes
2025 Spotlight Poster

Abstract

Lay Summary

Video

Chat is not available.