Position: AI Evaluation Should Work With Humans
Jan Kulveit ⋅ Gavin Leech ⋅ Tomáš Gavenčiak ⋅ Raymond Douglas
Abstract
We argue that the dominant paradigm of AI evaluation, which focuses on autonomous superhuman performance and so an implicit goal of replacing humans, is guiding AI development in the wrong direction. Instead, the AI community should pivot to evaluating the performance of human–AI teams. We argue that this collaborative shift in evaluation will foster AI systems that act as true complements to human capabilities and therefore lead to far better societal outcomes than the current process.
Successful Page Load