Skip to yearly menu bar Skip to main content


Turing

Expo Talk Panel

Improving LLM Benchmarks: Making AI Work for Real-World Needs

Jonathan Siddharth

West Meeting Room 220-222
[ ]
Mon 14 Jul 8 a.m. PDT — 9 a.m. PDT

Abstract:

To make AI models truly useful in real-world settings, we need better ways to measure their performance. This talk will focus on how we can improve benchmarks, ensuring LLMs are tested in ways that reflect actual business challenges.

Jonathan will discuss how using real user feedback and industry-specific examples can create more meaningful tests for AI models. We’ll explore ways to measure AI performance based on practical tasks that require applying the model’s conceptual understanding. This will complement the many existing benchmarks that focus on evaluating AI models across a range of conceptual understanding tasks.

By designing evaluation methods that reflect real-world use, we can help bridge the gap between research and business, making AI more effective and reliable in everyday applications.

Live content is unavailable. Log in and register to view live content