Expo Talk Panel
Improving LLM Benchmarks: Making AI Work for Real-World Needs
Jonathan Siddharth
West Meeting Room 220-222
To make AI models truly useful in real-world settings, we need better ways to measure their performance. This talk will focus on how we can improve benchmarks, ensuring LLMs are tested in ways that reflect actual business challenges.
Jonathan will discuss how using real user feedback and industry-specific examples can create more meaningful tests for AI models. We’ll explore ways to measure AI performance based on practical tasks that require applying the model’s conceptual understanding. This will complement the many existing benchmarks that focus on evaluating AI models across a range of conceptual understanding tasks.
By designing evaluation methods that reflect real-world use, we can help bridge the gap between research and business, making AI more effective and reliable in everyday applications.
Live content is unavailable. Log in and register to view live content