Skip to yearly menu bar Skip to main content


Poster Wed, Jul 8, 2026 • 1:00 AM – 2:45 AM PDT HALL A #4103

Evaluating LLMs When They Do Not Know the Answer: Statistical Evaluation of Mathematical Reasoning via Comparative Signals

Zihan Dong ⋅ Zhixian Zhang ⋅ Yang Zhou ⋅ Can Jin ⋅ Ruijia Wu ⋅ Linjun Zhang

Abstract

Log in and register to view live content