TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation
Kim Yundong ⋅ Heyoung Yang
Abstract
Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. We introduce TRACE (Toulmin-based Reasoning Assessment through Constructive Elements), a metric that analyzes Chain-of-Thought (CoT) reasoning processes. TRACE integrates Toulmin's argumentation theory with Flavell's metacognitive framework to assess reasoning structure. Experiments on 26.3K QA samples across 7 reasoning models show strong correlation with benchmark accuracy (r = 0.74). Furthermore, TRACE is effective as a reinforcement learning reward signal, outperforming accuracy-only baselines. These results suggest that TRACE serves as a complementary metric for evaluating open-ended outputs.
Successful Page Load