Is Code Better Than Language for Algorithmic Reasoning?
Terry Tong ⋅ Yu Feng ⋅ Surbhi Goel ⋅ Dan Roth
Abstract
Large language models can solve algorithmic problems either through direct natural language (NL) reasoning or by generating executable code delegated to an external solver. However, little progress has been made on **understanding why**. Comparing NL reasoning and solver-based pipelines directly is ill-posed: they differ simultaneously in representation space and execution mechanism. We introduce a three-route framework that makes this comparison tractable by introducing an intermediary step---code generation with LLM-based execution. This enables our empirical analysis, which shows a statistically significant gap supporting code $>$ NL by +28.9\% across 48 different algorithmic tasks and 6 models. A statistical analysis indicates that natural-language reasoning does not provide additional decision-relevant information beyond what is already captured by code representations. Consequently, replacing NL traces with code traces incurs minimal performance loss while enabling deterministic execution. A systematic comparison of LLM-based reasoning and external execution further shows that execution, rather than trace representation, is the primary performance bottleneck.
Successful Page Load