LLMs Can Reason Faster Only If We Let Them
Abstract
Large language models (LLMs) are making inroads into classical AI problems such as automated planning, yet key shortcomings continue to hamper their integration. Chain-of-Thought (CoT) struggles in complex multi-step reasoning, and Tree-of-Thoughts requires multiple queries that increase computational overhead. Recently, Algorithm-of-Thoughts (AoT) have shown promise using in-context examples, at the cost of significantly longer solutions compared to CoT. Aimed at bridging the solution length gap between CoT and AoT, this paper introduces AoT-O3, which combines supervised finetuning on AoT-style plans with a reinforcement learning (RL) framework designed to reduce solution length. The RL component uses a reward model that favors concise, valid solutions while maintaining planning accuracy. Empirical evaluations indicate that AoT-O3 shortens solution length by up to 80\% compared to baseline AoT while maintaining or surpassing prior performance. These findings suggest a promising pathway for more efficient, scalable LLM-based planning.
Lay Summary
Large language models (LLMs) can solve complex problems better when they are guided in smarter ways. The paper introduces a new method called AoT-O3 that helps these models plan more efficiently by giving rewards for shorter, accurate solutions. This approach significantly cuts down on the steps needed to reach a solution—by up to 80\%—without sacrificing quality. As a result, it also reduces energy use and makes AI more scalable and environmentally friendly.