Entropy-informed Decoding: Adaptive Information-Driven Branching
Benjamin Evans ⋅ Sumitra Ganesh ⋅ Leo Ardon
Abstract
Large language models (LLMs) achieve remarkable generative performance, yet their output quality is dependent on the decoding strategy. While sampling-based methods (e.g., top-$k$, nucleus) and search-and-select based methods (e.g., beam search, best-of-$n$, majority voting) can improve upon greedy decoding, both approaches suffer from limitations: sampling commits to a single path, while search often expends excessive computation regardless of task complexity. We introduce **Entropy-informed DEcodiNg** (EDEN), a plug-and-play, model-agnostic decoding framework that adaptively allocates computation based on the model’s own uncertainty, approximating higher width beam search with *fewer generations required*. At each generation step, EDEN estimates the entropy of the output token distribution and adjusts the branching factor monotonically with the entropy, expanding more candidates in high-entropy regions and following a greedier path in low-entropy regions, improving sample efficiency. Experiments across complex tasks, including mathematical reasoning, code generation, and scientific questions, demonstrate that EDEN consistently improves output quality over existing decoding strategies, achieving better trade-offs between accuracy and token generations than fixed beam search approaches. By treating next token selection as a noisy maximisation problem, we prove that branching factors monotone in entropy are guaranteed to find better (i.e. more probable) continuations than any fixed branching factor within the same total computation budget, motivating the dynamic branching.
Successful Page Load