Poster Mon, Jul 6, 2026 • 10:00 PM – 11:45 PM PDT HALL A #2811

Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space

Houjun Liu ⋅ Shikhar Murty ⋅ Christopher Manning ⋅ Róbert Csordás

Abstract

Current approaches for scaling inference-time compute in transformers train them to emit explicit chain-of-thought tokens before producing an answer. While these methods are powerful, they are limited because they cannot be applied during pretraining and rely solely on serially-generated, natural-language verbalization. In this work, we propose Thoughtbubbles, a transformer variant that natively performs parallel adaptive computation in latent space by learning to fork or delete residual streams. Thus, tokens requiring more computation can form a "bubble" of cloned residuals in the middle of the network. Crucially, this behavior is learned during pretraining with only language modeling loss. Using half of the training budget, Thoughtbubbles outperforms the perplexity and zero-shot evals of both standard decoder LMs and those using non-adaptive parallel computation approaches. These results hold across model sizes from 150M to 1.9B. Thoughtbubbles achieves competitive GSM8K results using half of the baseline's token budget. The implicit nature of our method enables models to begin learning adaptive computation at pretraining time, paving the way to unified train-time and test-time scaling behaviors.

Lay Summary

Current language models must be taught how to think step by step with human-written step-by-step thinking instructions. These instructions are difficult to write and not scalably producible. We introduce Thoughtbubbles, a new variant of traditional decoder language models that can learn to think adaptively on their own, by opening multiple parallel threads of computation. We show that this new approach can achieve on-par performance than a normal language model, even when its only trained on only half of the amount of input data. When trained on the same amount of input data, our approach outperforms a normal language model. This "parallel thinking" property means that future implementations of adaptive computation is possible without the need to write step-by-step instructions at post-training time.