Poster Tue, Jul 15, 2025 • 11:00 AM – 1:30 PM PDT

Large Language Models to Diffusion Finetuning

Edoardo Cetin · Tianyu Zhao · Yujin Tang

[ OpenReview]

Abstract

We propose a new finetuning method to provide pre-trained large language models (LMs) the ability to scale test-time compute through the diffusion framework. By increasing the number of diffusion steps, we show our finetuned models achieve monotonically increasing accuracy, directly translating to improved performance across downstream tasks. Furthermore, our finetuned models can expertly answer questions on specific topics by integrating powerful guidance techniques, and autonomously determine the compute required for a given problem by leveraging adaptive ODE solvers. Our method is applicable to any foundation model pre-trained with cross-entropy and does not modify any of its original weights, fully preserving its strong single-step generation capabilities. We show our method can be more effective and is fully compatible with traditional finetuning and search approaches, introducing an orthogonal new direction to unify the strengths of the autoregressive and diffusion frameworks.

Lay Summary

We developed a new method to enhance how large language models (like ChatGPT, Claude, and DeepSeek) answer questions by letting them dynamically increase their processing power at test time. Normally, language models respond instantly or produce longer explanations by formulating their thoughts in plain text. Instead, our technique allows them to gradually refine their answers by going beyond the space of language, much like a person reasoning about a problem beyond saying out loud every word in their thoughts.Using a process called "diffusion," our models steadily improve their accuracy the more computation they use, which can be specified by the user on demand. This means our method could allow users to allocate increasingly more thinking time for tough questions until the model returns an appropriate answer. Our models can even intelligently decide how many thinking steps are necessary automatically, saving resources when simpler questions arise. Overall, our approach helps language models provide smarter answers in a new orthogonal way that provides users with a new level of agency.

Video

Chat is not available.