Llama-Nemotron: Efficient Reasoning Models
Abstract
We introduce the Llama-Nemotron series: open, heterogeneous reasoning models in three sizes—Nano (8B), Super (49B), and Ultra (253B)—that deliver strong reasoning, inference efficiency, and a permissive license. Our training pipeline combines neural architecture search, knowledge distillation, and a reasoning-focused post-training stage with supervised fine-tuning and large-scale reinforcement learning. Our large-scale RL training leverages exploration-driven curriculum and data filtering strategies to systematically challenge the model with increasingly difficult reasoning tasks, enabling it to discover and refine complex problem-solving chains beyond the capabilities of supervised learning. This approach allows the model to autonomously explore new reasoning strategies and surpass teacher performance on challenging benchmarks. Ultra achieves significantly higher GPQA accuracy and outperforms DeepSeek-R1 and other open models on key reasoning tasks. Llama-Nemotron models are also the first open-source models to support a dynamic reasoning toggle. We open-source all data, models, and code to support open research.