Poster
in
Workshop: Exploration in AI Today (EXAIT)

Llama-Nemotron: Efficient Reasoning Models

Soumye Singhal · Jiaqi Zeng · Alexander Bukharin · Yian Zhang · Gerald Shen · Ameya Mahabaleshwarkar · Bilal Kartal · Yoshi Suhara · Akhiad Bercovich · Itay Levy · Izik Golan · Mohammed Dabbah · Ran El-Yaniv · Somshubra Majumdar · Igor Gitman · Evelina Bakhturina · Jimmy Zhang · Bor-Yiing Su · Guyue Huang · Izzy Putterman · Mostofa Patwary · Oluwatobi Olabiyi · Olivier Delalleau · Bryan Catanzaro · Boris Ginsburg · Oleksii Kuchaiev · Tugrul Konuk

Keywords: LLMs Reinforcement learning RLVR AI agents and AI alignment

Project Page [ Slides] [ OpenReview]

Abstract

We introduce the Llama-Nemotron series: open, heterogeneous reasoning models in three sizes—Nano (8B), Super (49B), and Ultra (253B)—that deliver strong reasoning, inference efficiency, and a permissive license. Our training pipeline combines neural architecture search, knowledge distillation, and a reasoning-focused post-training stage with supervised fine-tuning and large-scale reinforcement learning. Our large-scale RL training leverages exploration-driven curriculum and data filtering strategies to systematically challenge the model with increasingly difficult reasoning tasks, enabling it to discover and refine complex problem-solving chains beyond the capabilities of supervised learning. This approach allows the model to autonomously explore new reasoning strategies and surpass teacher performance on challenging benchmarks. Ultra achieves significantly higher GPQA accuracy and outperforms DeepSeek-R1 and other open models on key reasoning tasks. Llama-Nemotron models are also the first open-source models to support a dynamic reasoning toggle. We open-source all data, models, and code to support open research.

Chat is not available.