Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

Privileged Information Distillation for Language Models

Emiliano Penaloza ⋅ Dheeraj Vattikonda ⋅ Nicolas Gontier ⋅ Alexandre Lacoste ⋅ Laurent Charlin ⋅ Massimo Caccia

Project Page

Abstract

Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, where closed-source systems typically hide their internal reasoning and expose only action trajectories. This breaks standard distillation pipelines, since successful behavior is observable but the reasoning process is not. We introduce π-Distill, a joint teacher–student framework that trains a PI-conditioned teacher and an unconditioned student simultaneously within a single shared-parameter model, enabling the teacher to learn how to use PI while mitigating distribution shift during transfer. We show that π-Distill effectively distills frontier agents using action-only privileged information, matching or outperforming industry-standard pipelines that assume access to full Chain-of-Thought supervision across multiple agentic benchmarks, models, and forms of PI. We complement our results with extensive analysis that characterize what factors enable effective learning with PI.