Oral Tue, Jul 7, 2026 • 6:15 PM – 6:30 PM PDT HALL B2

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao ⋅ Akari Asai ⋅ Shannon Shen ⋅ Hamish Ivison ⋅ Varsha Kishore ⋅ Jingming Zhuo ⋅ Xinran Zhao ⋅ Molly Park ⋅ Samuel Finlayson ⋅ David Sontag ⋅ Tyler Murray ⋅ Sewon Min ⋅ Pradeep Dasigi ⋅ Luca Soldaini ⋅ Faeze Brahman ⋅ Scott Yih ⋅ Sherry Wu ⋅ Luke Zettlemoyer ⋅ Yoon Kim ⋅ Hannaneh Hajishirzi ⋅ Pang Wei Koh

Abstract

Deep research agents perform multi-step research to produce long-form, well-attributed answers. However, most open deep research agents are trained on easily verifiable short-form QA tasks via reinforcement learning with verifiable rewards, which does not extend to realistic long-form tasks. We address this with Reinforcement Learning with Evolving Rubrics (RLER), where rubrics are constructed and maintained to co-evolve with the policy model during training. This allows the rubrics to incorporate newly explored information from search and contrasting model responses, enabling better fact checking and more discriminative on-policy feedback. Using RLER, we develop Deep Research Tulu (DR Tulu-8B), the first fully open model that is directly trained for open-ended, long-form deep research. Across four long-form deep research benchmarks in science, healthcare, and general domains, DR Tulu-8B substantially outperforms existing open deep research agents (by 15.6% over Tongyi DR on average) and matches or exceeds proprietary deep research agents (by 0.7% over OpenAI DR on average), while being significantly smaller and cheaper per query (1000x cheaper than OpenAI DR per query).

Lay Summary

Deep research agents are AI systems that can search for information, read multiple sources, and write detailed answers with citations. They could be useful for tasks like scientific literature review, healthcare research, or answering complex real-world questions. However, most open-source systems today are not directly trained for this kind of long, open-ended research. Instead, they are often trained on short questions with simple answers, which does not capture the difficulty of producing a careful, well-supported research report. In this work, we introduce Deep Research Tulu, a fully open AI model trained specifically for long-form deep research. The key idea is a new training method called Reinforcement Learning with Evolving Rubrics. Instead of using fixed grading rules, our method builds and updates evaluation rubrics during training. These rubrics learn from the model’s own searches and from comparisons between different answers, allowing the model to receive more accurate feedback as it improves. Deep Research Tulu performs strongly across science, healthcare, and general research benchmarks. It outperforms existing open deep research agents, matches or exceeds several proprietary systems, and is much cheaper to run. We also release the model, data, code, and training infrastructure to make future research on open deep research agents easier and more reproducible.