Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025

Omni-Think: Scaling Multi-Task Learning in LLMs via Reinforcement Learning

Derek Li ⋅ Jiaming Zhou ⋅ Amirreza Kazemi ⋅ Qianyi Sun ⋅ Abbas Ghaddar ⋅ Liheng Ma ⋅ Yu Luo ⋅ Dong Li ⋅ Jianye Hao ⋅ Yingxue Zhang

Project Page [ OpenReview]

Abstract

The pursuit of general-purpose artificial intelli-gence demands large language models (LLMs)capable of excelling across diverse tasks, rang-ing from symbolic reasoning to open-ended gen-eration. However, existing post-training meth-ods, such as Supervised Fine-Tuning (SFT) of-ten fall short in multi-task settings, leading topoor generalization and memorization rather thantransferable capabilities. In this work, we in-troduce Omni-Think, a unified framework thatenhances LLM performance across both struc-tured and open-ended tasks. Our approach inte-grates rule-based verifiable rewards with genera-tive preference signals obtained through LLM-as-a-Judge evaluations, enabling consistent optimiza-tion across heterogeneous task types. To betterunderstand the dynamics of multi-task RL, we ex-plore different task scheduling strategies and findthat introducing tasks in a progression from struc-tured to open-ended leads to better generalizationand mitigated forgetting. Experiments across fourdomains reveals that curriculum training improvesaverage relative performance by 5.2 % over jointmulti-task RL and by 9.1 % over merging modelstrained via RL on individual tasks. These findingshighlight the value of task-aware sampling and hy-brid supervision in scaling RL-based post-trainingfor general-purpose LLMs.

Chat is not available.