Poster Tue, Jul 7, 2026 • 2:00 PM – 3:45 PM KST Coex: HALL A

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

Shenggui Li ⋅ Chao Wang ⋅ YIKAI ZHU ⋅ Yubo Wang ⋅ Fan Yin ⋅ Shuai Shi ⋅ YefeiChen ⋅ Xiaomin Dong ⋅ Qiaoling Chen ⋅ Jin Pan ⋅ JiLi ⋅ Yineng Zhang ⋅ Lei Yu ⋅ Yonggang Wen ⋅ Ivor Tsang ⋅ Tianwei Zhang

Abstract

Speculative decoding mitigates the memory-bound nature of LLM decoding by using a lightweight draft model to propose multiple tokens for parallel verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source and efficient framework for training speculative decoding models with full support for EAGLE-3. SpecForge incorporates target–draft decoupling, hybrid parallelism, optimized training kernels, and tight integration with production-grade inference engines, enabling up to 9.9x faster EAGLE-3 training for Qwen3-235B-A22B compared to the baseline. We further release SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs, achieving up to 4.48x end-to-end inference speedup on SGLang and addressing the scarcity of high-quality drafts. Finally, we distill a systematic study of speculative decoding training into practical and actionable recipes to guide real-world adoption.