Skip to yearly menu bar Skip to main content


Poster

Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions

haoyu zheng ⋅ Yongqiang Zhang ⋅ Fangcheng Fu ⋅ Xiaokai Zhou ⋅ Hao Luo ⋅ Hongchao Zhu ⋅ Yuanyuan Zhu ⋅ Hao Wang ⋅ Xiao Yan ⋅ Jiawei Jiang

Abstract

Log in and register to view live content