Skip to yearly menu bar Skip to main content


Oral

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Ying Sheng ⋅ Lianmin Zheng ⋅ Binhang Yuan ⋅ Zhuohan Li ⋅ Max Ryabinin ⋅ Beidi Chen ⋅ Percy Liang ⋅ Christopher Re ⋅ Ion Stoica ⋅ Ce Zhang
2023 Oral
[ PDF

Abstract

Video

Chat is not available.