Skip to yearly menu bar Skip to main content


Oral

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang
2023 Oral
[ PDF

Abstract

Video

Chat is not available.