Skip to yearly menu bar Skip to main content


Poster

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Ying Sheng ⋅ Lianmin Zheng ⋅ Binhang Yuan ⋅ Zhuohan Li ⋅ Max Ryabinin ⋅ Beidi Chen ⋅ Percy Liang ⋅ Christopher Re ⋅ Ion Stoica ⋅ Ce Zhang
2023 Poster
[ Poster

Abstract

Video

Chat is not available.