Skip to yearly menu bar Skip to main content


Poster

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Harry Dong ⋅ Xinyu Yang ⋅ Zhenyu Zhang ⋅ Zhangyang “Atlas” Wang ⋅ Yuejie Chi ⋅ Beidi Chen
2024 Poster

Abstract

Chat is not available.