Skip to yearly menu bar Skip to main content


Poster

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Harry Dong · Xinyu Yang · Zhenyu Zhang · Zhangyang “Atlas” Wang · Yuejie Chi · Beidi Chen
2024 Poster

Abstract

Chat is not available.