Skip to yearly menu bar Skip to main content


Poster

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal ⋅ Bilge Acun ⋅ Basil Hosmer ⋅ Mostafa Elhoushi ⋅ Yejin Lee ⋅ Shivaram Venkataraman ⋅ Dimitris Papailiopoulos ⋅ Carole-Jean Wu
2024 Poster

Abstract

Chat is not available.