Skip to yearly menu bar Skip to main content


Poster

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal · Bilge Acun · Basil Hosmer · Mostafa Elhoushi · Yejin Lee · Shivaram Venkataraman · Dimitris Papailiopoulos · Carole-Jean Wu
2024 Poster

Abstract

Chat is not available.