Skip to yearly menu bar Skip to main content


Poster

QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Jiaming Tang ⋅ Yilong Zhao ⋅ Kan Zhu ⋅ Guangxuan Xiao ⋅ Baris Kasikci ⋅ Song Han
2024 Poster

Abstract

Chat is not available.