Skip to yearly menu bar Skip to main content


MInference: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Huiqiang Jiang ⋅ Yucheng Li ⋅ Chengruidong Zhang ⋅ Qianhui Wu ⋅ Xufang Luo ⋅ Surin Ahn ⋅ Zhenhua Han ⋅ Amir Abdi ⋅ Dongsheng Li ⋅ Chin-Yew Lin ⋅ Yuqing Yang ⋅ Lili Qiu

Abstract

Chat is not available.