Skip to yearly menu bar Skip to main content


Poster

FAFO: Lossy KV Cache Compression for Lossless Inference Acceleration via Draftless Fumble Decoding

Hoang Anh Duy Le ⋅ Shaochen (Henry) Zhong ⋅ Yifan Lu ⋅ Yingtong Dou ⋅ Jiayi Yuan ⋅ Yu-Neng Chuang ⋅ Xiran Fan ⋅ Guanchu Wang ⋅ Yuzhong Chen ⋅ Xia Hu

Abstract

Log in and register to view live content