Skip to yearly menu bar Skip to main content


Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

Seongjun Yang ⋅ Gibbeum Lee ⋅ Jaewoong Cho ⋅ Dimitris Papailiopoulos ⋅ Kangwook Lee

Abstract

Video

Chat is not available.