Timezone: »
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee
Event URL: https://openreview.net/forum?id=xK9FnwDMZp »
This paper presents “Predictive Pipelined Decoding (PPD),” a novel approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Breaking from conventional strategies, PPD strategically employs additional compute resources to parallelize the initiation of subsequent token decoding during the ongoing verification of the current token decoding. This innovative method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as $p_{correct}$. Our results demonstrate that the use of extra computational resources has the potential to significantly accelerate LLM greedy decoding.
This paper presents “Predictive Pipelined Decoding (PPD),” a novel approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Breaking from conventional strategies, PPD strategically employs additional compute resources to parallelize the initiation of subsequent token decoding during the ongoing verification of the current token decoding. This innovative method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as $p_{correct}$. Our results demonstrate that the use of extra computational resources has the potential to significantly accelerate LLM greedy decoding.
Author Information
Seongjun Yang (KEAFTON AI)
Gibbeum Lee (KRAFTON)
Jaewoong Cho (KRAFTON)
Dimitris Papailiopoulos (University of Wisconsin-Madison)
Kangwook Lee (KAIST)
More from the Same Authors
-
2023 : Teaching Arithmetic to Small Transformers »
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos -
2023 : Looped Transformers are Better at Learning Learning Algorithms »
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos -
2023 : A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks »
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak -
2023 Poster: Looped Transformers as Programmable Computers »
Angeliki Giannou · Shashank Rajput · Jy-yong Sohn · Kangwook Lee · Jason Lee · Dimitris Papailiopoulos -
2023 Poster: Transformers as Algorithms: Generalization and Stability in In-context Learning »
Yingcong Li · Muhammed Ildiz · Dimitris Papailiopoulos · Samet Oymak -
2023 Poster: Improving Fair Training under Correlation Shifts »
Yuji Roh · Kangwook Lee · Steven Whang · Changho Suh -
2023 Poster: Optimizing DDPM Sampling with Shortcut Fine-Tuning »
Ying Fan · Kangwook Lee -
2022 Poster: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2022 Spotlight: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2021 : Dreaming of Federated Robustness: Inherent Barriers and Unavoidable Tradeoffs »
Dimitris Papailiopoulos -
2020 Poster: Closing the convergence gap of SGD without replacement »
Shashank Rajput · Anant Gupta · Dimitris Papailiopoulos -
2019 Workshop: Coding Theory For Large-scale Machine Learning »
Viveck Cadambe · Pulkit Grover · Dimitris Papailiopoulos · Gauri Joshi -
2019 Poster: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2019 Oral: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2018 Poster: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Lingjiao Chen · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Lingjiao Chen · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Poster: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos