TokenDrop: Token-Level Importance-Aware Backward Propagation Skipping for Efficient LLM Fine-Tuning
Beomseok Kim ⋅ Sol Namkung ⋅ Dongsuk Jeon
Abstract
Despite the success of parameter-efficient fine-tuning (PEFT) methods in reducing parameter-related overhead, fine-tuning large language models (LLMs) is still bottlenecked by significant memory and computational demands. In this paper, we propose **TokenDrop**, a token-level importance-aware backpropagation skipping method that reduces activation memory and accelerates LLM fine-tuning by skipping backward computations for less informative tokens. TokenDrop evaluates token importance based on the magnitude of residual updates during the forward pass, enabling lightweight, gradient-free importance estimation. Furthermore, we introduce cumulative token selection to preserve gradient continuity across layers and lazy selection scheduling that defers token selection to facilitate globally informed importance scoring under memory constraints. Across a range of experiments, TokenDrop achieves up to **42.9**\% reduction in memory usage and up to **1.50**$\times$ training speedup, while preserving accuracy and outperforming existing backpropagation-skipping baselines. The code is available at https://anonymous.4open.science/r/tokendrop_official-B469.
Successful Page Load