PADA-Coder: Improving Plan-Following Code Generation via Perturbation-Verified Attention Distillation and Dynamic Alignment
Abstract
The Plan-then-Code paradigm effectively enhances Large Language Models (LLMs) in complex code generation by decomposing reasoning into explicit, interpretable steps. However, introducing the plan and verification report substantially enlarges the context, which in turn misdirects the model’s attention toward irrelevant tokens and the most recently generated code. This effect leads the model to overlook critical constraints and to generate incorrect code, especially for small-scale LLMs (less than 8B). To address this issue, we propose \textbf{P}erturbation-Verified \textbf{A}ttention \textbf{D}istillation and Dynamic \textbf{A}lignment (PADA). PADA identifies the key tokens most critical to the student model and constructs the optimal attention target matrix, dynamically aligning the student’s focus with key tokens for each plan step. We evaluate PADA with two teacher models and three student models across seven benchmarks, and the results show that PADA improves Pass@1 by up to 16.7\% and outperforms SOTA methods in all settings. Our code is available at https://anonymous.4open.science/r/PADA-coder