The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order
John Sweeney
Abstract
Sequential fine-tuning on multiple datasets is ubiquitous, but the training order of sources can measurably change downstream performance; testing both orders roughly doubles compute. We model a single gradient step on a dataset as a nonlinear operator and show that non-commutativity induces order-dependent effects governed by a commutator (Lie-bracket) term. For two sources $A,B$ and target domain $E$, this yields a directional score $\sigma_{AB}^{(E)} = \langle g_E, H_B g_A - H_A g_B \rangle$ that predicts whether $A \to B$ or $B \to A$ yields lower $L_E$. We evaluate $g_E$ at a reference point capturing the shared drift of both orders (Trotter scoring) and develop a theory-driven $\eta$-autopilot that selects step sizes from pilot data by balancing signal-to-noise against higher-order stability constraints. On four LLMs and a diffusion UNet, our planner achieves 81–94% overall sign accuracy and 82–100% on highest-impact decisions, enabling practical transfer-order planning without manual hyperparameter tuning.
Successful Page Load