d2p: Fast and Scalable Structured Attention with Differentiable Dynamic Programming
Abstract
Dynamic programming (DP) algorithms are central to sequence alignment, parsing, and time-series analysis, yet their non-differentiability has limited integration into end-to-end learned systems. We present d2p, a unified differentiable framework covering twelve fundamental DP algorithms across alignment, edit distance, and parsing families. By replacing hard max/min with temperature-scaled soft operators, each algorithm yields a smooth log-partition function. Our main theoretical contribution is a complete characterization of mixed second-order derivatives: cross-Jacobians of posterior marginals with respect to algorithm parameters (gap penalties, edit costs, temperature), with natural covariance interpretations under the induced Gibbs distribution. Our GPU-accelerated implementations achieve up to 10,000× speedups over naive PyTorch, with full support for torch.compile, automatic mixed precision, and variable-length batching. We demonstrate d2p on protein structure alignment, where neural network encoders (ProteinMPNN, GVP, IPA) achieve 0.75 F1 versus 0.51 for discrete structural alphabets substitution matrices and 0.21 for amino acid sequence substitution matrices alone, showing that differentiable DP enables learning continuous representations that discrete methods cannot capture.