HilbertA: Hilbert-Curve–Aligned Sparse Attention for 2D Structured Data
Shaoyi Zheng ⋅ Wenbo Lu ⋅ Yuxuan Xia ⋅ Shenji Wan
Abstract
Designing sparse attention for 2-dimensional image data in diffusion models and vision-language models requires reconciling spatial locality with hardware-efficient execution, a fundamental trade-off that existing methods struggle to resolve. Prior approaches preserve 2D structure through handcrafted sparsity patterns, but often incur uncoalesced memory access, limiting practical speedups on modern GPUs. We present HilbertA, a 2D-aware and GPU-efficient sparse attention mechanism, and show that Hilbert curves provide a hardware-aligned inductive bias for sparse attention over 2D data. By reordering image tokens along Hilbert curves, HilbertA preserves local spatial neighborhoods while inducing a contiguous memory layout aligned with efficient GPU execution. To enable global information flow without uncoalesced access, HilbertA further employs a layer-wise sliding schedule, allowing long-range interactions to emerge progressively across depth. In addition, a small central shared region facilitates cross-tile communication and enhances positional awareness. Implemented in Triton, HilbertA achieves substantial acceleration while maintaining or improving model quality across both diffusion models and VLM. On Flux.1-dev, HilbertA delivers up to $4.17\times$ at $2048\times2048$, with image quality comparable to baselines. On Qwen3-VL-8B, HilbertA achieves over $2.08\times$ attention acceleration with $1.55\times$ improvements in Time-To-First-Token, while maintaining competitive model performance.
Successful Page Load