Progressive Cramming: Reliable Token Compression and What It Reveals
Abstract
Token cramming compresses sequences into learned embeddings with near-perfect reconstruction, but prior work used fixed token budgets and 99\% accuracy thresholds, obscuring whether residual errors reflect optimization failures or fundamental limits. We introduce progressive cramming, which grows the target prefix token-by-token and stops only when reconstruction is no longer achievable within a fixed optimization budget. Using progressive trajectories, we find that optimization paths occupy surprisingly low-dimensional structure in the embedding space. Attention analysis shows that compression embeddings often become attention sinks in specific intermediate layers, which correlates with both optimization difficulty and downstream degradation. On likelihood-based multiple-choice evaluation, prepending a crammed embedding drops accuracy to close to random guessing, even with the original prefix in context. These results suggest that perfect reconstruction can arise from brittle steering and attention hijacking, rather than a transferable semantic representation. Our results position progressive cramming as a tool for studying compression limits, while showing that perfect reconstruction is insufficient for meaningful compression.