Efficient Mismatch-Tolerant Coding for Model-Driven Compression
Abstract
A central insight in lossless data compression is the close connection between probabilistic next-symbol prediction and efficient sequence compression, whereby predictive models can be combined with classical coding techniques to achieve strong compression performance. Applying this approach with powerful modern learned models, such as LLMs, has been shown to achieve markedly better compression than traditional techniques across a wide range of domains. However, significant practical challenges remain, including model non-determinism, in which a model produces different predictions on different machines despite identical parameters and inputs; such mismatches between the encoder and decoder can lead to complete decoding failure. Probability Matching Interval Coding (PMATIC) was recently introduced as a drop-in framework for mismatch-robust coding and shown to enable reliable compression and decompression in the presence of bounded prediction mismatch (Adler & Tang, 2026). In this work, we present PMATIC+, a generalization of PMATIC that allows the incorporation of tight theoretical results into the design and more flexible parameter optimization, resulting in substantial improvements in compression efficiency and robustness.