Poster Wed, Jul 8, 2026 • 10:30 PM – 12:15 AM PDT HALL A #2611

SJD-SV: Speculative Jacobi Decoding with Semantics Verification for Autoregressive Image Generation

Baoquan Zhang ⋅ Bingqi Shan ⋅ Shihao Fang ⋅ Kenghong Lin ⋅ Xutao Li ⋅ Yunming Ye

Abstract

Speculative Jacobi Decoding (SJD) is an important approach for accelerating autoregressive image generation. Although SJD has shown superior performance, recent studies point out that it usually suffers from a token ambiguity issue during token verification but its reason can not be well explained. To figure out this reason, in this paper, we conduct a visualization analysis on vision token and find that different from text tokens, vision tokens generally corresponds to some local, small, and unclear vision details, which means only using single token is difficult to accurately express a certain semantic, thereby causing token ambiguity issue. To this end, we propose a novel Speculative Jacobi Decoding with Semantics Verification (called SJD-SV), for accelerating autoregressive image generation. The key idea is that leveraging the strong correction characters between tokens to recognize semantic-aware token subsequence and then instead of perform token-by-token verification, turning to perform verification on semantic-aware token subsequence level for accelerating image generation. In particular, our method is plug-in, which can be directly integrated into existing SJD and its variants. Extensive experiments on various datasets show that existing SJD methods achieve significant performance improvement after integrating our SJD-SV method.

Lay Summary

AI models typically generate images one tiny token at a time. While there are methods to speed up this process, they often struggle to verify if these generated pieces are correct. We discovered this happens because a single visual piece is too small and unclear to carry meaning on its own, unlike a distinct word in a written sentence. To solve this, we visualized how these image tokens behave and confirmed that these tiny pieces must be evaluated in context. We built a new method called SJD-SV. Instead of forcing the AI to verify ambiguous puzzle pieces one by one, our method identifies the strong connections between them, grouping them into larger, recognizable chunks to verify the entire group at once. This acts as an easy-to-use plug-in for existing image-generation systems, significantly accelerating how fast AI can create images without losing accuracy.