SJD-SV: Speculative Jacobi Decoding with Semantics Verification for Autoregressive Image Generation
Abstract
Speculative Jacobi Decoding (SJD) is an important approach for accelerating autoregressive image generation. Although SJD has shown superior performance, recent studies point out that it usually suffers from a token ambiguity issue during token verification but its reason can not be well explained. To figure out this reason, in this paper, we conduct a visualization analysis on vision token and find that different from text tokens, vision tokens generally corresponds to some local, small, and unclear vision details, which means only using single token is difficult to accurately express a certain semantic, thereby causing token ambiguity issue. To this end, we propose a novel Speculative Jacobi Decoding with Semantics Verification (called SJD-SV), for accelerating autoregressive image generation. The key idea is that leveraging the strong correction characters between tokens to recognize semantic-aware token subsequence and then instead of perform token-by-token verification, turning to perform verification on semantic-aware token subsequence level for accelerating image generation. In particular, our method is plug-in, which can be directly integrated into existing SJD and its variants. Extensive experiments on various datasets show that existing SJD methods achieve significant performance improvement after integrating our SJD-SV method.