Autoregression with Self-Token Prediction
Chen Dengsheng ⋅ Yangming Shi ⋅ Enhua Wu
Abstract
Conventional autoregressive models achieve causality through next-token prediction, but suffer from prohibitive latency and typically under-perform non-causal alternatives such as masked prediction and diffusion. We propose self-token prediction, which enables predicting a flexible number of tokens per step, and introduce AGARIC, the first spatially causal image generator built on this paradigm. AGARIC delivers markedly faster inference and consistently outperforms prior autoregressive baselines, matching the performance of state-of-the-art non-causal models. Our findings point to self-token prediction as a key step toward unified and efficient multimodal autoregressive modeling.
Successful Page Load