Poster Tue, Jul 7, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning

Yixian Shen ⋅ Zhiheng Yang ⋅ Qi Bi ⋅ Changshuo Wang ⋅ JIA-HONG HUANG ⋅ Shuai Wang ⋅ Prayag Tiwari ⋅ George Floros ⋅ Anuj Pathania

Abstract

Multimodal reasoning often relies on long chains of intermediate textual and visual thoughts, where accumulating visual tokens and dense cross-modal attention incur substantial computation and memory overhead. To address this challenge, we propose Spectral-Progressive Thought Flow (*SpecFlow*), a *novel* lightweight multimodal reasoning framework that represents intermediate visual thoughts in a fixed-size discrete cosine space. By exploiting strong energy compaction, *SpecFlow* preserves global layout and relational structure while introducing high-frequency details only when increased spatial precision is required. To align visual state evolution with linguistic intent, classifier-free guidance enables autoregressive textual thoughts to steer flow-based updates of the visual workspace without expanding the context. As a result,*SpecFlow* maintains a bounded visual workspace whose updates depend only on the current visual state and accumulated textual trace, enabling long-horizon inference with stable latency and memory usage independent of reasoning depth. Empirical results show that *SpecFlow* achieves competitive or superior reasoning performance while reducing computation and memory costs by up to *$2.1\times$*.