Poster Thu, Jul 9, 2026 • 5:00 PM – 6:45 PM KST Coex: HALL A

Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

Onkar Susladkar ⋅ Tushar Prakash ⋅ Gayatri Deshmukh ⋅ Kiet Nguyen ⋅ Jiaxun Zhang ⋅ Adheesh Juvekar ⋅ Tianshu Bao ⋅ Lin Chai ⋅ Sparsh Mittal ⋅ Inderjit Dhillon ⋅ Ismini Lourentzou

Project Page

Abstract

We propose UniDFlow, a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via task-specific low-rank adapters, avoiding objective interference and representation entanglement, while a novel reference-based multimodal preference alignment optimizes relative outcomes under identical conditioning, improving faithfulness and controllability without large-scale retraining. UniDFlow achieves SOTA performance across eight benchmarks and exhibits strong zero-shot generalization to tasks including inpainting, in-context image generation, reference-based editing, and compositional generation, despite no explicit task-specific training.