FlowMAP: Flow Matching for Generalizable Agent Planning
Abstract
Agent planning faces dynamic heterogeneity—nonstationary observations, dynamics, and objectives with sparse, delayed rewards—which dominant methods largely ignore, leading to poor generalization under environment shifts. We propose Flow-Matching for Agent Planning (FlowMAP), which formulates planning as a continuous-time flow matching by learning a planning-time velocity field that transports an initial meta-state distribution toward a task-conditioned target. FlowMAP introduces Value-Transport Flow Matching to provide distribution-level planning objective that steers transport toward high-value regions in meta-state distribution, mitigating error accumulation under environmental shifts. To enforce alignment between meta-state distributions transport and action-environment interaction, FlowMAP further propose Flow-Policy Co-Training, which jointly optimizes the planning flow and policy so that the flow transport directly regularizes the policy-induced meta-distribution dynamics. Across diverse agent planning benchmarks, FlowMAP consistently outperforms strong baselines, yielding improvement in planning generalization.