LangPrecip: Language-Aware Multimodal Precipitation Nowcasting
Abstract
Short-term precipitation nowcasting is inherently under-constrained due to limited historical observation windows: identical observations can lead to multiple plausible future trajectories, especially for extreme events. Existing generative methods rely solely on visual features and lack explicit constraints on precipitation motion semantics, resulting in ambiguous dynamics, blurred details, and unstable predictions. We propose LangPrecip, the first language-guided precipitation nowcasting framework, and contribute LangPrecip-160K, a large-scale radar-text paired dataset with 160K annotated sequences. LangPrecip addresses the under-constrained challenge by leveraging natural-language motion descriptions as explicit semantic constraints to reduce motion ambiguity and introducing a dual-path wavelet consistency unfolding decoder that enforces physical data fidelity during latent-to-pixel reconstruction. By reformulating nowcasting as semantically constrained trajectory generation under the Rectified Flow paradigm with model-based decoder optimization, LangPrecip produces sharper and more physically consistent forecasts. Experiments on Swedish and MRMS benchmarks demonstrate substantial improvements over state-of-the-art vision-only methods, achieving over 60\% and 19\% relative gains in heavy-rainfall CSI at 80-minute lead time with enhanced spatial detail preservation.