Solving Spatial-Spectral Fusion with Latent Spectral Operators
Abstract
Existing deep spatial–spectral fusion (SSF) methods typically learn the fusion mapping in the coordinate domain using convolutions and attentions, making it hard to scale across varying spatial resolutions and offering limited control over the frequency content of the reconstructions, which may further lead to severe spectral distortion. In this work, we propose Latent Spectral Operators (LSO), a SSF framework that learns fusion mappings between spectral functions through a structured operator parameterization. Specifically, LSO first applies a cross-attention projection, where learned latent tokens serve as spectral prompts, to compress high-dimensional observations into a compact latent representation, and then adopts a hierarchical, patch-based architecture to integrate rich multi-scale cues. Furthermore, to parameterize the latent fusion operator in a controllable manner, a Trigonometric Basis Solver is elaborated, which represents the mapping using a trigonometric basis expansion. This formulation naturally supports multi-frequency modeling, with a capacity–stability trade-off governed by the number of basis functions. Extensive experiments on the CAVE and Harvard benchmarks demonstrate that LSO achieves consistent state-of-the-art performance and exhibits strong transferability across different spatial scales. Codes are attached.