Must All Negatives Be Pushed Away Equally? Uncertainty-Aware Cross-View Geo-Localization via Normal Inverse Gamma Distribution
Songsong Ouyang ⋅ Le Wu ⋅ Yingying Zhu
Abstract
Cross-view geo-localization (CVGL) aims to retrieve the corresponding satellite image given a street query and is critical for autonomous navigation. Although recent methods perform well on benchmarks, they often fail to generalize to unseen environments. A key limitation is the use of contrastive learning, which assigns equal labels to all negative samples and induces similarity-amplified repulsion. But should all negatives be treated equally? In CVGL, semi-positive samples that are geographically proximate to the positive often share important semantic cues. Treating them as ordinary negatives forces the model to overfit noise, leading to a collapse in generalization. To address this issue, we propose an uncertainty-aware framework grounded in Deep Evidential Regression (DER), modeling the Normal-Inverse-Gamma (NIG) distribution as a conjugate prior to quantify environmental complexity $u$ in a single forward pass. The estimated $u$ adaptively softens labels for hard negatives in Soft InfoNCE, mitigating excessive repulsion on semi-positive samples. An Uncertainty Head with cls-to-spatial cross-attention and attention statistics is designed to accurately fit the NIG distribution. Extensive experiments demonstrate state-of-the-art performance, including an average 18\% R@1 improvement in zero-shot cross-dataset transfer, filling the critical gap between laboratory benchmarks and robust real-world deployment.
Successful Page Load