When Random Saliency Looks Trained: Architectural Center Bias in CNN Interpretability
Abstract
Saliency maps are widely used to interpret image classification models and build trust in their predictions; however, their reliability remains a central concern, as randomized networks can produce saliency maps that closely resemble those of trained models. We identify a previously underappreciated architectural contributor to this phenomenon: a center-focused saliency bias induced by common convolutional design choices. Through controlled ablations, we show that this bias arises from architectural components such as zero padding and receptive field growth, and persists even in randomly initialized convolutional neural networks (CNNs) and under randomized inputs. In contrast, this behavior is largely absent in non-convolutional architectures such as Vision Transformers (ViTs) and multilayer perceptrons (MLPs). To investigate the interaction between architectural priors and learning, we introduce a corner-shift benchmark and a Center-Shift Index that quantify how saliency redistributes under object relocation. We show that training can partially shift saliency toward object regions, while randomized models remain dominated by architectural priors, helping explain the previously observed similarity between trained and random saliency maps and clarify how architectural priors can confound standard saliency evaluations.