Position: Evidence and Implications of Texture Bias in Deep Neural Networks
Abstract
Whether deep vision models recognize objects primarily by shape or texture remains a central and unresolved question in computer vision. Early studies report a strong texture bias in convolutional neural networks (CNNs), while other work reports shape-biased representations. We argue that much of this apparent discrepancy reflects methodological confounds and a conflation of local contour sensitivity with genuine global shape understanding. Using minimal, tightly controlled stimuli, we directly compare cue-conflict and cue-suppression paradigms within a unified experimental framework. We show that standard CNNs consistently prioritize texture over global shape when cues compete, even when shape information is explicitly available. Evidence for shape bias typically reflects reliance on local fragments rather than invariant, relational representations of object structure. Our findings support the view that texture bias is fundamentally rooted in architectural inductive biases rather than data or optimization alone. This gap has direct consequences for robustness, safety, and generalization, and motivates the development of architectures that explicitly support global integration and relational reasoning, moving beyond incremental data-driven fixes.