AURA: Visually Interpretable Affective Understanding via Robust Archetypes
Abstract
Interpretable methods such as Vision–Language Models (VLMs) have advanced affect analysis by aligning images with textual descriptions. However, relying on text as an intermediate proxy faces critical limitations: linguistic templates are inherently discrete, making them fundamentally incompatible with continuous Valence–Arousal (VA) regression, while also acting as a bottleneck for fine-grained visual nuances. Cognitive psychology suggests that human affective perception is not mediated by linguistic translation, but is grounded in direct perceptual resemblance to internalized Visual Archetypes. Motivated by this, we propose AURA, a archetype framework that replaces brittle linguistic proxy with a self-organizing archetype manifold. By adaptively allocating representational density based on data complexity, AURA enables precise continuous regression and reshapes affective taxonomies, decomposing coarse labels into interpretable, geometrically coherent visual primitives. This paradigm offers a transparent, perceptually grounded decision trail, achieving state-of-the-art performance across discrete and continuous tasks.