Identifying Latent Concepts and Structures for Generalized Category Discovery
Abstract
Generalized Category Discovery (GCD) aims to recognize known classes while autonomously discovering novel ones in open-world settings. However, current approaches primarily focus on designing clustering objectives, often overlooking a critical bottleneck: standard vision backbones yield high-rank, entangled token representations that are ill-suited for unsupervised discovery of latent concepts and structures. In this paper, we propose Compositional Primitive Fields (CPF-GCD), a novel representation learning framework that reshapes the feature space to make such latent structure identifiable by enforcing a low-rank compositional organization. Our core hypothesis is that all categories (known/novel) reside on a latent manifold spanned by a finite set of learnable visual primitives that capture reusable concepts. CPF instantiates this geometric constraint via a spatial field mechanism. Inserted between the backbone and the head, it projects noisy patch tokens onto this structured manifold, effectively decomposing images into reusable atomic parts and their spatial layouts. By explicitly modeling the spatial distribution of primitives, CPF enables novel categories to emerge naturally as new activation patterns over a shared vocabulary. Extensive experiments demonstrate that CPF serves as a generic, plug-and-play module that consistently boosts performance across diverse GCD baselines, validating that identifying and leveraging low-rank compositional structure is a crucial inductive bias for open-world recognition.