Poster Wed, Jul 8, 2026 • 6:30 PM – 8:15 PM PDT HALL A #4200

GRPO-based Cluster Decision Agent for Unknown-$\boldsymbol{K}$ Multi-view Clustering

Xuqian Xue ⋅ Jun Zhang ⋅ Qi Cai ⋅ Zhizhong Huang ⋅ Hongming Shan ⋅ Junping Zhang

Abstract

Existing contrastive multi-view clustering methods rely on a pre-defined cluster number, limiting their flexibility in real-world scenarios lacking prior knowledge. To address this, we propose GROK, a novel framework driven by a cluster decision agent for unknown-$K$ multi-view clustering. It pioneers the adaptation of group relative policy optimization (GRPO) —a reinforcement learning strategy for LLM reasoning— into the unsupervised domain to autonomously determine the optimal $K$. Specifically, the agent orchestrates the clustering process through three synergistic phases. First, in the state perception phase, we employ a structure-aware adaptive backbone to aggregate multi-view data, providing the agent with consistent and discriminative consensus observations. Second, in the group decision phase, we introduce an action space divide-and-conquer strategy and an adaptive reward function. Equipped with these mechanisms, the agent performs group sampling and relative advantage estimation within the discrete action space of candidate $K$ values, autonomously searching for the optimal $K$ via reward maximization. Finally, via geometric feedback, geometric clustering guidance mechanism transforms the agent's structural hypotheses into explicit differentiable constraints to reshape feature manifolds, thereby closing the perception-decision-feedback loop. Experimental results demonstrate that GROK achieves superior clustering performance in unknown-$K$ scenarios by autonomously exploring the underlying cluster structure.