Learning What to Generate: A Reinforcement Learning-based Closed-Loop Augmentation Framework for Person Re-identification
Abstract
Person re-identification (ReID) models are sensitive to long-tail nuisances (e.g., rare viewpoints, occlusions, complex backgrounds), yet current generative augmentation is largely open-loop: prompts/conditions are sampled heuristically without verifying whether the synthesized samples improve ReID discriminability. We introduce ReasonAug, a closed-loop framework that learns an image-conditioned instruction policy for a frozen generator, turning augmentation into a sequential decision problem over instruction tokens. A Semantic Reasoning Agent (SRA) performs hierarchical planning from global semantics to identity-critical local cues, producing structured edit instructions whose utility is verified by downstream ReID feedback. To make closed-loop optimization reliable, we propose Metric-Aligned Gated Reward (MAGR), which converts metric-learning objectives into a dense reward while gating task shaping by identity preservation to prevent reward hacking, and Structure-Aware Entropy (SAE), which allocates exploration per token to lock identity-critical cues while diversifying nuisance factors. Experiments on Market-1501 and MSMT17 demonstrate state-of-the-art performance, confirming that closing the augmentation loop and learning what to generate yield more discriminative training data than open-loop alternatives.