PMSPO: Progressive Matching and Semantic-Aware Policy Optimization for Camouflaged Object Detection
Abstract
Reinforcement learning-based Multimodal Large Language Models (MLLMs) provide new perspectives for visual grounding, yet face significant challenges in Camouflaged Object Detection (COD) where objects blend seamlessly with backgrounds. This stems primarily from: difficulties in multi-object matching, the detrimental effects of low-quality samples, and erroneously localizing visual distractors with similar textures to true objects. We propose Progressive Matching and Semantic-aware Policy Optimization (PMSPO), a curriculum learning-based framework that employs Sinkhorn multi-object matching IoU reward during training for multi-object alignment, utilizes Positive Learning Gain Filtering (PLGF) to curate high-quality samples, and transforms deep visual features into semantic contrastive reward rules to calibrate target background semantics. Experiments on COD benchmarks demonstrate that PMSPO achieves state-of-the-art (SOTA) performance among reinforcement learning methods across all evaluation metrics.