Benchmarking Dense and Indiscernible Object Counting with Blueberries
Weihao Bo ⋅ Jingwen Qin ⋅ Yanpeng Sun ⋅ Fei Shen ⋅ Xiaofan Li ⋅ Zechao Li
Abstract
Real-world agricultural counting often operates in the extreme regime of \textbf{Dense and Indiscernible Object Counting (DIOC)}, where targets are tiny, clustered, and highly camouflaged. To facilitate research in this domain, we introduce \textbf{DIOCblueberry}, a large-scale benchmark that pushes the boundaries of visual perception. Unlike general datasets with salient objects, DIOCblueberry features extreme occlusion and camouflage. Compared to the popular FSC147 benchmark, it contains \textbf{1.9$\times$ more instances} per image (avg. 108) with an average box pixel ratio that is \textbf{7.9$\times$ smaller}, serving as a rigorous testbed for model robustness. Standard counting methods struggle in these scenarios due to severe visual ambiguity and scale mismatch. To address this, we propose \textbf{MaskCount}, a coarse-to-fine framework that incorporates semantic guidance. MaskCount leverages Vision-Language Models (CLIP) to generate pseudo segmentation masks for background suppression and employs a contrastive loss to maximize feature discriminability between fruits and foliage. Additionally, we design an edge-aware cropping mechanism to resolve boundary truncation in dense clusters. Extensive experiments demonstrate that MaskCount achieves a new state-of-the-art, reducing MAE and RMSE by \textbf{49.16\%} and \textbf{70.50\%} respectively on DIOCblueberry, with strong generalization to other agricultural scenes.
Successful Page Load