Timezone: »

Patch-level Contrastive Learning via Positional Query for Visual Pre-training
Shaofeng Zhang · Qiang Zhou · Zhibin Wang · Fan Wang · Junchi Yan

Thu Jul 27 04:30 PM -- 06:00 PM (PDT) @ Exhibit Hall 1 #617

Dense contrastive learning (DCL) has been recently explored for learning localized information for dense prediction tasks (e.g., detection and segmentation). It still suffers the difficulty of mining pixels/patches correspondence between two views. A simple way is inputting the same view twice and aligning the pixel/patch representation. However, it would reduce the variance of inputs, and hurts the performance. We propose a plug-in method PQCL (Positional Query for patch-level Contrastive Learning), which allows performing patch-level contrasts between two views with exact patch correspondence. Besides, by using positional queries, PQCL increases the variance of inputs, to enhance training. We apply PQCL to popular transformer-based CL frameworks (DINO and iBOT, and evaluate them on classification, detection and segmentation tasks, where our method obtains stable improvements, especially for dense tasks. It achieves new state-of-the-art in most settings. Code is available at https://github.com/Sherrylone/Query_Contrastive.

Author Information

Shaofeng Zhang (Shanghai Jiao Tong University, Tsinghua University)
Qiang Zhou (Alibaba Group)
Zhibin Wang (Alibaba Group)
Fan Wang (Alibaba Group)
Junchi Yan (Shanghai Jiao Tong University)

More from the Same Authors