Skip to yearly menu bar Skip to main content


Spotlight Poster

Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

Jian Liang · Sheng · Zhengbo Wang · Ran He · Tieniu Tan

Hall C 4-9 #2201
[ ] [ Project Page ]
Wed 24 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

The emergence of vision-language models, such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. This paper explores a realistic unsupervised fine-tuning scenario, considering the presence of out-of-distribution samples from unknown classes within the unlabeled data. In particular, we focus on simultaneously enhancing out-of-distribution detection and the recognition of instances associated with known classes. To tackle this problem, we present a simple, efficient, and effective approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompt, UEO incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Extensive experiments across 15 domains and 4 different types of prior knowledge validate the effectiveness of UEO compared to baseline methods. The code is at https://github.com/tim-learn/UEO.

Live content is unavailable. Log in and register to view live content