Skip to yearly menu bar Skip to main content


Poster

Tell, Don't Show: Language Guidance Eases Transfer Across Domains in Images and Videos

Tarun Kalluri · Bodhisattwa Prasad Majumder · Manmohan Chandraker


Abstract:

We introduce LagTrAN, a novel framework that utilizes readily available or easily acquired text descriptions to guide robust transfer of discriminative knowledge from labeled source data to unlabeled target with domain gaps. While unsupervised adaptation methods have been established to address this problem, they show limitations in handling challenging domain shifts due to their exclusive operation within the image-space. Motivated by our observation that semantically richer text modality has more favorable domain transfer properties, we devise a transfer mechanism to use a source-trained text-classifier to generate predictions on the target text descriptions, and utilize these predictions as supervision for the corresponding images. Our approach driven by language guidance is surprisingly easy and simple, yet beats all prior approaches on challenging datasets like GeoNet and DomainNet validating its extreme effectiveness. To extend the scope of our study beyond images, we introduce a new benchmark to study ego-exo transfer in videos, and find that our language-aided approach LagTrAN yields significant gains on this novel transfer setting. Code, models and proposed datasets will be publicly released.

Live content is unavailable. Log in and register to view live content