Timezone: »
We present Contextual Vision Transformers (ContextViT), a method for producing robust feature representations for images exhibiting grouped structure such as covariates. ContextViT introduces an extra context token to encode group-specific information, allowing the model to explain away group-specific covariate structures while keeping core visual features shared across groups. Specifically, given an input image, Context-ViT maps images that share the same covariate into this context token appended to the input image tokens to capture the effects of conditioning the model on group membership. We furthermore introduce a context inference network to predict such tokens on the fly given a few samples from a group distribution, enabling ContextViT to generalize to new testing distributions at inference time. We demonstrate the performance of ContextViT through a diverse range of applications.
Author Information
Yujia Bao (Insitro)
Theofanis Karaletsos (Insitro)
More from the Same Authors
-
2022 Poster: Learning Stable Classifiers by Transferring Unstable Features »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2022 Spotlight: Learning Stable Classifiers by Transferring Unstable Features »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2021 Poster: Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2021 Poster: Variational Auto-Regressive Gaussian Processes for Continual Learning »
Sanyam Kapoor · Theofanis Karaletsos · Thang Bui -
2021 Spotlight: Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers »
Yujia Bao · Shiyu Chang · Regina Barzilay -
2021 Spotlight: Variational Auto-Regressive Gaussian Processes for Continual Learning »
Sanyam Kapoor · Theofanis Karaletsos · Thang Bui