Poster
in
Workshop: Machine Learning for Astrophysics
Toward Galaxy Foundation Models with Hybrid Contrastive Learning
Mike Walmsley · Inigo Slijepcevic · Micah Bowles · Anna Scaife
New astronomical tasks are often related to earlier tasks for which labels have already been collected. We adapt the contrastive framework BYOL to leverage those labels as a pretraining task while also enforcing augmentation invariance. For large-scale pretraining, we introduce GZ-Evo, a set of 96.5M volunteer responses for 552k galaxy images plus a further 1.34M comparable unlabelled galaxies. Most of the 206 possible GZ-Evo answers are unknown for any given galaxy, and so our pretraining task uses a Dirichlet loss that naturally handles missing answers. Our hybrid pretraining/contrastive method achieves higher accuracy on our downstream task (classifying ringed galaxies) than both direct training and the purely-contrastive equivalent. Surprisingly, the simple approach of purely-supervised pretraining performs best, achieving a relative error reduction of 17\% vs. direct training on 50k+ labels.