Poster
in
Workshop: Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
Major TOM: Expandable Datasets for Earth Observation
Mikolaj Czerkawski · Alistair Francis
The performance of the state of the art deep learning models is largely owed to high-quality, large-scale datasets used to train the model, and in the field of Earth Observation (EO) this should be no exception. However, the current landscape of EO datasets is relatively atomised, with many datasets curated with diverse formats and data structures. To enable the next generation of datasets, a shared framework is proposed, introduced as Major TOM (Major Terrestrial Observation Metaset). Primarily, it consists of a geographical indexing system based on a set of grid points across the globe. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM Core, which covers the vast majority of the Earth's land surface with Sentinel-2 (multi-spectral optical) and Sentinel-1 (SAR) satellite images. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major~TOM ecosystem.