Toward Scientific Foundation Models for Aquatic Ecosystems
Abstract
Understanding and forecasting lake dynamics is essential for monitoring water quality and ecosystem health in lakes and reservoirs. While machine learning models trained on ecological time-series data have shown promise, they tend to be task-specific and struggle with generalization across diverse aquatic environments. Current research is limited to single-lake single-variable models, inconsistent observation frequencies, and a lack of foundation models that can generalize across ecosystems, hindering reproducibility and transferability. To address these challenges, we introduce LakeFM, a foundation model for lake ecosystems, pre-trained on multi-variable and multi-depth data drawn from a combination of simulated and observational lake datasets. Through empirical results and qualitative analysis, we demonstrate that LakeFM learns meaningful representations spanning both fine-grained variable-level dynamics and broader lake-level patterns. Furthermore, it achieves competitive—and in some cases superior—forecasting performance compared to existing time-series foundation models