Large-Scale Deep Learning for the Earth System – Abstracts:

AIFS, ECMWFs data driven forecast model

Simon Lang, Rilwan Adewoyin, Mihai Alexe, Zied Ben Bouallègue, Matthew Chantry, Mariana Clare, Jesper Dramsch, Sara Hahner, Christian Lessig, Linus Magnusson, Michael MaierGerber, Gert Mertes, Gabriel Moldovan, Ana Prieto Nemesio, Cathal O’Brien, Florian Pinault, Baudouin Raoult, Mario Santa Cruz, Helen Theissen, SteMen Tietsche, Martin Leutbecher, Peter Düben

ECMWF has developed a datadriven forecast model, the Artificial Intelligence Forecasting System (AIFS). AIFS is based on an attention-based graph neural  network (GNN) encoder and decoder, and a sliding window transformer processor, and is trained on ECMWF’s ERA5 reanalysis and ECMWF’s operational numerical weather prediction (NWP) analyses. We will describe the current state of AIFS, its architecture and framework, and discuss ensemble generation methods, e.g. via probabilistic score optimisation or diMusionbased training. Both, a deterministic AIFS version as well as an ensemble version of AIFS are now run in experimental mode alongside ECMWF’s physicsbased NWP model. AIFS forecasts are available to the public under ECMWF’s open data policy.

———————————–

Regional data-driven modelling with a global stretched-grid approach

Magnus Sikora Ingstad, Håvard Haugen, Thomas Nipen, Even Nordhagen, Aram Farhad Shafiq Salihi, Paulina Tedesco, Ivar Seierstad and Jørn Kristiansen (Norwegian Meteorological Institute).

We present a data-driven weather model developed to support high-resolution regional forecasting needs at the Norwegian Meteorological Institute. The model has global coverage with a variable resolution grid, dedicating higher resolution to the Nordic region, which is our region of interest. As input, the model uses high-resolution (2.5 km) analyses from the MetCoOp ensemble prediction system (MEPS) covering the Nordics and 0.25° resolution ERA5 covering the rest of the globe. This stretched-grid model allows weather systems to seamlessly pass from the global domain into the regional and vice versa.
The model is based on components of ECMWF’s data-driven modelling framework Anemoi and uses a graph neural network architecture. The model is initially pre-trained on 43 years of 1° ERA5 global reanalysis, fine-tuned on 0.25°, and finally transferred to 3 years of combined ERA5 and MEPS analyses. This setup allows for a computationally cheap transition to higher resolution, while also maintaining information from the longer period of global data. We show that this is essential for accurately simulating the synoptic flow patterns when only a limited period of high-resolution training data is available.
The model is verified against surface observations from SYNOP stations covering Norway. We assess the model’s potential to be used in automated public weather forecasting, focusing on short-range (up to 66 hour lead times) forecasts of 2m temperature, 10m wind speed, and 6 hour precipitation accumulation. Results indicate that the model outperforms our high-resolution NWP system on 2m temperature and is able to faithfully represent the intricate temperature field in the mountainous coastal terrain of Norway. For wind speed, the model outperforms ECMWF’s IFS forecasts in mountainous regions. Although the model struggles to represent convective scale precipitation events, it accurately predicts large-scale precipitation events lasting multiple days.

———————————–

ArchesWeather: An efficient AI weather forecasting model at 1.5º resolution

Guillaume Couairon1, Christian Lessig2, Anastase Charantonis1,3, Claire Monteleoni1,4
1 INRIA, Paris, France
2 ECMWF, Bonn, Germany
3 ENSIIE, Evry France
4 CU Boulder, Colorado, USA

The field of weather forecasting is undergoing a revolution, as AI models trained on the ERA5 reanalysis dataset can now outperform IFS-HRES, the reference numerical weather prediction model developed by the European Center for Medium-Range Weather Forecasting (ECMWF), with inference costs that are orders of magnitude lower. However, current state-of-the-art AI weather forecasting models like GraphCast require thousands of GPU-days to train the model, and at least 30 TB of ERA5 data at 0.25º resolution.
We present ArchesWeather, a new AI weather model at 1.5º resolution, designed to be trained on academic resources. ArchesWeather can be trained in ~2 days on 2 A100 GPUs, with a dataset size of less than 1TB, and a small ensemble (4) of independently trained ArchesWeather models reach competitive performance with models trained with a much higher computational budget (see Figure 1). Like Pangu-Weather, ArchesWeather is based on a Swin Transformer with Earth-Positional Bias. We identify that the 3D local processing in Pangu-Weather is computationally sub-optimal, and propose an architecture improvement that allows training our weather model much faster. We also show a small distribution shift in ERA5 before and after 2000, which we attribute to shifts in the observation system, and we improve forecasting by fine-tuning on recent data.
We will also present ongoing work on diffusion models for weather forecasting. We train architectures based on ArchesWeather and compare models that learn to generate the entire future weather state, versus models that learn to generate the residual between future weather states and predictions of a deterministic model like ArchesWeather.

———————————–

Aurora: A Foundation Model of the Atmosphere

Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris

Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models. Taken together, these results indicate that foundation models can transform environmental forecasting.

———————————–

Leveraging datadriven weather forecasting for improving numerical weather prediction skill through largescale spectral nudging

Leo Separovic1, Syed Z. Husain1, JeanFrançois Caron2, Rabah Aider1, Mark Buehner2, Stéphane Chamberland1, Ervig Lapalme2, Ron McTaggartCowan1, Christopher Subich1, Paul Vaillancourt1, Jing Yang1 and Ayrton Zadra1


1 Atmospheric Numerical Weather Prediction Research Section, Environment and Climate Change Canada, Dorval, Quebec, Canada
2 Data Assimilation Research Section, Environment and Climate Change Canada, Dorval, Quebec, Canada

Operational meteorological forecasting has long been carried out using physicsbased numerical weather prediction (NWP) models. The recent emergence of datadriven artificial intelligence (AI)based weather emulators has begun to disrupt this existing landscape. However, most deterministic datadriven models for mediumrange forecasting suffer from major limitations, including low effective resolution and a narrow range of predicted variables.
This study illustrates the relative strengths and weaknesses of these two competing forecasting paradigms using the GEM (Global Environmental Multiscale) and GraphCast models to represent physicsbased and AIbased approaches, respectively. By analyzing global predictions from GEM and GraphCast against observations and analyses in both physical and spectral spaces, this study demonstrates that GraphCastpredicted large scales outperform GEM, particularly for longer lead times. Building on this insight, a hybrid NWPAI system is proposed, wherein GEMpredicted largescale state variables (e.g., temperature, horizontal wind) are spectrally nudged toward GraphCast inferences, while allowing GEM to freely generate finescale details (see Fig. 1), which is often critical for weather extremes.

Results from different verifications reveal that this hybrid approach is capable of leveraging the strengths of GraphCast to enhance the prediction skill of the GEM model. For example, RMSE of the 500hPa geopotential height is reduced by 510 %, with an overall predictability improvement of 1218 hours, peaking at day 7 of a 10day forecast. Notably, trajectories of tropical cyclones are predicted with enhanced accuracy without significant changes in intensity. In addition to improving the forecasting skill, this new hybrid system ensures that meteorologists have access to all the forecast variables, including those relevant for highimpact weather.

———————————–

Aardvark Weather: End-to-end data driven weather prediction

Anna Vaughan*, Stratis Markou, Will Tebbutt, James Requeiuma, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking and Richard E. Turner*
* corresponding authors: av555@cam.ac.uk and ret26@cam.ac.uk

Weather forecasting is critical for a range of human activities including transportation, agriculture, industry, as well as the safety of the general public. Machine learning models have the potential to transform the complex medium-range weather prediction pipeline, but current approaches still rely on numerical weather prediction (NWP)
systems, limiting forecast speed and accuracy. Here we demonstrate that a machine learning model can replace the entire operational NWP pipeline. Aardvark Weather, an end-to-end data-driven medium-range weather prediction system, ingests raw observations and outputs global gridded forecasts and local station forecasts.
Further, it can be optimized end-to-end to maximize performance over quantities of interest. Global forecasts outperform an operational NWP baseline for multiple variables and lead times. Local station forecasts are skilful up to ten days lead time and achieve comparable and often lower errors than a post-processed global NWP baseline and a state-of-the-art end-to-end forecasting system with input from human forecasters. These forecasts are produced with a remarkably simple neural process model using just 8% of the input data and three orders of magnitude less compute than existing NWP and hybrid AI-NWP methods. We anticipate that Aardvark Weather will initiate a new generation of end-to-end machine  learning models for medium-range forecasting that will reduce computational costs by orders of magnitude and enable rapid and cheap creation of bespoke models for users in a variety of fields, including for the developing world where state-of-the-art local models are not currently available.

———————————–

Machine learning for predicting high-resolution extreme precipitation

Peter Watson

Prediction of precipitation at high resolution, with realistic spatial structure, is necessary to predict climate impacts such as flooding. For many applications, knowledge is needed about events of extreme intensities, with probabilities of occurrence of around 1% per year or less at a given location. Few studies on applying machine learning for climate applications have considered performance on such rare and intense events, which are especially challenging due to having relatively small numbers of examples to learn from. This presentation will discuss methods to develop and evaluate machine learning-based systems to perform well for such events.
This will be illustrated using results from three studies, all showing skill for extreme weather events with intensities expected at most once every few decades:

Emulating precipitation simulations from a regional convection-permitting model (CPM) over the UK using a diffusion model, a generative machine learning model (Addison et al., in prep.). We show that the model can match the properties of the CPM output for a tiny fraction of the cost, including for ~1-in-100 year events.

Super-resolution of tropical cyclone rainfall using generative adversarial networks (GANs) (Vosper et al., 2023). This includes demonstration that one form of GAN can generalise to out-of-sample extremes. It will also be demonstrated that attaining good performance on typical events does not guarantee good performance on extreme events.

Post-processing East African weather forecasts using a GAN (Antonio et al., in review). It will be shown that the system can generalise to perform well in a wet season beyond any seen in training.

———————————–

DownscaleBench: A benchmark dataset for statistical downscaling of meteorological fields

Michael Langguth1, Sebastian Lehner2, Paula Harder3, Ankit Patnala1, Irene Schicker2, Markus Dabernig2, Konrad Mayer2, Martin G. Schultz1

1 Jülich Supercomputing Centre (JSC), Jülich, Germany
2 GeoSphere Austria, Vienna, Austria
3 Mila Quebec AI Institute, Montreal, Canada

Localized and high-resolution atmospheric data is of particular relevance for fields like agriculture, natural hazard management, and the renewable energy sector. As an alternative to computationally costly numerical simulations at high resolution, statistical downscaling with deep neural networks has recently gained momentum. Although numerous methods match or may even surpass the accuracy of classical statistical downscaling methods, intercomparison is challenging due to the wide range of methods and datasets used.
Inspired by the success of benchmark datasets for various computer vision tasks and for weather forecasting (e.g. WeatherBench 2), we provide a benchmark dataset for statistical downscaling of meteorological fields. We choose the coarse-grained ERA5 reanalysis (km) and the fine-scaled COSMO-REA6 Δ𝑥 ( ) as input and target 𝐸𝑅𝐴5 ≃ 30 Δ𝑥𝐶𝑅𝐸𝐴6 ≃ 6𝑘𝑚 datasets. Both datasets enable the formulation of a real downscaling task: super-resolve the data and correct for model biases.
The benchmark dataset offers ready-to-use data for three standard downscaling tasks, that are downscaling of the 2m temperature, 100m wind speed and the global horizontal irradiance. Alongside the provided training and test datasets, baseline neural network architectures from the literature such as U-Nets, GANs and a Swin-Transformer complemented network are selected. The baselines are complemented by SAMOS, a competitor for downscaling using classical statistical methods to account for the long history of statistical downscaling in the meteorological domain. Suitable sets of metrics and diagnostics are defined and provided in code, so that the performance of novel deep neural network solutions can be easily evaluated. Intercomparison to the baseline or other solutions can be performed with an interactive Jupyter Notebook.
Relaxing the obstacle to retrieve training and test data with the help of the benchmark dataset as well as establishing a standardized evaluation framework is considered to be fundamental for steering the progress of deep neural network-based statistical downscaling methods. Furthermore, fair intercomparison between different approaches enhances confidence and transparency of novel deep learning methods for statistical downscaling which, in turn, is believed to foster the advance of deep learning in Earth system Science in general.

———————————–

FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting

Tao Han, Song Guo, Fenghua Ling, Kang Chen, Junchao Gong, Jingjia Luo, Junxia Gu, Kan Dai, Wanli
Ouyang, Lei Bai

Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Nonetheless, developing the higher resolution numerical model remains a long-standing challenge due to the substantial consumption of computational resources. Recent advances in data-driven global weather forecasting models utilize reanalysis data for model training and have demonstrated comparable or even higher forecasting skills than numerical models. However, they are all limited by the resolution of reanalysis data and incapable of generating higher-resolution forecasts. This work presents FengWu-GHR, the first data-driven global weather forecasting model running at the 0.09∘horizontal resolution. FengWu-GHR introduces a novel approach that opens the door for operating ML-based high-resolution forecasts by inheriting prior knowledge from a pretrained low-resolution model. The hindcast of weather prediction in 2022 indicates that FengWu-GHR is superior to the IFS-HRES. Furthermore, evaluations on station observations and case studies of extreme events support the competitive operational forecasting skill of FengWu-GHR at the high resolution.

———————————–

Stormer – A state-of-the-art transformer for medium-range weather forecasting

Troy Arcomano, Tung Nguyen, Rohan Shah, Hritik Bansal, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Aditya Grover

Recently, several deep learning-based models for weather forecasting have been demonstrated with skill approaching or even exceeding traditional numerical weather prediction (NWP). These models include Graphcast, Fourcastnet, FuXi, Pangu weather, ClimaX, and the AIFS, each with vastly different training methods, machine learning architectures, and variables predicted. It is unknown whether these complex architectures are needed or what part of their training protocols are necessary to achieve their impressive accuracy for medium-range weather forecasting.
Here we introduce Stormer, a simple transformer-based model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. Using several ablations, we identify the key components of previous work and develop new methodologies, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss, to improve deep learning-based weather forecast models. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days while requiring orders of magnitude less training data and compute resources. We also investigate downstream applications for our newly developed model. Specifically, to perform data assimilation using real, in-situ observations of the atmosphere and for uncertainty quantification using a combined initial condition and model-based ensemble system.

———————————–

Probabilistic Weather Forecasting with Hierarchical Graph Neural Networks

Joel Oskarsson1, Tomas Landelius2, Marc Peter Deisenroth3,4, and Fredrik Lindsten1

1 Linköping University
2 Swedish Meteorological and Hydrological Institute
3 University College London
4 The Alan Turing Institute

In recent years, machine learning has established itself as a powerful tool for high-resolution weather forecasting. While most current machine learning models focus on deterministic forecasts, accurately capturing the uncertainty in the chaotic weather system calls for probabilistic modeling. In this work, we propose a probabilistic weather forecasting model called Graph-EFM, combining a flexible latent-variable formulation with the successful graphbased forecasting framework.
The goal of probabilistic weather forecasting is to model the distribution over future weather states X1:T , conditioned on initial states X−1:0 and forcing F1:T . In Graph-EFM we decompose this distribution as

p (X1:T |X−1:0, F1:T ) = Product(T, t=1) Integral(p(Xt|Zt,Xt−2:t−1, Ft)p(ZtXt−2:t−1, Ft)dZt). (1)*

We have here introduced a latent random variable Zt at each time step. The stochasticity in Zt should capture the uncertainty over the weather state Xt. By implementing both

p(Xt|Zt,Xt−2:t−1, Ft) and p( Zt Xt−2:t−1, Ft)

as neural networks this formulation specifies a flexible parametrization for the forecast distribution. Sampling a forecast is efficient, requiring only a single forward pass per time step. Thus Graph-EFM allows for quickly generating arbitrarily large ensembles.
Our probabilistic model builds on the graph-based weather forecasting framework [1, 2]. Gridded input data is encoded using Graph Neural Networks (GNNs) onto a constructed mesh graph covering the forecasting area. Different from previous works, we use a hierarchical mesh graph consisting of multiple levels with a decreasing number of nodes in each. This offers a natural, spatially aware, dimensionality reduction of the input data. To utilize this, we associate the different dimensions of the latent variable Zt with the nodes of the mesh graph at the top level of the hierarchy. As the independent components of Zt are then propagated down through the mesh graph, gradually increasing the spatial resolution, spatial dependencies are introduced by the model in the GNN layers. This leads to realistic, spatially coherent atmospheric fields in Xt. Figure 1 shows an overview of the Graph-EFM model with the hierarchical mesh graph.
We experiment with the model on both global and limited area forecasting. Ensemble forecasts from Graph-EFM achieve equivalent or lower errors than comparable deterministic models, with the added benefit of accurately capturing forecast uncertainty.
More details about this work can be found in the preprint [3].

References
[1] R. Keisler. Forecasting global weather with graph neural networks. arXiv preprint arXiv:2202.07575, 2022.
[2] R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, et al. Learning skillful mediumrange global weather forecasting. Science, 2023.
[3] J. Oskarsson, T. Landelius, M. P. Deisenroth, and F. Lindsten. Probabilistic weather forecasting with hierarchical graph neural networks. arXiv preprint arXiv:2406.04759, 2024.

*note by the website editor: The display of the function is limited due to the text implementation – this is noe the authors fault. Please excuse us.

———————————–

Continuous Ensemble Forecasting with Diffusion

Martin Andrae1, Tomas Landelius2

1 Div of Statistics and Machine Learning, Dept of Comp and Info Sci, Linköping Univ, Sweden
2 Swedish Meteorological and Hydrological Institute, Norrköping, Sweden

Following the success of deterministic machine learning weather prediction (MLWP), the focus has shifted towards probabilistic modeling. This approach addresses the blurriness often observed in deterministic MLWP models and provides methods to quantify forecast uncertainty and detect extreme events. Thanks to the low computational demand of MLWP, we can now generate large ensembles previously out of reach with NWP models, enabling us to better capture the distribution of future states. However, first-generation probabilistic MLWP models face several issues. They either only address initial state uncertainty, are limited to coarse resolutions, or require computationally expensive diffusion models for finer resolutions. Moreover, they are often applied iteratively using auto-regression to roll out longer forecasts. In a probabilistic setting using diffusion models it becomes non-trivial to implement multi-step losses and to obtain time consistency between successive roll-outs of ensemble members. We propose a continuous forecasting model (similar to [1]) that takes lead time as input and forecasts the future weather state in a single step using a diffusion model. This enables both iterative and direct forecasting within a single model and stabilizes the outputs of iterative forecasting methods. Moreover, we present a novel ensemble forecasting method based on diffusion models that tracks individual ensemble members over time, without resorting to iterative predictions. To generate ensemble forecasts, the model uses a deterministic ODE-solver to solve the probability flow ODE starting in different noise samples. This limits the randomness in the sampling to the random noise. Thus, by using a single noise sample for all timesteps, as illustrated in the figure below, a continuous trajectory can be generated for each member.
This enables parallel sampling of individual ensemble members and eliminates the dependency on multi-step loss functions, significantly enhancing the efficiency of ensemble forecasting. In order for fast turnaround times, our experiments were done using a simple U-Net applied to the geopotential height at 500 hPa (Z500) from the WeatherBench dataset (5.625 deg). First results are encouraging, showing time-consistent ensembles (as seen here), with good spread/skill and error statistics comparable to other MLWP (as seen in the graph above). Finally, it’s important to note that the potential of this method isn’t limited by the scores shown here.

Link to animated forecasts: https://martinandrae.github.io/Continuous-Ensemble-Forecasting/

———————————–

Efficiently fine-tuning 37-level GraphCast for the Canadian GDPS

Christopher Subich
christopher.subich@ec.gc.ca

GraphCast [Lam 2023] provides state-of-the-art forecast accuracy when applied to initial conditions given by the ERA5 dataset or the closely-related HRES operational system. However, as trained by Deepmind GraphCast produces significantly better forecasts when initialized with ERA5-like data than when initialized with data from other operational analysis systems.
At Environment & Climate Change Canada, we have found that the 37-level, quarter-degree version of GraphCast (GraphCast-37) provides significantly degraded forecasts when initialized with analysis fields from the Global Deterministic Prediction System (GDPS), likely attributable to residual but systematic differences in the stratosphere compared with ERA5/HRES. The 13-level version of GraphCast provides better forecasts, but its limited set of output levels makes the model less suitable for predicting atmospheric vertical structure, including inversions and associated freezing rain events. Fortunately, this problem can be fixed. Through a few numerical optimizations and an improved training schedule, GraphCast-37 can be fine-tuned on just 29 months of GDPS analysis data (July 2019 – December 2021) for one GPU-month, producing an optimized version of the model that recovers state-of-the-art performance on an out-of-sample test set (calendar year 2022).
This talk will discuss:
• The computational and numerical challenges of training the 37-level GraphCast model in a memory limited environment,
• How to derive a more robust, physically-motivated weighting of errors versus vertical level, and
• The development of an optimized training schedule, matching Deepmind’s training on forecasts up to 72h in length with fewer intermediate steps.

———————————–

Design and generation of ensemble weather forecasts using Spherical Fourier Neural Operators

Ankur Mahesh*1,2, William Collins*1,2, Boris Bonev3, Noah Brenowitz3, Yair Cohen3, Peter
Harrington4, Karthik Kashinath3, Thorsten Kurth3, Joshua North1, Travis O’Brien5, Michael
Pritchard3,6, David Pruitt3, Mark Risser1, Shashank Subramanian4, Jared Willard4

1. Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory (LBNL), Berkeley, California, USA
2. Department of Earth and Planetary Science, University of California, Berkeley, USA
3. NVIDIA Corporation, Santa Clara, California, USA
4. National Energy Research Scientific Computing Center (NERSC), LBNL, Berkeley, California, USA
5. Department of Earth and Atmospheric Sciences, Indiana University, Bloomington, Indiana, USA
6. Department of Earth System Science, University of California, Irvine, USA
* These authors contributed equally to this work.

Studying low-likelihood high-impact extreme weather events in a warming world is a significant and challenging task for current ensemble forecasting systems. While these systems presently use up to 100 members, larger ensembles could enrich the sampling of internal variability. They may capture the long tails associated with climate hazards better than traditional ensemble sizes. Due to computational constraints, it is simply impossible to generate huge ensembles, of say 10,000 members, using traditional numerical simulations of climate models at high resolution. We replace traditional numerical simulations with machine learning (ML) models to generate hindcasts of huge ensembles. We construct an ensemble weather forecasting system based on Spherical Fourier Neural Operators, and we discuss important design decisions for constructing such an ensemble. The ensemble represents model uncertainty through perturbed-parameter techniques, and it represents initial condition uncertainty through bred vectors, which sample the fastest growing modes of the forecast. Using the IFS operational weather forecasting system as a baseline, we present a rigorous diagnostics pipeline. We evaluate the ML ensemble’s overall performance as a probabilistic forecasting system and show that its performance is comparable to (though approximately 18 hours behind) that of IFS. Based on the spectra of the individual ensemble members and the ensemble mean, the ML ensemble trajectories have realistic error growth. We also evaluate the extreme weather forecasts in particular, by assessing the ensemble’s reliability and discrimination. These diagnostics test the physical fidelity of the ML emulator. They ensure that the ensemble can reliably simulate the overall time evolution of the atmosphere, including low likelihood high-impact extremes. We generate a huge ensemble initialized each day in summer 2023, and we characterize the statistics of extremes.

———————————–

Machine Learning Weather Prediction Model Development for Global Forecast

Jun Wang*1, Sadegh Tabas2, Linlin Cui3, Wei Li4, Jun Du1, Bing Fu1, Jacob Carley1
1NOAA/NWS/NCEP/EMC
2Axiom at NOAA/NWS/NCEP/EMC
3Lynker at NOAA/NWS/NCEP/EMC
4SAIC at NOAA/NWS/NCEP/EMC

Machine learning-based weather prediction (MLWP) models have been under rapid development in the past couple of years. These models leverage autoregressive neural network architectures, are trained using reanalysis data generated by operational centers, and demonstrate proficient forecasting abilities. Once trained, these models take significantly fewer computational resources to produce forecasts than traditional numerical weather prediction (NWP) models, while maintaining or surpassing conventional NWP forecast performance.
This presentation will provide an overview of the development of MLWP models for the global deterministic and ensemble forecast systems at NOAA/NCEP Environment Modeling Center (EMC). Several state-of-the-art MLWP models, including GraphCast and FourCastNet, have been installed and fine tuned. Experimental real time global forecasts with products comparable to operational GFSv16 have been made publicly available on the NOAA cloud. Evaluation has been set up to compare the graphcastGFS with operational GFS and IFS and the results show that the GraphcastGFS significantly outperforms operational GFS and is very close to IFS. Bias correction of operational products using machine learning (ML) models has been actively worked on. A hybrid multi-model ensemble using GraphCast, ForeCastNet and GEFS are under development.
A U-NET based bias correction ML model was developed and applied to correct operational GFS products such as 2m temperature and Convective Available Potential Energy (CAPE) with promising results. Future plans to improve the forecast performance for possible MLWP operational implementation will also be discussed.
Keyword: MLWP, global forecast, ML bias correction

———————————–

AI-Var – A data-driven data assimilation system

Jan D Keller, Roland Potthast, Stefanie Hollborn, Thomas Deppisch

In the evolving domain of Numerical Weather Prediction (NWP), the integration of artificial intelligence (AI) presents transformative opportunities that promise to enhance forecast accuracy and computational efficiency. Traditional data assimilation methods, such as variational techniques and ensemble Kalman filters, have been foundational in NWP. Over the past 60 years, these methods have enabled the continuous improvement of weather predictions.
With the advent of AI, a paradigm shift begins in NWP as new techniques allow for the production of weather forecasts at significantly reduced computational costs making classical data assimilation (DA) a bottleneck in a data-driven NWP value chain. In this regard, we introduce a novel AI-based variational DA approach (AI-Var) designed to replace classical methods by leveraging deep learning techniques. Unlike hybrid approaches that combine traditional methods with AI enhancements, AIVar integrates the DA process directly into a neural network. This integration allows the system to learn the complex functional mappings required for solving the underlying optimization problem, thus, bypassing the computational burdens of traditional DA methods. Further, by minimizing the variational cost function within a neural network, AI-Var is able to learn the DA functional without relying on pre-existing analysis datasets.
A first proof-of-concept implementation (Keller and Potthast, 2024) has demonstrated AI-Var’s efficiency in assimilating observations and generating accurate initial conditions through a series of idealized and real-world test cases. One major advantage of our approach over classical methods is the ability of the trained DA system to circumvent the necessity of calculating model equivalents and observation biases as these are learned during training.
The AI-Var system is currently integrated into the Anemoi framework for a seamless integration in the development of AI-based weather forecasting models. Further, AI-Var is extended with AI-based variational bias correction and error covariance estimation modules that will enhance the quality of the analysis estimates while needing no additional computational costs during the inference step.
The AI-Var system is currently being integrated into the Anemoi framework to enable seamless integration with the development of AI-based weather forecast models. In addition, AI-Var will be extended with AI-based modules for bias correction and error covariance estimation, which will improve the quality of the analysis estimates without incurring additional computational costs. These advancements pave the way for fully data-driven NWP systems, offering substantial improvements in computational efficiency, flexibility, and forecast accuracy.

———————————–

Skillful multi-day forecasts directly from observations

Christian Lessig, Peter Lean, Mihai Alexe, Eulalie Boucher, Simon Lang, Matthew Chantry, Peter Dueben,
Ewan Pinnington, Marcin Chrust, Sean Healy, Chris Burrows, Tony McNally,
(European Centre for Medium-Range Weather Forecasts)

The global observing system provides a rich set of information about the atmosphere and the overall Earth system with state-of-the-art satellites collecting terabytes of high-quality, scientifically validated data per day and with the entire historical record consisting of many petabytes. In this work, we propose to directly leverage this data for medium-range forecasting and build an end-to-end forecasting system that is trained and runs entirely on level 1 observations. For this, we do not rely on any part of the conventional data assimilation or forecasting pipeline and learn the relationship between different observation types, instruments, and channels entirely from the data. This enables us to make use of many more observations than a conventional data assimilation system, e.g. use all-sky data for all channels and directly include visible channels which are not assimilated operationally at the moment.
In our talk, we will introduce the general concept and principles of our forecasting system directly from observations and present results on skillful multi-day forecasts. As input, our system uses a wide range of observations from polar orbiting and geostationary satellites within a 12h window. From this, a latent representation of the global atmospheric state is constructed in the network. Roll-out, i.e. time stepping is performed in this latent space. The final latent representation at the forecast lead time is then decoded to gridded output of surface and upper atmosphere variables. Our model is pre-trained by performing short-term predictions in observation space, i.e. from brightness temperatures to brightness temperatures, with gridded output being obtained through virtual SYNOP stations and radiosondes.

———————————–

AICON – A data-driven weather forecasting model based on ICON

Tobias Göcke, Marek Jacob, Florian Prill, Hendrik Reich, Maria Reinhard, Sabrina Wahl, Jan D Keller, Roland Potthast

DWD operates a global to regional model chain to seamlessly provide weather forecasts, warnings and services. Its tasks include modeling, data assimilation, and verification of numerical weather prediction (NWP), as well as operational use and continuous quality monitoring. Aiming to exploit current and future machine learning techniques, DWD initiated a project with the goal to complement its classical NWP model chain with an AI-based model. Here, we present a prototype of the Artificial intelligence ICON (AICON), a data-driven weather forecasting model based on the ICOsahedral Non-hydrostatic (ICON) NWP model established at DWD. AICON is envisioned to run in a global mode with regional refinements at higher resolutions, mirroring DWD’s current NWP model chain setup. AICON utilizes a graph neural network approach with an encoder – processor – decoder architecture, following the implementations of Google’s Graph Cast and ECMWF’s AIFS. The hidden processor mesh is based on DWD’s global ICON triangular grid, albeit at a lower refinement level.
AICON is trained on DWD’s ICON-DREAM (Dual resolution Reanalysis for Emulators, Applications and Monitoring) reanalysis data. The global to regional reanalysis system uses a setup corresponding to the operational NWP chain at DWD. It operates on a global domain with 13 km horizontal resolution and a European nest at 6.3 km. Data from ICON-DREAM is currently in production and will cover a time-period from 2010 until now in a preliminary version. The prototype AICON currently has a horizontal resolution of about 50 km and provides 3-hour forecasts with autoregressive training up to 24 hours for surface pressure and four upper air variables on model levels. For future developments, the AICON setup is currently integrated into the Anemoi framework to better support pan-European collaboration on AI-based weather forecasting. 

———————————–

Overview on progress toward AI-driven Numerical Weather Prediction at NOAA

Daryl Kleist1, Adam Clark2, Sergey Frolov3, Kevin Garrett4, Isidora Jankov5, Corey Potvin2, Jebb Stewart5, John Ten Hoeve6, Monica Youngman4
1 NOAA/NWS/NCEP/EMC
2 NOAA/OAR/NSSL
3 NOAA/OAR/PSL
4 NOAA/NWS/STI
5 NOAA/OAR/GSL
6 NOAA/OAR/WPO

The past several years have seen significant developments in the advancement of data-driven models for numerical weather prediction, particularly from efforts trained on reanalysis data such as that from ERA5. Several initiatives have begun at the National Oceanic and Atmospheric Administration (NOAA) in an effort to chart a path forward for the agency to fully embrace the opportunity that such data-driven models bring to better enable delivery of services to meet the agency mission. In November 2023, NOAA convened a workshop to focus on the integration of emerging data-driven models into the NOAA research to operation pipeline for numerical weather prediction (Frolov et al. 2024). From this workshop, a team from various NOAA offices have come together to coordinate and accelerate activities around identified grand challenges. This presentation will provide an overview of the workshop, the main findings, and specific activities that have since begun. Some details on the progress toward building shared infrastructure, performing additional training using NOAA internal data, and preliminary results for emulators of systems like the Global Forecast system will be presented. A brief overview of plans toward emulators for kilometer scale models will be provided.

———————————–

Data-driven weather forecasting at kilometre scale : towards new Arome-AI systems at Météo-France

Laure Raynaud and contributors, Météo-France

Following the success of first global data-driven weather forecasting systems, and more recently of limited-area adaptations such as Neural LAM, several efforts are ongoing at Météo-France in order to develop new deep learning applications for kilometre to hectometric scale weather forecasting.
Our current roadmap includes several topics under investigation, among with :
– the full emulation of the operational Arome-France model based on GNNs ;
– stochastic downscaling with diffusion models ;
– super-sampling ensemble prediction systems with generative approaches (DE_371 project) ;
– assessment of physical consistency and explainability of DL models predictions, through the development of specific metrics and application of XAI tools.
While still being at a preliminary stage of development, these different works already show some promising performances and open exciting perspectives, as well as challenges. An overview of our main findings and short-term plans will be presented for the different applications.
In parallel to these research progress, another major recent achievement has been the development and publication of the Py4Cast framework (https://github.com/meteofrance/py4cast), which aims at training a variety of DL architectures (CNNs, GNNs, Vision Transformers, …) for several purposes and using various datasets. Initially meant to be the backbone for research work and future operational DL-based applications at Météo-France, this framework could also be made interoperable with other similar initiatives.

———————————–

AI/ML Initiatives at the Canadian Meteorological Centre

Stephane Beauregard, manager Design and Integration of Numerical Prediction Systems and AI, CCMEP/MSC/ECCC

This presentation or poster will give an overview of the AI/ML initiatives underway at the Canadian Centre for Meteorological and Environmental Prediction of the Meteorological Services of Canada, as well as within the Atmospheric Science and Technology Directorate of the Science and Technology Branch, both of which are part of Environment and Climate Change Canada.
Our spectrum of activities includes:

  • AI Downscaling: Enhancing resolution for precise local forecasts.
  • AI Nowcasting: Utilizing AI to improve immediate weather predictions.
  • AI Emulators Evaluation: Assessing available AI emulators in real-time forecasting environments.
  • Fine-tuning of AI Emulator: Using the Canadian global model data to adapt an available AI emulator to improve Canadian produced forecasts.
  • Hybrid Modelling: Merging traditional methods with AI to improve forecast reliability.
  • Technological Enablers: Projects that pave the way for AI integration in meteorology.
  • Strategic AI Roadmap: Strategic plans to incorporate AI emulators into the Canadian Numerical Weather and Environmental Prediction System, promising a significant leap in forecasting capabilities.

These initiatives represent our commitment to harnessing AI/ML for enhanced meteorological and environmental forecasts, ensuring Canada remains at the forefront of environmental prediction technology.

———————————–

Probabilistic Emulation of a Global Climate Model with Spherical DYffusion

Salva Rühling Cachay1, Brian Henn2, Oliver Watt-Meyer2, Christopher S. Bretherton2, Rose Yu1

1 UC San Diego,
2 Allen Institute for AI

Data-driven deep learning models are on the verge of transforming global weather forecasting. It is an open question if this success can extend to climate modeling, where long inference rollouts and data complexity pose significant challenges. Building on the pioneering setup from ACE [1], which emulated the operational US primary global forecast model, FV3GFS, using a deterministic SFNO [2] architecture, we present the first conditional generative model that produces accurate and physically consistent global climate ensemble simulations. Our model runs at 6-hourly time steps with low computational overhead compared to single-step deterministic baselines. It runs stably for 10-year-long simulations, outperforming relevant baselines and nearly achieving a gold standard for successful climate model emulation. We show that the success of our method is powered by our careful integration of the dynamics-informed diffusion model framework [3] with the SFNO architecture. We detail the key design choices that enable this significant step towards efficient, data-driven climate simulations that can help us better understand the Earth and adapt to a changing climate.

Overview of key results:
Our study begins by demonstrating a divergence between the medium-range weather forecasting skill of ML models (measured as the average RMSE on 5-day forecasts) and their performance on longer climate time scales (measured as the RMSE of the 10-year time-mean climatology), indicating that dedicated approaches may be necessary for climate model emulation. In Figure 2, we demonstrate that our proposed framework, when operating in ensemble mode, achieves a performance within 29% of the reference noise floor for emulating the time-mean 10-year climatology. Using a single sample, it reaches 50% of the noise floor. This represents a 2- to 4-fold improvement over the next best method, ACE.

References.
[1] Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K Clark, James Duncan, Brian Henn, Matthew Peters, Noah D Brenowitz, Karthik Kashinath, Mike Pritchard, Boris Bonev, and Christopher Bretherton. “ACE: A fast, skillful learned global atmospheric model for climate prediction”. NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023.
[2] Salva Rühling Cachay, Bo Zhao, Hailey Joren, and Rose Yu. “DYffusion: A dynamics-informed diffusion model for spatiotemporal forecasting”. Advances in Neural Information Processing Systems (NeurIPS), 2023.
[3] Boris Bonev, Thorsten Kurth, Christian Hundt, Jaideep Pathak, Maximilian Baust, Karthik Kashinath, and Anima Anandkumar. “Spherical fourier neural operators: Learning stable dynamics on the sphere”. International Conference on Machine Learning (ICML), 2023.

———————————–

Coupling the AI2 Climate Emulator to a slab ocean and learning the sensitivity to changes in CO2

Spencer K. Clark1,2, Oliver Watt-Meyer1, Anna Kwa1, Jeremy McGibbon1, Brian Henn1, Gideon
Dresdner1, W. Andre Perkins1, Lucas M. Harris2, and Christopher S. Bretherton1

1 Allen Institute for Arti cial Intelligence
2 NOAA / Geophysical Fluid Dynamics Laboratory

The AI2 Climate Emulator (ACE) is a machine-learning-based emulator of a physics-based global atmosphere model. Based on a select set of prescribed forcing variables and a physically inspired set of prognostic variables, it learns to predict the state of the atmosphere six hours later. Predictions of the prognostic variables are fed back in auto-regressively to obtain a simulation of weather and climate. Key for modeling climate, ACE maintains realistic weather variability throughout decades-long rollouts with prescribed sea surface temperatures and sea ice, and is over 60x faster than the traditional model it emulates. However, while running with prescribed ocean conditions can be appropriate when simulating present-day or historical climate, it is not possible to simulate climate with increased carbon dioxide without a model of how these ocean conditions will respond. In this presentation we will describe work that extends ACE in this direction, coupling it to a slab ocean model and training it to learn the sensitivity of climate to changes in CO2.
In this work our reference physics-based model is GFDL’s SHiELD model coupled to a slab ocean (SHiELD-SOM). The slab ocean model predicts the temperature evolution of the ocean mixed layer based on energy exchange with the atmosphere, and prescribed climatologies of the heating due to ocean currents (the \Q- ux”) and the mixed layer depth. For simplicity, sea ice remains prescribed from a climatology, though we acknowledge this will limit the simulated climate sensitivity. We implement an identically formulated differentiable slab ocean model in ACE, making sea surface temperature a prognostic variable, and adding the Q- ux and mixed layer depth as forcings. We refer to this configuration as \ACE-SOM.” Including CO2 concentration as an additional forcing, we train ACE-SOM on a collection of output from 100 km resolution equilibrium-climate SHiELD-SOM simulations with different CO2 concentrations. We can show that ACE-SOM trained in this manner produces stable and accurate rollouts in multiple climates. See, for example, Figure 1, which compares the difference in time-mean surface temperature and precipitation rate between climates with 4x and 1x present-day CO2 in 10-year periods with SHiELD-SOM (our target model) and ACE-SOM. In addition to results from in-sample equilibrium-climate emulation, we will also discuss results from applying ACE-SOM in out-of-sample settings such as emulating the evolution of climate with gradually increasing CO2, or the response of climate to an abrupt CO2 change, and what these reveal about the strengths and weaknesses of our current approach.

———————————–

A Hybrid Model for El Niño Southern Oscillation Dynamics

Jakob Schlör1, Matthew Newman3, Jannik Thuemmel1, Antonietta Capotondi2,3, BedarthaGoswami1

1 Machine Learning in Climate Science, University of Tübingen, Germany
2 Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, CO, USA
3 NOAA/Physical Sciences Laboratory, Boulder, CO, USA

Seasonal forecasting worldwide rests largely on the predictability of tropical sea surface temperature anomalies (SSTA). As the dominant mode of SSTA variability, El Niño Southern Oscillation (ENSO) results in different global patterns of extreme climate conditions, requiring early and accurate ENSO forecasts. While deep-learning models have demonstrated skillful ENSO forecasts up to 1 year in advance, they are predominantly trained on climate model simulations that provide thousands of years of training data at the expense of introducing climate model biases. Simpler Linear Inverse Models (LIMs) trained on the much shorter observational record also make skillful ENSO predictions but do not capture predictable nonlinear processes. This motivates a hybrid approach, combining the LIM’s modest data needs with a recurrent neural network that captures nonlinear dynamics. For O(100 yr) datasets, our resulting Hybrid model is more skillful than the LIM while also exceeding the skill of a full deep-learning model when trained on O(1500 yr) of CESM2 simulation. Additionally, the LIM allows us to identify the most predictable ENSO events in advance which are better predicted by the Hybrid model. We obtain improved forecast accuracy, particularly in the western tropical Pacific within the 9 to 18-month range by capturing the subsequent asymmetric (warm versus cold phases) evolution of ENSO

———————————–

Probabilistic Representations of Subseasonal to Annual Ocean Dynamics

Jannik Thuemmel, Jakob Schlör, Florian Ebmeier, Bedartha Goswami, (University of Tübingen)

Data-driven forecasts on subseasonal-to-annual scales face a two-fold challenge: the scope of observational records is comparatively short and predictability strongly depends on the coupling of earth systems evolving on very different timescales. Here, we take a first step towards general subseasonal-to-annual predictions by modelling the El Niño Southern Oscillation (ENSO), one of the principal sources of predictability on seasonal scales, which occurs irregularly every 2 to 8 years.
Characterised by anomalously warm sea surface temperatures in the tropical Pacific, an El Niño event arises from a combination of slow heat transfer within the Pacific and between ocean basins, as well as fast atmospheric dynamics, such as westerly wind bursts and convection. To supplement the short observational record for ENSO, we use data from preindustrial control runs of the CESM2 climate model for pretraining and validation. Inspired by the Perceiver architecture, we design a deep learning model that can integrate information from modalities on different timescales and grid representations by utilising the flexibility of cross-attention in an Encoder-Processor-Decoder scheme. In contrast to concurrent work, we do not enforce a spatiotemporal inductive bias in our Processor component, leaving the learning algorithm free to determine the appropriate correlation structure and reducing the computational cost associated with our global, multiyear context.
The model is trained as a Masked Autoencoder using the empirical Continuous Ranked Probability Score on an ensemble of small tail networks. Rather than generating forecasts autoregressively, we predict a distribution of possible states of the ocean at a target time and location, which allows the model to better handle fluctuations in predictability across time. To improve generalisation to the forecasting downstream task, we adopt a Mixture of Denoisers objective, resulting in substantially better calibration and performance than a standard masking strategy. We find that our representation learning approach exhibits zero-shot performance that is competitive with task-specific models on subseasonal-to-annual forecasts of ENSO and, to the best of our knowledge, is the first to predict well calibrated forecast uncertainties. Linear probing reveals that the model’s latent state encodes information that is predictive for ENSO over several years beyond the trained window. As a next step we plan to fine tune the model on reanalysis data and utilise the flexibility afforded by the Perceiver architecture to directly investigate the effects of different atmospheric and oceanic components on the predictability of ENSO.

———————————–

Prithvi WxC: A Multi-Regional Foundation Model for Weather and Climate

Johannes Jakubik1, Thomas Brunschwiler1, Sujit Roy2, Johannes Schmude3, Manil Maskey4, Rahul Ramachandran4
1 IBM Research Europe
2 NASA IMPACT
3 IBM Research Yorktown
4 NASA Marshall Space Flight Center, USA

Deep learning is increasingly disrupting weather applications by, for example, producing highly accurate forecasts at reduced computational costs compared to numerical weather prediction [1, 2]. The underlying approach of deep learning models for weather applications is vastly different from previous physics-based approaches. Instead of modeling the underlying physics directly, deep learning represents it through probability distributions as a result from the model training. This approach has been borrowed from the domains of natural language processing and computer vision, and is surprisingly effective in approximating physical systems like weather applications. However, most existing deep learning models for weather applications are represented by task-specific forecast emulators with very limited capabilities on applications different than their pretraining task. To unlock a range of different weather applications with a single model more efficiently, we observe the emergence of task-agnostic foundation models for weather [3]. Accordingly, the compute intensive pre-training can be amortized across multiple applications.

Within this abstract, we introduce Prithvi WxC, a large-scale foundation model for weather applications trained on the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) data set with 158 atmospheric variables. MERRA2 is a widely-used reanalysis dataset from NASA providing global atmospheric data, including temperature, humidity, and wind. Spanning from 1980 to the present day with spatial resolution of 0.625 degrees and temporal resolution of 3 hours [4], it is valuable for climate research and atmospheric studies.
Prithvi WxC is a transformer-based deep learning architecture which combines ideas from several recent transformer architectures in order to effectively process regional and global dependencies of the input data and to efficiently process longer sequence lengths of tokens. This allows the model to, for example, infuse additional tokens from off-grid measurements to the model during finetuning. We additionally experiment with different loss functions, for example, by removing task-specific temporal variances from loss functions of forecast emulators and replacing them with task-agnostic climatology variances. Further, we are experimenting with scaling Prithvi WxC to larger parameter counts in order to better understand whether the capabilities of the model change with an increasing number of weights. For that we shard the data via the fully-sharded data parallel (FSDP) framework and train the model across dozens of GPUs for multiple days on NASA Advanced Supercomputing (NAS) clusters. By experimenting with how far we can effectively scale the model, we naturally address the inherent tradeoff between a higher number of model parameters and larger batch sizes during pretraining.
The validation of Prithvi WxC extends over downstream tasks such as gravity wave flux parameterization derived using ERA5 U, V, T, and P variables, downscaling of weather and climate datasets, and the insertion of off-grid observational data. Additionally, we validate to which degree the model learns the underlying physics by adding several perturbations to the input data. We then compare whether the perturbations propagate spatially and temporally as expected from physical equations which is inspired by [5]. As stated, our current validation setting includes several downstream applications. However, we are eager to obtain additional validations and get more feedback through additional downstream tasks by the community. We therefore plan to make our model with its pretrained weights available via HuggingFace and make the code to run the model available on Github.

References
[1] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Accurate medium-range global weather
forecasting with 3d neural networks,” Nature, vol. 619, no. 7970, pp. 533–538, 2023.
[2] R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri,
T. Ewalds, Z. Eaton-Rosen, W. Hu et al., “Learning skillful medium-range global weather forecasting,”
Science, vol. 382, no. 6677, pp. 1416–1421, 2023.
[3] C. Lessig, I. Luise, B. Gong, M. Langguth, S. Stadler, and M. Schultz, “AtmoRep: A stochastic model
of atmosphere dynamics using large scale representation learning,” arXiv preprint arXiv:2308.13280,
2023.
[4] R. Gelaro, W. McCarty, M. J. Su´arez, R. Todling, A. Molod, L. Takacs, C. A. Randles, A. Darmenov,
M. G. Bosilovich, R. Reichle et al., “The modern-era retrospective analysis for research and
applications, version 2 (MERRA-2),” Journal of Climate, vol. 30, no. 14, pp. 5419–5454, 2017.
[5] G. J. Hakim and S. Masanam, “Dynamical tests of a deep-learning weather prediction model,” Artificial
Intelligence for the Earth Systems, 2024.

———————————–

Using AI-based numerical weather prediction models for climate applications

Nikolay Koldunov1, Thomas Rackow2, Christian Lessig2

1 Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research (AWI)
2 European Centre for Medium-Range Weather Forecasts (ECMWF)

State-of-the-art AI-based numerical weather prediction models (AI-NWP) produce forecasts that are comparable or even outperform conventional forecasting systems while being orders of magnitude faster. Since climate projections are obtained by simulating the long-term evolution of weather states with appropriate forcing, the use of AI-NWP models for climate modeling is a promising avenue that has received little attention so far. We present two applications of AI-NWP models for climate modeling: (i) downscaling and (ii) weather forecasting initialised from climate projection data. Both applications use ERA5-pre-trained AI-NWP models without fine-tuning for the tasks or for the input data. For downscaling, we use low-resolution CMIP6 simulation data as initial condition and obtain high-resolution, bias corrected output fields by performing short-term forecasting with the existing model; see Fig. 1 for an example. Our results show a remarkable robustness of AI-NWP to unseen states from historical and climate simulations of different resolutions. For AI-based weather forecasting in future climates, we obtain almost unchanged RMSE scores in a 2o warmer climate although a more detailed analysis shows a cold bias in the forecasts. We believe that differences between climate model results and AI-NWP forecasts have the potential to provide insights into the physics and deficiencies of both climate models (e.g. for short time scales) and AI-NWP models (on long time scales). Based on our results, we discuss how existing AI-NWP models can be extended for climate projections, e.g. to sample extreme weather events, and hence help with adaptation to climate change.

———————————–

Learning causal representations of climate model data

Sebastian Hickman1, 2, Julien Boussard1, 3, Ilija Trajkovic6, Charlotte Lange1, Julia Kaltenborn1, 4, Yaniv Gurwicz5, Peer Nowack6, and David Rolnick1, 4

1 Mila – Quebec AI Institute,
2 University of Cambridge,
3 Columbia University,
4 McGill University,
5 Intel Labs,
6 Karlsruhe Institute of Technology

Making projections of possible future climates with models is essential to improve our understanding of the causes and implications of anthropogenic climate change. Earth system models (ESMs), which couple together complex numerical models describing different components of the Earth system, are the dominant tool for making these projections. While ESMs are currently the most complete description of the physical processes of the Earth system, these models are computationally expensive. Simpler models (emulators) are thus useful to explore the large space of possible future climate scenarios and to generate large ensembles. One class of emulators are simple climate models (SCMs), which model the Earth system with simplified physics, such as an energy balance model 1, and typically model climatic variables on a global scale, e.g. outputting global mean surface temperature. A second class of emulators are statistical models, which learn relationships directly from climate model data, often without the physical grounding of SCMs2, instead relying on correlations in data. In this preliminary work, we seek to combine the benefits of the physical grounding of SCMs with those of purely statistical emulators, using tools from causal representation learning. The end goal, a data-driven causal climate emulator, would allow climate scientists to explore the effect of various unseen interventions on the climate, including the effect of changing emissions, and facilitate causal attribution of climate phenomena. The goal of causal representation (3) learning is to simultaneously learn low-dimensional latent representations from high-dimensional data and the causal graph between these latent representations. In the context of climate model data, we aim to infer latent variables representing regions with shared climate variability from fine-grid climate measurements, and causal teleconnections between these regions, representing climate dynamics. We build on previous work by Boussard et al.4, which illustrated how causal representation learning methods, specifically Causal Discovery with Single-parent Decoding5 (CDSD), might be used for this task. CDSD provides a continuous optimization method to learn a distribution over latent variables such that, at each timestep, every grid-point observation is driven by a single latent variable, and a causal graph between these latents is also learned. The single-parent assumption ensures identifiability of the causal graph. The resulting probabilistic generative model can then be used to generate next timestep predictions.

We illustrate that on deseasonalised single-variable fields of monthly pre-industrial climate model data, CDSD learns physically-reasonable latent variables, although learning the causal graph between latent variables remains a challenge. Furthermore, by autoregressively rolling out the model we can generate possible future climate trajectories. As CDSD learns a distribution over latents and corresponding grid-points, we can sample multiple predictions at each time step to effectively produce an ensemble of predictions. Finally, since causal modelling on real world data suffers from the lack of a ground truth for the causal graph, we evaluate CDSD on a synthetic dataset designed to simulate teleconnections. In order to emulate the climate system usefully for policy-makers and scientists, we explore approaches for including the effect of forcings such as greenhouse gases into the causal model.

1 Smith, C. et al., 2018. Geoscientific Model Development.
2 Watson-Parris, D., 2022. Journal of Advances in Modeling Earth
Systems.
3 Scholkopf, B. et al., 2021. Proceedings of the IEEE.
4 Boussard et al., 2023. Tackling Climate Change with Machine Learning Workshop, NeurIPS.
5 Brouillard  et al., 2024. In preparation.

———————————–

Evaluation of data-driven models on S2S timescales

Catherine de Burgh-Day1, Harrison Cook1, Robin Wedd1, Debbie Hudson1, Li Shi1, Hongyan Zhu1, Chen Li1, Griffith Young1
1 The Bureau of Meteorology, Australia

The rate of development in the field of data-driven atmosphere and coupled models is accelerating. Initially these models focused almost exclusively on the atmosphere, on weather timescales, and were presented with only high-level evaluation metrics and limited case studies. The utility and skill of these models for longer timescales remained unexplored, in spite of there being clear benefits to the use of data-driven models for these timescales. For example, the generation of the required hindcast sets for sub-seasonal to seasonal (S2S) forecast bias correction and calibration could be significantly expedited with the use of data-driven models. Here we present an early evaluation of two data-driven models, GraphCast and FourCastNetV2, against cutting-edge global atmosphere and global coupled models, for prediction up to S2S timescales. We assess regionally averaged skill scores, skill for selected S2S climate drivers, and explore the model’s performance for specific tropical cyclone cases around Australia. We show that while GraphCast is generally considered the more skillful of the two models in the literature for weather forecasts, FourCastNetV2 shows unexpectedly high skill for some drivers and cases examined. We will also discuss some of the challenges posed by prediction beyond the medium range with data-driven models, such as the lack of available evaluation data once model training is completed.

———————————–

LUCIE – A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles

Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay, Romit Maulik

We present LUCIE, a 1000-member ensemble data-driven atmospheric emulator that remains stable during autoregressive inference for thousands of years without a drifting climatology. LUCIE has been trained on 9.5 years of coarse-resolution ERA5 data with 4 prognostic variables on a single A100 GPU for 2.4 h. Owing to the cheap computational cost of inference, 1000 model ensembles are executed for 5 years to compute an uncertainty-quantified climatology for the prognostic variables that closely match the climatology obtained from ERA5. Unlike all the other state-of-the-art AI weather models, LUCIE is neither unstable nor does it produce hallucinations that result in unphysical drift of the emulated climate.
LUCIE is stable and able to run for 1000 years with little climate drift. The climatology of LUCIE matches ERA5 well, with LUCIE being able to reproduce the general circulation of the atmosphere (Fig. 1). LUCIE also has a realistic representation of tropical convective processes (Fig. 2) including an MJO-like signal and equatorial Rossby (ER) waves, however, LUCIE does tend to underdo precipitation variability driven by Kelvin waves. The precipitation global bias is ~1 mm/day (not shown) and comparable to higher-resolution, modern climate models.

———————————–

ClimateSet: A Large-Scale Consistent Climate Model Dataset for Machine Learning

Julia Kaltenborn1,2,*, Charlotte Lange1, Venkatesh Ramesh1,4, Philippe Brouillard1,4, Yaniv Gurwicz3, Chandni Nagda7, Jakob Runge6, Peer Nowack4, and David Rolnick1,2

1 Mila – Quebec AI Institute,
2 McGill University,
3 Intel Labs,
4 University of Montreal,
5 Karlsruhe Institute of Technology,
6 Dresden University of Technology, 
7 University of Illinois at Urbana-Champaign,
*julia.kaltenborn[at]mail.mcgill.ca

Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists’ efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, policy makers and the climate science community do not solely rely on projections from a single climate model, but a set of climate models that are part of the Coupled Model Intercomparison Projects (CMIP, latest completed phase 6). To ensure that ML’s current progress in tackling climate science tasks can impact policy making, we need ML models to capture the inter-model variability of climate models. Both the climate science and ML communities have suggested that to address that, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a “super-emulator” has the potential to quickly emulate climate
change scenarios across multiple climate models, and could thus complement existing scenarios already provided to policymakers. Currently, we are working on an open-source package that provides access to ClimateSet’s pipelines for retrieving and preprocessing climate model data from different Model Intercomparison Projects. We envision that both the core dataset and the package will be particularly useful for researchers who need large training datasets, e.g. for climate foundation models. We hope ClimateSet enables the ML-community to address a much wider range of climate-related tasks. Tackling those tasks on the scale of CMIP6 will help the ML-community to contribute meaningfully to climate policy making.

———————————–

Coupled atmosphere-ocean simulations with a parsimonious deep learning model

Dale Durran1,2, Nathaniel Cresswell-Clay1, Zac Espinosa1, Bowen Liu1, Zihui Liu1

1 Atmospheric Sciences, University of Washington, Seattle, WA, USA

2 NVIDIA

A deep-learning model using convolutional neural nets is shown to produce physically realistic simulations of atmospheric and ocean circulations for the current climate state over 100-year autogressive rollouts. The model employs 10 prognostic variables above each cell on a 110-km resolution HEALPix mesh. The atmosphere and ocean are coupled asynchronously, with 6-hour and 2-day time-resolution in the atmosphere and ocean, respectively. The model is trained on both ERA5 reanalysis data and observations from the International Satellite Cloud Climatology Project (ISCCP).
The model’s representation of large-scale low-frequency atmospheric variability, the structure of mid-latitude cyclones, the climatology of western-Pacific tropical cyclones, and ENSO behavior will be compared against historical ERA5 data and the skill of more traditional numerical models of the earth system.
It is sometimes erroneously assumed that auto-regressively generated ML forecasts inevitably smooth with time. After an initial two-day adjustment, our model maintains sharp representations of the atmospheric structure indefinitely. This is demonstrated in the figure below which shows an intense winter-time mid-latitude cyclone roughly 100 years (73,000 steps) after the start of the simulation.

———————————–