On Some Limitations of Current Machine Learning Weather Prediction Models

Machine Learning (ML) is having a profound impact in the domain of Weather and Climate Prediction. A recent development in this area has been the emergence of fully data‐driven ML prediction models which routinely claim superior performance to that of traditional physics‐based models. We examine some aspects of the forecasts produced by three of the leading current ML models, Pangu‐Weather, FourCastNet and GraphCast, with a focus on their fidelity and physical consistency. The main conclusion is that these ML models are not able to properly reproduce sub‐synoptic and mesoscale weather phenomena and lack the fidelity and physical consistency of physics‐based models and this has impacts on the interpretation of their forecasts and their perceived skill. Balancing forecast skill and physical realism will be an important consideration for future ML models.


Introduction
As in other areas of applied science and engineering, Machine Learning (ML) methods are having a profound impact in NWP and Climate monitoring and prediction (e.g., Bonavita et al., 2023;Schneider et al., 2022, for recent overviews).Researchers have sought to deploy ML algorithms in specific parts of the NWP and Climate prediction workflow (e.g., Krasnopolsky, 2023, for a recent review), aiming to take advantage of the extremely low computational cost of the trained ML models and the fact that the ML algorithms can be effective at learning complex, nonlinear mappings if large and accurate training data sets are available.
In the last few years, a parallel and rapidly growing area of development has emerged which aims to apply ML methods to produce fully data-driven forecasts for NWP and Climate prediction.These efforts have been facilitated by the availability of high-quality, multi-decadal Earth system reanalyses, such as the ECMWF ERA5 reanalysis (Hersbach et al., 2020).The first notable results in this area have been achieved by Keisler (2022), whose ML model shows forecast skill scores which are competitive with NOAA Global Forecast System operational forecasts.Keisler's model was trained on a data set of ERA5 reanalysis fields at 1°lat/lon horizontal resolution and 13 pressure levels (50-1,000 hPa) every 6 hr, with the stated aim of learning the set of physical laws underlying the ECMWF IFS model.Soon after the appearance of Keisler's work various independent groups, often affiliated with large technology corporations, have presented fully data-driven ML weather forecast models and published evaluations of their performance, for example, FourCastNet (Pathak et al., 2022); Pangu-Weather (Bi et al., 2022(Bi et al., , 2023)); SwinRDM (L.Chen et al., 2023); ClimaX (Tung et al., 2023); GraphCast (Lam et al., 2022), FengWu (K.Chen et al., 2023).While these ML models show some variations in their architecture and training, they share many fundamental characteristics.With respect to the original Keisler's model, they are built using larger training data sets obtained by sampling the ERA5 reanalysis at higher horizontal/vertical/ temporal resolution (typical values are 0.25 lat/lon regular grid, 13 to 37 vertical pressure levels, 1 to 6-hr temporal sampling).This aims at increasing the realism and fidelity of the weather patterns the ML model is able to predict, though at the cost of vastly increasing their memory footprint and computational training costs (Bi et al., 2022;Lam et al., 2022).In parallel with increased memory and computational costs, the forecast performance of the ML models has improved.The most recent ML models routinely claim to outperform the ECMWF IFS system (currently considered the most accurate physics-based global NWP forecasting system) on a variety of performance metrics for deterministic prediction (Bi et al., 2022(Bi et al., , 2023;;K. Chen et al., 2023;L. Chen et al., 2023;Lam et al., 2022).Together with the low computational cost and energy consumption of the ML model during the deployment phase (i.e., O (10 4 ) faster and computationally/energy cheaper than standard NWP systems), claims that the era of traditional NWP is rapidly coming to an end in favor of a new generation of MLdriven Weather Prediction (MLWP) are becoming common (e.g., Bi et al., 2022Bi et al., , 2023;;K. Chen et al., 2023;L. Chen et al., 2023;Lam et al., 2022).We consider here the question of to what extent, and with which caveats these claims are justified, building on the initial evaluation in Ben-Bouallegue et al. ( 2023), but with a specific focus on the physical realism of the forecasts produced by MLWP models.The approach we have taken is to choose a representative sample of the current generation of MLWP model, Pangu-Weather, FourCastNet and GraphCast and analyze their forecast output from the point of view of physical consistency and fidelity.This has been done by comparing the characteristics of the ML models' outputs to those of the ERA5 reanalysis fields used in their training and the operational ECMWF IFS physics-based model which is the NWP system against which the forecast performance of MLWP models is typically validated.

Data and Methods
Pangu-Weather (Bi et al., 2022(Bi et al., , 2023) is a MLWP model developed by researchers at Huawei Cloud Computing and trained on 43 years  of ERA5 reanalyses retrieved hourly at 0.25°lat/lon horizontal resolution and 13 pressure levels, plus a small selection of near surface fields (T2m, u/v10 m, mslp).Similar choices are done in FourCastNet and GraphCast, see the provided references for more details.The architecture of Pangu-Weather is based on a variation of the Transformer model (Vaswani et al., 2017) widely adopted in large language models (GPT-3, BERT), and its adaptation to Computer Vision tasks (Vision Transformers, Dosovitskyi et al., 2021).A similar architecture is used in FourCastNet, while GraphCast uses a Graph network (Sanchez-Gonzalez et al., 2020) backbone.The Pangu-Weather model is one of the largest in terms of number of trainable parameters (∼256 M; for comparison GraphCast has 36.7 M parameters), possibly due to the choice of the Transformer architecture and the need to train separate ML models for different forecast ranges (see below).
Most MLWP models (FourCastNet and GraphCast among them) predict the evolution of the atmosphere with a Δt = 6 hr timestep and forecasts at longer lead times, which are always multiple integers of Δt, are obtained autoregressively: As noted by the Pangu-Weather developers, the repeated application of an imperfect model leads to rapid accumulation of errors and can limit its predictive skill.MLWP models obviate this problem by progressively increasing the forecast horizon over which the ML model is trained (e.g., Lam et al., 2022), typically at the price of increased smoothness of the forecasted states.The solution adopted in Pangu-Weather is called "Hierarchical Temporal Aggregation" (HTA) and involves the development of four separate ML models trained to forecast at different lead times of 1, 3, 6 and 24 hr which are then combined during inference.This allows a reduction of the number of applications of the ML model for any given forecast lead time and the ability to provide forecasts with 1-hr granularity but may lead to unphysical discontinuities in forecast evolution.
The Pangu-Weather model has been used to build a six month data set of forecasts (October 2018 to March 2019, every three days) started from both ERA5 analysis fields and ECMWF IFS analyses.This data set has been used in this study, together with publicly available ERA5 reanalysis and forecast fields and ECMWF IFS analyses and forecasts over the same period.Additionally, Pangu-Weather and other MLWP models like FourCastNet and GraphCast have been run in a semi-operational configuration at ECMWF since August 2023 and their outputs both graphical and numerical are freely available from the ECMWF archives (see the Data Availability Statement section for more details).These products have also been used in this work.

Spectral Diagnostics
A widely known issue with forecasts produced by MLWP models is that they appear to become increasingly "blurry" with increasing forecast lead times (Keisler, 2022; Lam et al., 2022).This behavior is to be expected on

Geophysical Research Letters
10.1029/2023GL107377 general grounds (Sønderby et al., 2020), as they are usually trained to optimize a weighted mean squared/absolute error (L2/L1) norm of forecast errors.Thus, the way the ML models express increasing uncertainty over longer lead times is by producing forecasts closer to the mean of the forecast pdf, that is, smoothing out unpredictable details from the forecast.This smoothing is indeed visible in the Pangu-Weather model as well.In Figure 1, top row, we show examples of power spectra from the ERA5 analysis, Pangu-Weather forecasts and ECMWF IFS forecasts at different forecast lead times.Equivalent plots for ML models GraphCast and FourCastNet are provided in Figure S1 of Supporting Information S1.ECMWF IFS forecast spectra do not change appreciably with forecast lead time and remain close to the ERA5 analysis spectra until about wave number 200 (∼200 km wavelength), above which they start to diverge (being more energetic) due to the higher spatial resolution of the operational ECMWF IFS forecasts (∼9 km horizontal grid spacing vs. ∼31 km of ERA5 analyses) and, to a lesser extent, the impact of employing models from different IFS cycles (IFS Cycle 45r1 for the ECMWF IFS vs. IFS Cycle 41r2 for ERA5).Conversely, the spectra of Pangu-Weather forecasts and those of the other ML models, show a noticeable divergence from the ERA5 analysis spectra in terms of reduced energy already from wavenumbers in the 60-80 range (∼500-700 km wavelength).This reduction in spectral energy of the Pangu-Weather forecasts shows a marked sensitivity to forecast lead time, especially over the first 24 hr.For completeness, note that forecast spectra of forecasts from ERA5 forecasts form ERA5 re-analyses (ERA5 hindcasts in the following) at the original resolution and IFS model cycle are not shown here because they are largely indistinguishable from the spectra of ERA5 analysis fields.
The plots in Figure 1 and Figure S1 in Supporting Information S1, top row, quantitatively confirm that the ML models produce less spectrally resolved forecasts than the analysis fields used in their training and those produced by the ECMWF IFS forecasts.The effective resolution of the ML models' forecasts is closer to 500-700 km than to the nominal 0.25°and is gradually decreasing with forecast lead time.In the case of Pangu-Weather however, the Hierarchical Temporal Aggregation idea of training and deploying different models for different lead times seems partially effective in reducing the further loss of spectral energy beyond t + 24 hr lead time.
What are the consequences of the rapid reduction in the Pangu-Weather forecast spectral energy with increasing wavenumber?On standard forecast maps used for synoptic evaluation, the differences between Pangu-Weather and ECMWF IFS forecasts are not striking, for example, Figure S2 in Supporting Information S1, top row, in Supporting Information, although the Pangu-Weather contours appear somewhat smoother.However, for weather phenomena with high variability on sub-synoptic and mesoscale spatial modes, the effects are noticeable.An example is given in Figure S2 of Supporting Information S1, bottom row, where we show the t + 132 hr forecasts of the evolution of Typhoon Doksuri, which was a destructive category 4 tropical cyclone in late July 2023.In the Pangu-Weather forecast Typhoon Doksuri evolves into a shallow low pressure system (986 hPa min mslp), while in the ECMWF IFS forecast it remains an active tropical cyclone (957 hPa min mslp), though not as deep as in reality (944 hPa, IBTrACS version v04r00, Knapp et al., 2018).This behavior leads to relatively poor performance in the forecast of the intensity of tropical cyclones in comparison to state-of-the-art NWP Earth system simulators like the ECMWF IFS (see Ben-Bouallegue et al., 2023 for an extensive evaluation).
A common explanation of the progressive loss of detail for increasing forecast ranges of MLWP models is that these models are trained to produce forecasts that are closer to an ensemble forecast mean than to a deterministic forecast (e.g., Lam et al., 2022).This is based on the established property of ML models trained to optimize L2 loss functions to converge to the conditional mean of the target data (Hsieh, 2023).It is thus of interest to compare the characteristics of Pangu-Weather forecasts with those of the ensemble mean (EM) forecasts of the ECMWF ensemble prediction system (ENS; ECMWF, 2022), which is a forecast system expressly designed to sample the forecast pdf of the ECMWF IFS starting from a sample of the initial uncertainties estimated by the ECMWF Ensemble of Data Assimilation (EDA; Bonavita et al., 2016).
In Figure 1, bottom row, we show the same curves as in the top row but insert the energy spectra of the ECMWF EM in lieu of those of the deterministic ECMWF IFS forecast.The spectral signature of the ECMWF EM forecasts is noticeably different from that of the Pangu-Weather forecasts.In the short forecast range (12-24 hr) error evolution on synoptic and sub-synoptic scales is approximately linear and the EM spectra closely track the ERA5 analysis spectra, while the Pangu-Weather forecasts already show heavy damping of spectral modes above approx.wavenumber 60.On the other hand, at lead times well into the medium range (t + 120 hr), the ECMWF EM fields show reduced energy at synoptic ranges, where error growth becomes nonlinear and ensemble averaging acts to smooth out unpredictable features (Leith, 1974;Toth & Kalnay, 1997), while they maintain a comparable or larger spectral signature than Pangu-Weather forecasts for higher spatial frequency modes.Thus, Pangu-Weather forecasts differ from ECMWF EM forecasts in the sense that they have both too little energy at short forecast ranges and high wave numbers, and too much energy in the medium range at synoptic and subsynoptic scales.Similar considerations apply to the other ML models discussed here (e.g., Figure S1 in Supporting Information S1).These results have implications both in terms of employing the Pangu-Weather model in ensemble prediction (discussed in Section 5) and for the interpretation of the results on Pangu-Weather forecast skill.
The spectral characteristics of the forecasts can in general affect standard deterministic forecast skill performance measures.An example is given in Figure S3 of Supporting Information S1 where the RMS forecast error evolution is presented for the same ECMWF IFS forecasts, but the input fields to the verification have been spectrally truncated at different wavenumbers.It is apparent how the less spectrally resolved forecasts appears to be more skillful than their higher resolution version.When we compare the RMS forecast performance of the Pangu-Weather model relative to the ECMWF IFS and EM operational forecasts (Figure S5 in Supporting Information S1) we find broadly similar skill in the short range (day 1-3) for all the models, consistent with the fact that they show similar level of activity up to synoptic scale wavenumbers.In the medium-range (day 3-10) Pangu-Weather shows marginally improved performance over the ECMWF IFS, and significantly worse performance than the ECMWF ensemble forecast mean (EM), which is again consistent with the different levels of spectral energy in the three forecasts.

Physical Consistency of Pangu-Weather Forecasts
Physics-based NWP models predict the evolution of the atmosphere by solving discretized forms of the governing physics-based equations (Pu & Kalnay, 2018).This set of equations describe fundamental conservation laws (momentum, mass, energy and atmospheric constituents) that the atmospheric system obeys, and which implicitly enforce balances between different physical variables.

Geostrophic Wind Balance
One of the fundamental physical balances in atmospheric flow is that between the mass variables (temperature, geopotential) and the wind field.Neglecting friction and acceleration, scale analysis (Holton & Hakim, 2013) leads to a stationary diagnostic balance between horizontal wind and geopotential which in local Cartesian coordinates reads: where V g = (u g ,v g ), f = 2Ωsin(φ) is the Coriolis parameter, and ∇ p Φ is the gradient of the geopotential on an isobaric surface.
In Figure 2, top row, we present vertical profiles of the ratio of the intensity of ageostrophic (V ag ≡ V V g ) over geostrophic winds for Pangu-Weather, ERA5 hindcasts and ECMWF IFS forecasts in the extra-tropics (Equivalent plots for FourCastNet and GraphCast are provided in Figure S5 of Supporting Information S1, top row).This ratio remains close to the ERA5 analysis at all lead times for ERA5 hindcasts (not shown) and ECMWF IFS forecasts (in fact the ageostrophic wind shows a modest increase around the tropopause and lower stratosphere at longer lead times, possibly due to spin up effects).On the other hand, the Pangu-Weather profiles present reduced intensity of the ageostrophic/geostrophic wind ratio, which tends to further weaken for increasing lead times.This implies that the wind and geopotential forecasts from the Pangu-Weather model are increasingly dynamically inconsistent with one another.
Ageostrophic winds in midlatitude synoptic systems are connected with areas of convergence/divergence which, through the continuity equation, are linked to areas of vertical motions and active weather.As shown in the example in Figure 2, bottom row, while the broad scale geopotential pattern of the Pangu-Weather forecast appears plausible if somewhat smoothed, the intensity and spatial distribution of the ageostrophic motions (and the implied vertical motions) are less so.Similar considerations are applicable to the FourCastNet and GraphCast forecasts (Figure S5 in Supporting Information S1, bottom row).

Rotational and Divergent Wind Components
The Helmholtz decomposition (Dutton, 1976) allows to uniquely partition the total wind field u = (u,v) into divergent ("vorticity free") and rotational ("divergence free") components: where χ is the velocity potential function, and ψ is a streamfunction.χ and ψ can be obtained from the divergence (δ) and vorticity (ζ) fields, that is: Divergence and vorticity fields thus allow to estimate the dynamical consistency of the forecasted wind, in a manner analogous to what the geostrophic balance allows to do in terms of the dynamical consistency of mass and wind fields.
In Figure 3, top row, we present vertical profiles of the ratio of the globally averaged absolute values of divergence and vorticity for the ERA5 analysis, Pangu-Weather forecasts, ERA5 hindcasts and IFS operational forecasts (Equivalent plots for ML models FourCastNet and GraphCast are provided in Figure S8 of Supporting Information S1, top row).This ratio is significantly and progressively reduced in all the ML models' forecasts, while remains approximately constant and close to that of the ERA5 analysis in both ERA5 hindcasts and operational IFS forecasts.The divergent component of the flow is suppressed in the ML models' forecasts with respect to the rotational component, which is unphysical and implies that forecasted vertical motions would also be suppressed, as discussed in the next section.

Vertical Motions
All MLWP models considered in this work do not explicitly forecast vertical velocities.However, it is possible, under certain assumptions, to diagnose vertical velocity from horizontal velocity on constant pressure levels by integrating the continuity equation in the vertical (Holton & Hakim, 2013, Section 3.5.1):As the geostrophic wind is approx.non divergent, vertical velocity can be diagnosed from the mean layer horizontal divergence together with the standard hydrostatic balance assumption.An example output of this field is given in Figure S6 of Supporting Information S1, where we show the vertical velocity field (w) at 500 hPa forecasted by the ERA5 hindcast (top panel); the vertical velocity field at 500 hPa diagnosed using Equations 5 and 6 from the ERA5 hindcast (middle panel); and the vertical velocity field at 500 hPa diagnosed using Equations 5 and 6 from the Pangu-Weather forecast.Comparing the top and middle panels of Figure S6 in Supporting Information S1, it is apparent that while the diagnosed w is unrealistic in regions with significant topography (i.e., where isobaric surfaces end up below ground), it is a qualitatively good approximation in low lying areas and over the oceans.Inspection of Figure S6 in Supporting Information S1 indicates that the Pangu-Weather diagnosed vertical velocity field is weaker and more diffuse than the ERA5 hindcast fields.This is quantitatively confirmed in Figure S7 of Supporting Information S1, where the evolution of the absolute value of the diagnosed vertical velocities over the ocean are presented for the ECMWF IFS forecasts, the ERA5 hindcasts and forecasts from all the ML models discussed here.Pangu-Weather vertical velocities are about 40% smaller than those of the ERA5 hindcasts, which have the same nominal spatial resolution, and about half in magnitude of those diagnosed from the IFS forecasts, which have higher resolution.Also notable is the reduction of the intensity of vertical velocities in all the ML models, which is more evident for those that employ an autoregressive time-stepping technique (i.e., FourCastNet and GraphCast).
The forecast maps presented in Figure S6 of Supporting Information S1 reveal the signature of a developed tropical cyclone in the Caribbean Sea.This corresponds to hurricane Lee, the strongest hurricane of the 2023 Atlantic hurricane season, and a category 3 hurricane with a 948 hPa minimum mslp at the verification time of the maps (12 September 2023, 00 UTC).In Figure 3, bottom row, we show a magnified view around the forecasted position of Hurricane Lee for the ECMWF IFS, ERA5 hindcast and Pangu-Weather.The vertical velocities shown in all these plots are derived from Equations 5 and 6 on forecasted horizontal wind fields on standard pressure levels.While the relative shallowness of the Pangu-Weather forecasted tropical cyclone is coherent with the global diagnostics presented in Figures 2 and 3, the general noisiness and lack of realism of the TC in the Pangu-Weather forecast (and the other ML models, Figure S8 in Supporting Information S1 bottom row) raise further questions about the ability of MLWP models to provide a physically consistent picture of the evolution of the atmosphere.

Discussion and Conclusions
The field of ML weather forecasting has made huge progress in a very short (by traditional NWP/Climate standards) period of time, and the best performing MLWP models routinely claim better performance than stateof-the-art traditional NWP models, using orders of magnitude less computational power and energy during their deployment phase.While these advantages appear compelling, there are some caveats.
One main finding of this analysis is that Pangu-Weather (and other well-known MLWP models) cannot be considered general-purpose atmosphere simulators or atmospheric "digital twins."This is already apparent from the power spectra of ML models' forecasts when compared to those of the ERA5 re-analyses used for their training and those of the ECMWF IFS model.ML models' forecast spectra show decreasing energy with increasing wavenumber (higher spatially resolved scales) and with increasing forecast lead time.This is in line with the observation that ML weather models produce progressively smoother forecasts.What is possibly not so widely appreciated is that the shape and evolution of their forecast spectra imply that ML models' forecasts have problems in representing fundamental dynamical balance relationships in the atmosphere, for example, those implied by geostrophic and ageostrophic flows and the ratio between divergent and rotational wind components.Additionally, the fact that these physical balances are not satisfied implies that other quantities that can be diagnosed from balance relationships, for example, vertical motion fields and, by extension, areas of precipitation/active weather, are also unrealistic.
The hypothesis that MLWP models should better be viewed as estimators of the mean of the forecast pdf is also problematic.A comparison of the spectra of Pangu-Weather forecasts with those of the ECMWF operational ensemble forecast mean shows significant discrepancies.In particular, ML models' forecasts do not present the signature drop in energy of the ECMWF EM at synoptic scales in the medium range (3-5 days), which is associated with the loss of predictability at these lead times/spatial scales due to the chaotic growth of initial and forecast uncertainties (Žagar, 2017), while they consistently show reduced forecast variability at smaller spatial scales.This reduced forecast activity is also dependent on forecast lead time, which implies heteroscedasticity in the distribution of forecast errors with forecast lead time.These results are consistent with recent work by Selz & Craig, 2023, documenting the inability of Pangu-Weather to produce realistic error growth from small-amplitude initial condition perturbations (i.e., lack of a "butterfly effect").For these reasons, the application of Pangu-Weather and similar MLWP models in ensemble prediction may turn out to be challenging, at least in terms of following the current paradigm of forecast ensembles as collections of perturbed realizations of physically valid model trajectories.
The above considerations may have an impact on the evaluation of the forecast performance of MLWP models.Forecast models with reduced variability and which do not present the standard upscale error growth of physicsbased models (Selz & Craig, 2023) tend to perform better on deterministic forecast skill measures, especially at longer lead times ("double penalty" effect), and this is confirmed by results presented here.To which extent this contributes to the forecast skill of MLWP models will require more work to determine.
While they cannot be considered atmospheric emulators/digital twins, ML models like Pangu-Weather can be better understood as forecast applications targeted at optimizing specific aspects of forecast performance, that is, medium-range mean squared/absolute errors over a range of atmospheric and near surface weather parameters.This is effectively a similar objective to that pursued by modern multivariate NWP post-processing techniques (Lakatos et al., 2023), with the obvious advantage that MLWP models only require the analysis state (or an ensemble of analyses) as predictors.This can be both effective and efficient for various medium and extendedrange user applications (Lam et al., 2022) where the main drivers of predictability are on synoptic or larger scales, standard NWP models are affected by significant systematic errors and producing physically consistent forecast states is not crucial for the end user.This also suggests that similar ML technologies could be effectively applied to different forecast ranges with a targeted choice of loss functions and training curricula.On the other hand, the results presented here highlight some of the outstanding challenges for data-driven ML prediction models, namely, how to produce forecasts that are skillful and at the same time dynamically and physically consistent at all relevant spatial scales.
While current MLWP models have a significant role to play in forecast applications, they still fundamentally depend on physics-based model and data assimilation systems for both their training, their initialization and the further development of their forecasting capabilities.The Author would like to thank Matt Chantry (ECMWF) for making the Pangu-Weather forecast data set available and interesting discussions on the subject of ML.Gregory Hakim (Univ. of Washington), Linus Magnusson, Tony McNally, Andy Brown, and Florian Pappenberger (all ECMWF) provided insightful comments on an earlier version of the manuscript, which are gratefully acknowledged.

•
Forecasts from Machine Learning (ML) models have energy spectra notably different from those of their training reanalysis fields and Numerical Weather Prediction models • This results in overly smooth predictions and weather phenomena at spatial scales shorter than 300-400 km are not properly represented • Fundamental physical balances and derived quantities are not realistically represented in the forecasts of the ML models Supporting Information: Supporting Information may be found in the online version of this article.

Figure 1 .
Figure 1.Top row: Power spectral density as a function of total wavenumber of ERA5 analysis (continuous black lines), ECMWF IFS operational forecasts (dotted lines) and Pangu-Weather forecasts (dashed lines) at t + 12 hr, t + 24 hr, and t + 120 hr for 850 hPa temperature (left panel) and 250 hPa (right panel).Bottom row: Same as top row for ERA5 analysis (continuous black lines), ECMWF IFS operational ensemble forecast mean (ensemble mean, continuous lines) and Pangu-Weather forecasts (dashed lines).Values averaged over the period 20230907-20230910.

Figure 2 .
Figure 2. Top row: Vertical profiles of the ratio of the intensity of ageostrophic over geostrophic wind over the extra-tropics (|lat| ≥ 20°) for the ERA5 reanalysis (continuous black line) and Pangu-Weather forecasts (dash lines, left panel); and ECMWF IFS forecasts (right panel) at lead times t + 12 hr, t + 24 hr, and t + 120 hr.Values averaged over the period 20230907-20230910.Bottom row: T + 120 hr forecasts of geopotential height (continuous lines, units dam), and ageostrophic wind (wind arrows and shaded areas for intensity, units m/s) at 250 hPa valid on 2023-09-12 00 UTC.Left panel: Pangu-Weather forecast.Right panel: ECMWF IFS forecast.