Adaptive tuning of uncertain parameters in a numerical weather prediction model based upon data assimilation

In numerical weather prediction models, near‐surface quantities like 10‐m wind speed (FF10M) or 2‐m temperature (T2M) tend to exhibit significantly larger forecast errors than the related variables in the free troposphere. Besides representativeness issues of surface stations, this is primarily related to parametrization errors and insufficient knowledge of relevant physical properties of the soil and the surface layer. For instance, the vegetation roughness length is usually derived from land‐cover classifications that may contain errors and reflect only part of the natural variability. This article describes a methodology implemented into Deutscher Wetterdienst's operational numerical weather prediction model ICON in order to improve the estimate of such parameters by using information from the data assimilation system. Building upon the condition that FF10M, T2M, and 2‐m relative humidity are assimilated, time‐filtered assimilation increments are calculated for the respective fields at the lowest model level. These are taken as proxies for the related model biases. For T2M, an additional weighted increment field is computed that indicates the model bias in the diurnal temperature amplitude. Based on these increment fields, several physical parameter fields and a few model tuning parameters are varied around their base values. This adaptive parameter adjustment is used operationally in the global and regional forecasting systems of Deutscher Wetterdienst. The ensuing reduction of the FF10M, T2M, and 2‐m relative humidity errors typically lies on the order of 5% on a hemispheric average but has substantial regional and seasonal variability that depends on the original magnitude of the model error. A weaker but still statistically significant positive impact is seen in the radiosonde verification of wind speed, humidity, and temperature in the lower troposphere, giving confidence that the adaptive tuning indeed reduces model errors rather than pushing the model towards unrepresentative station observations.


INTRODUCTION
The forecast quality of a numerical weather prediction (NWP) model depends on a wide range of factors, including the quality of the associated data assimilation (DA) system, its interaction with the NWP model, the accuracy of the dynamical core, the sophistication of the physics parametrizations and their coupling with the dynamical core, and on the tuning of the system as a whole.In this context, the term tuning should not be interpreted as merely playing around with a few namelist parameters, but rather be taken to include the optimization of individual process formulations and their interaction with each other.Since many processes depend on parameters describing physical properties of, for instance, vegetation, soil, or a snow cover, estimating these parameters from external data is also an important aspect of tuning.In many cases, the available external information allows only for a rough estimate of the required quantities, making a significant contribution to model uncertainties and systematic errors.
Efforts of model developers and users to optimize various aspects of forecast quality by parameter calibration have a long history, and for a bit more than a decade there have been increasing attempts to use objective optimization algorithms for this purpose.For regional climate applications, Bellprat et al. (2012) reported an error reduction of about 7% from optimizing five selected parameters of a model that previously was primarily tuned for NWP purposes.Even more pronounced improvements were found in a follow-up study by Bellprat et al. (2016), who actually identified a systematic model deficiency that subsequently led to an improvement of the respective process formulation.Regional NWP applications were considered, for example, by Duan et al. (2017), who found substantial quality improvements with respect to the default settings as well, whereas Voudouri et al. (2021) reported only a slight gain in forecast scores.Generally, the potential of such parameter tuning depends on the selection of the tuning parameters, which is subjective even when using an objective algorithm afterwards, and on the quality of the baseline tuning, for which one needs to keep in mind that namelist defaults are not necessarily tuned values used for operational applications.Apart from that, it appears to be more difficult to obtain a benefit from parameter tuning in global models than in regional models because systematic errors of the reference configuration usually differ between regions and seasons.The experience gained at Deutscher Wetterdienst (DWD) in tuning experiments for the operational global ICON configuration (Reinert et al., 2022) is that, for the vast majority of parameters, a certain range of values exists within which there is no clear optimum because changing a parameter improves the scores in some regions or seasons and degrades them in others, or improves some forecast variables and degrades others.Trying to select optimal parameters would thus involve subjective decisions (e.g., by weighting some forecast variables or regions of the Earth higher than others), and so passing the optimization task to an objective algorithm would just shift the subjective decision-making to the definition of the cost function or target score.With very few exceptions, the model-related skill improvements made since the start of operational ICON forecasts in January 2015 were related to advances in the formulation of individual physical processes, the completion of missing processes, or upgrades in the realm of the external parameter data.To make further progress, we therefore need to seek for more sophisticated approaches than simple parameter tuning.
In recent years, there has been increasing activity in using DA not only for providing the optimal initial state of the prognostic model variables but also for estimating uncertain parameters and for correcting model biases without addressing their source at process level.The success of these efforts was limited until a few years ago, but recent results look increasingly promising.An example for an advanced bias correction method is the weak-constraint four-dimensional variational data assimilation used at the European Centre for Medium-Range Weather Forecasts (ECMWF) for more than a decade, first restricted to the upper stratosphere, but extended to the whole stratosphere in 2020 (Laloyaux et al., 2020).The weak-constraint four-dimensional variational data assimilation basically detects model biases during the variational minimization procedure and applies bias-correcting forcing terms to the subsequent forecast.Obviously, separating systematic from non-systematic model errors is of key importance for the success of this method, and it is interesting to note that achieving this separation required a decade of research and development work (Laloyaux et al., 2020).Similar approaches have been developed for ensemble DA and prediction schemes; for instance, the analysis-correction-based additive inflation method (Crawford et al., 2020), which was found to provide significant improvements on NWP time-scales but may lead to overcorrection later on if biases of the baseline model change sign.Another field of increasing activity is using machine-learning methods to complement DA-based bias correction (Bonavita and Laloyaux, 2020), and currently available results give a promising perspective for future use in operational NWP.However, from a model development point of view, it appears to be preferable to reduce model biases by optimizing uncertain physical parameters rather than just offsetting biases by external forcing tendencies, because conditional biases depending on the actual weather situation are more likely to be minimized when the underlying physical dependencies are well captured by the model.Efforts to extend ensemble Kalman filter (EnKF)-based DA schemes to model parameter estimation started with simplified models and idealized configurations, but more recent studies considered full general circulation models as well (e.g., Schirber et al., 2013;Kotsuki et al., 2018).Yet, despite some promising results, these methods do not appear to be mature enough for use in operational NWP.In particular, estimating many parameters at once entails the risk of instabilities, in the sense that the parameters evolve towards unphysical values that degrade the forecast skill at longer lead times or for variables not addressed in the DA process.Limited-area models were considered, for instance, by Ruckstuhl and Janji ć (2020), investigating specifically the adaptation of the surface roughness length z 0 in order to optimize the surface fluxes feeding the formation of clouds and precipitation.Though this adaption was found to deliver some benefit for the optimization target, the authors reported a large diurnal variation of z 0 , which appears to be undesirable from a physical point of view.Moreover, they did not investigate the impact on the 10-m wind speed.
At DWD, a technically different method for adaptive parameter estimation has been developed and implemented that is not related to a specific DA algorithm but infers the spatial distribution of certain model biases from the information provided by the DA system.A-priori knowledge about the relationship between these biases and sensitive model parameters is used to avoid the difficulties EnKF-based methods have with estimating many parameters at once.Currently, the methodology focuses on the near-surface variables 2-m temperature (T2M), 2-m relative humidity (RH2M), and 10-m wind speed (FF10M), for which the attribution of systematic forecast errors to model parameters is relatively clear.The majority of the model parameters selected are physical properties derived from external parameter data, like the roughness length z 0 or physical properties of the soil, but adapting tuning parameters of parametrization schemes is possible as well.Based upon the model biases diagnosed from DA increments, these parameters are varied around their base value within prescribed bounds, and an input bias of zero implies that the parameters attain their base value determined from external data or prescribed by namelist.Thus, the algorithm is inherently stable, and it will be referred to as adaptive parameter tuning (APT) rather than parameter estimation in the remainder of this article in order to highlight the methodological differences from EnKF-based approaches, where parameters may drift away from their reference values permanently.In the NWP applications of DWD, various components of APT have been successively introduced since 2021, and the goal of this article is to describe the status used in the operational global NWP system in March 2023 together with the associated benefits on forecast quality.After providing some background information on the NWP configurations operated at DWD, Section 2 will describe the main algorithmic components of APT together with the procedure used for testing and optimizing the scheme.The gain in forecast quality achieved with APT is presented in Section 3, followed by a summary and an outlook towards possible future extensions in Section 4.

2.1
Background: The operational NWP system of DWD The operational NWP system of DWD builds upon the Icosahedral Nonhydrostatic (ICON) modelling framework (Zängl et al., 2015;Zängl et al., 2022), which has been developed in collaboration with the Max Planck Institute for Meteorology, the Karlsruhe Institute for Technology, and the German Climate Computing Center.The model equations are solved on a triangular Arakawa-C grid, for which we define the mesh size as the square-root of the average cell area.The global deterministic model configuration started operational production in January 2015 with a mesh size of 13 km and a model top at 75 km.The number of model levels is 90 until November 2022 and 120 afterwards.The global system includes a two-way nested domain over Europe and adjacent regions with a mesh size of 6.5 km and a vertical interface to the global domain at about 23 km, the number of levels being 60 until November 2022 and 74 since then.The associated ensemble prediction system has mesh sizes of 40 km (20 km) until November 2022 and 26 km (13 km) afterwards in the global (Europe) domain, the tops and level counts being the same as in the deterministic system.More detailed information can be found in Reinert et al. (2022).Since February 2021, ICON is also operated in a convection-permitting limited-area configuration over central Europe, which is named ICON-D2 and has a mesh size of 2.1 km with 65 levels up to about 22 km.The discussion in the remainder of this article will focus on the global model, but most elements of the APT are applied to ICON-D2 as well.
The associated DA system is a hybrid ensemble-variational scheme (Buehner, 2005) building upon a three-dimensional variational data assimilation algorithm for the deterministic component and a local ensemble transform Kalman filter (Hunt et al., 2007) with recentring of the ensemble mean analysis towards the deterministic analysis.The error covariance matrix used for the variational analysis is partly derived from the ensemble first guess and partly climatological, the weights currently being 70% and 30% respectively.The global assimilation cycle is performed in three-hourly steps with assimilation windows of ±90 min centred around the nominal validity date.Filtering of dynamical imbalances is done using an incremental analysis update approach following Bloom et al. (1996), which implies that the DA scheme passes three-hourly increments rather than full fields to the model.Specifically, analysis increments are provided for pressure P, temperature T, the horizontal wind components U and V, and specific humidity Q V .

Calculation of filtered analysis increments
Building upon the fact that FF10M, T2M, and RH2M are assimilated in DWD's present operational system, the analysis increments at the lowest model level (located at 10 m above ground) are used as a proxy for the related model (first-guess) error at a given point in space and time.The analysis increments naturally have the opposite sign of the model error and a smaller magnitude because DA schemes never push the model fully towards the observations, but they can be expected to be roughly proportional to the first-guess error.A probably even more accurate way to diagnose surface-level model errors would be to generate separate analyses for FF10M, T2M, and RH2M, followed by calculating differences to model forecasts, but taking the existing analysis increments is preferred because this adds much less complexity to the operational workflow.To separate systematic model errors (biases) from random errors, an appropriate time filtering of the analysis increments is needed.This filtering is executed in the ICON model at each step of the assimilation cycle (i.e., every 3 hr in the global system) and follows a simple Newtonian relaxation method: where  represents T, wind speed FF, or relative humidity RH, and subscripts i and fi signify analysis increments and filtered increments respectively, t is the validity time, dt ana = 3 hr is the analysis interval (1 hr for ICON-D2), and dt filt = 2.5 days is the filtering time-scale.The latter choice is motivated by the desire to remove diurnal cycles from the filtered model biases while keeping the possibility of the scheme to adjust to model bias changes related to changes in weather conditions.Sensitivity experiments revealed that a moderate increase of dt filt to, for instance, 5 days has very little impact on the forecast scores, indicating that sudden bias changes are an exception rather than the rule.They may occur, however, under special circumstances, like the appearance or disappearance of a snow cover, and a short dt filt was found to be advantageous during the development phase because this reduces the initial spin-up time of the filtered increment fields.Note that the wind-speed increment is computed as where the subscript fg denotes the model first guess, in order to ensure that wind direction changes are not counted as speed increments.As will be described in detail later herein, these filtered increments serve as predictors for the APT applied in ICON.For temperature, an additional filtered increment field is computed that is weighted with the cosine of local time in order to provide a proxy for the model's diurnal temperature amplitude bias: where t loc denotes local time (in seconds) at a given model grid point.The sign convention is such that negative values of T wfi correspond to an underestimated diurnal temperature amplitude.During the assimilation cycle, the filtered increment fields T fi , T wfi , RH fi and FF fi are passed from each step to the subsequent one, and they are also part of the first guess passed to the forecasts.Apart from this, the technical workflow is not changed by the APT.
It is important to note in this context that relatively strict selection criteria are applied for the assimilation of surface data in order to exclude stations that are unrepresentative for the related model equivalent (obtained by bilinear interpolation to the station location) due to their topographic exposure.Limits are imposed both on the height difference between the interpolated model topography and the station and on the interpolated standard deviation of subgrid-scale orography (SSO) obtained from the external parameter data.The latter is needed to exclude stations in mountainous terrain that happen to lie at a similar elevation as the model orography but are nevertheless unrepresentative due to local topographic effects.The current operational settings for the height differences are 100 m for FF10M and 150 m for T2M and RH2M.Additional limits for the SSO standard deviation of 70 and 200 m are applied for FF10M and T2M respectively.

Adaptive surface friction
To optimize the forecast skill of FF10M, the vegetation roughness length z 0 derived from the land-cover class and the SSO blocking tendency at the lowest model level are adapted based on FF fi .Regarding the SSO scheme, it is noted that ICON uses the Lott and Miller (1997) scheme developed at the ECMWF with a few modifications.Unlike the ECMWF, we do not combine the SSO scheme with a turbulent orographic form drag scheme but rather use the SSO scheme for the full spectrum of SSO and compute the related external parameter fields accordingly.As a consequence, the minimum SSO standard deviation required for applying the scheme has been reduced from 50 to 1 m, implying that the SSO scheme is active in ICON over nearly all land surfaces except for very smooth glaciers.
In addition, the vertical profile function for the low-level drag has been adjusted to using the scheme for SSO values much smaller than the height of the lowest model level.These changes have already been found to have a large beneficial impact on the FF10M scores prior to implementing the APT.
In our approach, APT is generally realized by multiplicative scaling factors.Thus, the scaling factors are desired to be antisymmetric in a multiplicative sense (i.e., when displayed with a logarithmic scale; see Supporting Information Figure S1) with respect to the zero value of a given filtered increment.For example, a linear relationship f (x) = 1 + x for x > 0 needs to be complemented by f (x) = 1∕(1 − x) for x < 0. The simplest possibility to obtain a strongly nonlinear relationship (again in a multiplicative sense) is to just flip the branches; that is, f (x) = 1∕(1 − x) for x > 0 and f (x) = 1 + x for x < 0. Obviously, the argument then needs to be limited in order to avoid unphysically large/small factors or even a division by zero.Note in addition that the signs change if the physical relationship requires that a positive (negative) filtered increment induces a scaling factor below (above) 1.Since these options delivered satisfying results for all elements of the APT described later herein, more complex formulations have not been considered with one exception noted later.
For surface friction, the formulation that was found to provide the best results is given by for negative FF fi and for positive FF fi (see Supporting Information Figure S1a for visualization).f sf is limited to a range between 0.25 and 4 and applied multiplicatively to z 0 and the aforementioned SSO blocking tendency.For z 0 , an additional upper limit of 1.5 m is prescribed in order to prevent the surface fluxes from growing too large.Moreover, glacier surfaces are excluded because the results obtained over Antarctica were not convincing.The coupling factor of 2.5 has been determined empirically by increasing it successively until negative side effects on other forecast variables like T2M or surface pressure became apparent.An example for this tuning procedure, referring to the parameters addressed in Section 2.5, is provided in Supporting Information Section S2 and the corresponding Figures S2-S6.f ai is a scaling factor that allows adjustment of the coupling factor to analysis intervals other than 3 hr (e.g., 1 hr for ICON-D2).As adaptive surface friction is not yet operational in ICON-D2 (testing is still ongoing), f ai is provisionally set to 1 because the dynamical adjustment time for the near-surface wind profile tends to be so short that hourly analysis increments are not much smaller than three-hourly increments.Temperature and humidity behave differently in this respect (see later).The nonlinear dependence of f sf on FF fi is motivated by the finding that the root-mean-square error (RMSE) is most effectively reduced when large biases enforce disproportionately large changes in surface friction.On the other hand, it is important that f sf does not react nervously on small wind-speed biases, because this could give rise to oscillations under adverse conditions.
Finally, it should be mentioned that there is no objective argument for using the same f sf for z 0 and the blocking tendency; but likewise, there appears to be no justification for considering one of these quantities less uncertain than the other one.z 0 may be inaccurate due to errors in the land-use data and the lack of information on the density of trees in forests, whereas the SSO slopes, which are the most important parameter affecting the blocking tendency, may be inaccurate when the horizontal scales of small-scale orography are not well resolved by the raw data.Depending on the local circumstances, either vegetation roughness or SSO blocking may constitute the dominant momentum sink term at the surface level; and by varying them proportionally, their relative contribution is approximately maintained without the need of explicit case discriminations.

Adaptive adjustment of soil and plant evaporation
The importance of a realistic partitioning of the incoming solar flux into sensible and latent heat fluxes (Bowen ratio) is large, particularly during the warm season, and the need of appropriately adjusting or assimilating the soil moisture was recognized a long time ago at many NWP centres.At DWD, a soil moisture analysis (SMA) was introduced into the global NWP system about 20 years ago, performing a variational minimization of the T2M bias around local noon diagnosed from the 0000 UTC forecast of each day (Hess, 2001;Hess et al., 2008).The SMA continues to have a substantial positive impact on the T2M and RH2M scores of ICON despite significant improvements implemented in the land-surface model TERRA during the last two decades (Schulz and Vogel, 2020;Doms et al., 2021).However, limitations arise because RH2M is not used as an additional predictor to better distinguish errors originating from an incorrect Bowen ratio from errors related to other sources; for example, cloud and radiation biases.Moreover, the response time related to adapting the soil moisture reservoir is rather long, and the SMA increments may attain a physically questionable magnitude when model biases change rapidly.This issue becomes evident almost every year during the first sunny and mild weeks in early spring, when the evaporation tends to be systematically too large due to an unknown deficiency in the physical process description of TERRA.Rather than removing a lot of soil water in early spring and feeding it back in summer, it was felt to be more meaningful to additionally adjust the most important model parameters controlling evaporation, which not only improves the T2M and RH2M forecasts but also indirectly reduces the SMA increments.
For this purpose, a combined temperature-humidity predictor is used that was found to provide a good compromise for jointly minimizing T2M and RH2M errors: (5) The scaling of the temperature contribution is motivated by the experience that three-hourly analysis increments for T are typically about a quarter of the model bias in a 1-day forecast.TRH fb thus approximates a filtered model bias (but this is just a matter of convenience, because the coupling factor defined later herein could compensate any other scaling).The T wfi term means that contributions around local noon get three times the weight of contributions around midnight; this is beneficial because the Bowen ratio has a much larger impact on daytime biases than on nocturnal ones.However, further increasing the day-night weighting ratio turned out to degrade the T2M scores again on a daily average.The scaling of RH fi has the effect that the RH and T contributions are of similar magnitude on average over larger domains and periods, which was found to minimize the risk of false corrections in regions where other error sources are significant.Finally, f ai is taken to be the ratio between dt ana,ref (3 hr) and dt ana in this case because T and RH increments turn out to be almost proportional to dt ana in the relevant range.
In contrast to wind speed, model biases of T and RH need some time to reach their equilibrium because of the heat and moisture content of the air.
Based upon TRH fb , the minimum evaporation resistances of bare soil and plant stomata, rmin bs and rmin pl are modified with the multiplicative factor for positive TRH fb and for negative TRH fb (see Supporting Information Figure S1b for visualization).The coupling factor f c is set to 0.75 for rmin pl and to 1 for rmin bs .Note that, in contrast to the factor f sf for surface friction, f rmin does not need to be limited because its dependence on TRH fb is effectively linear (appearing for the negative function branch because a negative TRH fb needs to be associated with f rmin > 1).This is motivated by the fact that the adaptive evaporation parameters are supposed to be used in combination with the SMA and thus should not reduce biases so strongly that the SMA gets largely deactivated.For completeness, we note that the base values of rmin pl are derived from the land-cover class, whereas rmin bs is currently a constant that can be specified by namelist.Considerations of making rmin bs dependent on the soil properties have been postponed until the implementation of a more accurate and more highly resolved soil dataset has been completed.

Adaptive adjustment of heat conductivities and capacities
Besides the sensible and latent heat fluxes, the ground heat flux is an important component of the surface energy balance that is subject to large uncertainties or errors in NWP models.Usually, the relative importance of the ground heat flux is much larger at night than during the day due to the smaller size of the sensible and latent heat fluxes, and related model deficiencies typically show up as a bias in the diurnal temperature amplitude.Provided that the turbulence scheme is able to handle stable boundary layers properly, the largest error sources over snow-free land are related to insufficient knowledge of the heat capacity and heat conductivity of the soil and to the related physical properties of the canopy layer, which are parametrized using a skin conductivity approach in TERRA (Schulz and Vogel, 2020) over snow-free land.In ICON, the soil heat capacity and conductivity depend on the soil type and the prognostic soil water content, whereas the skin conductivity is specified as a function of the land-cover class.Particularly in regions with sparse or no vegetation, ICON used to exhibit large diurnal temperature amplitude errors of either sign, which could not be addressed by changing the dependence of the physical soil properties on the soil type because the latter has almost no correlation with the bias patterns.Therefore, the heat capacities and conductivities are adapted by multiplicative factors given by for positive T wfi and for negative T wfi .Here, the scaling factor f ai is taken to be in order to account for the fact that, if the DA works against a model bias, hourly increments of T wfi tend to be larger than one-third of three-hourly increments.The coupling factor f c = 2.5 for skin and soil heat conductivity and f c = 2 for soil heat capacity; and it is pointed out that, for the soil parameters, f hc is applied to the base values for dry soil derived from the external parameter data while retaining their dependence on the soil water content.Note also that the functional relationship between T wfi and f hc follows the same nonlinear approach as for surface friction in order to effectively reduce regionally large biases without entailing the risk of oscillations for nearly zero biases.
The lower and upper limits imposed on f hc are 0.1 and 10 respectively for skin and soil heat conductivity, and 0.25 and 4 respectively for soil heat capacity.More details about the tuning procedure leading to these parameter choices are provided in Supporting Information Section S2.Additional limits are set inside TERRA in order to ensure that the tuned parameters stay within a physically justifiable range (e.g., do not fall below the values for dry peat) and do not give rise to numerical instabilities at long time steps.

Adaptive adjustment of snow properties
Since the adaptive parameter settings described in Sections 2.4 and 2.5 have little or no impact in the presence of a snow cover, they should be complemented in an appropriate way for this case.For evaporation (or, more precisely, sublimation) of snow, there is seemingly little uncertainty because it is known to occur at its potential rate, but predicting the snow surface temperature needed to calculate the correct saturation vapour pressure actually turns out to be very difficult, particularly for snow lying beneath higher vegetation.Owing to the lack of a canopy-layer scheme in TERRA, this process is currently not treated in a physically satisfactory way.Specifically, snow beneath a forest is parametrized as dark snow in order to obtain a reasonable bulk surface energy balance, but this implies that the temperature at which snow sublimation occurs is too high during daytime.This issue is currently treated with an auxiliary parametrization that diagnoses the temperature difference between the "dark snow" and the "true snow" depending on the surface radiation balance and the saturation deficit of the near-surface air.As making the adjustable parameter of this scheme dependent on RH fi has only a minor positive impact on the forecast skill, this aspect is skipped here.
The heat conductivity and capacity of snow are primarily determined by the snow density  sn , which would be the next variable to be considered for adaptive tuning.Unlike the parameters discussed so far,  sn is a prognostic variable, implying that an assimilation increment would need to be derived from the filtered increments rather than varying a time-independent parameter around its base value.Unfortunately, the attempts made so far in this direction have not been sufficiently successful to be considered for operational use.This is apparently related to the fact that large model biases in the diurnal temperature amplitude are primarily due to missed local cold-air pools in unresolved or poorly resolved valleys (e.g., in the mountainous parts of eastern Siberia), whereas the biases in flat, snow-covered regions are relatively small despite the simplicity of TERRA's single-layer snow scheme.Obviously, compensating unresolved local effects by adjusting  sn would be undesirable.
Another uncertain parameter, which did turn out to be suitable for APT, is the albedo of snow-covered surfaces  sn .As already indicated,  sn either corresponds to the actual snow albedo if the vegetation is low enough to be fully covered by snow, or to the mixed albedo of the snow lying on the ground and the vegetation elements sticking out of the snow.In ICON,  sn is parametrized as a function of the land-cover class, the snow-cover fraction, and an auxiliary prognostic variable used to predict the aging process of the snow.Large uncertainties arise because the land-cover classification provides only crude information on the vegetation height and density, which reflects only part of the natural variability, and because there is currently no parametrization for the interception of snow on vegetation.Over glaciers, the uncertainty of  sn is generally smaller; on the other hand, the sensitivity of the T2M bias to  sn can be very high.Experiments for the Antarctic Plateau revealed that, in December and January, an albedo difference of 2% corresponds to a T2M bias difference of 1 K.As such a high accuracy is unreachable with a simple snow aging parametrization, adaptive tuning of  sn turned out to be beneficial over glaciers as well.This is realized through a multiplicative tuning factor applied on  sn , which is specified by for positive T fi and for negative T fi , with upper and lower limits of 4/3 and 3/4 respectively.Moreover, the final snow albedo is not allowed to fall below the lower limit of the regular snow aging scheme (0.6 times the fresh snow albedo) and not allowed to exceed the fresh snow albedo over glaciers, which is set to 80%.

Adaptive adjustment of the near-surface minimum diffusion coefficient
Though the elements of the APT presented so far have focused on uncertain physical properties derived from external parameter data, the methodology can be applied to tuning parameters as well.However, there are quite few tuning parameters having a clear functional relationship to near-surface model errors.The most promising one refers to the near-surface profile function of the minimum diffusion coefficient for heat, which is relevant in the presence of strong surface inversions.The general motivation for using minimum diffusion coefficients in ICON's turbulent kinetic energy (TKE)-based turbulence scheme (Doms et al., 2021) is that the diffusion coefficients derived from the prognostic TKE decrease too rapidly with increasing Richardson number Ri in the stable regime.By default, the minimum diffusion coefficients for heat K h,min and momentum K m,min have the form where x represents "h" or "m", K x0 are tunable parameters, and the reduction profile function p r (z) is given by where z (m) denotes the height above ground of a model surface, a 0 = 0 over glaciers and 0.25 otherwise, and a 1 = 4 × 10 −3 m −1 over glaciers and 7.5 × 10 −3 m −1 otherwise.For the adaptive tuning factor f kh , which is applied to K h,min only, we define T kh = T fi + 0.5T wfi and set for positive T kh and for negative T kh .The largest impact of the K h,min tuning is encountered in snow-covered regions with a very strong surface inversion, most notably the Antarctic Plateau in winter.Over snow-free land, the impact is typically one order of magnitude less than that of the measures described in Section 2.5.Note in this context that the prognostic TKE calculation is not touched by the APT and that K h,min ceases to be relevant when the basic K h derived from the prognostic TKE is larger than K h,min , which limits the impact of reducing K h,min under moderately stable conditions.The asymmetry between positive and negative values of T kh was found to be beneficial, because a cold bias during the polar night is usually more due to errors in the long-wave radiation balance rather than turbulent mixing, whereas a warm bias under extremely stable conditions is more likely related to mixing.

Conceptual remarks
As evident from the equations thus far, the APT can be regarded as an inherently stable force-restore algorithm.
A filtered bias of zero implies no parameter adjustment, which means that the respective model parameters attain their default value derived from the external parameter fields or specified by namelist.Consequently, any model bias in T2M, RH2M, or FF10M can be corrected only partly by the adaptive adjustment.This might be viewed as a disadvantage compared with bias correction approaches, which are usually designed to relax biases towards zero.However, it would be unrealistic to assume that the parameters selected for APT in ICON are responsible for the full difference between model and observations.In fact, the tests for the coupling factors between the predictors (filtered increments) and the resulting adaptive tuning factors usually show that forecast quantities other than the current predictor variable start to degrade when the coupling factor becomes too large (indicating overtuning; see Supporting Information Section S2).This furthermore implies that APT does not replace a continuous improvement of the physical parametrizations.
For clarity, it is pointed out that the APT is currently restricted to land points, even though the filtered increment fields are calculated globally.As will be mentioned later, the increments over oceans in some cases point towards weaknesses of our current DA procedure that are scheduled for future investigation.We also note that the description given in this section does not reflect the chronological order in which the individual components were introduced into the operational system.It rather summarizes the status at the time of writing this article and omits intermediate steps of the development.For instance, the components building upon T fi and/or T wfi were operationalized in May 2022 together with the assimilation of T2M (Zängl and Anlauf, 2022), which previously did not perform sufficiently well in regions with a large model bias in the diurnal temperature amplitude.Prior to this date, the evaporation tuning (Section 2.4) made use of a separate T2M analysis conducted for the SMA.Corresponding to the successive development and implementation of the APT, the coupling parameter tuning was done sequentially with occasional cross-checks that previously determined values are still appropriate after adding a new component.As may be expected, significant retuning is usually needed when extending the set of model parameters addressed within an existing APT component (see Supporting Information Section S2), and when the formulation of a parametrization depending on an adapted parameter is revised (which happened for the bare-soil evaporation), but no "bad surprises" in the sense of unexplainable side effects were encountered.Nevertheless, some cross-sensitivities between the main APT components exist that are robust against the fine-tuning of the coupling parameters.These will be summarized in Section 3.3.

RESULTS AND DISCUSSION
To evaluate the impact of the APT without the superposition of other changes in the operational system, dedicated experiments have been conducted.In the following, we will report results of an experiment performed for October 15-December 31, 2020, using the operational settings of March 2023 for all components relevant to APT.The assimilation cycle and the corresponding forecasts were conducted with full APT, and a second set of reference forecasts was performed without any adaptive tuning.Note that this approach disregards the synergistic interaction between the APT and the quality of the DA cycle (particularly for T2M), but concentrating on the direct APT impact simplifies the subsequent discussion.An additional reference experiment in which the APT was also disabled during the assimilation cycle was conducted for a shorter period ending at November 30, 2020.It serves for documenting the bias reduction achieved by the APT (Section 3.1) and confirms the presence of the aforementioned synergistic interaction for T2M, whereas RH2M and 2-m dew-point (TD2M) tend to be a bit better when the APT is consistently disabled.As all findings reported in the following are robust against the choice of the reference experiment, further discussion of the differences is left to Supporting Information Section S4.

Filtered increment fields and related model biases
To illustrate the typical structure of the filtered analysis increment fields, Figures 1 and 2 display FF fi , RH fi , T fi and T wfi for November 2020, averaged over the 0000 UTC field of each day.The fields shown in Figure 1 are taken from the assimilation cycle with full APT, implying that they reflect biases persisting despite the adaptive tuning, whereas Figure 2 provides the reference without APT.In any case, the sign of the filtered increments is opposite to that of the related model bias, and their magnitude is smaller by at least a factor of 2 because the observations are not assumed to be perfect in the DA system.For FF fi , Figures 1a and 2a show negative values over large parts of Asia, whereas positive values prevail over North America, particularly in sparsely vegetated regions at high latitudes.The bias reduction achieved by adapting the surface friction is more pronounced in the Asian regions with a positive wind speed bias (negative FF fi ) than in northern Canada, and the verification against near-surface (1,000 hPa) radiosonde wind speeds confirms a more favourable impact in Asia than in Canada (see Section 3.2).Large negative FF fi values remain along the coastlines of India and southeastern Asia, indicating representativeness issues of coastal stations that will be the subject of further investigation.Moreover, a striking line-shaped bias appears near the Equator in large parts of the Pacific and the Atlantic, where surface wind information primarily originates from scatterometers.As scatterometers measure the difference between the atmospheric and oceanic motion and the feature is collocated with the North Equatorial Countercurrent, it is believed that the scatterometer data need to be corrected for the ocean current speed.This is currently under investigation but beyond the scope of this article.As already mentioned, glacier regions (Greenland and Antarctica) are excluded from the z 0 adaptation because of the unclear interpretation of the biases.
The largest RH biases are found in northern Canada and parts of northern Siberia (Figures 1b and 2b).There, positive RH fi values indicate a dry model bias, which

F I G U R E 2 Same as
Figure 1, but with adaptive parameter tuning turned off during the assimilation cycle.
occurs in combination with a large cold bias in northern Canada (Figures 1c and 2c).Neither of these biases is significantly reduced by the APT, which is easy to understand because Arctic regions receive little to no sunlight in November, so that the snow albedo adjustment (see Supporting Information Figure S7 and the related discussion) is ineffective.Most likely, the biases are related to a liquid water deficit in the Arctic stratus clouds, which is a well-known issue shared by many other models.The resulting underestimation of downward long-wave radiation induces a cold bias that is even larger at the snow surface than in the adjacent air, so that excessive resublimation of atmospheric water vapour on the snow induces a dry bias even in terms of relative humidity.However, most lower latitude land regions showing moderate biases benefit significantly from the APT, as will be demonstrated in more detail for central Europe and central Asia in Section 3.2.A feature deserving further investigation later on is the warm bias (negative T fi ) over the northern Atlantic and northern Pacific, which is surprising given the fact that the sea-surface temperature analyses are subject to relatively little uncertainty and only few surface temperature data are available for DA in these regions.
The global overview is completed by T wfi displayed in Figures 1d and 2d, highlighting regions with a bias in the diurnal temperature amplitude.Most prominently, a large underestimation is found in parts of central and eastern Asia, northern Africa, and the Arabian Peninsula, whereas comparatively small regions with an overestimation exist, for instance, south of the Sahara, north of the Black Sea, and in parts of western North America.Most of these biases are markedly reduced by the APT.As will be shown in more detail in Section 3.2, underestimated temperature amplitudes usually originate from a nocturnal warm bias, which can be effectively reduced by the adjustment measures described in Section 2.5.An issue needing further examination is the negative T wfi occurring over the Mediterranean Sea and along several coasts of Asia and Australia.Although the assimilation of T2M is restricted to stations for which the interpolated model data has a weighted land fraction of least 50%, it seems that some coastal stations are still subject to appreciable representativeness issues.Owing to the structure of the error covariances used in the DA system, the related signals are then spread out over the adjacent oceans.

Impact on forecast quality
To quantify the impact of the APT, Figures 3-5 display biases (mean error) and standard deviations (SD; i.e., bias-corrected RMSEs) against forecast lead time for selected regions in which substantial model biases exist in the reference experiment without APT in the forecasts.Results are averages over the 0000 UTC forecast of each day in the period October 20-December 31, 2020, verified every 6 hr against the routinely available SYNOP observations for FF10M, RH2M, T2M, and TD2M up to the operational lead time of 180 hr.For central Asia, Figure 3 shows a pronounced positive wind-speed bias and a marked underestimation of the diurnal temperature amplitude in the reference run, which are reduced by roughly a factor of 2 due to the APT.While the reduction of the wind-speed bias is an almost constant shift, the improvement of the diurnal temperature amplitude is primarily accomplished by reducing the nocturnal warm bias, whereas there is little change during daytime.The latter can be easily understood by the fact that the ground heat flux makes a much larger contribution to the surface energy balance at night than during the day.For TD2M, Figure 3 indicates a substantial reduction of the positive bias, but there is only a light bias improvement for RH2M.
The SD improvement related to the APT is largest for FF10M and RH2M, followed by TD2M, whereas the T2M improvement is largely confined to the amplitude bias.
It is noted that the SD reduction found for FF10M and RH2M/TD2M should not be interpreted as a reduction of random forecast errors because it primarily originates from the fact that the bias improvements are not uniform within the region (cf.Figures 1 and 2).Considering individual stations would reveal only a minor SD improvement (not shown).As can be inferred from Figures 1 and 2, even larger improvements are found for FF10M to the north of the domain considered here, whereas the gains in T2M skill are smaller at higher latitudes.Regarding the temporal evolution of the bias differences, it is worth noting that they are almost independent of the forecast lead time for FF10M, reflecting the fact mentioned in Section 2.3 that the dynamical balance determining the near-surface wind profile is established very quickly.A moderate growth with lead time is seen for T2M, and this would be even stronger in regions or seasons with a more positive surface radiation balance (not shown).Different bias characteristics are found in central Europe (Figure 4).There, a light negative FF10M bias prevails, which remains almost unaffected by the adaptive tuning.This lack of sensitivity is partly related to the nonlinear formulation of the scheme (Equations 3 and 4) and, more importantly, the fact that the magnitude of FF f i is smaller in this region than expected from the FF10M bias alone, probably due to the impact of other sources of wind observations.On the other hand, a large improvement is obtained for RH2M, where the adaptive tuning reduces a pronounced dry bias by increasing the surface evaporation.This goes along with a light cooling and a light increase of F I G U R E 4 Same as Figure 3, but for central Europe (45 TD2M, and it is interesting to note that the SD reduction achieved for RH2M in this region is much larger than for TD2M.A moderate improvement is also achieved for low-level cloud cover, which has a negative bias related to the negative RH2M bias (not shown).The reason for the late-autumn dry bias, which recurs every year in central Europe, has already been investigated thoroughly at DWD but not fully understood.It is most prominent under anticyclonic conditions with fog or low stratus and even larger in most other global models than in ICON prior to introducing the APT.Besides a deficit in surface evaporation, excessive vertical mixing in stable boundary layers could contribute to this bias, but a general reduction of the minimum vertical diffusion coefficients would induce degradations in other situations and regions.
The third region selected for closer discussion is Canada (Figure 5).As already mentioned, this region exhibits a pronounced positive FF10M bias, which is significantly reduced by the adaptive surface friction, but not as much as the negative bias prevailing in large parts of Asia.The corresponding change of SD is very small because the bias improvement is spatially more uniform than in Central Asia.A reason for the smaller impact might be that the surface roughness is generally small in northern Canada, with an original z 0 < 1 cm in the presence of a snow cover and weak SSO impact due to the lack of significant mountains.In fact, sensitivity tests with even stronger adaptive reduction of surface friction showed only a minimal impact.However, as already indicated earlier herein, there is also less consistency between the FF10M bias and the near-surface wind-speed bias against radiosondes than in Asia.The number of low-elevation radiosonde stations reporting a 1,000-hPa wind speed is quite small in this region, but they indicate a positive model bias that is further increased by the adaptive surface friction (not shown).Thus, it would not be desirable to further enhance the impact of the FF10M observations in this region.For RH2M, a moderate improvement is achieved by adapting surface evaporation in order to reduce the prevailing dry bias, and it is noted that this change originates from the midlatitude parts of Canada that are not snow covered in the first half of the verification period.The aforementioned cold/dry bias north of 70 • N is essentially unaffected by the APT but does not contribute much to the domain averages shown in Figure 5 because of the low station density in the High North.Apart from that, T2M shows a slight cooling that is consistent with increased surface evaporation but has negligible impact on the SDs.Unlike RH2M, there is essentially no SD reduction obtained for TD2M.
The global impact of the APT on FF10M, RH2M, T2M, and TD2M and on the corresponding variables in the free atmosphere is summarized in Figures 6 and 7 respectively.The scorecards display the relative RMSE reduction of the operational configuration with respect to the reference experiment without APT, filled bars/boxes indicating statistical significance at the 95% level.Moreover, they contain averages over 0000 UTC and 1200 UTC forecasts and therefore do not reflect the diurnal cycles seen in some of the previous figures.The surface verification (Figure 6) indicates improvements for all variables directly affected by the APT, most of them staying statistically significant over the entire forecast range of 180 hr, although the RMSE reduction gradually decreases with lead time.The largest improvements for FF10M are found in Asia and in the Tropics, the latter primarily originating from the tropical parts of Asia, for which Figures 1 and 2  negative FF fi .On average over the extratropical hemispheres, the RMSE reduction is about 5% during the first forecast days.Similarly large improvements are found for RH2M and TD2M on a hemispheric average, even though Europe and North America show smaller improvements for TD2M than for RH2M (this is compensated by subregions not shown explicitly here, particularly the extratropical part of northern Africa).For T2M, the improvements are smaller than for the other variables except for the Asian regions with a pronounced underestimation of the diurnal temperature amplitude.However, this disregards the fact that T2M would not be assimilated operationally without the bias reduction contributed by the APT (see Supporting Information Section S4 for further discussion).The combined impact of both will be addressed in Section 3.4.For completeness, it is noted that the improvements achieved for RH2M/TD2M tend to have a slight beneficial impact on low-level cloud cover and global radiation, with an RMSE reduction of about 1% on average over the hemispheres (not shown).During the first forecast days, the adaptive surface friction also tends to have a slight beneficial impact on surface pressure/geopotential, but this has substantial regional and seasonal variability, and a safe statement would be that the impact is neutral to slightly positive.

indicate large
The verification against radiosondes shown in Figure 7 serves as a cross-check that the APT really reduces model errors rather than pushing the model towards observations suffering from representativeness or measurement errors.This appears to be the case in most parts of the world, with a minor exception for FF10M in Canada, as already indicated earlier herein.The largest beneficial impact on the atmospheric forecast quality is obtained in Asia, being significantly positive throughout the troposphere for wind speed and temperature, and up to 700 hPa for humidity at least until forecast day 5.The improvement at upper levels also pertains to geopotential (not shown) and originates from the Tibetan Plateau and the surrounding high mountain ranges.In the lower troposphere, significant improvements of all three variables are also found in Europe and in the Tropics, whereas improvements are restricted to temperature and humidity in North America.There, the wind speed at 1,000 hPa shows a slight degradation, which primarily originates from Canada and is due to inconsistent wind-speed biases against surface observations and radiosonde data.In the Southern Hemisphere, the changes in the atmospheric scores are small and mostly insignificant.The differences arising from turning off APT completely rather than just in the forecasts are presented in Supporting Information Figures S8 and S9 and the related discussion (Section S4).

Sensitivities and cross-dependencies
As may be expected intuitively, the individual APT components described in the previous section primarily optimize the variables passed as predictor; that is, the adaptive surface friction optimizes FF10M, the adaptive heat conductivities and heat capacities (as well as the adaptive K h,min ) optimize the diurnal temperature amplitude, and the adaptive evaporation parameters provide a combined optimization of daytime temperature and humidity biases.However, there are also cross-dependencies that shall be briefly summarized here.As already indicated, the adaptive heat conductivities and capacities have a much stronger impact at night than during the day because the relative contribution of the ground heat flux to the surface energy balance is larger at night, so that an APT-induced increase of the diurnal temperature amplitude usually goes along with a reduction of the daily average temperature and dew-point.The side effect on RH2M errors is variable but tends to become predominantly negative when increasing f hc beyond the values used in our implementation (see Supporting Information Section S2).In addition, changes of surface friction affect the vertical wind shear in the boundary layer, which alters not only the momentum fluxes but also the heat and moisture fluxes, and there are side effects on the heat and moisture transfer at the surface.Since both surface and atmospheric fluxes increase with increasing z 0 , the effects on T2M and RH2M get partly compensated and turn out to be rather small even for very stable boundary layers.However, existing inconsistencies or oversimplifications in the physical formulation of the land-surface scheme may be amplified.This was observed most prominently in snow-covered, forested regions at temperatures far below freezing (Siberia in midwinter), where the RH2M scores, and to a lesser extent the T2M scores, tend to be degraded by increasing z 0 .We hypothesize that this is related to the missing distinction between the heat and moisture fluxes from the soil/snow and those from the vegetation layer, which would require a much more sophisticated canopy-layer scheme than what is available in TERRA.Specifically, the transfer coefficients for the fluxes from/to the snow cover should be smaller than those for the vegetation layer rather than being all based on the z 0 of the trees, and the related overestimation of the surface fluxes gets aggravated when increasing z 0 in order to reduce the FF10M bias.Additional tests have confirmed that this sensitivity occurs irrespective of whether APT is activated or not, implying that this issue is not caused but just highlighted by APT, and it was decided to postpone its solution to the availability of a more advanced canopy-layer formulation.

International comparison
To put the performance of DWD's operational NWP system, including the recent improvements achieved with the APT, into an international context, Figure 8 displays time series of FF10M, RH2M, and T2M scores from the global NWP centres participating in the surface verification data exchange programme coordinated by the World Meteorological Organization (WMO, 2019).Results are considered for a validity time of 1200 UTC and a short lead time of 24 hr in order keep the impact from errors in the synoptic-scale evolution small.It can be seen that DWD has already been leading in the Northern Hemisphere and the Tropics for several years (i.e., prior to the implementation of APT), whereas ECMWF had similar forecast skill for T2M and RH2M in the Southern Hemisphere.A marked jump appears in May 2022 with the introduction of the T2M assimilation and the related APT components, and it is obvious that the total improvement achieved in the Northern Hemisphere is larger than the pure APT impact shown in Figure 6 of the FF10M assimilation in Russia and some adjacent countries, which was deactivated so far because the large model biases (Figures 2a and 3) had negative impacts in the assimilation cycle prior to introducing the APT.For the first step, the improvements seen in Figure 8 are similar to those in Figure 6, and thus less striking than those for T2M half a year before.

SUMMARY AND OUTLOOK
This article presented a new approach for the adaptive tuning of model parameters that was recently introduced into the operational NWP system of DWD.The current implementation focuses on optimizing the forecast skill of the near-surface variables FF10M, T2M, and RH2M, for which it is relatively well known which parameters have a strong influence on systematic forecast errors.Building upon the condition that these variables are assimilated and dominate the assimilation increments of the related prognostic model variables at the lowest model level, the algorithm starts with computing time-filtered fields of these assimilation increments.These are taken as proxies for the related model biases and serve as predictors for the APT.The most important model parameters subject to adaptive tuning are the roughness length and the SSO blocking tendency at the lowest model level for FF10M, the minimum evaporation resistances of bare soil and plants for the combined optimization of T2M and RH2M in daytime, and the heat conductivities of the soil and the skin layer as well as the soil heat capacity in order to optimize the diurnal amplitude of T2M.The ensuing improvement of the forecast scores is on the order of 5% in the early part of the forecast range, gradually decreasing with increasing lead time.Smaller but statistically significant improvements are also achieved in the lower troposphere, as revealed by the verification against radiosonde ascents.
Comparing our APT approach with previously proposed methods summarized in Section 1, it is clear that the closest similarity exists to the combined state-parameter estimation approaches considered, for instance, by Schirber et al. (2013) and Kotsuki et al. (2018).However, the parameter estimation is not integrated into the DA scheme in our case.In this sense, APT could be viewed as an extended (or indirect) usage of DA information in addition to the usual analysis of the prognostic state variables.Since APT does not involve any postprocessing after solving the prognostic equations and the model-internal diagnostics of T2M and RH2M are unaffected by APT, ICON forecasts are nevertheless not calibrated forecasts in the usual meaning of this term (calibrated forecast products are generated in addition at DWD, but these are unrelated to what is presented in this work).Moreover, there is no dependence on pre-existing training data as for most machine-learning methods, and no artificial forcing terms potentially compromising the physical consistency of the model variables are applied.
A strategically important aspect for the future model development process is the interaction of the APT with subsequent changes in the physics parametrization package, which can be either improvements in the formulation of specific processes or changes of the basic tuning.Since APT adapts to bias changes within a few days by construction of the filtering time-scale, no significant complication is expected for the development and testing process.The experience gained so far is that improvements seen in pure forecast tests with deactivated APT are qualitatively maintained in subsequent assimilation experiments with full APT but exhibit a reduced magnitude, as may be anticipated from the fact that a smaller bias automatically results in a smaller parameter adaptation.On the other hand, the information gained from analysing the filtered increment fields shown exemplarily in Figures 1 and 2 may give hints for improving specific aspects of parametrizations.This has already happened for the dependence of z 0 on the vegetation phase and snow coverage, which was found to be too strong for forest vegetation and resulted in an annual cycle of the FF10M bias in some parts of the world.
Extensions of APT to further external or model tuning parameters appear possible but depend on the existence of a clear functional relationship between the selected predictor and the tuned parameter.For instance, it would probably be difficult to find appropriate predictor fields for convection tuning parameters affecting precipitation as well as the temperature and humidity profiles in large parts of the troposphere.Moreover, predictor fields should not be strongly affected by bias-corrected observations like satellite radiances, because this entails the risk of feedback loops between the (adaptively tuned) model bias and the bias correction even though the APT algorithm itself is inherently stable.Thus, the surface-based parameters addressed in this work are presumably the most suitable ones.Looking beyond NWP, one could think of creating climatologies of filtered increments in order to reduce the model biases in applications with seasonal or climate time-scales, for which the persistence assumption underlying the algorithm presented in this work may not be valid.Alternatively, one could try augmenting the external parameter data themselves by the information gained from APT if the required corrections are essentially static (e.g., for soil properties).These ideas have not been pursued yet because extending ICON's NWP physics package for applications at longer time-scales is still ongoing, but they may be considered as options for quality optimization later on.Evidently, the need for training data would come up in these cases, implying that at least a few years of DA cycles with APT need to be available first.

F
day of November 2020 for (a) wind speed (m⋅s −1 ), (b) relative humidity [0, 1], (c) temperature (K), and (d) temperature with opposite weighting for day and night (see text for further explanation).

F
I G U R E 3 Bias (mean error, ME) and standard deviation (SD) of forecast error for 10-m wind speed (FF10M), 2-m relative humidity (RH2M), 2-m temperature (T2M), and 2-m dew-point (TD2M) against forecast lead time for central Asia, defined as 35 • N-50 • N, 50 • E-100 • E. Solid (dashed) lines indicate the operational configuration with full adaptive parameter tuning (reference experiment without adaptive parameter tuning in the forecasts).Results are shown for the 0000 UTC forecasts started between October 20 and December 31, 2020; the date range given in the header indicates the bounds of the validation period.OPER: operational configuration; REF: reference experiment.

F
Scorecard for the verification against radiosonde ascents, averaged over 0000 UTC and 1200 UTC forecasts of the full experiment period.Green (red) bars indicate an improvement (degradation) due to adaptive parameter tuning; filling indicates statistical significance at the 95% level.Regions are defined as in Figure6.FF: wind speed; RH: relative humidity; T: temperature.
. A beneficial side impact is also found for RH2M.The adaptive surface friction was put into operations in two steps in November 2022 and March 2023, the second one lying in the activation F I G U R E 8 World Meteorological Organization (WMO) intercomparison for operational forecast verification against SYNOP stations for 10-m wind speed (FF10M; m⋅s −1 ), 2-m relative humidity (RH2M; %), and 2-m temperature (T2M; K) at a lead time of 24 hr.Regions are defined as in Figure 6.Participating numerical weather prediction centres are the Japan Meteorological Agency, Météo-France, UK Met Office, the European Centre for Medium-Range Weather Forecasts (ECMWF), and Deutscher Wetterdienst.RMSE: root-mean-square error; NH: Northern Hemisphere; SH: Southern Hemisphere; TR: Tropics.