Ensemble‐based estimates of the impact of potential observations

In order to conduct observation‐network design experiments for a forecast system, methods are needed to estimate the benefit of potential observations. Ideally, these methods are flexible enough to accommodate multiple observation types and forecast lead times while being computationally fast enough to evaluate many potential observational network layouts. For ensemble forecasts, this can be achieved by assuming that analysis and forecast errors are represented by the respective ensemble variance and by assuming a linear ensemble sensitivity between the background state and a forecast quantity of choice. These assumptions enable estimating how much the forecast error of a forecast quantity would be reduced for an arbitrary combination of observation locations and types without the need to run additional forecasts. An aspect that has received little attention is that to apply these variance‐reduction estimates to a specific ensemble‐forecast system consistently, the system's localization needs to be taken into account. We introduce and compare three methods to take localization into account when estimating the benefit of potential observations. One method requires explicitly inverting the background ensemble covariance matrix, one method avoids the inversion but needs to be provided with estimates of signal propagation over time, and the third method requires both the inversion and the signal propagation estimate. We introduce a simple linear‐advection toy model to perform various observing‐system simulation experiments to test the three methods with point and indirect observations. Our results indicate that the methods requiring signal propagation to be provided are best suited to short lead times, whereas the remaining method is more accurate if signal propagation is not known well; for example, for longer lead times.


INTRODUCTION
Weather forecasts rely on a combination of up-to-date observations and past model forecasts to estimate the current state of the atmosphere, which is then used to initialize new forecasts. Gradually incorporating more and more diverse observations has been one of the main reasons numerical weather prediction (NWP) models have improved steadily over the last decades (Bauer et al., 2015). In practice, acquiring and assimilating new observations requires substantial investments. In particular, for regional convection-permitting NWP models, it remains an open question which additional observations would be most beneficial to assimilate Gustafsson et al., 2018;Hu et al., 2022). To make informed decisions on where to deploy which types of observations, methods are needed to estimate potential observations' benefits before they are deployed and incorporated into the data assimilation system.
The most direct method is to use simulated observations and then run data-denial experiments; that is, the benefit of the observation is determined by how much better forecasts with the simulated observation are than forecasts are without. The weakness of these so-called observing-system simulation experiments (OSSEs) is that they are computationally expensive, as running the NWP system is one of the most demanding routine tasks in the world of supercomputing. As such, OSSEs are typically only done in small numbers for high-value and cost-extensive observing systems, such as satellites (Zeng et al., 2020). One approach to avoid the high cost of OSSEs is to evaluate potential observations on how much they would influence the analysis, assuming that reducing initial uncertainty reduces forecast error. However, this approach cannot offer a quantitative estimate of the reduced forecast error and how it propagates over time. Adjoint models can propagate the observation impact through time and space (Langland and Baker, 2004), but developing and maintaining adjoint code is a major and non-trivial endeavor.
For ensemble NWP systems, linear ensemble sensitivities can be used instead of an adjoint model to estimate the relationship between ensemble deviations in the analysis state and a scalar forecast metric of choice (Ancell and Hakim, 2007). Ensemble sensitivities, also referred to as ensemble regression (Gombos and Hansen, 2008), can be used to estimate the individual contribution of observations to the reduction of forecast error (Kalnay et al., 2012;Weissmann, 2014, 2016;Necker et al., 2018;Kotsuki et al., 2019). Additionally, ensemble sensitivities can be used to estimate the benefit of potential observations on a forecast metric, as long as the metric is only a function of the model state and not dependent on observations . Instead of estimating the error reduction, the ensemble variance reduction of the chosen forecast metric is estimated. For an unbiased ensemble with an ensemble spread that reflects forecast uncertainty, variance reduction is equivalent to error reduction. However, to avoid confusion, we will refer to all ensemble approaches and experiments that use potential observations as 'variance reduction'.
Variance reduction has been used to estimate the ideal location to deploy a one-time observation for a specific forecast, commonly referred to as observation targeting Reynolds et al., 2007;Majumdar, 2011Majumdar, , 2016Xie et al., 2013;Hill et al., 2020;Kerr and Wang, 2020). In contrast, network design experiments usually attempt to find the ideal static location of a set number of observations (Mauger et al., 2013;Hakim et al., 2020;Tardif et al., 2021). Ensemble-based network design experiments tend to take a climatological approach using an ensemble sampled over time. Generating a large ensemble comes at no computational cost for such climatological studies, as members can be sourced from various datasets.
So far, network design studies using ensemble variance reduction have only used point observations of a single variable. To our knowledge, none of these studies have taken the analysis localization used in a forecast system into account. Analysis localization, however, is a crucial component of ensemble assimilation systems, especially for convective-permitting regional forecasts. Hill et al. (2020) found in their observation targeting study that the impact of analysis localization is dramatic and that their estimates performed best when localization was disabled. Previous variance-reduction studies that looked at larger scales often explicitly disabled localization (e.g. Xie et al., 2013). In this study, we revisit the methods used in the network design studies of Mauger et al. (2013), Hakim et al. (2020), and Tardif et al. (2021) and present three different methods to account for analysis localization in full detail. The first method has not been applied in any study before our companion paper (Nomokonova et al., 2022) and requires inverting the background ensemble covariance matrix, which is computationally rather expensive and introduces a free parameter needed to regularize the matrix. The second method avoids the inversion but needs to be provided with estimates of signal propagation over time. The latter approach has been used in previous studies, which omitted the localization aspect when deriving the method. The final method combines the strengths and weaknesses of both previous methods. We will test all three methods using a linear advection toy model that includes two types of observations: a point (conventional) observation of a model state variable with a linear observation operator, and an indirect (non-conventional) observation with a nonlinear observation operator that was inspired by cloud-affected satellite radiance observations. Our main scientific questions are whether the variance-reduction methods can correctly account for analysis localization and how sensitive the methods are to the applied regularization and errors in the estimated signal propagation. These questions are tested over a range of lead times, analysis localization lengths, and ensemble sizes.
The remainder of the article begins by introducing the linear advection toy model, followed by a description of the two types of observations, the assimilation method, and experimental set-ups used throughout the article. We then explain the variance-reduction methodology with a step-by-step derivation in Section 3 and also examine the dependence of the methods on ensemble sensitivity regularization and signal propagation. Thereafter, Section 4 begins by illustrating our experimental approach and compares the observations with a linear and nonlinear observation operator. After that, we explore how the variance-reduction methods respond to changes in analysis localization length, forecast lead time, and ensemble size before looking into the effects of regularization and signal propagation errors. Finally, we summarize our conclusions and discuss which method is suited to which NWP application in Section 5.

Model dynamics and default parameters
The toy model we developed for this article is based on the one-dimensional advection of a unitless passive tracer in a domain with periodic boundary conditions, which we will refer to as the linear-advection model. We set the velocity u constant throughout the domain so that (x), the only variable, is continuously advected downstream. Accordingly, the one-dimensional advection equation fully describes the model's dynamics. To generate an ensemble forecast, we initialize all ensemble members with the same initial field but perturb the u values of each member around a true value u, as shown in Figure 1a.
We chose to create this linear model instead of using one of the popular nonlinear models (e.g., Lorenz) for two main reasons. First, in the absence of wave dynamics, signal propagation is solely dictated by the prescribed velocity u, and the forecast model spread results from the prescribed u perturbations and the gradient ∕ x. Second, the numerical implementation via a semi-Lagrangian advection scheme is easy, cheap, and unconditionally stable for all grid sizes and time steps.
Our default model set-up has a spatial resolution of 100 m for a 30 km long channel, so that the state size m is 300 and an order of magnitude larger than our default ensemble size n of 32 (model parameters are listed in Table 1). Our true field is a single Gaussian peak and is used to initialize all ensemble members ( Figure 1). The truth has a constant velocity u of 10 m⋅s −1 , whereas the velocities of the individual ensemble members are perturbed with random Gaussian noise at each assimilation step. Note that the mean over the perturbed velocities is not forced to match the true velocity u. As a result of the perturbed velocities, the model spread is largest at the peak's flanks. The 10 m⋅s −1 true velocity makes it easy to convert from time step to distance, and the default times step of 550 s between assimilation steps was chosen to avoid the peak always being at the same few positions at assimilation time. Given that the periodic domain is 30 km long and the true velocity 10 m⋅s −1 , the 550 s time step results in 60 recurring peak locations, each 500 m apart.

Observations
We use two types of observations. The first observation type is direct point observations of the model variable , shown by the black circles in Figures 1 and 2. Converting from model space to observation space is achieved by selecting the value of at the specific location, making it trivial to transform from model to observation space and back. We generate these observations by adding Gaussian noise to the truth. These observations thereby F I G U R E 1 The first two assimilation steps of the linear model using the default settings listed in Table 1, along with the point observations randomly generated using the truth. Show in (a) and (c) are the backgound ensembles before the observations are assimilated, and (b) and (d) show the analysis ensembles after the observations are assimilation. Not shown are the indirect observations displayed in Figure 2. The model spread in (a) results only from the perturbations of u, as the initial conditions were identical. Although assimilating the observations always leads to a reduced spread in the analysis, the resulting ensemble mean can, on rare occasions, be pulled away from the truth [Colour figure can be viewed at wileyonlinelibrary.com] reflect conventional observations with a linear observation operator.
The second type of observation has a larger footprint than the grid size and a nonlinear operator (Figure 2b). We designed these observations as a simplified imitation of cloud-affected satellite radiance or reflectance observations that cover the entire domain with a constant spacing. All grid points where is above the threshold value of c = 0.5 are considered as 'cloudy' and have a reflectance of 0.7. The remaining grid points have a 'clear-sky' reflectance of 0.3. The indirect observations are then calculated by averaging the reflectance horizontally over 1,500 m. This averaging results in 20 equally spaced reflectance observations, as our model has 300 grid points and a spatial resolution of 100 m ( Table 1). The precise observation values are generated by calculating the reflectances of the truth and then adding Gaussian noise ( Figure 2). As a final step, we limit the reflectance observation values between 0 and 1.
In contrast to the point observations, the indirect (reflectance) observations only provide information if is above or below c . They provide no information on how much higher or lower is than c . Accordingly, reflectances are most useful along the flanks of the peak, where the true reflectance changes between 0.3 to 0.7 and the ensemble spread is highest (Figure 2). In the valley, the Kalman filter gives no weight to the reflectance observations as the background spread of the ensemble reflectance equivalents Y is zero there. Another difference from the point observations is that it is no longer possible to invert the observation operator. Though we are not aware of other studies that applied such a nonlinear observation operator to a toy model, our cloud-affected reflectance is somewhat similar to the prognostic rain variable in the convective shallow-water model of Würsch and Craig (2014). In this model, rain occurs when the height surpasses a threshold and decays with time.
We included these indirect reflectance observations for three reasons. First, to show how the theory needs to be adapted to cope with a nonlinear observation operator. Second, to test how the variance-reduction methods are affected by a nonlinear observation operator. And third, to test whether the methods can deal with two different observation types simultaneously.

Assimilation set-up
To assimilate the observations, we choose the ensemble square-root Kalman filter (EnSRF) proposed by Whitaker and Hamill (2002). The linear advection model is run forward in time to create the background ensemble, after which the analysis is created by assimilating the randomly generated observations (see Figures 1 and 2). The change in the ensemble mean is determined by the distance between the ensemble mean and the observations. The reduction in model spread is not affected by the observation values and is only determined by the model and observation errors. We assimilate all observations, both the direct and indirect, at once. If desired, however, the observations could be assimilated iteratively. The precise formulation of the EnSRF will follow in Section 3.1. The EnSRF of Whitaker and Hamill (2002) was also used in most previous ensemble variance-reduction publications (e.g., Mauger et al., 2013;Hakim et al., 2020;Tardif et al., 2021), and by using the same Kalman filter in the assimilation update as well as in the variance-reduction estimates we ensure that errors in the variance-reduction estimates are not related to different assimilation methods. This is an advantage for our detailed evaluation of the estimates, but we would like to note that the variance-reduction methodology can also provide useful information for other assimilation systems as long as the peculiarities of the respective system are kept in mind. The EnSRF approach of Whitaker and Hamill (2002) is introduced in Section 3.1.
For localization, we use the common Gaspari and Cohn (1999) function, which is very similar to a Gaussian but becomes zero after a cut-off distance. Throughout this article, the localization length we refer to is half the cut-off distance. An example of the Gaspari-Cohn function is included in the discussion of signal propagation in Section 3.3.3. When localizing, we treat the reflectance observations as point observations located at the center of the reflectance observations.

The OSSEs
OSSEs use a model simulation as a virtual truth (nature run), from which simulated observations are generated and assimilated (Errico and Privé, 2018). Many studies use OSSEs to assess the impacts of potential observations and evaluate impact-estimation methods (e.g. Privé et al., 2021). OSSEs should not be confused with observing system experiments, which assess the impact of real observations (Gelaro and Zhu, 2009). We evaluate the variance-reduction methods by running multiple OSSEs and then comparing the estimated versus the true variance reduction. Each OSSE is a self-contained and independent experiment, with no memory from one experiment to the next. One such experiment is shown as an example in Figure 3. Experiments are initialized with the background ensembles from the time integration of the linear advection model shown in Figures 1 and 2 using the default values listed in Table 1. The only exception is for the experiments in which the ensemble size is varied. For these experiments, the default ensemble size is increased from 32 to 512 members, and then, for each experiment, as many ensemble members as required are randomly selected from the 512 members.
We can generate an arbitrary number of OSSEs from each background ensemble by varying which ensemble member is selected as the truth and changing the seed to randomize the observation noise and velocity perturbations. For each set of parameters to test, we run 15 experiments for each of the default assimilation steps between 40 and 100, resulting in 900 experiments. The 60 time steps were chosen to equally sample all the 60 possible peak locations in the default model run. The indirect reflectance observations settings in the default run remain unchanged in the OSSEs. However, in contrast to the default run, which only used three direct observations to allow for more model spread, we use six equally spaced point observations in the OSSEs (Figure 1 versus Figure 3). The number was increased so that the indirect and point observations contribute roughly equally to the analysis increment (see Section 4.1). To ensure that our results are not only valid for the specific set-up we just described, we also performed the tests with randomized point observation locations and numbers. Though there were small differences in the results, the main qualitative conclusions were unaffected, so we decided to keep the observations stationary. Throughout this article, we use the sum over the tracer of the 100 grid points in the F I G U R E 3 Example experiment generated by selecting the 43rd assimilation step as the background and selecting a random ensemble member as truth, as explained in Section 2.4. Shown are (a) the background ensemble along with the member chosen as the truth, (b) the analysis ensemble after assimilating the observations, (c) the free forecast launched from the background, and (d) the forecast launched from the analysis. The default settings are used ( Table 1) (shown by the gray background in Figure 3). The precise recipe we use to generate the OSSEs, as shown in Figure 3, is as follows: 1 Use a forecast ensemble generated by the default time integration as the background ensemble. 2 Select an ensemble member as the truth and use it to generate both and reflectance observations by forward modeling and adding a random error. 3 Assimilate observations to create analysis. 4 Perturb the velocities randomly for each ensemble member and use the same velocities to create the free forecast from the background and the forecast from the analysis. 5 Calculate the forecast metric of each free forecast and forecast ensemble member. Compute the resulting variance reduction.
To illustrate the nonlinear relationship between variance reduction and lead time, Figure 4 shows the time evolution of the forecast metric and variance of the forecast metric 2 for the same OSSE shown in Figure 3. For the chosen example, the variance is very small for lead times below 300 s because all ensemble members have forecast metrics close to zero due to the initial conditions shown in Figure 3. As the peak enters the middle of the domain over which is computed, both the forecast metric (Figure 4a,b) and the variance (Figure 4c) increase strongly. There is a dip in variance when the mean ensemble peak is in the middle of the domain after about 1,300 s, before a second peak as the ensemble members exit the middle of the domain at different times. In addition to the changes caused by the peak moving through the middle of the domain, the variance increases with time as the ensemble members diverge more and more due to their different velocities.

ESTIMATING THE BENEFIT OF POTENTIAL OBSERVATIONS
We begin by introducing the ensemble Kalman filter along with the terminology and notation we use throughout this article. We then define what we refer to as variance reduction and the assumptions needed to estimate it. This is followed by a step-by-step derivation of the different methods F I G U R E 5 The ensemble deviations of the forecast metric for the free-forecast ( j ff ) and forecast ( j fc ) ensembles from the experiment shown in Figure 3, as well as the first-order linear approximations (j ff ,j fc ) in Equation (18). The forecast metric is the sum of the middle of the state vector, and the black lines mark the standard deviation of each of the four ensembles. For this specific example, the variance reduction was −123.6, and the estimated variance reduction was −119.4 [Colour figure can be viewed at wileyonlinelibrary.com] we use in this study, along with a detailed description of the respective assumptions.

Analysis error reduction
Ensemble Kalman filters reduce the error of a prior ensemble forecast, the background state X b , by assimilating observations, resulting in the analysis X a (Evensen et al., 2022). We capitalize all matrices, such as the ensemble state X and the ensemble deviations X, which both are of the dimension m × n, with m being the size of the state vector and n being the number of ensembles (see Table 1). Throughout the article, we will always use to refer to deviations from the ensemble mean and bold lower-case letters for vectors, such as the state vector x. As our toy model only has a single variable, the state vector x is equal to the discrete tracer field , and the length of x equals the number of grid points. A critical component for ensemble filters is the observation operator H that transforms from state space to observation space by calculating what observation values would result from each ensemble member Y = H(X). Y is a p × n matrix, with p being the number of observations. If the operator is linear, it can be expressed through a matrix H, with H(X) = HX = Y.
The difference between the ensemble covariance matrix of the background ensemble B = ( X b X T b )∕(n − 1) and the analysis ensemble A can be calculated from the Kalman gain K (Evensen, 1992): (2) The observation errors are represented through the observation covariance matrix R, and superscript T represents the transpose. The equations can be reformulated using the ensemble deviations in observation space Y instead of H (Evensen et al., 2022): There are various ways to calculate the changes to the individual ensemble members so that the ensemble variance reduction reflects the error reduction, the diagonal of A − B, of Equation (3). For the reasons mentioned in Section 2.3, we use the square-root form of the ensemble Kalman filter proposed by Whitaker and Hamill (2002) that calculates the analysis deviations via the modified Kalman gainK: In our code, we compute the square root of the matrices using the block Schnur algorithm implemented in scipy.linalg.sqrtm .
Localization reduces the influence of observations on the analysis update based on their distance and is commonly implemented by multiplying the background covariance matrix element-wise with a matrix of weights L•B. Ensemble data assimilation primarily uses localization to avoid spurious correlations (Houtekamer and Mitchell, 2001). However, for some approaches, localization is essential to break down the assimilation problem into smaller chunks that can be computed in parallel, for example, the local ensemble transform Kalman filter of Hunt et al. (2007).
We use two different analysis localization matrices with different dimensions in this article, whose dimensions are denoted via their superscripts: L mp and L pp . L mp contains the Gaspari-Cohn weights based on the distance from all grid points to all observation locations, and L pp is a symmetric matrix based on the distances between observation locations. The respective localized forms of Equations (3) and (5) are

Forecast variance reduction
The goal of our study is to estimate how much potential observations would reduce the forecast error, which is reflected in the reduction of ensemble variance of a specific forecast metric of interest. In this article, we use a scalar forecast metric , which is determined from the model state vector x through a response function f , so that = f (x). We apply the response function to the ensemble members of the free forecast and the forecast, as shown in Figure 3.
In the literature, is also referred to as a performance measure, monitoring goal, or response function (Ancell and Coleman, 2022). Though it is possible to use multiple forecast metrics at different locations (e.g. Tardif et al., 2021), the fundamentals remain unchanged. The forecast metric is computed for each of the n ensemble members individually, resulting in a 1 × n vector j.
The variance of j is calculated from the ensemble mean deviations j = j − j through var(j) = j j T ∕(n − 1). In addition to index 'a' for the analysis and 'b' for the background, we use 'ff' to denote the free forecast that is initialized from the background ensemble and 'fc' to denote the forecast initialized from the analysis ( Figure 3). Accordingly, the variance reduction Δ 2 we seek to estimate is the difference between the forecast and the free forecast: Δ 2 = var(j fc ) − var(j ff ). An example of Δ 2 and how it changes with lead time is shown by the difference between the two lines in Figure 4. The information available to estimate the variance reduction from is the background ensemble state X b and the forecast metric j ff of the free forecast.
Ensemble variance-reduction estimates, as proposed by Ancell and Hakim (2007) and Torn and Hakim (2008), are based on two assumptions. The first is that the ensemble deviations of the free-forecast metric j ff have a linear dependence on the ensemble state deviations of the background ensemble X b through the sensitivity 1 × m vector ff ∕ x b , abbreviated with s. And the second assumption is that the linear dependence between the background and free-forecast ensemble deviations also applies to the deviations of the analysis and forecast ensemble; that is, the same sensitivity: Using Equation (8), the variance reduction can be approximated by So, in order to estimate the variance reduction, both the sensitivity s and the analysis ensemble deviations X a need to be estimated from X b and j ff . This is where the three methods used in this article differ. We first describe what we refer to as the explicit methods, which explicitly calculate s, as well as X a . The final method, which is computationally much cheaper, calculates neither one directly. Therefore, we refer to the latter method as the implicit sensitivity method.
Before deriving the variance reduction methods in Sections 3.4 and 3.5, we first explore how the ensemble sensitivity can be computed in Section 3.3. The variance reduction methods are first derived without considering analysis localization and then modified to provide consistent variance-reduction estimates for ensemble forecast systems that localize the analysis update by localizing the covariance matrix.

Ensemble sensitivity
The ensemble sensitivity is commonly reformulated by multiplying our starting assumption j ff ≈ s X b with X T b ∕(n − 1) and the inverse of the background ensemble covariance matrix B −1 , resulting in Computing s directly by inverting B would only be possible if the state space m were smaller than the number of independent ensemble members n, which is unlikely to be the case in any meteorological application. Moreover, even if it were possible, directly inverting B without taking ensemble sampling errors into account would yield a poor approximation. In mathematical terms, Equation (10) is an ill-posed problem due to B being rank-deficient and ill-conditioned.
The most common solution used to calculate the ensemble sensitivity despite the problem being ill-posed is to reduce it to a univariate regression problem by neglecting all non-diagonal elements of the B matrix (e.g. Garcies and Homar, 2009;Torn, 2014;Hill et al., 2016Hill et al., , 2020Berman et al., 2017;Necker et al., 2020;George and Kutty, 2021;Ancell and Coleman, 2022), resulting in with I being the identity matrix. Although this univariate sensitivity is cheap to compute and a useful qualitative analysis tool, it cannot be used for quantitative estimates (Hacker and Lei, 2015). In order to derive a meaningful multivariate sensitivity despite Equation (10) being ill-posed, one must add a constraint to the system, commonly known as regularizing the problem. We use Tikhonov regularization (Tikhonov, 1965), which we introduce in Section 3.3.1. An additional issue that arises when computing the ensemble sensitivity from a non-infinite ensemble is that both covariance terms in Equation (10) ( j ff X T b and B) are affected by sampling noise. The most common approach in ensemble data assimilation to handle the resulting spurious covariances is to apply distance-based weighting, known as localization. In Section 3.3.2, we detail when and how sensitivity localization can be applied.

Sensitivity regularization
The most popular method to regularize ensemble sensitivities is singular-value decomposition (Ancell and Hakim, 2007;Gombos and Hansen, 2008;Hacker and Lei, 2015). We use Tikhonov regularization instead (Tikhonov, 1965), because the regularization strength is directly controllable by an easy-to-understand parameter . Tikhonov regularization is used in many fields under various names, such as ridge regression, weight decay, or L 2 regularization.
Tikhonov regularization as applied to Equation (10) seeks a sensitivity vector s that minimizes the combined L 2 norm with || || 2 used to denote the L 2 norm. That is, the regularized problem balances finding an s that best fits our original linear approximation J ff ≈ s X b with keeping the absolute values of s small. For = 0, the Tikhonov regularization is equivalent to the Moore-Penrose inverse, commonly referred to as the pseudoinverse, which can be computed via singular-value decomposition. For any value of the regularization parameter > 0 chosen, there exists the following unique analytical solution for s : To understand how the ensemble sensitivity is affected by the regularization parameter , we computed the sensitivity for a single experiment for values of 1, 0.1, and 0.01. We begin with the trivial case that the forecast lead time dt = 0. Accordingly, the background and free-forecast states are identical, and for a linear response function f , the linear relationship The sensitivity s can now be computed analytically. For our response function, which is the sum over the middle third of the domain, the sensitivity s = 1 in the middle and 0 outside (dashed line in Figure 6a,d).
If the velocity were the same in all ensemble members, the sensitivity for the forecast would be shifted upstream without changing shape, as shown in Figure 6b-f. However, owing to the free-forecast ensemble and forecast ensemble diverging over time, the sensitivity will no longer be a step function and will develop a rounded shape.
Regularization will always be needed to compute the ensemble sensitivity if the ensemble size is smaller than the state space, as the problem is underdetermined. Any > 0 results in an analytical solution; and the larger is, the smaller the resulting sensitivities should be following Equation (12) (shown in Figure 6a-c). However, which value is ideal is not trivial to determine, even for a model as simple as ours. For the trivial case with a lead time of zero, the smallest shown (0.01) provides the best agreement with the analytical solution ( Figure 6a). But with increasing lead time, a higher value is needed to suppress the strong oscillations that occur (Figure 6b,c). However, if is too large, the sensitivity is suppressed too far.
The more ensemble members there are available, the less underdetermined the problem is and the more precise the sensitivity estimate will be (Figure 6d-f). As the lead time, and thus the nonlinearity, increases, regularization will need to be increased to compensate. Even having a larger ensemble size than state space is insufficient to avoid these oscillations occurring in our example, highlighting that some form of regularization will be needed no matter how large the ensemble size is. Using the pseudoinverse of B, which is equivalent to the Tikhonov regularization with = 0 and singular-value decomposition, is insufficient. For the example shown in Figure 6f, using the pseudoinverse results in highly oscillating sensitivity values above 10 6 and below −10 6 (not shown). Ideally, the amount of regularization applied would change with forecast lead time, ensemble size, forecast metric, and the current model state. Since regularization is a mature topic of mathematics, many methods exist that automatically adjust the applied regularization strength. However, we will prescribe a static regularization parameter for this article to make it easier to interpret the results. We will evaluate how the static regularization parameter affects the variance-reduction estimates and explain why we selected a default value of = 0.1 in Section 4.3.
To highlight the importance of using a multivariate sensitivity (Hacker and Lei, 2015), we also computed the univariate sensitivity of Equation (11), which is widely used (e.g., Garcies and Homar, 2009;Torn, 2014;Hill et al., 2016;Berman et al., 2017;Necker et al., 2019Necker et al., , 2020Hill et al., 2020;George and Kutty, 2021). For our highly correlated one-dimensional state vector, the univariate approach overestimates sensitivities by at least an order of magnitude ( Figure 7). This overestimation due to neglecting covariances scales with state size, which is why Ancell and Hakim (2007) found univariate sensitivities roughly 10,000 times higher than adjoint sensitivities for NWP data. The benefits of neglecting the covariances are the much lower computational cost and the absence of numerical instabilities caused by the matrix inversion. Though the resulting sensitivities are not quantitatively useful, they can still serve as a useful qualitative diagnostic. But our examples shown in Figure 7 highlight that many possibly misleading artifacts can occur, such as the strong negative sensitivity values or the sensitivity being lower for the 300 s lead time than for the 600 s lead time.

Sensitivity localization
So far in this article we have only discussed localization applied to the analysis. We refer to this as analysis localization, which for our purposes is a constraint given by the forecast system we are providing variance-reduction estimates for. But localization can also be applied to the sensitivity to reduce the impact of spurious sampling covariances. We refer to this as sensitivity localization, which can be set independently of the analysis localization and is not determined by the forecast system. Sensitivity localization can be applied under the following two conditions.
I The forecast metric can be split into individual localized grid components, = ∑ m i=0 i , with the ensemble deviations of all localized components j having the same dimension as the state matrix X. The response function we choose for this article, = ∑ n2∕3 i=n∕3 i , fulfills this condition. But other possible response functions do not, such as the maximum value. An alternative way of fulfilling this criterion is to use multiple forecast metrics at different locations, such as Tardif et al. (2021). Although the dimensions of the equations change, the principle stays the same. II The signal propagation of the observation impact over time can be estimated.
Condition I enables a distance localization of the covariances between the individual forecast metric components and the background ensemble deviation because, in contrast to j, the individual j i has a specific location. So, if condition I is fulfilled, an individual localized sensitivity s loc i can be calculated for each j i separately. To do so, the B matrix is weighted with an m × m localization matrix C, and the covariance j ff i X b is weighted by the ith vector of the localization matrix C i , resulting in And owing to the assumed linearity of the sensitivity, the full localized sensitivity is the sum of the individual components s loc = ∑ m i=1 s loc i , so that the localized equivalent of Equation (10) is We refer to this sensitivity localization matrix as C to differentiate it from the analysis localization matrix L used in Equation (6). Although both localization matrices are symmetric, C is not required to be symmetric positive semi-definite in contrast to L, as C is independent of the Kalman filter. Despite C and L being independent, for simplicity we use the Gaspari-Cohn function for both, which has one variable, the localization length (Gaspari and Cohn, 1999) (example in Figure 8). Our default sensitivity localization length is 2,000 m, compared with the 3,200 m of the analysis localization length.
However, applying the same sensitivity localization matrix C to B as well as J ff X T b only works if no signals are propagated spatially over time. Under these conditions, the localized sensitivity s loc and non-localized sensitivity s would be equivalent if an infinite number of ensembles were available. But if signals propagate -that is, the ensemble background deviations at a specific point influence the forecast at different grid points -then the C matrix used to localize J ff X T b needs to be adapted to take the signal propagation into account. This is where our second condition to apply sensitivity localization comes in. We will discuss how signal propagation can be estimated in NWP models in the next subsection, but, for our toy model, the downstream signal propagation can be taken into account by advecting the localization weights upstream using the mean ensemble speed u, as shown in Figure 8: Taking the signal propagation into account, the resulting equation for the unregularized localized sensitivity is To illustrate the effect of sensitivity localization, we apply three different localization lengths to the same experiments shown in Figures 6 and 7. After applying regularization, we see that the sensitivity localization successfully suppresses spurious sensitivities outside of the expected area (Figure 9). The shorter the applied localization length, the tighter the sensitivity is restricted to the expected area. But to achieve this effect, it is crucial that localization is applied both to J ff X T b and B. Only localizing one or the other results in the poor sensitivities shown in Figure A.1a,b. An additional effect of adding localization is that no regularization is needed to solve the inverse. However, the longer the localization length used, the more regularization is required to avoid the same oscillations that occur in the non-localized sensitivity, which are visible in Figure 9c. For the smallest localization length shown of 2 dx, no regularization seems to be necessary ( Figure A.1c). But as we will see in Section 4.4, such short localization lengths can only be applied if the signal propagation is very well known.

Estimating signal propagation
We use the term signal propagation to describe how the background state at a grid point affects the forecast at other grid points. In our toy model, this is trivial to accomplish. Signals propagate downstream with a constant velocity, and the velocity is not affected by the observations. Accordingly, signals will propagate just as fast in the free forecast as in the forecast. However, since each ensemble member has a different velocity, each grid point will affect different points in the forecasts of different ensemble members. As a result, by using the ensemble mean advection speed to account for signal propagation in Equation (15), we are introducing an error source (Figure 8). This error grows with lead time and ensemble velocity spread and decreases with localization length. In order to evaluate the effects of misjudging the mean signal propagation on the variance-reduction estimates in Section 4.4, we run experiments in which we apply a percentage error to u, as shown in Figure 8. However, the more complicated and nonlinear the model dynamics are, the harder the signal propagation is to estimate. Possibly the easiest way to account for signal propagation is to advect the localization with a fraction of the ensemble mean wind. For example, both Ota et al. (2013) and Buehner et al. (2018) use a fraction of 0.75. This approach works best over short time-scales in the free troposphere and is commonly used to extrapolate convective cells in nowcasting. A more complicated approach is to add a passive tracer to the model, assuming that the signal of the observation is only propagated through advection and dispersion, as done by Kalnay et al. (2012) for the simplified Lorenz model. Though conceptually easy, this is computationally expensive, as an additional tracer needs to be added for each observation, increasing run time and memory requirements. Another different idea that can be used to estimate signal propagation is to use many sub-ensembles and determine where sub-ensemble correlations between initial state and forecast differ the least (Anderson, 2007). This approach was applied by Hacker and Lei (2015) to localize the ensemble sensitivity in a Lorenz model and by Gasperoni and Wang (2015) to determine optimal temporal localization for a dry, global, two-layer primitive equation spectral model. The drawback of this sub-ensemble approach is that many sub-ensembles are needed. If no large ensemble is present, additional members can be achieved by collecting members over time or space. Accordingly, the signal propagation estimated will either reflect a temporal or spatial average. To our knowledge, no studies have addressed if such an average signal propagation is beneficial to observation impact estimation. Although signal propagation can be poorly approximated through an advection term, it is not feasible to compute signal propagation precisely in an NWP model. A single observation can affect multiple variables, which can, in turn, propagate at different speeds in different directions. Assuming a simple Rossby wave, a change in a passive tracer, such as water vapor, would propagate eastward with the mean zonal flow. In contrast, a change in pressure or wind that affects the Rossby wave could propagate either east-or westward depending on the wave number and resulting group velocity. The speed of propagation varies widely depending on which processes are involved. For example, a signal in the upper troposphere could propagate to the surface quickly via radiation or precipitation, or the signal may not propagate downwards at all. On the flip side, the impact of a surface observation within the boundary layer can be dispersed in all directions quickly through turbulence or be trapped close to the surface in a highly stable boundary layer.

Without analysis localization
These straightforward methods calculate X a and s and then plug these into Equation (9) to estimate the variance reduction. The difference between the two explicit methods is that one applies sensitivity localization when calculating s (see Section 3.3.2), which we refer to as the 'explicit local' method, and the other method does not, which we refer to as the 'explicit global' method.
Since we are dealing with potential observations, it is impossible to calculate the analysis state X a directly, which would require knowing the difference between observations and the model equivalents. But ensemble filters can determine the analysis state deviations X a from the ensemble mean for observations of a given uncertainty. We do so using Equation (4).
As discussed in Section 3.3.1, the regularization parameter needs to be set in order to calculate the sensitivity. Section 4.3 analyzes the resulting effect on the variance-reduction estimates. Once an value has been selected, we can compute the variance-reduction estimate from We definej fc andj ff as the approximated ensemble deviations of j fc and j ff . For the example experiment shown in Figure 3 we have plotted both the approximated and true ensemble deviations of j ff and j fc in Figure 5.

With analysis localization
To include analysis localization in the explicit global and explicit local method, the only change needed is to swap K in Equation (18) with the localized versionK loc from Equation (7):

Implicit sensitivity method
There is a good reason why the explicit methods shown in the last subsections were not used in any study before Nomokonova et al. (2022), and that is because, without analysis localization, the same result can be achieved without explicitly calculating the sensitivity. Instead, the variance reduction can be computed much more cheaply directly in observation space without requiring an inversion of the background covariance matrix.

Without analysis localization
We follow the steps of Hakim et al. (2020) and also stay close to their notation, with the exception that we use ensemble deviations in observation space instead of the linear observation operator used by Hakim et al. (2020). We begin by expressing the forecast metric variances through B and A: By replacing s through Equation (10), it is possible to eliminate the second inverse B, thanks to The remaining inverse B can be eliminated through the X b Y T b at the start of the Kalman gain in the same way: resulting in the final form used by Hakim et al. (2020) and Tardif et al. (2021), which not only no longer contains an inverse B matrix but also reduces the equation from state space down to observation space: This method can also be used iteratively, either with single observations or groups of observations. When used iteratively, the model state X b , as well as the metric deviations j ff , need to be updated after each step to take the impact of the already processed observations into account. To do so, Hakim et al. (2020) and Tardif et al. (2021) append j ff to the state vector and apply the EnSRF of Equation (4).

With analysis localization
Analysis localization affects the implicit sensitivity method through the localized Kalman gain K loc , Equation (6), which when inserted into Equation (23) results in Crucially, owing to the analysis localization applied, we can no longer cancel out the inverse as As a result, the equation no longer reduces down to observation space.
But there is a way to avoid the inversion of B by using a localized sensitivity (Equation 15). The first step is to replace the first sensitivity of Equation (21) with the localized sensitivity: Keep in mind that, under ideal conditions (i.e., unlimited ensemble size and exactly known signal propagation), the localized and global sensitivity should be equal. The second step is to use the analysis localization matrix for the sensitivity localization matrix (C = L) so that the analysis localization in the Kalman gain balances out against the sensitivity localization. As a result, the inverse is no longer needed, as it can be approximated via the following sum (details in Appendix): This approximation comes with the drawback that estimated signal propagation needs to be applied to the sensitivity localization matrix (indicated by the arrow of ⃗ L), as discussed in Sections 3.3.2 and 3.3.3. And by using this approximation in Equation (25), the resulting variance-reduction estimate is If the localized version is applied iteratively, in addition to the state vector, all the sub-components j i also need to be updated iteratively. The updates of the state vector should occur using the original analysis localization matrix L mp , whereas the propagated sensitivity localization matrix ⃗ L mp needs to be used for the j i updates.

RESULTS
Now that we have introduced the model, experimental set-up, and the three methods listed in Table 2, we can conduct the experiments needed to answer our research questions to determine how sensitive the methods are to changes in localization length, observation type, lead time, and ensemble size. First, we showcase the experimental set-up and the different observation types using scatter plots. All later figures will only show the variance-reduction's mean and root-mean-square error (RMSE) statistics.
To illustrate the experimental set-up described in Section 2.4, we show the actual variance reduction of each of the 900 OSSEs versus the estimated variance reduction for lead times of 300 and 1,000 s ( Figure 2). All other parameters are the same as in the default set-up; for example, a localization length of 3,200 m and an ensemble size of 32. Since the variance reduction is defined as a decrease, an overestimated variance reduction means that the benefit of the observations was overestimated; that is, the dot lies below the dotted line in Figure 10. And vice versa, underestimated variance reductions lie above the dotted line in Figure 10.
The first aspect we want to highlight is that even though the observations' number, precision, and locations remain constant, the variance reduction varies substantially between experiments for the same lead time. Accordingly, the benefit of the observations varies greatly depending on the current background and free forecast. As a result of this variability, many experiments are needed to provide statistically robust results.
As expected, all methods are more precise for the shorter lead time of 300 s (Figure 10a-c). The forecast metric and its spread evolve nonlinearly in time, as the regions of greatest spread along the flanks of advect into and out of the limited response function domain (see Figure 3). The shorter the forecast lead time, the less nonlinear the

F I G U R E 10 Scatter plots of variance reduction versus estimated variance reduction for forecast lead times of (a-c) 300 s and (d-f)
1,000 s. The observing-system simulation experiment settings other than the lead time are the default values listed in Table 1. (a, d) Explicit global method; (b, e) implicit method; (c, f) explicit local method ( Table 2). The dashed line marks the 1:1 line, the white dot the mean over all experiments, and the black line the linear regression. As our forecast metric has no unit, the variance reduction also has no unit. ME: mean error; RMSE: root-mean-square error [Colour figure can be viewed at wileyonlinelibrary.com] relationship between background state and forecast metric is, and therefore the better the basic linearity assumptions are fulfilled (Equation 8). The well-estimated variance reduction for the 300 s lead time shows that all methods properly account for analysis localization and that no substantial systematic errors were introduced by regularizing the ensemble sensitivity or estimating the signal propagation.
For the longer lead time, the RMSE of all methods increases. This increase again matches our expectations, as we have already shown that the ensemble sensitivities deteriorate with lead time (Figure 6). All methods lead to overly conservative estimates, in that they underestimate the largest variance reductions and overestimate the weakest variance reductions (visible by the linear regressions tilting away from the diagonal in Figure 10d-f). The mismatch is most apparent when the true variance increases. A variance increase occurs when adding observations leads to the spread in the forecast metric becoming larger. In these cases, the assumption that the linear sensitivity linking X b to j ff also applies to X a and j fc breaks down. Despite all methods struggling with the strongest and weakest variance reductions, the mean error increases much slower with lead time than the RMSE. This indicates that ensemble estimates of the mean variance reduction can be useful even at long lead times, although more experiments are needed to achieve robust results. The differences between the methods are too small to be seen by eye in the scatter plots and only reveal themselves in the statistical values of the errors and the slope of the regression. Although the implicit method has slightly smaller mean error and RSME than the other two methods, we will see that the implicit method does not systematically outperform the explicit methods.

Observation types
To assess whether either of the methods is sensitive to the observation type used, we ran a test using only the direct or the indirect observations ( Figure 11). Since fewer observations are used, the resulting variance reductions are lower (compared with Figure 10). It is not a coincidence that both observations result in a similar variance reduction on average, as we designed the experimental set-up to achieve this. However, our results would not qualitatively change if the observations types had unequal contributions.  Table 2). The dashed line marks the 1:1 line, the white dot the mean over all experiments, and the black line the linear regression. As our forecast metric has no unit, the variance reduction also has no unit. ME: mean error; RMSE: root-mean-square error [Colour figure can be viewed at wileyonlinelibrary.com] All methods have a slightly lower RMSE for the direct observations. The higher RMSE could be caused by the indirect observations more often leading to a variance increase, which all methods can, by design, not capture. But the main result of this experiment is that all methods perform similarly well for both types of observations. This supports that the derivation of the variance-reduction methods using observation equivalents Y is valid, and no linearized H operator is needed for nonlinear observations.

OSSE settings
In this subsection, we check how the three methods to estimate the variance reduction cope with changes to the OSSE settings; namely, analysis localization length, forecast lead time, and ensemble size. We rerun the test shown in Figure 10b and e with analysis localization lengths ranging from 1,024 to 1 grid point (Figure 12a,d). Remember that since the whole domain is only 300 grid points long and the localization only becomes zero after two localization lengths, the differences between a localization scale of 1,024 grid points and no localization are negligible.
The smaller the localization applied, the less impact the observations have and the smaller the resulting variance reduction.
We conclude that all three methods successfully account for analysis localization if signal propagation is known. All three methods capture the decrease in variance reduction with analysis localization length, and all lie within or close to the shading marking the sample mean standard deviation for all lengths (Figure 12a). And though there are some visible differences in the mean variance reduction for the largest analysis localization lengths, the RMSE shows that all three methods have remarkably similar performance (Figure 12d).
Next, we see how variance-reduction estimates change with lead times. With increased lead time, the relative variance reduction decreases as the observations' relative impact decrease compared with the increasing ensemble spread. And for lead times from 1 to 1,000 s (Figure 12b,e), we see results that match what we deduced from Figure 10. All methods do well at estimating the mean variance reduction for all lead times (within shading of sample size standard deviation), but the RMSE increases with lead time. Again all three methods have similar RMSEs, The mean variance reduction is divided by the mean total variance and given in percent. As our forecast metric has no unit, the RMSE is also unitless. The shading around the truth shows the sample mean standard deviation; that is, the standard deviation over the 900 observing-system simulation experiments (OSSEs) divided by √ 900. For the ensemble size test, 3,600 OSSEs were used and an additional method is shown: the global explicit method with a regularization parameter of 1 (hi reg) instead of the default 0.1. The vertical dashed line marks the default value used for the other experiments, as listed in Table 1. See Section 2.4 for details on how the experiments are set up. The methods used to estimate the variance reduction are listed in Table 2 [Colour figure can be viewed at wileyonlinelibrary.com] but a noteworthy detail is that the explicit method with a localized sensitivity (explicit local) has a much smaller RMSE relative to the other methods for the shortest lead time of 1 s. We attribute this to the sensitivity localization successfully filtering out spurious covariances, which are the only error source in the estimates at such small lead times. This also explains why the implicit approach, which combines a localized with a non-localized sensitivity, Equation (26), performs better than the explicit global method for these short lead times. That the RMSE of all methods has a wave-like shape with a small dip in RMSE for a lead time of 900 s arises from the placement of the point observations in the default model set-up (Figure 1) and the OSSEs (Figure 3). Randomizing location observations would result in a monotonic increase in RMSE (not shown), but the wave-like error increase nicely illustrates the complexity even such a simple toy model can develop.
Finally, we check how our methods are affected by the ensemble size. To make the experiments with differing ensemble sizes comparable to each other, all experiments use the same ensemble size of 512 for the default run to generate the background ensembles. The desired number of members is randomly drawn from the 512 ensemble members available. We also increased the number of experiments from 900 to 3,600 because there is a large spread in the low-ensemble OSSEs.
The true variance reduction is only weakly affected by the ensemble size, with the variance reduction slightly increasing for ensemble sizes of 16 and below ( Figure 12c). All three methods have the same qualitative behavior but with a stronger amplitude. It is not unexpected that all three methods overestimate the benefit of observations for the smallest ensemble size, as the relative impact of spurious covariances increases and the ensemble sensitivity is overestimated. For these small ensembles, the benefit of localizing the sensitivity is strongest, which explains why the implicit and local explicit method substantially outperform the explicit global method (Figure 12f). But it is possible to counter the overestimation of the explicit global approach by increasing the regularization strength (thin purple line in Figure 12a,b). All three methods benefit from increasing the ensemble size, but the changes are minute beyond 32 members. That the RMSE does not approach zero shows the limit of the linear assumptions the methods are all based on, which cannot be overcome by adding ensemble members. Given that our state size is only 300, the 32 members, after which the mean and RMSE stabilize, represent a 0.1 ratio of ensemble size to state space. Such a ratio is difficult to achieve in common NWP applications. But in our companion study using the 1,000-member ensemble, we achieve an even higher ratio of 0.2 by reducing the state size.
Looking over all three experiments shown in Figure 12, no method outperforms the others noticeably. The only conditions under which a method performs substantially differently is the poor performance of the global explicit method for small ensemble sizes, but this can be partially compensated by regularizing the sensitivity stronger.

Regularization
This subsection examines how sensitive the explicit global method is to the regularization used and explains why we chose our default value of = 0.1. The explicit local method also requires regularization, but the quantitative effect of regularization varies with the sensitivity localization (Figure 9), so we choose to focus on the less complicated explicit global method. We varied the value of from 10 −8 to 10 2 and ran experiments for five lead times ranging from 200 to 1,800 s. For the highest values, the sensitivity approaches zero; as a result, the estimated variance reduction becomes zero ( Figure 13). This result is not desired and is caused by regularizing too strongly. If too little regularization is applied, the mean variance reduction becomes positive. These positive values are caused by high absolute sensitivity values resulting from the oscillatory behavior visible in Figure 6c. As approaches zero, the absolute values of s reached 10 6 for a 600 s lead time (not shown). The longer the lead time, the more regularization is needed to avoid the mean positive variance-reduction estimates. Moreover, both the mean variance reduction and RMSE of variance reduction are less sensitive to the regularization for shorter lead times. Errors only vary slightly for values between 10 −7 and 10 −1 for the 200 s lead time; however, for a lead time of 1,000 s, changing by one order of magnitude has a substantial impact. For the default value of = 0.1, marked by the dotted line in Figure 13, an average overestimation of variance reduction grows with lead time. This matches what we saw in Figure 10, as the basic linear assumptions of the variance-reduction estimates break down for long lead times, and adding observations no longer reliably leads to a variance reduction in the forecast. This overestimation can be compensated by increasing the regularization, and thus weakening the ensemble sensitivities, reducing both the mean error and the RMSE (Figure 13b).
Although we know that the ideal regularization parameter changes with lead time and ensemble size ( and 12c,f), we decided to choose a single value that seems suitable for lead times up to 1,000 s for 32 ensemble members. The value of 0.1 was selected because it results in plausible ensemble sensitivities, and the resulting mean errors and RMSEs are close to the minimum values shown in Figure 13. As the motivation of the ensemble variance-reduction estimates is to avoid running OSSEs, it is likely not possible to evaluate the regularization strength based on the resulting error. In that case, we recommend choosing the strongest regularization that does not substantially decrease the mean variance reduction, as overregularization is less prone to cause high errors than underregularization is. We used this approach to determine the value for our study of the potential of wind lidar observations for improving wind forecasts (Nomokonova et al., 2022).

Misjudging signal propagation
We have shown in Section 4.2 that, for small ensemble sizes, the methods that contain some form of sensitivity localization (implicit and explicit local) outperform the explicit global approach. But in those experiments, the correct signal propagation was used, which is impossible to estimate accurately in an NWP model (see Section 3.3.3).
In this subsection, we determine how over-or underestimating signal propagation (see Figure 8) affects the implicit and the explicit local methods. We begin by analyzing the mean error of the implicit method for ensemble sizes from 4 to 128 (Figure 14a). For no signal propagation error, the results match what we saw in Figure 12, with observations being given too much impact for small ensembles and slightly too little impact for larger ensembles. Both over-and underestimating signal propagation decreases variance-reduction estimates. This is because the signal propagation error causes the implicit method to look for correlations between areas of the background and the free forecast that do not affect each other. Given that the estimated variance reduction is greatest for the correct signal propagation, this could be used in reverse to estimate signal propagation. However, since this relationship is only true when averaging over a large number of experiments, it can only provide information on the mean signal propagation, similar to the temporal localization of Anderson (2007) and Gasperoni and Wang (2015). By comparing the RMSEs of the implicit to the explicit global method that is not reliant on estimated signal propagation (Figure 14b), we find a clear relationship in ensemble size. For the smallest ensembles, the benefit of localizing the sensitivity is so large that the implicit method outperforms the explicit global method even if the assumed signal propagation is 60% higher or lower than the truth. With increasing ensemble size, the explicit global method becomes better, and the implicit method needs a more exact signal propagation estimate to compete. At 32 ensemble members and higher, misjudging the signal propagation by just 10% already leads to the implicit method performing worse than the explicit global.
There are two other variables that affect the impact of misjudging signal propagation. The first is the lead time. For longer lead times, signals propagate further, so an over-or underestimation in propagation speed has a larger effect. For short enough lead times, the signal has so little time to propagate that misjudging the propagation speed has barely any effect (not shown). The second variable that has an impact is the sensitivity localization length. This is best illustrated using the explicit local method, as the sensitivity localization length can be modified independently of the analysis localization length. In the implicit method,  (Table 2), the RMSE of the explicit method is constant in each column. To compute the relative RMSE difference, we divide the difference by the RMSE of the explicit method [Colour figure can be viewed at wileyonlinelibrary.com] the sensitivity localization lengths must match the given analysis localization length.
The smaller the sensitivity localization length, the larger the penalty for misjudging signal propagation ( Figure 15). In contrast, longer length scales are less affected by signal propagation errors. But using longer sensitivity localization lengths comes with a trade-off because the benefit of sensitivity localization decreases with the length scale (as shown in Figure 9). The set-up shown in Figure 15 has a small ensemble size of eight, for which sensitivity localization is highly beneficial. One advantage of the explicit local method over the implicit method is that the user can attempt to balance signal propagation errors

F I G U R E 15
Difference in the root-mean-square error (RMSE) of variance reduction between the explicit local and explicit global method. The RMSE differences are divided by the RMSE of the explicit global approach, resulting in a relative RMSE difference fraction. The sensitivity localization and signal propagation error only affect the explicit local method ( Table 2). The observing-system simulation experiment settings are the default values listed in Table 1, except for a smaller ensemble size of n = 8. As our forecast metric has no unit, the RMSE is also unitless [Colour figure can be viewed at wileyonlinelibrary.com] by using longer sensitivity localization lengths. But without OSSEs, it will be difficult to judge which settings result in the best estimates.

Conclusions
This article presents and evaluates three methods to estimate the contribution of potential observations to the reduction in forecast error: explicit global, explicit local, and implicit. All methods use an ensemble-based approach, where forecast error reduction is estimated based on the ensemble variance reduction of a scalar forecast metric. The explicit global method requires the background covariance matrix to be inverted, the implicit method requires signal propagation to be provided, and the explicit local method requires both the inversion and propagation to be provided. In contrast to preceding studies, we take the analysis covariance localization of the assimilation system into account, which is necessary to achieve consistent variance-reduction estimates. We test these methods using a newly developed linear advection toy model. For each set of parameters to test, we ran 900 or more individual OSSEs that include two types of observations: one type with a linear and one type with a nonlinear observation operator. By comparing the estimated variance reduction to the variance reduction in the OSSEs, we conclude the following: • All methods successfully estimated the correct mean variance reduction for a wide range of analysis localization lengths.
• All methods worked similarly well for a linear and a nonlinear observation operator.
• All methods benefit from larger ensemble sizes, but the gain becomes marginal beyond 32 members in our set-up.
• The RMSE of variance-reduction estimates increases much faster with lead time than the mean error for all methods. This means observation targeting is only reliable for short lead times, and that reliable mean estimates can be achieved for longer lead times if sufficient experiments are conducted.
• Some form of regularization is needed to calculate meaningful ensemble sensitivities. Sensitivity localization can help reduce sampling noise but requires estimating signal propagation.
• The explicit methods falsely estimate variance increases if the sensitivity is underregularized. Overregularizing results in too weak variance-reduction estimates.
• The explicit global method, which contains no sensitivity localization, is more affected by sampling noise, making it more sensitive to ensemble size. In the absence of sensitivity localization, the regularization helps mitigate sampling noise.
• The implicit and explicit local methods will underestimate the variance reduction if the signal propagation is over-or underestimated. The RMSEs increase with over-or underestimation of the signal propagation.
• The larger the sensitivity localization length is chosen, the less sensitive the variance-reduction estimates are to errors in the assumed signal propagation. But larger sensitivity localization lengths are less effective at mitigating sampling errors.

Discussion
We have shown that all methods work well for our toy model and could, in theory, improve the estimated variance reductions for NWP applications, such as observation targeting (e.g. Romine et al., 2016;Hill et al., 2020) and observational network design (e.g. Hakim et al., 2020). However, the methods have specific strengths and weaknesses that make them more or less suited to specific NWP applications.
The strengths of the implicit approach are that it is much cheaper than the explicit methods and outperforms the explicit global method for low ensemble sizes. The main weakness of the implicit method is that it requires estimates of signal propagation. As we discussed, estimating signal propagation is exceedingly difficult for many NWP applications. However, if signal propagation can be provided accurately, the implicit method has an advantage over the explicit global method, especially for lower ensemble sizes.
Another drawback of the implicit method is that it can only be applied to forecast metrics that can be broken down into local components. In contrast, the explicit global method can be applied to any response function, such as gradients, variances, exponential over sums, or max values, although the more nonlinear the response function is the poorer the estimates are likely to become. Although the implicit method could be applied to time integrals, such as accumulated rainfall, accounting for signal propagation would be even more difficult. For qualitative studies looking to compare the benefit of different observations, we recommend only applying the implicit method to the same type of observation (e.g., two dropsondes at different locations) and not to differing observation types (e.g., dropsonde vs. surface observation). This is because the errors due to estimating or neglecting the signal propagation are likely to be more similar for the same type of observation.
The greatest weakness of the explicit methods is that they require inverting the ensemble state covariance matrix. This introduces a tuning parameter needed to regularize the matrix and the computational complexity of inverting an m × m matrix through Gaussian elimination scales with m 3 . Although the computational cost will vary based on the machine architecture, covariance matrix, and algorithm used, applying the explicit methods to an NWP model requires reducing the state space to an invertible size. One possibility is to remove unnecessary variables and crop the domain horizontally and vertically, which also helps avoid spurious long-range correlations. Grid subsampling is another possibility, although we recommend using a subsampled grid resolution that is finer than the localization length. To provide a rough frame of reference, we inverted an m = 5,000 matrix in the order of seconds using a single processor and standard Python libraries, which was the size used in our companion paper (Nomokonova et al., 2022). Although computing the ensemble sensitivity can be expensive, once computed, the same sensitivity can be reused to estimate the variance reduction of various combinations of observation types and locations for the same forecast time.
Based on the characteristics listed, we recommend the implicit method for short lead times when signal propagation is considerably smaller than the analysis localization length and when the ensemble size is small. If sufficient ensemble members are available, we recommend the explicit global method, especially for longer lead times with uncertain signal propagation. The explicit local method has the combined drawbacks of the implicit and explicit global methods, as well as an additional free parameter in the sensitivity localization length. Despite having the highest overhead, the explicit local method did not substantially outperform the other methods in any of our tests. Therefore, we only recommend using the explicit local method if it is crucial to set the sensitivity localization independent of the analysis localization. This independence could be helpful for small ensembles if the expected signal propagation distance is multiple times the analysis localization length.
Our toy model can offer no guidance on how many ensemble members are needed to achieve robust results for various NWP applications. We expect the number of ensemble members needed to vary with lead time, forecast metric, and the state of the atmosphere. Climatological or time-lagged ensembles could possibly provide sufficiently large ensembles in an operational setting, but using a climatological ensemble will strongly affect signal propagation. Although only data-denial experiments can objectively evaluate how accurate the variance-reduction methods are for NWP applications, under specific conditions all methods can provide a cost-efficient alternative to classic OSSEs. This increase in efficiency allows evaluation of many more possible observational networks under a wider range of meteorological conditions. In our companion paper, we achieved robust results by applying the explicit global method to a 1,000-member forecast to estimate the benefit of wind lidar observations on short-term wind forecasts (Nomokonova et al., 2022).
AUTHOR CONTRIBUTIONS Philipp J. Griewank: conceptualization; data curation; formal analysis; methodology; software; visualization; writing -original draft. Martin Weissmann: conceptualization; funding acquisition; methodology; writing -review and editing. Tobias Necker: conceptualization; methodology; writing -review and editing. Tatiana Nomokonova: methodology; writing -review and editing. Ulrich Löhnert: conceptualization; funding acquisition; writing -review and editing. our two anonymous reviewers, whose encouraging feedback led to substantial changes. This work was supported by the Hans-Ertel-Centre for Weather Research, funded by the German Federal Ministry for Transportation and Digital Infrastructure (grant number BMVI/DWD 4818DWDP5A). Finally, we would like to thank the community that develops and maintains the free software our code is based on, most importantly the Python packages matplotlib (Hunter, 2007), NumPy , and seaborn (Waskom, 2021).

DATA AVAILABILITY STATEMENT
The full code needed to reproduce the results and plots in this paper are available on zenodo under the doi: https:// doi.org/10.5281/zenodo.7891000.

A.1 IMPLICIT METHOD
Here, we show under which conditions analysis localization can be accounted for correctly in the implicit method. To simplify the notation, we will refer to the analysis localization matrix from observation to state space L mp as L and will drop the indexes from X b , Y b , and j ff . The only index i refers to the index in state space m. The prerequisite for this method is that the forecast metric can be split into individual grid components = ∑ m i=0 i , with ensemble deviations of the forecast metric at the i grid point j i = j i − j i .
If the forecast metric at each grid point is a linear function of the state variable at that grid point, the following equation is exact: Breaking down our requirement of local linearity and applying it to all ensemble members at a specific grid point results in with the sensitivity consisting of the scalar sensitivity at that specific grid point s i multiplied with the unit vector e i . To prove Equation (A.1) we show that for each point i of the state vector the following is true:

A.2 SENSITIVITY LOCALIZATION ISSUES
Here, we will illustrate the effect of only partially localizing the sensitivity or not applying regularization. The first variation to partially apply localization is to only localize the covariance between the background and the forecast metric: As a result, the non-localized B spreads the localized forecast metric variances throughout the whole domain ( Figure A.1a), and the resulting sensitivities do not match our expectations quantitatively or qualitatively. The second variation only localizes the B matrix. This variation has two large benefits over the first variant. First, one does not need to decompose the forecast metric into individual subcomponents. And second, it does not require estimating signal propagation to get ⃗ C. The resulting equation is The stronger B is localized, the more the covariances in B are neglected and the stronger the resulting sensitivities ( Figure A.1b). Although this variant results in more qualitatively useful sensitivities than the previous variant, we cannot easily identify an application for which this partial localization would be suited.
And lastly, we illustrate what happens if no regularization is applied when calculating s loc . By localizing B, regularization is not strictly required, as the localized bsB matrix is no longer rank deficient. And indeed, for the smaller two localization lengths, the resulting sensitivity matches our expectations; but for longer localization lengths, the sensitivity develops the large oscillations the regularization is required to dampen ( Figure A.1c). So, although it is possible not to regularize the localized sensitivity, there is no guarantee that the resulting sensitivity is meaningful.