Improving ensemble data assimilation through Probit-space Ensemble Size Expansion for Gaussian Copulas (PESE-GC)

. Small forecast ensemble sizes ( < 100) are common in the ensemble data assimilation (EnsDA) component of geophysical forecast systems, thus limiting the error-constraining power of EnsDA. This study proposes an efﬁcient and embarrassingly parallel method to generate additional ensemble members: the Probit-space Ensemble Size Expansion for Gaussian Copulas (PESE-GC; “peace gee see”). Such members are called “virtual members”. PESE-GC utilizes the users’ knowledge of the marginal distributions of forecast model variables. Virtual members can be generated from any (potentially non-Gaussian) multivariate forecast distribution that has a Gaussian copula. PESE-GC’s impact on EnsDA is evaluated using the 40-variable Lorenz 1996 model, several EnsDA algorithms, several observation operators, a range of EnsDA cycling intervals, and a range of forecast ensemble sizes. Signiﬁcant improvements to EnsDA ( p < 0 . 01) are observed when either (1) the forecast ensemble size is small ( ≤ 20 members), (2) the user selects marginal distributions that improve the forecast model variable statistics, and/or (3) the rank histogram ﬁlter is used with non-parametric priors in high-forecast-spread situations. These results motivate development and testing of PESE-GC for EnsDA with high-order geophysical models.


Introduction
Geophysical forecast models are often computationally expensive to run.As a result, geophysical ensemble data assimilation (EnsDA) typically uses <100 forecast ensemble members.Such small forecast ensemble sizes result in sampling errors that degrade the performance of EnsDA.As such, low-cost methods that introduce additional ensemble members (henceforth, "virtual members") to the original forecast members (henceforth, "forecast members") have the potential to improve EnsDA.
Several types of ensemble expansion methods have been proposed in the literature, all of which have strengths and weaknesses.The first type is random draws from climatology (Castruccio et al., 2020;El Gharamti, 2020;Lei et al., 2021).Though computationally efficient, this type of ensemble expansion method cannot generate members with flow-dependent ensemble statistics.
An alternative type of ensemble expansion method is to aggregate forecast ensemble members across time (Xu et al., 2008;Yuan et al., 2009;Huang and Wang, 2018;Gasperoni et al., 2022).Though this type of method efficiently produces members with flow-dependent statistics, the number of virtual members created is limited (Huang and Wang, 2018;Gasperoni et al., 2022).
A third type of ensemble expansion method is to search a historical catalog for forecast states similar to the current forecast or observations (Van den Dool, 1994;Tippett and Delsole, 2013;Monache et al., 2013;Wan and Van Der Merwe, 2000;Grooms, 2021;Sun et al., 2022).The virtual members resulting from this search have flow-dependent statistics.Though such methods are historically expensive to employ, ongoing research may render them affordable in the near future (e.g., Sun et al. (2024)).
Ensemble modulation (Bishop andHodyss, 2009, 2011;Bishop et al., 2017;Kotsuki and Bishop, 2022;Wang et al., 2021) is the fourth type of ensemble expansion method.Such methods expand ensembles by combining a localization matrix with the original ensembles' perturbations (see Bishop and Hodyss (2009) for details).Though the expanded ensemble possesses the same mean and variance as the original ensemble, the expanded ensemble's kurtosis can be much larger than the original ensemble's (see the Supplement).In other words, the expanded ensemble's kurtosis is likely biased.If nonlinear observation operators are applied on the expanded ensemble, this kurtosis bias will result in biased expanded ensemble observation statistics (personal communication with Craig Bishop and Lili Lei).
The shortcomings of existing ensemble expansion methods motivate the development of a new ensemble expansion method.
This study proposes an new ensemble expansion method that explicitly utilizes the users' knowledge of prior marginals: the Probit-space Ensemble Size Expansion for Gaussian Copulas (PESE-GC).PESE-GC constructs virtual members using a generalization of the efficient and scalable Gaussian resampling algorithm of Chan et al. (2020) (henceforth, the CAC2020 algorithm).Unlike existing methods, PESE-GC efficiently and scalably generates an unlimited number of virtual members with flow-dependent statistics.Furthermore, the PESE-GC produces virtual members that are consistent with user-specified prior marginals, and handles multivariate Gaussian distributions and many multivariate non-Gaussian distributions.PESE-GC is applied before running any observation operators for EnsDA, and the analyzed virtual members are discarded before generating the next forecast ensemble (see Fig. 1).To the author's knowledge, no other ensemble expansion methods simultaneously possesses the same efficiency, scalability, generality and flow-dependency as PESE-GC.
The remainder of this publication is divided into five sections.Section 2 discusses the formulation of PESE-GC, and its computational complexity and scalability.PESE-GC's impacts on EnsDA are discussed and illustrated in Section 3. PESE-GC is then tested with EnsDA using the Lorenz 1996 model in Section 4. Section 5.1 discusses an important caveat of the PESE-GC method.This publication then ends in Section 6 with a summary and a discussion of avenues for future research.
2 Formulation of PESE-GC Figure 1.An illustration of how PESE-GC can be integrated into a typical EnsDA cyclic workflow.This workflow is meant to be read starting from the green box labeled "START".The arrows indicate the movement of various kinds of information (see legend).For example, the fat orange arrows indicate that the virtual members are created by PESE-GC (red polygon), passed to the observation operators (purple rounded box), passed to the EnsDA algorithm (brown rounded box), and then removed before applying the forecast model (black polygon)."Obs" stands for "observation" and "ens."stands for "ensemble".

The CAC2020 algorithm
The CAC2020 algorithm constructs Gaussian-distributed virtual members through linear combinations of the forecast ensemble perturbations.The resulting expanded ensemble has the same mean state and covariance matrix as the forecast ensemble.The CAC2020 algorithm is first formulated by Chan et al. (2020) and a more comprehensive derivation is presented in Chapter 7 of Chan (2022).
To write down the CAC2020 algorithm, a notation similar to Ide et al. (1997) NV are constructed, and Note that the mean of the virtual members is the same as the mean of the forecast members.In other words, (3)

Step 1 of the CAC2020 algorithm
The CAC2020 algorithm constructs N V virtual members from N E ensemble members using a three-step procedure.First, an Here, where all ω i,j s are identically and independently distributed (i.i.d.) samples drawn from the standard normal distribution.In other words, all ω i,j s are mutually independent.

Step 2 of the CAC2020 algorithm
The CAC2020 algorithm's second step is to generate

Step 3 of the CAC2020 algorithm
In the third and final step, the CAC2020 algorithm generates NV by evaluating 2.1.4On W used in the CAC2020 algorithm's step 1 Note that this study's (and the CAC2020's) W differs from that of Chan et al. (2023) (henceforth, "CAC2023").This is because the CAC2023's W [defined in Eq. ( 6) of CAC2023] generates virtual members with undesirable non-Gaussian properties.The strate drawing virtual members using CAC2020's W [Eq. ( 7)] and CAC2023's W [Eq. ( 10)].The former produces Gaussian virtual members while the latter produces non-Gaussian virtual memebrs.Panels c and d demonstrate the importance of ensuring that the forecast ensemble in probit-space has unit variance.If the probit-space forecast ensemble's variance is not unity, the virtual members generated by PESE-GC will deviate from the fitted marginal PDFs (rank histogram in the case of panels c and d).
CAC2023's W can be written as where δ i,j is the Kronecker-delta.
To illustrate the issue with the CAC2023's W , suppose 5 forecast members are drawn from a standard normal distribution and 10 7 virtual members are generated using the CAC2020 algorithm with the CAC2023's W . Though the expanded ensem-95 ble's mean and variance are correct (zero and unity respectively), the virtual members' histogram-estimated PDF (blue curve in Fig. 2b) is incorrect (not standard normal).In contrast, using the W defined in Eq. ( 7) results in virtual members that follow the standard normal PDF (Fig. 2a).As such, this study uses the CAC2020's W instead of the CAC2023's W .

Properties of the CAC2020 algorithm
The CAC2020 algorithm is efficient and scales well with parallel computing.This is because steps 2 and 3 of the CAC2020 100 algorithm (Sections 2.1.2and 2.1.3)are embarrassingly parallel and have computational complexities that scale linearly with N x .If E is generated offline (i.e., not part of the EnsDA procedure) and then read into the EnsDA program, the CAC2020 algorithm is reduced to only evaluating steps 2 and 3.
The CAC2020 algorithm also produces expanded ensembles with same ensemble means and ensemble covariances as the forecast ensembles.In other words, the rank of the expanded ensemble's covariance matrix is the same as that of the forecast ensemble.Future work can explore ways to incorporate localization into the expanded ensemble's covariance matrix.
Furthermore, the CAC2020 algorithm always generates Gaussian-distributed virtual members: even if the actual forecast distribution is highly non-Gaussian, the virtual members' distribution will still be Gaussian.The CAC2020 algorithm thus degrades the ensemble statistics in situations where the forecast distribution is non-Gaussian.This degradation limits the usefulness of the CAC2020 algorithm for situations with non-Gaussian forecast distributions.
Note that, except for the mean and covariance, the expanded ensemble's central moments (i.e., higher-order moments; e.g., skewness) likely differ from the forecast ensemble's.More specifically, the expanded ensemble's central moments will be closer to those of Gaussian distributions (e.g., zero skewness) than the forecast ensemble's central moments.This is because the virtual members are effectively drawn from a Gaussian distribution.If the forecast distribution is indeed a Gaussian distribution, then the expanded ensemble likely has better moments than the forecast ensemble.

The PESE-GC procedure
The CAC2020 algorithm is limited to generating Gaussian-distributed virtual members.PESE-GC overcomes this limitation by combining probit probability integral (PPI) transforms and their inverses with the CAC2020 algorithm.A PPI transform transforms any univariate distribution with a continuous CDF into a standard normal distribution, and the inverse PPI transform reverses the process.The quantity resulting from applying the PPI transform on a random variable is called "probit" and the coordinate space occupied by probits is called "probit-space".Such transforms are often used in Gaussian anamorphosis (Amezcua and Van Leeuwen, 2014;Grooms, 2022).
To define the PPI transform, suppose F i (x i ) represents the CDF of the i-th model variable x i (i = 1, 2, . . ., N x ), Φ −1 (q) represents the inverse CDF of the standard normal (q represents any quantile), and ϕ i represents the i-th model probit.The double appearance of index i in F i (x i ) is deliberate -the CDF varies with the chosen model variable.Note that Φ −1 (q) is sometimes called "the probit function" or the "quantile function of the standard normal".The conversion from x i to ϕ i (i.e., the PPI transform) is The inverse PPI transform (i.e., converting ϕ i to x i ) is The PPI transform generalizes the CAC2020 algorithm to handle non-Gaussian forecast ensembles.The resulting PESE-GC procedure has four stages and is illustrated in Fig. 3.These four stages are: 1.For each model state variable, fit a user-specified univariate distribution to that model variable in the forecast ensemble (i.e., marginal distribution fitting).2. For each model state variable, apply the PPI transform [Eq.( 11)] using that variable's fitted distribution to convert the forecast ensemble of that model variable into forecast probits.
3. For each model state variable, adjust the mean and variance of that variable's forecast ensemble probits to zero and unity, respectively (explained in section 2.3), and then apply the CAC2020 algorithm on that variable's forecast probits to generate virtual probits.
4. For each model state variable, apply the inverse PPI transform [Eq.( 12)] with stage 1's fitted distribution on that variable's virtual probits to generate that variable's virtual ensemble.
To be clear, the CAC2020 algorithm is implemented and used in stage 3.
Note that this four-stage procedure assumes that the multivariate forecast distribution is Gaussian in probit space.This assumption arises from the use of a Gaussian resampling algorithm (the CAC2020 algorithm) to generate virtual probits.This assumption is equivalent to assuming that the multivariate forecast distribution has a Gaussian copula.As such, this four-stage procedure is called "Probit-space Ensemble Size Expansion for Gaussian Copulas".
PESE-GC's four-stage procedure is attractive for geophysical EnsDA for several reasons.Aside from the fact that it can generate non-Gaussian virtual members, PESE-GC can be implemented in an embarrassingly parallel fashion (every loop over the model state variables is embarrassingly parallel).Furthermore, PESE-GC is likely affordable for geophysical EnsDA because the CAC2020 algorithm (stage 3) is efficient (see section 2.1).
Note that the quality of the virtual members depends on the distributions the user selects in step 1 of PESE-GC.This will be discussed later in Section 3.
2.3 On the mean and variance adjustments in PESE-GC's step 3 PESE-GC requires forecast ensemble probits with zero mean and unity variance.Otherwise, the resulting virtual members will disobey the marginal distributions fitted in PESE-GC's step 1.However, because the forecast ensemble size is finite, the forecast ensemble's probits may have non-zero mean and non-unity variance.To illustrate, suppose PESE-GC is applied on 5 univariate forecast ensemble members (red crosses in Fig. 2c) and a Gaussian-tailed rank histogram distribution (Anderson, 2010) is fitted to those 5 members in PESE-GC's step 1. Applying the PPI transform (PESE-GC's step 2) results in forecast probits with mean zero and a variance of approximately 0.561.10 7 virtual probits are then generated from these forecast probits using the CAC2020 algorithm, and the inverse PPI transform is applied to generate the virtual members (PESE-GC's step 4).The histogram-estimated PDF of the virtual members (blue curve in Fig. 2c) disagrees with the fitted (i.e., desired) Gaussian-tailed rank histogram PDF.
This problematic disagreement is resolved by adjusting the forecast probits' mean and variance to zero and unity (respectively) before generating the virtual members.Suppose the probit of the n-th forecast member for model variable i is ϕ i,n and the pre-adjusted prior ensemble probit's sample mean and sample variance are µ and σ 2 respectively.The adjustment process is simply The impact of this adjustment is illustrated in Fig. 2: the virtual members' histogram-estimated PDF (blue curve in Fig. 2d) now matches the Gaussian-tailed rank histogram PDF in PESE-GC's step 1.

Conceptual exploration of PESE-GC's EnsDA impacts
To understand the influence of PESE-GC on EnsDA, consider a joint model-observation space formulation of Bayes' rule: where y o is an N y -dimensional vector of observation values, This section explores the influence of PESE-GC on EnsDA through those two mechanisms.A bivariate example will be used to illustrate those mechanisms.Suppose a scalar forecast model variable x has a skewed-normal forecast distribution and 5 samples are drawn from this distribution (illustrated in Figure 4a).A signed square-root function h (x) will be used as the observation operator, and let y denote observation values.10,000 virtual members will be generated by PESE-GC in this bivariate example.Note that the true forecast distribution of x is the previously mentioned skewed-normal distribution (illustrated in Figure 4a), and the true forecast distribution of h (x) is estimated by applying h (x) on 1,000,000 samples of x drawn from the true forecast distribution of x.
3.1 Mechanism 1: PESE-GC improves ensemble-based representation of observation likelihood function Figure 6.Bivariate example demonstrating the impacts of drawing virtual members from a misinformed fitted marginal (gamma distribution).
The panels here are similar to Fig. 5.
For certain EnsDA algorithms that employ ensemble-based representations of the observation likelihood function, PESE-GC can improve those representations (impact mechanism 1).Two EnsDA algorithms will be considered: 1) the Rank Histogram Filter (RHF; Anderson (2010)), and 2) the serial stochastic Ensemble Kalman Filter (serial stochastic EnKF; Burgers et al. (1998)).In the case of the RHF, an ensemble-based piecewise approximation to the observation likelihood function is used (illustrated in Fig. 4b).The accuracy of that piecewise approximation depends on the ensemble size.When small forecast ensembles are used, that piecewise approximation is crude (red curve in Figure 4b).Since PESE-GC increases ensemble size, PESE-GC refines that piecewise representation (blue curve in Figure 4b).In the absence of other factors, this refinement will improve the performance of the RHF.
Impact mechanism 1 also manifests for the serial stochastic EnKF.To see that, consider a situation with two observations, and recall that the serial stochastic EnKF uses random draws from a univariate Gaussian distribution to represent the likelihood function (one draw per ensemble member).For small ensembles, only a few of those random draws are made.In other words, there are sampling errors in representing the likelihood function.The ensemble statistics resulting from assimilating the first observation are thus degraded by those sampling errors.This degradation then affects the assimilation of the second observation.The assimilation of more than two observations compounds such sampling issues.Since PESE-GC increases the ensemble size, more draws from the likelihood function are made, thus suppressing sampling errors.As such, in the absence of other factors, PESE-GC will improve the performance of the serial stochastic EnKF.
Note that many EnsDA algorithms are immune to impact mechanism 1.The deterministic variants of the EnKF (Bishop et al., 2001;Whitaker and Hamill, 2002;Anderson, 2003;Tippett et al., 2003;Sakov et al., 2008) and particle filters (Gordon et al., 1993;Snyder et al., 2008;Poterjoy, 2016;van Leeuwen et al., 2019) are immune to impact mechanism 1 because no ensemble-based representation of the likelihood function is used.Also, the stochastic EnKF that assimilates all observations simultaneously (all-at-once stochastic EnKF) is immune to this effect -the chain of events described in the previous paragraph will not occur for the all-at-once stochastic EnKF.Finally, EnsDA systems that run ensembles of variational data assimilation methods (e.g., the ECMWF) are also immune to impact mechanism 1.This immunity is due to the same reason as the all-atonce stochastic EnKF.

Mechanism 2: PESE-GC influences ensemble statistics
If the user specifies appropriate marginals in PESE-GC's stage 1, then PESE-GC will improve the ensemble statistics used by EnsDA algorithms (mechanism 2).This will be illustrated using the bivariate 5-member example discussed near the start of Section 3. Suppose the user knows that the prior marginal distribution of x is close to Gaussian.Thus, a Gaussian distribution is fitted to the forecast x values in PESE-GC's stage 1 (Section 2.2; Fig. 5a).Applying PESE-GC generates 10,000 virtual members.Figure 5 indicates that the virtual members' CDFs and x-y relationship are closer to the true CDFs and relationship than those of the forecast members -the virtual members' curves have visibly less distances from the true curves than the forecast members' curves.In other words, the virtual members have better ensemble statistics than the forecast ensemble.This improvement in ensemble statistics is a combination of 1) the user selecting an appropriate marginal for x, and 2) the nonlinear h (x) is evaluated more often (i.e., better sampled) with a large ensemble.As such, if the user specifies appropriate marginals in PESE-GC's stage 1, PESE-GC will likely improve the performance of EnsDA algorithms.
An important caveat is that if the user selects misinformed marginal distributions, then PESE-GC may degrade the ensemble statistics used by EnsDA algorithms.To illustrate, suppose the user fits a shifted gamma distribution (Cheng and Amin, 1983) to the 5 forecast x values.This distribution has three parameters: shape, scale and location.Since only 5 forecast values are used to fit three parameters, the fitted parameters' sampling errors are severe.Applying PESE-GC with this badly estimated shifted gamma distribution results in virtual ensemble statistics that are worse than those of 5 forecast x values (Fig. 6).In such situations, the performance of EnsDA will likely be degraded by PESE-GC.As such, the selection of marginals to use with PESE-GC must be done with care.
In the absence of knowledge about the model variables' prior marginal distribution, users can use non-parametric marginal distributions with PESE-GC.Such distributions include the Gaussian-tailed rank histogram distribution (Anderson, 2010) and kernel distibutions (Anderson and Anderson, 1999).Choosing such distributions may not improve the model-space ensemble statistics.However, because PESE-GC increases the number of observation operator evaluations, the observation-space ensemble statistics and the ensemble covariance/copula between observation and model quantities can be improved.
Note that when linear observations are assimilated, PESE-GC with Gaussian marginals will not change the performance of deterministic variants of the EnKF (Bishop et al., 2001;Whitaker and Hamill, 2002;Anderson, 2003;Tippett et al., 2003;Sakov et al., 2008).This is because the expanded ensemble will have the same joint-space mean and covariance as the forecast ensemble.The L96 model uses 40 variables (i.e., 40 grid points in a ring), a forcing parameter value of 8 (i.e., F = 8), and a timestep of 0.05 L96 time units.The L96 time unit is henceforth referred to as τ .All results in this paper will be displayed and discussed in terms of τ .Forward time integration of the model is done via the Runge-Kutta fourth-order integration scheme.
Every OSSE experiment runs for 5,500 cycles.Initial nature run states and the initial ensemble members are drawn from the L96's climatology.
In all experiments, there are 40 observations.Their observation locations are fixed throughout this study.Supposing that the model grid points have locations 0.025m, m = 1, 2, . . ., 40, each site location is a random draw from a uniform distribution between 0 and 1.
The PESE-GC-expanded ensemble sizes are specified in terms of factors: 5, 10 and 20 times the forecast ensemble size.
For example, if N E = 10, a PESE-GC factor of 10 means that the expanded ensemble has 100 members (i.e., N V = 90 virtual members are created).
Supposing the k-th observation site has location ℓ k , the observation operators for the three observation types are: where x is the L96 model's 40-element state vector, x ℓ k is the model variable linearly interpolated to location ℓ k , and sgn (x ℓ k ) returns the sign of x ℓ k .Observations created using h IDEN (x; ℓ k ), h SQRT (x; ℓ k ) and h SQU ARE (x; ℓ k ) will be respectively referred to as IDEN observations, SQRT observations and SQUARE observations.Every observation is created by applying its corresponding observation operator on a nature run state, and then perturbing the output with a random draw from N 0, σ 2 .
Note that IDEN's σ 2 is between 7.4% to 15% of the climatological IDEN error variance, SQRT's σ 2 is between 10% to 17% of the climatological SQRT error variance, and SQUARE's σ 2 is between 2% to 6% of the climatological SQUARE error variance.
The four EnsDA algorithms tested with PESE-GC are: 1 To be clear, the RHF algorithm first employs the rank histogram filter to generate observation increments and then uses linear regression to convert observation increments into model increments.The PR algorithm is similar to the RHF algorithm, except that probit regression is used to convert observation increments into model increments.
For each EnsDA algorithm, only 1 set of marginals are used with PESE-GC.When PESE-GC is used with the EAKF, EnKF or RHF algorithm, Gaussian marginals are selected for all 40 model variables.PESE-GC with Gaussian marginals is identical to the CAC2020 algorithm.In other words, for the EAKF, EnKF and RHF experiments, the virtual ensemble members follow multivariate Gaussian distributions.For the PR algorithm, the Gaussian-tailed rank histogram is selected as the marginal for every one of the 40 model variables.This means the PR experiments' virtual ensemble members follow multivariate non-Gaussian distributions.Future work can investigate the impacts of using PESE-GC with Gaussian-tailed rank histograms (or kernel density estimates) with the EAKF, EnKF and RHF.
Each of the 1440 configurations is trialed 36 times.These trials are enumerated (Trial 1, Trial 2, and so forth).All experiments with the same trial number and N E share the same nature run and initial forecast ensemble states.For example, the following two EnsDA experiments have the same initial nature run and initial forecast ensemble: 1) Trial 10 using IDEN observations, 20 forecast ensemble members, 0.15τ cycling period, EAKF, and PESE-GC with an expansion factor of 20, and 2) Trial 10 using SQRT observations, 20 forecast ensemble members, 0.60τ cycling period, RHF, and PESE-GC with an expansion factor of 5. Note that experiments with different trial numbers have different nature runs and initial forecast ensemble states.
The Gaspari-Cohn fifth-order rational function (Gaspari and Cohn, 1999) where x f i and x t i are, respectively, the forecast ensemble mean state and nature run state at model grid point i.The RMSE of the first 500 out of 5,500 EnsDA cycles are discarded.The localization half-radius that results in the smallest cycle-averaged RMSE (i.e., averaged from cycles 501 to 5,500) is selected as the optimal localization half-radius.Note that for the same configuration, the optimal localization half-radius can vary with the trial number.
The inflation scheme used here is identical to the one used by Anderson (2023).The adaptive prior inflation algorithm of Anderson ( 2009) is used with an inflation lower bound of 1 (no deflation), an upper bound of 2, a fixed inflation standard deviation of 0.6, and an inflation damping of 0.9.While manually tuning a homogeneous inflation factor or a relaxation-to-posteriorspread (RTPS; Whitaker and Hamill (2012)) relaxation factor may give smaller RMSEs, an adaptive inflation approach is chosen to reduce the computational cost of this study.This study already runs 881,280 (1, 440 × 36 × 17) combinations of configurations (1,440), trials (36), and localization half-radii ( 17), and each combination is run for 5,500 cycles.

Metric to assess PESE-GC's impact on EnsDA
The impacts of PESE-GC on EnsDA are assessed using the relative difference between cycle-averaged RMSEs (Eq.( 20  without using PESE.The relative difference between RMSE (ξ, r, True) and RMSE (ξ, r, False) is defined as Here, the denominator is the trial-averaged (indicated by angled brackets) cycle-averaged RMSE of EnsDA run with the same ξ as in the numerator, but with PESE-GC unused.For readability, (ξ, r) is omitted from the rest of this paper.Most importantly, a negative ∆RMSE indicates that PESE-GC improves the performance of EnsDA, and a positive value of ∆RMSE indicates 320 that PESE-GC degrades the performance of EnsDA.
Only statistically significant trial-averaged ∆RMSE (henceforth, ⟨∆RMSE⟩) will be discussed in this paper.A ⟨∆RMSE⟩ is considered statistically significant if its two-tailed z-test p value is smaller than 1%.These statistically significant ⟨∆RMSE⟩ values are plotted in Figs. 7, 8, and 9. Before proceeding, note that PESE-GC with Gaussian marginals only negligibly changes the performance of the EAKF with IDEN observations (Fig. 7a1, b1 & c1).This negligibility is predicted in Section 3.2.Any impact of PESE-GC on the performance of these experiments is likely due to rounding errors associated with the use of finite precision arithmetic.

Impacts of using 20-fold PESE-GC on EnsDA
This study first examines the ⟨∆RMSE⟩ for a PESE-GC factor of 20 (panels a1,a2,a3 and a4 in Figs. 7,8 and 9).The focus is on identifying common patterns in ⟨∆RMSE⟩ and explaining them through the lens of the two mechanisms laid out in Section 3. The variation of ⟨∆RMSE⟩ with different PESE-GC factors (remaining panels in Figs. 7, 8, and 9) will be discussed in Section 4.4.
The first common ⟨∆RMSE⟩ pattern in the 20-fold PESE-GC situations is that PESE-GC generally improves EnsDA performance (i.e., ⟨∆RMSE⟩ < 0) when N E is 10 or 20 (panels a1, a2, a3 and a4 in Figs. 7, 8 and 9).This is because either one or both mechanisms described in Section 3 are acting to improve RMSEs.First, for the PR, RHF and EnKF experiments, the performance improvements are partly (if not entirely) because PESE-GC improves the ensemble-based representation of the likelihood function (i.e., mechanism 1; Section 3.1).For the PR experiments with IDEN observations, mechanism 1 is likely the sole reason for the improved performance.Second, the PESE-GC-induced RMSE reductions in all 10/20 forecast member experiments with either SQRT or SQUARE observations are partly due to improved sampling of the observation operator (i.e., mechanism 2 described in Section 3.2).The RMSE reductions seen in the 10/20-member EAKF, EnKF and RHF experiments are also partly due to improved ensemble statistics (i.e., mechanism 2; Section 3.2).This is because the L96 model's forecast statistics tend to be close to Gaussian (e.g., Chan et al. (2020)).Applying PESE-GC with Gaussian marginals thus improves the model variables' prior ensemble statistics, therefore improving the performance of the EAKF, EnKF and RHF experiments.
The second common pattern in the 20-fold PESE-GC experiments is that with increasing N E , PESE-GC's RMSE impacts can go from improving (⟨∆RMSE⟩ < 0) to degrading (⟨∆RMSE⟩ > 0).This pattern is likely related to both mechanisms in Section 3. First, for the EnKF, RHF and PR, the ensemble-based representation of the observation likelihood function improves with increasing N E .This improvement implies there is less room for PESE-GC to improve that representation.As such, mechanism 1 weakens with increasing N E , thus reducing PESE-GC-induced EnsDA performance gains for the EnKF, RHF and PR EnsDA algorithms.
For the EAKF, EnKF and RHF experiments, impact mechanism 2 also contributes to the worsening of PESE-GC's RMSE impacts with increasing N E .In the act of choosing Gaussian marginal distributions for PESE-GC, the user implicitly assumes that the true forecast PDF is Gaussian.With increasing N E , imperfections in this Gaussian assumption become increasingly evident.The impacts of PESE-GC on the observation space prior statistics can thus go from improving to degrading with increasing N E .This change likely contributes to the worsening of PESE-GC's RMSE impacts with increasing N E .
A third common pattern is that with longer cycles, the PESE-GC's RMSE impacts on the EAKF, EnKF and RHF degrades (i.e., ⟨∆RMSE⟩ goes from either negative to zero, zero to positive, or negative to positive).This pattern is likely due to increasing non-Gaussianity in the forecast ensemble's statistics with longer cycles.The choice to use Gaussian marginals in these experiments' PESE-GC thus becomes increasingly inappropriate, meaning that impact mechanism 2 increasingly degrades the performance of EnsDA.
The fourth common pattern is that in the PR experiments, for N E ≥ 20, the 20-fold PESE-GC's impact improves with longer cycling interval (Figs. 7a4,8a4 and 9a4).A plausible explanation relates to PR's usage of a piecewise approximation to the observation likelihood function (henceforth, the "piecewise approximation").This approximation is more accurate when more ensemble members sample the regions of the observation likelihood function where that function varies strongly.However, with increasing forecast ensemble variance, those regions tend to be less sampled by forecast ensemble members, thus degrading the piecewise approximation.Since longer cycling intervals increase forecast ensemble variance, longer cycling intervals thus increase the room for PESE-GC to improve the piecewise approximation.Future work can test this explanation.
Note that the chain of events discussed in the previous paragraph likely occurs for the RHF experiments as well.Since the RHF experiments do not exhibit the fourth common pattern, it is likely that the inappropriateness of the Gaussian marginals used with PESE-GC overwhelms improvements introduced by refining the piecewise approximation.

Variations in PESE-GC's impacts with different amounts of ensemble expansion
This study now examines common patterns in how PESE-GC's impacts vary with PESE-GC expansion factors.The first common pattern is that PESE-GC's impacts on the PR experiments tends to weaken with smaller PESE-GC factors (panels a4, b4 and c4 in Figs. 7, 8 and 9).This pattern is likely caused by increasing sampling errors in the virtual members' statistics with fewer virtual members (i.e., smaller PESE-GC factors).
The second common pattern is that, for  and c3).
To explain this third pattern, notice that these instances are associated with short cycling intervals (0.05-0.10τ ) and shorter cycling intervals are associated with increasingly Gaussian forecast PDFs.Based on these associations, a plausible explanation is that the true forecast statistics only mildly deviate from Gaussian, but the forecast ensemble statistics are often far from Gaussian.Introducing some Gaussian virtual members thus improves the ensemble statistics.However, if too many Gaussian virtual members are introduced, the expanded ensemble statistics become too close to Gaussian.This "Goldilocks" explanation can be tested in future work.
Most importantly, even with a mere 5-fold PESE-GC, PESE-GC improves the performance of EnsDA in three types of situations.First, all EnsDA experiments involving small forecast ensemble sizes (10 members) are improved by PESE-GC.), Bi-Gaussian PDFs are fitted to each variable, and PESE-GC creates 1,000,000 virtual members.Panels a1 and b1 show the PDFs that the initial members are drawn from (i.e., the true bivariate PDFs), panels a2 and b2 show bivariate empirical PDFs estimated by the virtual members, panels a3 and b3 show the true bivariate PDF in probit space, and panels a4 and b4 show the virtual members' bivariate empirical PDFs in probit space.The two true bivariate are: a trimodal PDF with an underlying Gaussian copula (a1) and a bi-Gaussian PDF (panel d1).
Note that the bi-Gaussian PDF's copula is not a Gaussian copula (b3).The two true PDFs are described in Section 5.1.
Second, situations where using Gaussian marginals with PESE-GC improves ensemble statistics are also improved by PESE-GC.This second type of situation occurs for the EAKF, EnKF and RHF experiments that either have 1) 20-40 ensemble members and/or 2) cycling intervals that are 0.30τ or less.Third, PESE-GC improves the PR experiments for cycling intervals that are 0.30τ or 0.60τ .This improvement is plausibly because with longer cycling intervals, PESE-GC better improves the piecewise observation likelihood approximation used by the PR EnsDA algorithm (explained in Section 4.3).These PESE-GC-introduced improvements are particularly encouraging because a geophysical EnsDA system is more likely able to afford using 5-fold PESE-GC over 20-fold PESE-GC.

PESE-GC assumes Gaussian copulas
The results presented in the previous section are encouraging.However, a caveat about PESE-GC needs discussion: PESE-GC assumes that the forecast distribution is a multivariate Gaussian distribution in probit space (i.e., Gaussian copula).If that assumption is violated (henceforth, "Gaussian copula assumption"), the virtual members will possess statistical artifacts.
Fig. 10a illustrates PESE-GC's ability to generate non-Gaussian virtual members for a situation where the Gaussian copula assumption holds.The true forecast multivariate PDF (Fig. 10a1) is created by applying two inverse PPI transforms on a bivariate Gaussian PDF.The two-dimension mean vector µ and the 2 × 2 covariance matrix Σ of the bivariate Gaussian PDF are The first inverse PPI transform is applied on the first variable (x 1 ) and the PDF it uses (p (x 1 )) is where G (x 1 ; −1, 2) represents the Gaussian PDF with scalar mean −1 and standard deviation 2, and likewise for G (x 1 ; 3, 1).
Since the forecast PDF in Fig. 10a1 has, by construction, a Gaussian copula (Fig. 10a3), PESE-GC can produce virtual members that approximately follow the forecast PDF.To demonstrate, 100 forecast members were drawn from the forecast PDF, and 1,000,000 virtual members were constructed using PESE-GC.The two marginal PDFs that are used in steps 1, 2 and 4 of PESE-GC are univariate Bi-Gaussian PDFs (fitted via maximum likelihood estimation in step 1).The histogram-estimated PDFs of the virtual members (Fig. 10a2) and virtual probits (Fig. 10a4) are similar to the true forecast PDF (Fig. 10a1) and the true forecast's probit-space PDF (Fig. 10a3).
An example where the Gaussian copula assumption is violated is shown in Fig. 10b.Here, the forecast PDF (Fig. 10b1) is the following bivariate bi-Gaussian PDF where Applying PPI transforms on this bivariate bi-Gaussian forecast PDF reveals that the bi-Gaussian PDF violates the Gaussian copula assumption (Fig. 10b3).When PESE-GC is applied to generate 1,000,000 virtual members from 100 forecast members, the virtual members' probit space bivariate PDF (Fig. 10b4) differs from the forecast PDF in probit space (Fig. 10b3).As such, the virtual members' bivariate PDF (Fig. 10b2) deviates from the true bivariate bi-Gaussian forecast PDF (Fig. 10b1; the virtual members have two small spurious modes).
Note that though the virtual members' PDF deviates from the forecast PDF, a strong similarity exists between the two PDFs.
The two dominant modes of the virtual members' PDF are very similar to the bi-Gaussian forecast PDF.More generally, milder violations of the Gaussian copula assumption will likely lead to milder spurious statistical features in the virtual members.
More importantly, PESE-GC's Gaussian copula assumption may not be problematic for geophysical EnsDA.Due to the high dimensionality of geophysical models and small forecast ensemble sizes, it is difficult to identify the family of the multivariate forecast distributions in probit space.In other words, the forecast ensemble's statistics in probit space are likely indistinguishable from a multivariate Gaussian.This indistinguishability permits assuming Gaussian copulas.Future work can investigate this possibility.

Impacts of PESE-GC on the computational cost of EnsDA
It is also important to discuss the impacts of PESE-GC on the EnsDA process (i.e., the forecast step and analysis step).
Since the virtual members are deleted before running forecast models (Figure 1), PESE-GC does not change the forecast step's computational cost.However, PESE-GC will increase the computational cost of the analysis step because 1) the number of observation operator calls scales linearly with the ensemble size, and 2) the EnsDA filter algorithm's (e.g., the EAKF algorithm) computational complexity scales with the ensemble size.Supposing the computational cost of the EnsDA filter algorithm scales linearly with ensemble size, then the computational cost of the EnsDA analysis step scales linearly with the number of ensemble members created by PESE-GC.
However, the increase in the computational cost associated with PESE-GC is likely far more affordable than running a larger forecast ensemble.This is because the computational cost of the forecast step often accounts for > 80% of the overall computational cost of the EnsDA process and the analysis step accounts for the remaining < 20%.Consider the situation where PESE-GC quintuples the ensemble size.The overall computational cost of the EnsDA process will only increase by < 80%.
In contrast, if the forecast ensemble size is quintupled, then the overall computational cost of the EnsDA process increases by 400%.As such, PESE-GC is an alluring alternative to increasing the forecast ensemble size.

Summary and future work
In this study, an efficient and embarrassingly parallel algorithm to increase ensemble sizes, PESE-GC, is formulated.PESE-GC generalizes the efficient and embarrassingly parallel Gaussian resampling algorithm of Chan et al. (2020) to handle non-Gaussian forecast distributions.This handling of non-Gaussian forecast distributions means PESE-GC is highly flexible.Furthermore, PESE-GC provides an avenue for users to use their knowledge of the forecast statistics to improve EnsDA -users can choose to draw virtual members using marginal distribution families (e.g., Gaussian and gamma distribution families) that they think are good approximations to the true forecast marginal distributions.If that knowledge is unavailable, users can choose to use non-parameteric marginal distributions (e.g., Gaussian-tailed rank histogram distributions).
Three mechanisms are then identified for PESE-GC to influence the performance of EnsDA.It is particularly encouraging that many of these improvements are retained even with a modest amount of ensemble expansion (5-fold expansion).
There are two general areas for future work with PESE-GC.The first area is to move PESE-GC towards geophysical models (EnsDA or forecast postprocessing).To do so, PESE-GC needs to be first tested with ensemble members created by geophysical models (e.g., Weather Research and Forecasting model; Skamarock et al. (2008)).It will be particularly interesting to see if the virtual members have realistic meteorological structures (e.g., convective clouds with supporting circulations).Then, PESE-GC can be tested using geophysical EnsDA and/or postprocessing system.If PESE-GC does improve the performance of geophysical EnsDA/postprocessing, a comparison between PESE-GC and other ensemble expansion methods is warranted.
Another general area for future work is to develop the PESE-GC algorithm further.First, given the importance of localization in practical EnsDA, future work can and should explore inserting localization into PESE-GC.Second, the validity of PESE-GC's Gaussian copula assumption can be assessed in the context of geophysical modelling and forecasting.If the Gaussian copula assumption is inappropriate, then non-parametric methods to generate virtual probits can be explored.Third, methods to detect the usage of misinformed parametric marginal distribution families deserve exploration.One possible detection method is to employ hypothesis testing on the marginal distributions.For example, if Gaussian distributions are selected for PESE-GC, then the Shapiro-Wilk test can be applied on the forecast ensemble to determine if the selection is misinformed (e.g., Kurosawa and Poterjoy (2023)).Finally, the use of non-Gaussian marginals with PESE-GC may alter ensemble covariances between model variables.This possibility deserves future investigation.
The computational cost of running geophysical models will continue increasing in the coming years (higher spatial resolution, shorter time steps, more complex parameterization schemes, etc).Geophysical EnsDA groups will continue to grapple with the challenge of balancing the computational costs of increasing the number of forecast ensemble members and the computational costs of using more realistic geophysical models.If ensemble expansion methods can provide much of the benefits of a larger forecast ensemble size at a fraction of the cost, these methods will enable EnsDA groups to employ more realistic geophysical models.

Figure 2 .
Figure 2. Plots demonstrating the impacts of various heuristic choices in the formulation of PESE-GC.Panels a and b respectively demon-

Figure 3 .
Figure 3. Illustrations of PESE-GC's 4-stage algorithm.Panels a, b, c, and d respectively show the aftermath of stages 1, 2, 3, and 4. The details of these stages are described in Section 2.2.Note that the CAC2020 algorithm is applied in stage 3 (i.e., between panels c and d).

Figure 4 .
Figure 4. Plots of the true forecast PDF (a) and observation likelihood functions (b).The 5 forecast ensemble x values are indicated with red crosses in (a), and the 5 forecast ensemble y values are indicated with red crosses in (b).The red curve in (b) indicates the piecewiseapproximated observation likelihood function used by a rank histogram filter that only uses the 5 forecast members.With PESE-GC, the rank histogram filter's piecewise-approximated observation likelihood function (blue curve) is improved.

Figure 5 .
Figure 5. Bivariate example demonstrating the impacts of drawing virtual members from an informative fitted marginal (normal distribution).Panel a shows the empirical CDFs of x from the initial members and virtual members.The estimated relationships between x and y (obtained by passing the initial members and virtual members through h(x)) are displayed in panel b.Finally, panel c shows the empirical CDFs of the initial members and virtual members for variable y.The true CDFs and x-y relationships are plotted in panels b and c with dotted black lines.

4
Tests with Lorenz 1996 model 4.1 Setup of experiments This section explores the impacts of PESE-GC on the performance of EnsDA using perfect model Observing System Simulation Experiments (OSSEs) with the Lorenz 1996 model (L96 model; Lorenz (2006)).The Data Assimilation Research Testbed (DART; https://github.com/NCAR/DART;Anderson et al. (2009)) is used in this exploration and PESE-GC has been implemented into DART.
)) with PESE-GC and cycle-averaged RMSEs without PESE-GC.The cycle averaging is done from cycles 501 to 5,500.To define this RMSE relative difference, suppose an arbitrary configuration of N E , cycling interval, PESE-GC factor, observation type, and EnsDA algorithm is denoted by ξ.Let RMSE (ξ, r, True) denote the cycle-averaged RMSE of the r-th trial of configuration ξ with PESE-GC used.Furthermore, let RMSE (ξ, r, False) denote the cycle-averaged RMSE of the r-th trial of configuration ξ

Figure 9 .
Figure 9. Similar to Fig. 7, except that SQUARE observations are assimilated.

Figure 10 .
Figure10.Two bivariate demonstrations of PESE-GC.In each demonstration, 100 initial members are drawn from a true bivariate PDF (a1, b1), Bi-Gaussian PDFs are fitted to each variable, and PESE-GC creates 1,000,000 virtual members.Panels a1 and b1 show the PDFs that First, for EnsDA methods like the serial stochastic EnKF and the rank histogram filter, PESE-GC improves the representation of the observation likelihood function.Second, by expanding the number of ensemble members, PESE-GC increases the sampling of the observation operator.This increased sampling improves the forecast observations' PDF.Finally, when users use PESE-GC with informative marginal distribution families, the forecast observations' statistics are improved.The impacts of PESE-GC on the performance of EnsDA are explored using the L96 model, a variety of observation systems and a variety of EnsDA algorithms.Results indicate that PESE-GC generally improves the performance of EnsDA when 1) the forecast ensemble size is small (10 members), 2) the marginal distribution families used with PESE-GC are informative, and/or 3) PESE-GC improves the representation of the observation likelihood function (the PR experiments in sections 4.3 and 4.4).
is used to localize EnsDA increments.For each combination of trial and configuration, 17 localization half-radii are tested:0.075×1.3 0 , 0.075 × 1.3 1 , 0.075 × 1.3 2 , 0.075 × 1.3 3 , .. ., 0.075 × 1.3 14 , 0.075 × 1.3 15 , and infinity.To select the optimal localization half-radius for a given combination of trial and configuration, the root-mean-square error (RMSE) of the forecast ensemble mean is used.The RMSE for a particular N E ≥ 40, smaller PESE-GC factors tend to result in milder RMSE degradations introduced by PESE-GC in the EAKF, EnKF and RHF experiments.Since these RMSE degradations are likely due to the misinformed selection of Gaussian marginals with PESE-GC (see section 4.3), reducing the number of virtual members reduces the amount of misinformation introduced by PESE-GC into the forecast statistics.These less misinformed forecast statistics explain the second common pattern.An interesting third common pattern is also visible in the EAKF, EnKF and RHF experiments: there are instances where reducing PESE-GC factors 1) turns insignificant RMSE impacts into RMSE improvements (e.g., the lower left corner of Figs.8a1 and c1), or 2) turns RMSE degradations into RMSE improvements (e.g., upper right corner of Figs.7a3