Data Assimilation in Spatio-Temporal Models with Non-Gaussian Initial States—The Selection Ensemble Kalman Model

: Assimilation of spatio-temporal data poses a challenge when allowing non-Gaussian features in the prior distribution. It becomes even more complex with nonlinear forward and likelihood models. The ensemble Kalman model and its many variants have proven resilient when handling nonlinearity. However, owing to the linearized updates, conserving the non-Gaussian features in the posterior distribution remains an issue. When the prior model is chosen in the class of selection-Gaussian distributions, the selection Ensemble Kalman model provides an approach that conserves non-Gaussianity in the posterior distribution. The synthetic case study features the prediction of a parameter ﬁeld and the inversion of an initial state for the diffusion equation. By using the selection Kalman model, it is possible to represent multimodality in the posterior model while offering a 20 to 30% reduction in root mean square error relative to the traditional ensemble Kalman model.


Introduction
Data assimilation of spatio-temporal models is a challenge in many fields of study, including, but not limited to, air pollution mapping, weather forecast, petroleum engineering, and ground water flow assessment. Over the years, methods have been developed to handle increasingly complex problems. It started with the Kalman filter as presented in the seminal publication [1]. The Kalman filter is based on a Gaussian initial model and Gauss-linear forward and observation models. It defined the foundation for data assimilation and is still used in many assimilation studies. The extended Kalman filter (EKF) [2] appeared as a natural methodological extension that allowed for nonlinearity in the Kalman filter framework by linearization. The ensemble Kalman filter (EnKF) [3,4] defined a Monte Carlo approach to the filter and it became popular as it allowed for nonlinearity in the forward and observation models without having to evaluate analytical gradients. The EnKF and its variants have proven to be efficient in solving high-dimensional and nonlinear problems, see [5,6]. In the EnKF, the initial ensemble members represent the initial state which may not have an analytical expression. The forward model then propagates the ensemble members forward in time. Pseudo observations are generated using the observation model. The conditioning of each ensemble member is made with the Kalman weights estimated from the ensemble to give the best linear update. In cases where the initial model is non-Gaussian, the distribution of the variable of interest conditioned on the data will tend toward Gaussianity as observations are assimilated due to the linear assimilation rule.
Non-Gaussian initial distributions may be conserved by using a univariate transform into Gaussian marginals while assuming multi-Gaussianity in the transformed space. A univariate back transform is then used to return to the original space. This approach has a long history in traditional statistics, geostatistics, and more recently in ensemble methods for data assimilation, which is referred to as copulas [7], normal score transform [8], and Gaussian anamorphosis [9], respectively. The latter has shown to improve the performance of the EnKF in many applications [10,11]. There are however some unresolved issues since Gaussian anamorphosis transforms the marginal distributions rather than the full distribution, and the effect on the resulting variables interdependence is uncertain.
The Ensemble Randomized Maximum Likelihood Filter (EnRML) [12] and its close relative the Iterative EnKF (IEnKF) [6] are primarily used to handle nonlinearities in the forward and observation models, but they will also retain certain non-Gaussian features in the filtering distribution. These filters require gradient evaluations to execute the update which can be complicated even if the adjoint state method is used. One alternative is to evaluate the gradient using the ensemble itself [13], but this approach introduces an approximation with unclear consequences, particularly in models with multimodal marginals.
Multimodality in the prior model can be represented using categorical auxiliary variables to construct Gaussian mixture prior models [14][15][16]. In a spatial setting, these models appear as a combination of Gaussian random fields whose parameters depend on the value taken by the categorical variable, but in order to retain spatial dependence, the categorical variable must also have a spatial dependence. This indicator spatial variable can be modeled as a Markov [17] or truncated pluri-Gaussian [18] random field. For both of these models, there are challenges related to temporal data assimilation, although some encouraging examples have been developed [19].
We define and study an alternative prior model, the selection-Gaussian random field [20,21], which may represent multimodality, skewness, and peakedness. This random field model is conjugate with respect to Gauss-linear forward and observation models, similarly to the Gaussian random field model. The posterior distribution is therefore analytically tractable under these assumptions [22]. For general forward and observation models, ensemble based algorithms along the lines of the EnKF can be designed. Such selection ensemble Kalman algorithms are the focus of this study, and they are evaluated on a couple of examples.
In Section 2, we introduce the selection ensemble Kalman model. It provides a framework for the use of the selection-Gaussian distribution as a prior in data assimilation. This framework is then used for ensemble filtering and smoothing through the selection EnKF (SEnKF) and the selection EnKS (SEnKS) algorithms. In Section 3, a synthetic case study of the diffusion equation, with two distinct test cases, showcases the ability of the proposed approaches to assess a parameter field and the initial state of a dynamic field. Results from the SEnKF and the SEnKS are compared to that of the traditional EnKF and the EnKS, respectively. In Section 4, potential shortcomings are discussed and the results are put into perspective with respect to applicability in more realistic applications. In Section 5, conclusions are presented.
In this paper, f (y) denotes the probability density function (pdf) of a random variable y, ϕ n (y; µ, Σ) denotes the pdf of the Gaussian n-vector y with expectation n-vector µ and covariance (n × n)-matrix Σ. Furthermore, Φ n (A; µ, Σ) denotes the probability of the aforementioned Gaussian n-vector y to be in A ⊂ R n . We also use i n to denote the all-ones n-vector, I n to denote the identity (n × n)-matrix and 1(S) to denote the indicator function that equals 1 when S is true and 0 otherwise. We consider log-diffusivity to be an adimensional quantity and it will therefore not be given a unit.
Prior model: The prior model on r consists of an initial and a forward model, where f (r 0 ) is the pdf of the initial state and f (r 1:T+1 |r 0 ) defines the forward model. (a) Initial distribution: The distribution for the initial state f (r 0 ) is assumed to be in the class of selection-Gaussian distributions [20,21]. Consider a Gaussian (n + n)-vector [r, ν], with n-vectors µr and µ ν , (n × n)-matrix Γ ν|r , and where Σr, Σ ν , and Σ ν|r are all three covariance (n × n)-matrices with Σ ν = Γ ν|r ΣrΓ T ν|r + Σ ν|r . Define a selection set A ⊂ R n of dimension n and let r 0 = [r|ν ∈ A]; then, r 0 is in the class of selection-Gaussian distribution and its pdf is, Note that the class of Gaussian distributions constitutes a subset of the class of selection-Gaussian distributions with Γ ν|r = 0 × I n . The dependence in [r, ν] represented by Γ ν|r and the selection subset A are crucial user-defined parameters with the latter being temporally constant. The selection-Gaussian model may represent multimodal, skewed, and/or peaked marginal distributions, see [21]. In this study, the initial distribution is defined to be a discretized stationary selection-Gaussian random field with parametrization, For a given spatial correlation (n × n)-matrix Σ ρ r , a stationary selection-Gaussian random field is fully parametrized by Θ SG = (µr, µ ν , σr, Σ ρ r , γ, A). Similarly, a stationary Gaussian random field is parametrized by Θ G = (µ r , σ r , Σ ρ r ).
(b) Forward model: The forward model given the initial state [r 1:T+1 |r 0 ] is defined as where ω t (·, ·) ∈ R n is the forward model with random n-vector r t , independent and identically distributed (iid) for each t. This forward model may be nonlinear, but, since it only involves the variable at the previous time step r t , it defines a first-order Markov chain. Note that f (r t+1 |r t ) cannot generally be written in closed form. with where ψ t (·, ·) ∈ R m is the likelihood function with random m-vector d t , iid for each t. Note that f (d t |r t ) cannot generally be written in closed form.
Posterior model: The posterior model for the HM model in Figure 1 is given by and is also a Markov chain, see [23,24]. This model is denoted the selection ensemble Kalman model. If the forward and likelihood models are Gauss-linear, the posterior model is also selection-Gaussian and analytically tractable, see [22]. When the forward and/or likelihood models are nonlinear, however, approximate or sampling based assessment of the posterior model must be made. For this purpose, we introduce the selection ensemble Kalman filter (SEnKF) and smoother (SEnKS) in the spirit of the traditional ensemble Kalman model [3]. The traditional EnKF algorithm aims at assessing the forecast pdf f (r T+1 |d 0:T ), and it is justified by general HM model recursions, see [23]. The algorithm is initiated by and utilizes the recursion for t = 1, . . . , T, The expressions are represented by an ensemble of realizations, which in each recursion is conditioned using a linearized approximation with Kalman weights estimated from the ensemble. Thereafter, the ensemble is forwarded to the next time step. The SEnKF introduced in this study relies on the same relations as above, but it operates on the augmented (n + n)-vector [r · , ν], see Equation (2). Hence, the forward model is defined as where the auxiliary n-vector ν t is temporally constant. The likelihood model is defined as The SEnKF algorithm provides an ensemble representation of and, based on this ensemble, empirical sampling based inference, see [21], is used to obtain the forecast of interest: The SEnKF algorithm is specified in Algorithm A1 in Appendix A.
The traditional EnKS algorithm aims at evaluating the interpolation pdf f (r 0:T |d 0:T ) with corresponding HM model recursions, see [23]. The algorithm is initiated by and the recursions for t = 1, . . . , T, The expressions are represented by an ensemble of realizations. Forwarding is made on the ensemble and the conditioning is empirically linearized. Note that the dimension of the model increases very fast, one may therefore only store the interpolation pdf f (r s |d 0:T ) at the time point s of interest. The SEnKS introduced in this study relies on the relations defined above and uses an extended (n + n)-vector [r, ν] as defined in Equation (2). The forward and likelihood models are identical to those defined for the filter. The SEnKS algorithm provides an ensemble representation of and by using empirical sampling based inference, see [21], the interpolation of interest is assessed, The SEnKS algorithm is specified in Algorithm A2 in the Appendix A. Both algorithms, SEnKF and SEnKS, contain empirically linearized conditioning and asymptotic results, when the ensemble size goes to infinity, and are consistent only for Gauss-linear forward and likelihood models. Under these assumptions, the model is analytically tractable; however, see [22]. In spite of this lack of asymptotic consistency for general HM models, the ensemble Kalman scheme has proven surprisingly reliable for high-dimensional, weakly nonlinear models even with very modest ensemble sizes [25].

Results
We consider two test cases to illustrate the relevance of the selection ensemble Kalman algorithms presented in Section 2. The model, common to both test cases, is based on the diffusion equation.
The test cases are designed such that it will be opportune to consider bi-modal initial distributions. In the first test case, we compare the SEnKF to the traditional EnKF with a focus on predicting the diffusivity field that contains a high diffusivity channel. In the second test case, we compare the SEnKS to the traditional EnKS with a focus on evaluating the initial temperature field that is divided into two distinct areas where the initial temperature is substantially higher in one than in the other.

Model
Consider a discretized spatio-temporal random field, {r t (x), x ∈ L r ⊂ R 2 } where t ∈ L t : {0, 1, . . . . , T} and r t (·) ∈ R that represents temperature ( • C). Let a discretized spatial random field, . Let x be the spatial reference on the regular spatial grid L r on the domain D, while t is the temporal reference on the regular temporal grid L t . The number of spatial grid nodes is n = 21 × 21, and they are placed every 10 cm vertically and horizontally. The discretized temperature field at time t may be represented by the n-vector r t and the diffusivity field by the n-vector λ. Both are assumed to be unknown. Note that the Kalman models are defined on the joint variable [r t , λ].
Assume that, given the initial temperature field, the field evolves according to the diffusion equation: with n the outer normal to the domain and q a source term. The expression in Equation (20) is discretized using finite differences and the forward model is defined as with ω * (·, ·) ∈ R n . Convergence and stability of the numerical method are easily ensured for the finite difference scheme that is used. The initial temperature field r 0 is considered unknown in the test cases. The forward model is assumed to be perfect in the sense that there is no model error. The forward model in Equation (6) then takes the form, This forward model is nonlinear due to the product of r t and λ in Equation (20). Consequently, the assumption of Gauss-linearity required for both the traditional Kalman model [1] and the selection Kalman model [22] is violated and necessitates ensemble based algorithms.
The observations are acquired in a m = 5 location pattern on the spatial grid L r at each temporal node in L t , providing the set of observations m-vectors d t , t ∈ L t . The corresponding likelihood model is defined as where the observation (m × n)-matrix H is a binary selection matrix, while the centered Gaussian m-vector d t with the covariance (m × m)-matrix Σ d|r = σ 2 d|r I m , and σ 2 d|r = 0.1 represents independent observation errors. This likelihood model is in Gauss-linear form.

Test Case 1: Predicting the Parameter Field
The focus of this test case is to predict the unknown diffusivity field λ based on the observations d. Because diffusivity is constant in time, smoothing and filtering give an identical prediction of the field. However, filtering is preferred because it does not require updating the ensemble at all future times in addition to the previous one, see [26]. The posterior model is evaluated using the SEnKF, see Appendix A and the results are compared to those from the traditional EnKF algorithm.
The true diffusivity n-vector λ is displayed in Figure 2. The diffusivity λ is always positive. To ensure that ensemble updates do not lead to negative diffusivity values, we work on log(λ). The figure shows a channel in which the diffusivity is higher than in the rest of the field. The diffusivity field is formally defined as where D 1 ⊂ D is the low diffusivity area and D 2 ⊂ D is the high diffusivity channel. The parameter values are λ 1 = e −12 m 2 s −1 and λ 2 = e −5 m 2 s −1 . The true temperature field is initially at 20 • C and the heat source on the lower border of the high diffusivity channel starts pumping in heat at T = 0 at a constant volumetric rate q = 15 W m −3 , see Figure 2. The temporal evoluation of the temperature field, shown in Figure 3, is obtained by solving the diffusion equation in Equation (20) for the log-diffusivity field in Figure 2 and the initial temperature field defined above. The temperature observations d, see Figure 4, are then collected from the five locations shown in Figure 2 using the likelihood model defined in Equation (23). The measurements are taken every second from T = 0 to T = 100. As the heat from the source diffuses mostly along the high diffusivity channel, the observed temperature increases substantially at observation locations within the channel. The unknown initial field for log-diffusivity log(λ) is assigned a stationary selection-Gaussian random field prior model with parameters Θ SG λ = (µ λ r , µ λ ν , σ λ r , Σ ρ r , γ λ , A), see [21] and Equation (2). The parameter values for the prior model are listed in Table 1.    Table 1. Parameter values for the selection-Gaussian initial distribution for the initial log-diffusivity field.

Parameters
Values The unknown initial temperature field r 0 is assigned a stationary Gaussian random field prior model with parameters Θ G r = (µ r , σ r , Σ ρ r ) with expectation and variance levels µ r = 20 and σ 2 r = 2, respectively. The variance level is relatively large as we assume little prior knowledge of the initial temperature field. For both prior models, the spatial correlation (n × n)-matrix Σ ρ · is defined by the second order exponential spatial correlation function ρ · (τ) = exp (−τ 2 /δ 2 ); δ = 0.15, with interdistance τ. Figure 5 contains realizations from the prior model of the log-diffusivity field and their associated spatial histograms. The prior model is specified to be spatially stationary except for boundary effects with bi-modal spatial histograms. The selection set A ⊂ R n for the prior model is chosen to obtain bi-modal marginal distributions with a very dominant mode centered slightly above the value for λ 1 and a very small mode centered slightly below the value for λ 2 . The prior is therefore not centered at the true values. Note that the joint random field [log(λ), r 0 ] will appear as a bi-variate selection-Gaussian random field, see [21]. Realizations from the initial selection-Gaussian distribution of the log diffusivity f (log(λ)) at time T = 0 (upper panels) and associated spatial histogram (lower panels). Lower panels: the horizontal axes represent the log-diffusivity, the vertical axes represent the relative prevalence of each log-diffusivity value for the realization in the panel right above.
The SEnKF operates on the 3n-vector [log(λ), ν, r 0 ], and therefore we generate an initial ensemble with n e = 10,000 ensemble members that are sampled from the Gaussian 3n-vector [log(λ), ν, r 0 ] with pdf, The EnKF operates on the 2n-vector [log(λ), r 0 ], and therefore we generate an initial ensemble with n e = 10,000 ensemble members that are sampled from the selection-Gaussian distribution f (log(λ), r 0 ). The variables log(λ) and r 0 are independent, so we generate them independently: 10,000 samples from the selection-Gaussian n-vector log(λ) with parameters Θ SG λ and 10,000 samples from the Gaussian n-vector r 0 with parameters Θ G r . It is important to understand that both ensemble algorithms are initiated with an ensemble from an identical selection-Gaussian random field prior model for [log(λ), r 0 ] at T = 0, which reflects the bi-modality of the prior model. Due to the size of the ensemble relative to the dimension of the problem, we are using neither localization nor inflation in the algorithms.
To illustrate the differences between the SEnKF and the EnKF, we present the following results for both algorithms: 1.
The root mean square errors (RMSE) of the MMAP prediction of the log-diffusivity field relative to the true log-diffusivity field at time T = 100. Figures 6 and 7 show the marginal posterior pdfs f (log(λ i )|d 0:T ) at the four monitoring locations at time T = 0, 50, 80, 100 for the SEnKF and EnKF algorithms, respectively. Monitoring locations 1 and 2 are placed within the high diffusivity area while the two other locations are placed far into the low diffusivity area. At T = 0, all pdfs are identical, in all locations due to the stationary prior model and for both algorithms due to identical prior models. The SEnKF results appear to preserve bi-modality as observations are assimilated. As more data are made available, the high value mode increases at monitoring locations 1 and 2 that are inside the high diffusivity area. The low value mode remains dominant at monitoring locations within the low diffusivity areas. These results reflect expected behaviors. The traditional EnKF results are significantly different since the bi-modality of the marginal pdfs disappears already at T = 50. The marginal pdfs are Gaussian-like and are gently moved toward high and low values depending on which diffusivity area the monitoring locations are in. This regression toward the mean effect of the EnKF is generally recognized as it gives the best prediction in the squared error sense [27].   At T = 0, the predictions from the two algorithms are identical since they use identical prior models. As observations are assimilated, the SEnKF predictions reproduce the high diffusivity area relatively well, with clear contrast. The traditional EnKF predictions also indicate the diffusivity areas, but with less contrast. Figure 9 displays the MMAP prediction at T = 100 along the section B-B' shown in Figure 2. The high contrast reliable reconstruction of the high diffusivity channel by the SEnKF algorithm is confirmed. The traditional EnKF predictions appear less reliable. The 80% highest density interval (HDI) [28] covers the true diffusivity values for the SEnKF while these values are far outside the interval for the traditional EnKF results. The results are consistent with the observations made regarding the marginal posterior pdfs in Figures 6 and 7. Figures 10 and 11 show realizations and spatial histograms from the posterior distribution of the log-diffusivity at time T = 100 for the SEnKF and traditional EnKF algorithms, respectively. The realizations from the SEnKF largely reproduce the channel with clear contrast while the realizations from the EnKF also reproduce the channel, but with much less contrast. The spatial histograms also underline the difference in contrast in that they are clearly bi-modal for the SEnKS and much more Gaussian-like for the EnKS. Table 2 shows that the RMSE of the MMAP prediction relative to the true diffusivity field for the SEnKF is approximately 30% lower than for the EnKF . This test case clearly illustrates the SEnKF's ability to conserve multimodality in the posterior distribution and it leads to predictions with better constrast and accuracy. We conclude that the reconstruction of the true diffusivity field is done more reliably by the SEnKF algorithm than by the EnKF algorithm.    Lower panels: the horizontal axes represent the log-diffusivity, the vertical axes represent the relative prevalence of each log-diffusivity value for the realization in the panel right above.

Test Case 2: Reconstructing the Initial Field
The focus of the study is to evaluate the unknown initial state of the temperature field r 0 based on the observations d. The posterior model f (r 0 |d) is assessed using the SEnKS, see Appendix A, and the results are compared to those from the traditional EnKS.
The true initial temperature field r 0 is set at 20 • C except for a square shaped region with temperature set at 45 • C, see Figure 12. The temperature field is formally defined as where D 1 ⊂ D is the low temperature area and D 2 ⊂ D is the high temperature area, and τ 1 = 20 • C and τ 2 = 45 • C. Figure 12 shows the true log-diffusivity n-vector log(λ). The diffusivity λ is always positive. To ensure that ensemble updates do not lead to negative diffusivity values, we work on log(λ). The heat contained in the high temperature area will diffuse towards the rest of the field according to the diffusion equation in Equation (20), see Figure 13. The temporal observations are collected at five different observation locations according to the likelihood model in Equation (23), see Figure 12. Figure 14 displays the observations d where it is clear that the observed temperature increases substantially only at the observation locations close to the high temperature area. The measurements are taken every second from T = 0 to T = 50. The unknown initial temperature field r 0 is assigned a stationary selection-Gaussian random field prior model with parameters Θ SG r = (µr, µ ν , σr, Σ ρ r , γ, A). The parameter values are listed in Table 3. The unknown log-diffusivity field log(λ) is assigned a stationary Gaussian random field prior model with parameters Θ G λ = (µ λ , σ λ , Σ ρ λ ) with expectation and variance levels µ λ = −8.5 and σ 2 λ = 2, respectively. For both prior models, the spatial correlation (n × n)-matrix Σ ρ · is defined by the second order exponential spatial correlation function ρ · (τ) = exp (−τ 2 /δ 2 ); δ = 0.15, with interdistance τ. Figure 15 contains four realizations from the prior model of the temperature field and their spatial histograms. The marginal initial distributions of the realizations are bi-modal and spatially stationary except for boundary effects. The selection set A ⊂ R n in the prior model is chosen to obtain a bi-modal marginal distribution with one large mode approximately centered about 20 • C and a smaller mode centered close to 45 • C. Table 3. Parameter values for the selection-Gaussian initial distribution for the initial temperature prior model.

Parameters
Values Figure 12. Initial temperature ( • C) field (left) with data collection points · and monitoring locations × and reference log-diffusivity field (right). The SEnKS operates on the 3n-vector [r 0 , ν, log(λ)], and therefore we generate an initial ensemble with n e = 10,000 ensemble members that are sampled from the Gaussian 3n-vector [r 0 , ν, log(λ)] with pdf, The EnKS operates on the 2n-vector [r 0 , log(λ)] , and therefore we generate an initial ensemble with n e = 10,000 ensemble members that are sampled from selection-Gaussian distribution f (r 0 , log(λ)). The variables r 0 and log(λ) are independent, so we generate them independently: 10,000 samples from the selection-Gaussian n-vector r 0 with parameters Θ SG r and 10,000 samples from the Gaussian n-vector log(λ) with parameters Θ G λ . Due to the size of the ensemble relative to the dimension of the problem, we used neither localization nor inflation in the algorithms.
To illustrate the differences between the SEnKS and the EnKS we present the following results for both algorithms:

2.
The marginal maximum a posteriori (MMAP) prediction of the initial temperature field at time at time T = 0, 20, 30, 50.

3.
Realizations from the posterior distribution f (r 0 |d 0:T ) of the initial temperature field at time T = 50. 4.
The root mean square errors (RMSE) of the MMAP prediction of the initial temperature field relative to the true initial temperature field at time T = 50.  Figures 16 and 17 for the SEnKS and EnKS algorithms, respectively. At T = 0, the prior models for both algorithms are identical and so are the marginal pdfs. Monitoring location 1 is placed inside the high temperature area. As observations are assimilated, the marginal pdf from the SEnKS remain bi-modal, but the high value mode increases steadily. For the other monitoring locations, all placed outside the high temperature area, the bi-modality is reproduced but with a dominant low value mode. The relative size of the modes reflects the distance to the high temperature area and the observation locations. The marginal pdfs from the EnKS lose their bi-modality after a few assimilation steps and from then on the Gaussian-like marginal pdfs are only slightly shifted by the assimilation of observations. Figure 15. Realizations from the selection-Gaussian initial distribution of the initial temperature field f (r 0 ) at time T = 0 (upper panels) and associated spatial histogram (lower panels). Upper panels: the colorbar gives the temperature in • C. Lower panels: the horizontal axes represent the temperature ( • C), the vertical axes represent the relative prevalence of each temperature value for the realization right above.    Figure 19 displays the MMAP prediction of the initial temperature field at T = 50 along the section A-A', see Figure 12, for the SEnKS and the traditional EnKS. The SEnKS clearly identifies the high temperature area and the 80% HDI covers the truth, while the EnKS clearly fails to identify the high temperature area and the 80% HDI does not even cover it.  Realizations of the posterior model at T = 50 based on the SEnKS and the traditional EnKS algorithms are displayed in Figures 20 and 21, respectively. The SEnKS produces realizations that appear bi-modal while the ensemble members from the EnKS display more symmetric spatial histograms. Even though the differences between the realizations are quite subtle, they are consistent with previous results. Table 4 shows that RMSE of the MMAP prediction relative to the true initial temperature field for the SEnKS is approximately 20% lower than for the EnKS.
This test case clearly illustrates the ability of the SEnKS to conserve multimodality in the posterior distribution, and it leads to predictions with better contrast and accuracy. We conclude that the SEnKS algorithm provides a more reliable reconstruction of the initial state of the temperature field than the traditional EnKS algorithm. Note that the posterior model for the unknown diffusivity field f (log(λ)|d) can also be assessed with the two algorithms. When comparing the MMAP predictions relative to the true diffusivity field, see Figure 12, we observe that none of the algorithms provide reliable predictions. We conclude that the small scale variations in the field are not sufficiently distinct to be identified.   Table 4. RMSE comparing the MMAP prediction of the initial temperature field and the initial temperature field at time T = 50.

Discussion
The traditional EnKF and EnKS algorithms provide an ensemble that directly represents the posterior models f (r T+1 |d 0:T ) and f (r 0:T |d 0:T ), respectively. Hence, the posterior models can be assessed by displaying statistics based on these ensembles. Reliable assessment of the posterior model in the two test cases can be obtained with approximately 1000 ensemble members. The SEnKF and SEnKS algorithms under study provide an ensemble of the augmented posterior models, f (r T+1 , ν|d 0:T ) and f (r 0:T , ν 0:T |d 0:T ), respectively. In order to obtain the posterior models of interest, f (r T+1 |d 0:T ) and f (r 0:T |d 0:T ), the conditioning on ν ∈ A must be made by empirical sampling based inference, see [21,22]. This inference requires the estimation of the expectation vector µr ν and covariance matrix Σr ν . The two test cases are defined on a (21 × 21)-grid for bothr t and ν-hence in dimension 882. The expectation and covariance will have 882 and 389,403 unique entries, respectively. Our experience from this study is that approximately 10,000 ensemble members are required to obtain reliable assessment of the posterior models of interest. To reduce the ensemble size, we have tested various localization approaches [29], without notable success, and leave the subject for further research.

Conclusions
Data assimilation of spatio-temporal variables with multimodal spatial histograms is challenging. Traditional ensemble Kalman algorithms enforce a regression towards the mean due to the linearized conditioning on observations, hence the multimodality is averaged out. We introduce the selection ensemble Kalman algorithms, termed SEnKF and SEnKS. These algorithms are based on recursive expressions similar to the ones justifying the traditional ensemble Kalman algorithms, but they are defined in an augmented space including the selection variable. From the two case studies, we conclude that multimodality is much better represented by the selection ensemble Kalman algorithms than by the traditional ones. We obtain RMSE reductions in the range of 20 to 30%.
The traditional ensemble Kalman algorithms provide an ensemble representation of the posterior model of interest hence making assessment of the posterior pdf simple. The selection ensemble Kalman algorithms are defined in an augmented space and conditioning on the selection variable must be made a posteriori. For this conditioning to be reliable, the ensemble size needs to be much larger than for the traditional algorithms. Hence, there is a trade-off between improved reproduction of multimodal characteristics of the phenomenon under study and the computational demands. In our case study, the ensemble size needed to be increased by approximately a factor of ten.
We have not fully explored the possibilities of robust estimation of model parameters in the conditioning of the selection variable. This robustification may reduce the ensemble size requirements. Note that parallelization in forwarding of the ensemble is possible and it will reduce the computer demands. End iterate 21.

Algorithm A2 description:
The SEnKS is a two-step algorithm. The first step is a traditional EnKS that evaluates [r t , ν|d 0:T ]. The second step consists of a sampling step where the target quantity [r t |d 0 , ..., d T ] is evaluated using [r t , ν|d 0:T ] from the first step. 20.