Sensitivity Analysis of the Data Assimilation-Driven Decomposition in Space and Time to Solve PDE-Constrained Optimization Problems

: This paper is presented in the context of sensitivity analysis (SA) of large-scale data assimilation (DA) models. We studied consistency, convergence, stability and roundoff error propagation of the reduced-space optimization technique arising in parallel 4D Variational DA problems. The results are helpful to understand the reliability of DA, to assess what conﬁdence one can have that the simulation results are correct and to determine its conﬁguration in any application. The main contributions of the present work are as follows. By using forward error analysis, we derived the number of conditions of the parallel approach. We found that the parallel approach reduces the number of conditions, revealing that it is more appropriate than the standard approach usually implemented in most operative software. As the background values are used as initial conditions of local PDE models, we analyzed stability with respect to time direction. Finally, we proved consistency of the proposed approach by analyzing local truncation errors of each computational kernel.


Introduction
Data assimilation (DA) has long played a crucial role in the quantification of uncertainties in numerical weather prediction (NWP), oceanography [1,2] and, more generally, in data science. Recently, DA has been applied more widely to numerical simulations beyond geophysical applications [3], medicine and biological science [4] to improve the accuracy and reliability of computational approaches. DA encompasses the entire sequence of operations that, starting from observations/measurements of physical quantities and with additional information, such as mathematical models governing the evolution of these quantities, improve their estimation of a suitable function. In order to understand how such a function is obtained, we suggest that, from a mathematical perspective, DA is an inverse and ill-posed problem [5]. Hence, regularization methods are used to obtain a well-posed problem. A popular approach to obtain a unique solution to DA inverse problems in such circumstances is to formulate them as variational problems, minimizing the sum of two terms, the first of which is a combination of the residual between the observed and predicted outputs (the so-called misfit) in an appropriate norm and the second of which is a regularization term that penalizes unwanted features of the parameters. The inverse problem leads to a nonlinear variational problem in which the forward simulation model is embedded in the residual term. When the forward model takes the form of partial differential equations (PDEs) or some other expensive model, the result is a PDE-based variational problem [6][7][8][9][10]. In this way, DA provides mathematical methods to identify an optimal trade-off between the current estimate of the model state and the observations, accounting for uncertainties. This poses a formidable computational challenge, making DA an example an ill-posed inverse big data problem.
In [11][12][13][14], we proposed a design for an innovative mathematical model and the development and analysis of related numerical algorithms based on the simultaneous introduction of space-time decomposition in the overlapping case of PDE equations governing the physical model and the DA model. The proposed method is a so-called reduced-space optimization technique. In such an approach, the DA model acts as a coarse/predictor operator of local PDE models by providing the background values as initial conditions of the local PDE models. Hereafter, we refer to such an approach as a DD-DA (domain decomposition data assimilation) method.
In this paper, we present a sensitivity analysis (SA) and DD-DA study of consistency, convergence, and stability, as well as a roundoff error analysis. SA refers the contributions of uncertain data to the uncertainty of the solution. The aim of SA is to understand the errors that arise at the different stages of the solution process, namely the uncertainty in the mathematical model, in the model solution and in the measurements. Moreover, approximation errors are introduced by linearization, discretization and the local model. SA is helpful in understanding how all these errors impact the solution of a DD-DA model and in assessing its practical configuration [15]. The main contributions of the present work are as follows.
• By using the forward error analysis (FEA), we derive the number of conditions of DD-DA. We find that DD-DA actually reduces the number of conditions of DA, revealing that it is much more appropriate than the standard approach that is usually implemented in most operative software; • As the background values are used as initial conditions of local PDE models, we prove that small changes in initial values must not cause large changes in the final result. Then, we analyze the stability with respect to the time direction; • We analyze the consistency of DD-DA in terms of local truncation errors; • Overall, the present work complements the study reported in [16], in which the authors performed SA of DD in 3D space in the context of a variational data assimilation problem.
The remainder of this article is structured as follows. Section 2 provides a brief introduction to DA, 4D variational models and DD-DA following the discretize-thenoptimize approach. The main results are presented in Section 3. Validation analysis is addressed in Section 4 with respect to one-dimensional shallow water equations.

4D Variational DA Formulation
Before presenting our results, we briefly review the main concepts of 4D variational DA formulation. If Ω ⊂ R n , n ∈ N is a spatial domain with a Lipschitz boundary, let be a symbolic description of the model of interest, where is the state function of M, s ∈ N is the number of physical variables and f is a known function defined based on the boundary ∂Ω.Let be the observations function, and let denote non-linear observation mapping. To simplify future treatments, we assume s ≡ 1.

Definition 1 (Discretization of Ω × ∆ or Mesh Generation).
Let be the discretization of Ω, if |I| is the cardinality of set I, then are, respectively, the set of indices of nodes in Ω and its cardinality, i.e., the number of inner nodes in Ω. Let are, respectively, the set of indices of the time variable in ∆ and its cardinality, i.e., number of time instants in ∆. Consequently, we refer to as the discrete domain/mesh.
We introduce the 4D variational problem (see Figure 1).

Definition 2.
Let Ω ⊂ R n and ∆ ⊂ R be the spatial domain and the time interval, respectively. The 4DVAR DA problem consists of computing the so-called analysis: in where N p is the number of nodes in Ω ⊂ R n ; n obs , with n obs << N p , is the number of observations in Ω; N is the number of time instants in ∆; α is the regularization parameter; u 0 = {u 0,j } j=1,...,N p ≡ {u 0 (x j )} j=1,...,N p ∈ R N p is the state at time t 0 ; the operator is the discretization of the linear approximation of M t l−1 ,t l from t l−1 to t l ; the operator is the discretization of the linear approximation of M, running from t 0 to t N ; the matrix is the solution of M, i.e., the background; y := {y(z j , t l )} j=1,...,n obs ;l=0,1,...,N−1 ∈ R n obs ×N : are the observations; H l ∈ R n obs ×N p , l = 0, . . . , N − 1 : is the linear approximation of the observation and B= VV T are the covariance matrices of the errors on observations and on the background, respectively.

The Space and Time DA-Driven Domain Decomposition Method
The strength of the DD-DA approach is the exploitation of the coupling between the DA functional and the underlying PDE model. The idea goes back to the work of Schwarz [17] on overlapping domains and to Parallel in Time (PinT) methods, introduced by Lions [18]. Briefly, in DD-DA, DA acts as a predictor for the PDE-based local model, providing the approximations needed to locally solve the initial value problems on each subdomain, concurrently. Leveraging Schwarz and PinT methods' consistency constraints for PDEs-based models, the DD-DA framework iteratively adjusts local solutions by adding the contribution of adjacent subdomains to the local filter, along overlapping regions. As a consequence, this approach increases the accuracy of local solutions and it allows us to apply in parallel both the fine and coarse solvers.
In the following, we briefly resume the method, according to its schematic description reported in Figure 2.
We describe the DD of Ω × ∆ and of Ω I × ∆ k . The DD of Ω × ∆ consists in decomposing Ω ⊂ R n into a set of subdomains Ω i such that: and consequently we define the set J i , with cardinality ad i , made of indices of subdomains adjacent to Ω i ; J i ⊂ {1, . . . , N sub } and ad i = |J i |.
For i = 1, . . . , N sub , we define the overlap regions Ω ij : We define interfaces Γ ij , for i = 1, . . . , N sub : In the same way, time interval ∆ ⊂ R is decomposed into a sequence of intervals ∆ k such that: as local domains. DD of Ω I × ∆ K defined in (7): for i = 1, . . . , N sub , the set is made of inner nodes of Ω i where I i is I is the set of indices of inner nodes in Ω defined in (5) and δ := |I ij | (20) where Selection of inner nodes belonging to the overlap regions {Ω ij } i=1,...,N sub ,j∈J i proceeds in the same way. For i = 1, . . . , N sub are the inner nodes of Ω ij . Consequently, for i = 1, . . . , N sub , we define the cardinality of I i as the number of inner nodes of Ω i and we denote it as Finally, the selection of time values in {∆ k } k=1,...,N t proceeds as follows.
are time values in ∆ k where K k is defined as: where is the cardinality of K k , i.e., the number of time values belonging to K k , such that where K is defined in (6). Consistently with Definition 1, for i = 1, . . . , N sub and k = 1, . . . , N t is the local discrete domain/mesh. For i = 1, . . . , N sub and k = 1, . . . , N t , we pose We define the restriction and extension operators underlying the DD method.

Definition 3 (Restriction Operator)
. Given x ∈ R N p ×N and z ∈ R N p ×1 , for i = 1, . . . , N sub we define restriction of x to Ω i by and restriction of y to Ω i × ∆ k by where I i and I ij are, respectively, set of indices of inner nodes in Ω i and Ω ij , ∀j ∈ J i .

Definition 4 (Extension operator)
. If x ∈ R N loc ×N k , the Extension Operator (EO) is defined by For n = 0, 1, . . . fixed, we now define the local model in each subdomain is the background used as the initial value of the local model (see Figure 3 In (25) u M i,k and b n i,k are, respectively, the background in Ω i × ∆ k and the vector accounting boundary conditions of Ω i and is the restriction to Ω i of the matrix wheres are the first index of ∆ K K and ∆ K 1 , respectively. Let: be the local 4DVAR DA model with We let denote the overlapping operator in Γ ij , and Parameters α i,k and β j in (31) are the regularization parameters. For simplicity, we assume that where where B i = V i V T i and I i is the identity matrix. The solution of (P n i,k ) i=1,...,N sub ,k=1,...,N t is obtained by requiring that ∇J i,k (w n i,k ) = 0. This requirement leads to the linear system: where where ad i are the number of subdomains adjacent to Ω i . For each n, the r.h.s. of (34) depends on unknown value w n j,k defined on those Ω ij , where j ∈ J i , which are adjacent to Ω i . According to the Additive Schwarz Method (ASM) [19] for r = 0, 1, . . . ,r we solve by using Conjugate Gradient (CG) method where at step r + 1, then Ω i receives w r,n i,k from Ω ij , where j ∈ J i , for computing the r.h.s. of (36) and finally it sends w r+1,n i,k to Ω ij , where j ∈ J i , for updating the r.h.s. of (36) is needed for the next iteration. At step 0, w 0,n i j ,k is an arbitrary initial value.
Finally, we pose w n i,k ≡ wr ,n i,k , and consequently we have that The local solution update is performed by using (25) and (33): The global solution in Ω × ∆ is be the solution in Ω × ∆.
ij are the restrictions of the covariance matrix B, respectively, to subdomain Ω i and interface Γ ij defined in (17), while G i,k , and R i,k are the restriction of G k := Gs k and of R k := diag(R 0 , R 1 , . . . , Rs k ) to Ω i and, finally, to Ω i and to Γ ij , for i = 1, 2, . . . , N sub and j ∈ J i . In [11,12], the authors proved that the minimum of J can be obtained by patching together local solutions obtained as the minimum of local functions J i,k . In this way, the global minimum can be searched for among the global minima of the local functional.

Sensitivity Analysis
As already said, the core of the DD-DA approach is that the DA model acts as coarse/predictor operator to solve the local PDE model by providing the background values as initial conditions of the local PDE models. Then, we analyze the propagation of the errors with respect to the time direction. In the following, we use · = · 2 .We refer to z i,k 2 = {z i,k (ī,k)}¯i ∈I i ,k∈K k 2 where z ∈ R N p ×N and z i,k := z/(Ω i × ∆ k ) and I i and K k are defined in (19) and (23), respectively. Lemma 1. [20] Let R > 0 and T ≥ 0 be two positive constant quantities. If we have a sequence E k which is such that for k = 1, . . . , N t : Definition 5. Let u n+1 i,k andũ n+1 i,k be, respectively, the numerical solution at step (n + 1) in (38) and the corresponding floating point representation. Fixed n, for i = 1, . . . , N sub , k = 1, . . . , N t let denote the round-off error in Ω i × ∆ k .
From (25), the numerical solution at step (n + 1) can be written as follows: consequently, the corresponding floating point representation is is the local round-off error. Fixed n + 1, we have that and according to [16] we let µ( Using The upper bound in (43) is made of three terms: the first one represents the propagation of the error introduced on the first time interval, the second one represents the propagation of the error introduced at the previous step and the last term depends on the local round-off error. In particular, we note that, as expected, round-off error propagation grows with N t , i.e., the number of subdomains of ∆.

Convergence, Consistence and Stability of DD-DA Method
In [12], we proved the convergence of the outer loop, i.e.: where u DA,n is defined in (39). The convergence of ASM is proved in [19].

Consistence
We analyze the consistence in terms of the local truncation errors E M i,k i,k , E ASM i,k , E i,k and E g , which are reported in Figure 4.
Similar to [21], local truncation errors E M i,k i,k , E ASM i,k , E i,k in Ω i × ∆ k and E glob in Ω × ∆ are defined as the remainder after the solutions u M of the model (1) and the solution u DA of the 4D-DA problem (8) are substituted into the discrete models. To this aim, we give the following definitions.  Definition 7 (Local truncation errors in Ω i × ∆ k ). ∀i = 1, . . . , N sub and k = 1, . . . , N t , at iterationn, we define as the local truncation error of M i,k restricted to Ω i × ∆ k ; as the local truncation error of ASM restricted to Ω i × ∆ k ; as the local truncation error of DD-DA restricted to Ω i × ∆ k ; as the local truncation error of DD-DA in Ω × ∆.
From (50), the approximation in Ω × ∆ defined in (40) becomes For i = 1, . . . , N sub and k = 1, . . . , N t , we pose Consequently, from (52) we pose as the local model truncation error inΩ i × ∆ k ; as the local ASM truncation error in Ω i × ∆ k in (47); as the local truncation error Ω i × ∆ k in (48). We introduce the definition of consistency of the DD-DA method. We pose · = · 2 . (We refer to z i,k 2 = {z i,k (ī,k)}¯i ∈I i ,k∈K k 2 where z ∈ R N p ×N and z i,k := z/(Ω i × ∆ k ) and I i and K k are defined in (19) and (23), respectively). , in order to discretize the Shallow Water Equations (SWEs) model, we consider Lax-Wendroff scheme [22]. Hence, in that case, p = q = 2.
Lemma 2 (Local truncation error of ASM in Ω i × ∆ k ). Let us consider the following quantities: to Ω i of covariance matrices of the error on background; G i,k , restriction to Ω i × ∆ k of matrix G defined in (13); ad i , number of subdomains adjacent to Ω i , defined in (15); B ij = V ij V T ij , restriction to Ω ij of the covariance matrix of the error on the background; µ(V i ), µ(G i,k ), µ(M i,k ) and µ(V ij ), condition number of V i , G i,k , M i,k and V ij , respectively. Then, ∀i = 1, . . . , N sub and k = 1, . . . , N t , it holds that: Proof. As in [16], it is is the error in Ω i × ∆ 1 . As proved in [16], it is where J i,k and A i,k are, respectively, defined in (30) and (35), and by using triangle inequality, it is From (59) and (61), the (57) follows.

Theorem 1.
(Local truncation error in Ω i × ∆ k ) ∀i = 1, . . . , N sub ; k = 1, . . . , N t , it holds that: Proof. From (52) and (38), E i,k defined in (55) can be rewritten as follows: by using the triangle inequality, it is as consequence of Lemma 2 and (54), we have that where E ASM i,1 is defined in (54) and µ DD−DA i,k is defined in (58). In particular, by adding and , we obtain: and by using the triangle inequality In particular, the equation (67) is true for n =n and m =n − 1, assuming thatn is large enough. Consequently, we can neglect the dependency on the outer loop in E where E i,k is the local truncation error defined in (55) and c is positive constant independent on DD.
Proof. By applying Theorem 1 to E ASM i,1 on Ω i × ∆ 1 , we obtain where F i,1 is defined in (58) and e 0 /Ω i is the restriction of e 0 to Ω i . As it is e 0 /Ω i = 0, then by replacing e 0 /Ω i = 0 in (71), it results that From (73), (56) and (63), we obtain the thesis in (70).

Theorem 2. ( Truncation error in Ω × ∆)
Under the assumption of Lemma 3 in (69), the truncation error in Ω × ∆ is such that where c > 0 is a positive constant independent on DD.
Proof. From (51), it results that E glob , which is defined in (49), can be rewritten as follows: by applying the restriction and extension operator (Definitions 3 and 4) to u DA , we obtain by using the triangle inequality, it is where E i,k is defined in (55). From Lemma 3, we have and consequently By defining ∆x := max i=1,...,N sub (∆x) i ∆t := max k=1,...,N t (∆t) k we obtain Hence, the (74) is proved.

Stability
Now, we prove the stability of the method with respect to the time direction, and assuming that the predictive model is stable. We will perform SA by obtaining worst-case error bounds with the aid of the condition number.
We assume that the discrete scheme applied to the model M in (1) is stable, i.e., ∃D > 0 such that where u M is the computed solution of M in (11) and v M is the solution ofM, whereM is obtained by adding error e 0 to initial condition of M. For simplicity of notations in the sequel, we omit any subscripts of M.
Definition 9 (Propagation error from ∆ k−1 to ∆ k ). Letṽ DD−DA be the solution in Ω × ∆ computed by adding the perturbation e k to the initial condition of P M i,k ,n i,k , defined in (25). We definē as the propagation error from ∆ k−1 to ∆ k .

Theorem 3 (Stability).
If the error on initial condition of M in (11), is equal to zero, i.e., where C k is a constant depending on the model and on the ASM;ē k is the perturbation on the initial condition of P M i,k i,k , defined in (25).

Proof.
To simplify the notations in the proof, we considerē k =ē, ∀k = 1, . . . , N t . From (38), (52) and using the triangle inequality, we obtain From (67) and (54), we can neglect the dependency onn, i.e., From the assumption on the model, we may say that ∃D > 0 such that where e 0 is the error on the initial condition of M in (11). By adding and subtracting From (83) and (57), we obtain where and µ DD−DA k := 1 + 1 From (81), (82) and (84), it is and from the hypothesis in (79), we obtain Consequently, for k = 1, . . . , N t , we find that ∃ C k > 0 such that The thesis is proved.
From Theorem 3, we obtain the stability of DD-DA. Remarkfrom Lemma 2 it follows that the quantity µ DD−DA i,k can be regarded as the condition number of local problems restricted to the space-time directions. Further, in Theorem 3, we study the propagation error along the time direction according to the forward error analysis. As a consequence, we may say that µ DD−DA k can be regarded as the condition number of local problems restricted to the time direction. In [16], the authors apply the SA to the reduced DA functional obtained by applying domain decomposition across space. The results in [16] proved that Tikhonov regularization revealed to be more appropriate than truncation of EOFs to improve the conditioning of the covariance matrix. The results obtained in the present study complement the study in [16].

Validation Analysis
The validation is performed mapping the space-time domain ( Figure 5) on the high performance hybrid computing architecture of the SCoPE (Sistema Cooperativo Per Elaborazioni scientifiche multidiscipliari) data center, which is located in the University of Naples Federico II. Specifically, the architecture is composed of eight nodes that consist of distributed memory DELL M600 blades. The blades are connected by a 10 Gigabit Ethernet technology and each of them is composed of 2 Intel Xeon@2.33 GHz quadcore processors sharing the same local 16 GB RAM memory for a number of 8 cores per blade and of 64 total cores. Experimental results allow us to verify that the experimental order of consistency corresponds to the theoretical one obtained in Theorem 2 and that the local problems are well-conditioned. We consider the following experimental setup . H l ∈ R n obs ×N p : piecewise linear interpolation operator whose coefficients are computed using the nodes of Ω nearest to the observation values; • G ∈ R N×n obs ×N p : obtained as in (13) where u DA denotes the minimum of the 4DVAR (global) functional J in (9) whileũ DD−DA is obtained by gathering all minima of the local 4DVar functionals J i,k , in (30), by considering different values of δ defined in (20). u DA ∈ R N p ×N is computed by running the DD-DA algorithm for N sub = 1, whileũ DD−DA ∈ R N p ×N is computed by gathering local solutions obtained by running the DD-DA algorithm for different values of N sub > 1 and with δ ≥ 0, as shown in Figure 6. Figure 6. Decomposition of the spatial domain Ω ⊂ R in two subdomains {Ω i } i=1,2 by identifying overlap region Ω 12 defined in (16) and interfaces Γ 12 and Γ 21 defined in (17). On the left case δ = 0, i.e., no inner nodes in Ω 12 , on the right case δ = 2, i.e., two inner nodes in overlap region Ω 12 .
In the following, we present the experimental results of the consistency and the stability analysis by considering the initial boundary problem of the Shallow Water Equations (SWEs) in 1D. The discrete model is obtained using the Lax-Wendroff scheme [22] on Ω × ∆ where the orders of convergence in space and time are equal to 2.
As shown in Table 1 and Figure 7, the experimental order of consistency corresponds to the theoretical one obtained in Theorem 2. • Stability. In Table 2 and Figure 8, we report values ofĒ k for different values of the perturbationē k on the initial condition of P M i,k i,k defined in (25). Then, we may estimate C k in (90). In particular, we found that C k ≈ 2.00 × 10 1 ∀k = 1, . . . , N t .
Consequently, the local problems with initial boundary problem of SWEs 1D, are well-conditioned with respect to the time direction. Table 1. We fix N p = 640, the number of inner nodes in Ω, N = 9 the number of time values in ∆, N sub = 4 the number of spatial subdomain and N t = 4 time intervals. We report the values of e p , which is defined in (92), for different values of ∆x and ∆t, the spatial and temporal step sizes of M i,k defined in (26     Table 2.

Conclusions
This work concerns the sensitivity analysis for large scale DA problems that need parallel solutions. We feel that the whole analysis of uncertainties becomes crucial for the emerging approaches integrating DA with deep learning approaches. We derived and discussed the main sources of errors of the parallel DD-DA framework. We prove that the order of consistence depends on the order of local models and introduce the condition number of local problems. As the core of such a parallel approach is that the DA model acts as coarse/predictor operator solving the local PDE model, we analyze error propagation with respect to time direction. Validation analysis confirms that the experimental order of consistency corresponds to the theoretical one. Finally, we note that the SA results in [16] proved that Tikhonov regularization revealed to be more appropriate than truncation of EOFs to improve the conditioning of the covariance matrix. Our results here complement the study in [16] and we may conclude that the same findings hold true for DD-DA, too.