A parallel space-time domain decomposition method for unsteady source inversion problems

In this paper, we propose a parallel space-time domain decomposition method for solving an unsteady source identification problem governed by the linear convection-diffusion equation. Traditional approaches require to solve repeatedly a forward parabolic system, an adjoint system and a system with respect to the unknowns. The three systems have to be solved one after another. These sequential steps are not desirable for large scale parallel computing. A space-time restrictive additive Schwarz method is proposed for a fully implicit space-time coupled discretization scheme to recover the time-dependent pollutant source intensity functions. We show with numerical experiments that the scheme works well with noise in the observation data. More importantly it is demonstrated that the parallel space-time Schwarz preconditioner is scalable on a supercomputer with over $10^3$ processors, thus promising for large scale applications.


Introduction
Pollutant source inversion problems have wide applications in, for example, the detection and monitoring of indoor and outdoor air pollution, underground water pollution, etc. In the last several decades, physical, chemical and biological technologies have been developed to identify different types of sources [3,45,46]. In this paper, assuming the pollutant concentration data is measured by distributed sensors, we reconstruct the source intensities numerically using noise-contaminated data. Like all inverse problems, such a reconstruction problem is ill-posed in the sense of Hadamard [14,34,41]. The lack of stability with respect to the measurement data is a major issue, which means that small noise in the data may lead to significant changes in the reconstructed source strength. This problem has attracted much attention, and various methods have been developed, including both deterministic and statistical methods [27,37]. Among the deterministic methods, quasiexplicit reconstruction formulas are available for one-dimensional source location recovery problems [19,20]; and quasi-reversibility methods can be used to retrace the pollutant history as in [36]; optimization based methods are also widely used [2,22,23,35,40,42]. By reformulating an inverse problem into an output least-squares PDE-constrained optimization problem complemented with Tikhonov regularization, classical optimization methods such as regression methods [17], linear and nonlinear programming methods [17], linear and nonlinear conjugate gradient methods [1,40], Newton type methods, etc. can be used to obtain the approximate solutions. These methods can be categorized as sequential quadratic programming (SQP) methods. Reduced space SQP methods decouple the system and iteratively update the state variable, the adjoint variable and the optimization variables by solving each subsystem in a sequential order. In some sense this is a block Gauss-Seidel iteration with three large blocks. Such methods require less memory due to the reduced subproblem size but the number of outer iterations for a specified accuracy grows quickly with the increase of the optimization variables, thus they are not ideal for supercomputers with a large number of processors. We introduce in this paper a full space approach that does not have the three large sequential steps as in the reduced space approaches. Similar approaches have been applied to flow control problems in [31]. The full space method solves the state variable, adjoint variable and the optimization variables simultaneously, thus avoids repeatedly solving the subsystems. However the fully coupled system is several times larger in size and more ill-conditioned, direct methods such as Gaussian elimination or LU factorization as well as the classical iterative methods such as the Jacobi method, the Gauss-Seidel method are not suitable. To ease the difficulty of solving the large system, a preconditioned Krylov subspace technique is considered to reduce the condition number and the computing time significantly [9,44].
The inverse problem of recovering the pollutant source intensity functions can be reformulated into a PDE-constrained optimization problem. In this paper, we derive its continuous Karush-Kuhn-Tucker (KKT) system [24], including the state equation, the adjoint equation and other derivative equations with respect to each unknown source intensity.
Two main challenges of the problem lie in that firstly the adjoint equation needs the final state of the pollutant source distribution, which implies that the state equation and the adjoint equation should be solved in a sequential order; secondly the time marching of the unsteady problem is directional and sequential, thus difficult to break down into parallel steps. For unsteady PDE-constrained optimization problems, a steady state optimization subproblem is solved at each time step [44]. And in [18], a block time-marching method is used to reduce the number of sequential steps and increase the degree of parallelism.
In this paper, we propose a fully coupled space-time domain decomposition method that couples the time with the space domain and decomposes the "space-time" domain into sub-domains, then apply an additive Schwarz preconditioned Krylov subspace technique to solve the "space-time" problem. Our algorithm is fully parallel in space and time, avoids the sequential time marching steps, and does not need to solve optimization subproblems. As far as we know, no published work has achieved such a degree of parallelism for time-dependent inverse problems.
The rest of this paper is arranged as follows. The mathematical model and its corresponding optimization functional, and the derivation of the KKT system are formulated in Section 2. The discretization of the KKT system is given in Section 3. The parallel algorithm for solving the KKT system is proposed in Section 4. Some numerical experiments are shown in Section 5 and concluding remarks are given in Section 6.

Model formulation
We consider a flow domain Ω ∈ R 2 in which several point pollutant sources are present.
The distribution of the pollutant concentration is denoted by C(x, t) at location x and time t. The transport process is modeled by the following convection-diffusion equation [3,32]: where f i (t) is the temporal intensity of the i th source at location x * i , i = 1, · · · , s, s is the number of sources, a(x) and v(x) are the diffusive and convective coefficient. δ(·) is the Dirac delta distribution [5]. The model is complemented by the following boundary and the initial condition where Γ 1 and Γ 2 cover the physical boundary ∂Ω = Γ 1 Γ 2 , p(x, t) and q(x, t) are given functions for Dirichlet and Neumann boundary condition respectively. If the source locations x * i (i = 1, · · · , s) and the corresponding time-dependent intensities f i (t) (i = 1, · · · , s) in (1) are all known, then the distribution of the pollutant concentration C(x, t) can be obtained by solving the convection-diffusion equation (1)-(3). This is usually called a forward or direct problem. In this paper, we are concerned about the inverse problem, that is, using the noise-contaminated data C ε (x, t) (ε is the noise level) of the concentration C(x, t) in Ω at terminal time T to recover the source intensity functions f i (t) (i = 1, · · · , s). In practice, the data C ε (x, t) is measured by a sensor network placed at some discrete points inside the domain Ω [26,30]. A discussion of the sensor network can be found in [26], but we shall assume that the measurement data is available here at a set of uniformly distributed sensors inside Ω.
The Tikhonov optimization algorithm is popular for time-dependent parameter identification problems [22,23]. The main ingredient of the algorithm includes reformulating the reconstruction process as the minimization of the following functional: where f = (f 1 , f 2 , · · · , f s ) T , and N β (f ) denotes some Tikhonov regularization. Possible choices for the regularizations include L 2 , H 1 and BV regularizations. Here we consider a combination of the L 2 and H 1 regularizations in the following form where β i 1 , β i 2 , i = 1, · · · , s, are the L 2 or H 1 regularization parameters for the source intensity f 1 (t), · · · , f s (t) respectively. The minimization of the functional (4) is subject to the constraints that C(x, t) satisfies the state equation (1) with the boundary conditions (2) and the initial condition (3). This has transformed the original inverse source problem into a PDE-constrained optimization problem. Two kinds of approaches for the optimization problem (4) are available, the discretize-then-optimize approach and the optimize-thendiscretize approach. The solutions from both approaches are credible, although they are not necessarily the same [31]. We shall use the optimize-then-discretize approach in this work.
Taking the variations of (6) with respect to G, C and f i , i = 1, · · · , s, a system of partial differential equations is derived to characterize the first-order optimality conditions for this optimization problem (6). They are the so-called Karush-Kuhn-Tucker (KKT) optimality conditions [24]. It has been verified that the minimization problem (4) is equivalent to solving the KKT system [24] of the Lagrangian functional J (C, f , G) in [11]. The three sets of equations in the KKT system are obtained as follows: (a) The Gâteaux derivative of J with respect to G at direction v is given by The Gâteaux derivative of J in (6) with respect to C at direction w ∈ W 1,q (Ω) is given by For convenience, we writẽ and obtain by integrating by part for the second term of (7) that Then applying the boundary and initial conditions of w, i.e. w = 0 on Γ 1 and Now noting the arbitrariness of w, we can deduce the adjoint system for the Lagrange multiplier G, namely G(x, T ) = 0 for x ∈ Ω, G(x, t) = 0 on Γ 1 and G(x, t) satisfies − (G t , w) + (a∇G, ∇w) + (∇ · (vw), G) = −(δ(t − T )(C(·, t) − C ε (·, t)), w) (8) for all w ∈ W 1,q (Ω) such that w = 0 on Γ 1 , where δ(t − T ) is the Dirac delta distribution at t = T .
(c) The Gâteaux derivative of J in (6) with respect to f i at direction g ∈ H 1 (0, T ) is given by Putting (a)-(c) together, the KKT system is formulated as follows: that is, for any v ∈ W 1,p (Ω) and w ∈ W 1,q (Ω), we have the following coupled system: with C(x, 0) = C 0 (x), G(x, T ) = 0. The rest of the paper is devoted to solving (11) as a coupled space-time system. It is noted that the first equation in (11)

Finite element discretization
Let T h be a triangulation of Ω with triangular elements, then we define V h as the finite element space [12] consisting of continuous piecewise linear functions on T h , andV h the subspace of V h with functions vanishing on the Dirichlet boundary Γ 1 . To fully discretize the system (11), we partition the time interval [0, T ] as 0 = t 0 < t 1 < · · · < t M = T, with t n = nτ, τ = T /M . Define U τ as a piecewise linear continuous finite element space in time. For a given sequence {H n (x) = H(x, t n )}, we define the difference quotient and the averaging function respectively by Let π h be the finite element interpolation associated with the space V h , and C n h (x) be the finite element approximation of C(x, t n ), then we discretize the state and adjoint equations of the system (11) by the Crank-Nicolson scheme in time and piecewise linear finite elements in space, and lastly we use piecewise linear finite element in time to discretize the equations with respect to f . The finite element approximation of the KKT system (11) can be formulated as follows: where χ n = 1 2 when n = M and 0 when 1 ≤ n < M . We denote the basis functions of finite element spaces V h and U τ by φ i , i = 1, · · · , N and g n , n = 0, · · · , M , respectively, and introduce the following matrices: and the vectors The matrix form of the KKT system is then reformulated by using the above notations as the following: We can follow the approaches in [21,22,23,42] to obtain the convergence of the discretized problem (13) to the continuous optimization problem (6).
4 A space-time domain decomposition method for the KKT system 4.1 Fully coupled KKT system with special ordering of unknowns The ordering of the unknowns for the discretized KKT system (13) has significant influence in the convergence and computing efficiency of the iterative solver. Traditional reduced space SQP methods split the system into three subsystems and solve each subsystem for C, G, and f one by one in sequential steps [13], in this case the unknowns are ordered physical variable by physical variable. To develop a scalable and fully coupled method for solving the KKT system, we use the so-called fully coupled ordering, the unknowns C and G are ordered mesh point by mesh point and time step by time step. At each mesh point x j , j = 1, · · · , N , and time step t n , n = 0, · · · , M , the unknowns are ordered in the order of C n j , G n j , j = 1, · · · , N, n = 0, · · · , M . Such ordering contains unknowns of the same space-time subdomain in a subblock, preconditioners such as additive Schwarz can be applied naturally to each subblock of the fully coupled KKT system and the ordering also improves the cache performance of the LU factorization based solvers. Since f is defined only in the time dimension, we put all the unknowns of f at the end after C and G. More precisely, we define the solution vector U by then the linear system (13) with unknowns C n j and G n j , j = 1, · · · , N , n = 0, · · · , M , and f n k , k = 1, · · · , s, n = 0, · · · , M , is reformulated into the following linear system: where F is a sparse matrix of size (M + 1)(2N + s) × (M + 1)(2N + s) derived from the finite element discretization for KKT system (14) with the following block structure: and b has the following form correspondingly: In the matrix F, the block matrices S ij , with 0 ≤ i, j ≤ M are of size 2N × 2N and are zero matrices except the ones in tridiagonal stripes

Space-time Schwarz preconditioners
The KKT system (15) is usually large in size and severely ill-conditioned. In traditional reduced space SQP methods, the subsystems corresponding to unknowns C and G are time-dependent, time-marching algorithms starting from the initial or terminal moment are applied. But we notice that the adjoint equation needs the concentration distribution of C at the terminal time t = T , which means that the state equation and the adjoint equation should be solved in a sequential order. In addition, sequential steps within reduced space SQP methods exist between both the KKT subsystems and the time marching for time-dependent inverse problems, thus are quite challenging for efficient parallelization.
To overcome the lack of parallelism in SQP methods, we shall propose to solve the fully coupled system (15) all at once. This is a very large system, the all-at-once method is traditionally regarded as a very expensive approach and not suitable for small computers.
But on high-performance computers, especially on the upcoming exascale computers, we believe this approach is more attractive than the reduced space methods. It is well known that a direct solver such as Gaussian elimination or LU factorization is not suitable for very large problems due to the lack of parallel scalability. We shall use a preconditioned Krylov subspace method, where some preconditioning technique will be introduced for reducing the condition number of the KKT system and accelerating the convergence rate of the Krylov subspace method. Various preconditioners have been developed and applied for various elliptic and parabolic systems, such as the (block) Jacobi method, (incomplete) LU factorization, (multiplicative) additive Schwarz method, multigrid method, multilevel method, etc. [7,8,9]. Among these preconditioners the Schwarz type domain decomposition method is shown to have excellent preconditioning effect and parallel scalability [9,31].
We shall propose a "space-time" Schwarz type preconditioner for the unsteady inverse Lions et al. in [25]. The parareal algorithm is an iterative method which involves a coarse (coarse grid in the time dimension) solver for prediction and a fine (fine grid in the time dimension) solver for correction. An insight on the stability and convergence of the parareal algorithm was given in [16,38]. Parareal algorithm has been applied to solve problems in molecular dynamics [4], fluid and structural mechanics [15], quantum control [28] etc. However in the implementation of the parareal algorithm, the scalability is determined largely by the coarse time step and the space discretization scheme. In [29], the parareal algorithm was combined with domain decomposition in space to achieve higher degree of parallelization. Different from the parareal algorithm, the new "space-time" Schwarz type preconditioner treats the time variable and the space variables equally, so the physical domain is a "space-time" domain, instead of the conventional space domain.
We apply a domain decomposition technique to the coupled "space-time" domain.
In each "space-time" subdomain, a time-dependent subproblem with vanishing space boundary conditions and vanishing data at "artificial" initial and terminal time is solved.
The same as the global problem, no time-marching is performed in each subproblem, all unknowns associated to the same space-time subdomain are solved simultaneously.  The proposed "space-time" Schwarz preconditioner eliminates all sequential steps and all unknowns are treated at the same level of priority. We use a right-preconditioned restarted GMRES to solve the system (15): into subintervals T j = [t j−1 , t j ], j = 1, 2, · · · , N 2 , and 0 = t 0 < t 1 < · · · < t N 2 = T . We remark that the time partition here is coarser than that used in the full finite element discretization described in Section 3, and each interval T j contains a few consequent time intervals [t k , t k+1 ]. Θ consists of Θ ij = Ω i × T j , i = 1, · · · , N 1 , j = 1, · · · , N 2 . In order to obtain an overlapping decomposition of Θ, we extend each subdomain Ω i to a larger region Ω ′ i and each subinterval T j to a longer interval T ′ j , satisfying The sizes of Θ ′ ij are chosen so that the overlap is as uniform as possible around the perimeter of interior domains Θ ′ ij ⊂ Θ. For boundary subdomains we neglect the part outside of Θ. See Figure 1 for an illustration of the space-time domain decomposition.
We denote the size of the KKT matrix F byÑ ×Ñ , clearly there are two degrees of freedom at each mesh point corresponding to the state variable C and the adjoint variable G. The unknown time-dependent source intensity variables are allocated on the same processor as the last space-time subdomain Θ ′ N 1 ,N 2 . On each extended subdomain Θ ′ ij , we define theÑ ij ×Ñ matrix R δ ij , its 2 × 2 block element (R δ ij ) l 1 ,l 2 is either an identity block if the integer indices l 1 and l 2 are related to the same mesh point and time step and they belong to Θ ′ ij or a zero block otherwise. The multiplication of R δ ij with anÑ × 1 vector generates a shorter vector by keeping all components corresponding to the subdomain Θ ′ ij . R 0 ij is defined similarly as R δ ij , with the difference that its application to aÑ × 1 vector excludes the mesh points in Θ ′ ij \Θ ij . Now for each space-time subdomain we have defined the following local problem: It is complemented by the following boundary conditions along with the "initial" and "terminal" time boundary conditions C(x, t j−1 ) = 0; G(x, t j−1 ) = 0 C(x, t j ) = 0; G(x, t j ) = 0.
For the last subdomain, we include the additional variables corresponding to the source intensities f i , i = 1, · · · , s satisfying with the Neumann condition We remark that (16) is a parabolic system and it is usually "illegal" to impose both initial and terminal conditions (18)- (19). However, as inexact local solvers on space-time subdomains that form the global preconditioner, such local boundary conditions work well as we shall see from our numerical experiments. Similar boundary conditions are used in the context of hyperbolic subdomain problems [43].
Let M ij be a discretization of (16)- (19) and M −1 ij be an exact or approximate inverse of M ij . The space-time additive Schwarz preconditioner is defined as It is noted that the last space-time subdomain solver M −1 N 1 ,N 2 is an inverse or an approximate inverse of the matrix arising from the discretization of the subproblem (16) (20)- (21). Although its construction is slightly different from that of the other subdomain inverse matrices, we still use the same notation.
In addition to the standard additive Schwarz method (ASM) described above, the restricted version (RAS) of the method developed in [10] for standard space domain decompositions is also widely used. So we extend it to our current space-time domain decomposition, then the space-time RAS preconditioner is defined as For some applications, RAS achieves better preconditioning effect with less communication time since one of the restriction or extension operations does not involve any overlap. We use the restricted version in our experiments to be presented in the next section.
We remark that, computationally, the matrix M ij can be obtained as R δ ij F(R δ ij ) T . Moreover, if N 2 = 1, then no time partition is performed in the time dimension, M −1 ras is a "space-only" domain decomposition preconditioner for the fully coupled KKT system.

Numerical examples
We present in this section some numerical examples of recovering the intensity functions f i (t) (i = 1, · · · , s) at given source locations x * 1 , · · · , x * s . We set the test domain to be Ω = (−2, 2) × (−2, 2), and the terminal time at T = 1. We denote the time step by n t and the number of mesh points in x and y directions by n x and n y , respectively. Homogeneous Dirichlet and Neumann boundary conditions are imposed on Γ 1 = {x = (x 1 , x 2 ); |x 1 | = 2} and Γ 2 = {x = (x 1 , x 2 ); |x 2 | = 2}, respectively. The diffusive coefficient a(x) and the convective coefficient v(x) are chosen to be 1.0 and (1.0, 1.0) T , respectively.
The preconditioned KKT coupled system will be solved by the restarted GMRES method with a restart number 50 [33]. For clarity, we provide the restarted GMRES algorithm below for a general linear system Ax = b: In the algorithm above, e 1 is the first column of the identity matrix, V m stands for the n×m matrix with columns v 1 , · · · , v m , andH m denotes the (m + 1) × m Hessenberg matrix [33].
The computational cost is overwhelming and becomes more unstable numerically when m Based on the Gaussian elimination, each location (i, j) of the sparse matrix A has a level of fill, denoted by lev ij , which should indicate that the higher the level is, the smaller the element is [33]: 1. The initial value of fill of an element a ij of A is defined by Each time in the process of Gaussian elimination, the element a ij is updated in the loop of k by a ij = a ij − a ik a kj , the corresponding level of fill is updated by lev ij = min{lev ij , lev ik + lev kj + 1}.
In ILU with fill-in level p, an element whose level of fill lev ij does not exceed p will be kept. So the larger the level of fill p is, the more elements are kept in the factorization.
For the scalability test we use LU factorization as the subdomain solver and incomplete LU factorization with ilulevel = 3 for the other tests if not specified. The relative residual convergence tolerance of GMRES is set to be 10 −5 . The algorithm is implemented based on the package Portable, Extensible Toolkit for Scientific computation (PETSc) [6] and run on a Dawning TC3600 blade server system at the National Supercomputing Center in Shenzhen, China with a 1.271 PFlops/s peak performance.
We use a high resolution numerical solution of the concentration at the terminal time t = T as the noise-free observation data. In other words, we first solve the forward convection-diffusion system (1)-(3) on a very fine mesh, 640×640, with a small time stepsize τ = 1/160, then add a random noise of the following form to the terminal concentration where r is a random function with uniform distribution in [−1, 1], and ε is the noise level.
In our numerical experiments, ε is set to 1% if not specified.

Reconstruction results and parallel efficiency tests
The tests are designed to investigate the recovery effect of the pollutant source intensity functions and to understand how the solution of the KKT system behaves when using different mesh sizes, time steps, regularization parameters and number of processors, which is denoted by np. Supposing the source locations are known, we consider the following four examples.
Example 1. This is an example of recovering a quadratic polynomial source intensity function. An H 1 regularization is applied and the parameter is chosen to be β 2 = 10 −4 (β 1 = 0). Figure 2 shows the reconstructed result with mesh n x = 80, n y = 80 and time step n t = 320, when 64 processors are used. The blue dotted line represents the reconstructed source intensity which is quite close to the red true shape. This shows that the time-dependent intensity is successfully recovered by the algorithm.
We present the strong scalability results in Table 1. Sparse LU factorization is applied as the subdomain solver. The spatial mesh is 160 × 160 and the number of time steps is 160. The total degrees of freedom is 8, 192, 160. As the number of processors increases, the computing time decreases significantly and superlinear speedup is obtained, for np ≤ 1024, in Figure 3. Since the number of processors is the same as the number of subdomains, more processors lead to an increasing number of iterations. This suggests that the condition number of the preconditioned KKT matrix depends on the number of subdomains. Similar dependency was proved for elliptic problems [39].
We fix the number of processors to np = 128, the mesh to n x = 80, n y = 80 and the time step n t = 320, then test several choices of regularization parameters. ILU factorization is used as the subdomain solver with the fill-in level being ilulevel = 3. From the results in Table 2, as β 2 becomes smaller, the number of GMRES iterations increases, and no significant change is observed for the total computing time.     Table 3: Scalability test for Example 2: n t = 160, n x = 160, n y = 160, DOF = 8, 192, 160.
Example 2. This is an example of recovering a polynomial source intensity function of degree 4. We set β 1 = 0 and use an H 1 regularization with β 2 = 10 −4 . Satisfactory result is shown in Figure 4 with mesh n x = 80, n y = 80 and time step n t = 160, when 64 processors are used for the computation.
Using the same parameter settings as in Example 1, we perform the strong scalability test and the results are given in Table 3 and Figure 5. Superlinear speedup is obtained when np ≤ 1024. Next we test three sets of mesh and time step size in Table 4. The H 1 regularization parameter is set to be β 2 = 10 −6 , and 64 processors are used. The overlap iovlp = 4 and the fill-in level of ILU ilulevel = 3. We observe from Table 4 Figure 6, but it is observed from Table 5, under the same settings, that the "space-only" Schwarz preconditioner costs more iterations and computing time compared to the space-time Schwarz preconditioner. Thus the Schwarz preconditioner with a partition in the time is more efficient than the "space-only" domain decomposition preconditioner. In the end of this example, we perform the noise level test and the results are given in Figure 7. The results agree with our expectation that the reconstruction accuracy deteriorates with the increasing level of noise in the measurement data.       Example 3. In this example, we test the recovery of two source intensities. Compared with the single source intensity cases, more regularization parameters are needed. Here we test the following two sets of regularization parameters with np = 64: (1) β 1 1 = 0, β 2 1 = 10 −4 for the source intensity function f 1 and β 1 2 = 10 −6 , β 2 2 = 10 −5 for f 2 . The mesh is n x = 80, n y = 80 and the time step is n t = 80; (2) β 1 1 = 0, β 2 1 = 10 −6 for f 1 and β 1 2 = 0, β 2 2 = 10 −6 for f 2 . The mesh and the time step are set to be n x = 64, n y = 64 and n t = 256.
The numerical results are shown in Figure 8. The computed f 1 matches with its original data perfectly, but the computed f 2 is less accurate. As we see in the tests for Example 1 and Example 2, f 2 is physically harder to recover than the simpler function f 1 . Overall the reconstruction effect for both source intensities are reasonable.
Next we show the strong scalability results in Figure 9 and Table 6. We still observe a superlinear speedup, although it is a bit worse than that of Examples 1 and 2. It implies that it is more difficult to separate and identify multiple source intensities than the single source case. And from our previous experiments with reduced space SQP methods, the recovery of multiple sources is much more difficult to converge than that of the single source case.
Lastly, we test the algorithm with parameters such as the fill-in level of ILU factorization and the size of overlap in Table 7 and Table 8, respectively.
It is observed that the number of iterations decreases with the increase of the overlapping size or the fill-in level, however, it costs more communication time when we increase the overlap between "space-time" subdomains, and more computing time is used in the preconditioning stage when we raise the fill-in level of the ILU factorization. For this test, we take the spatial mesh 160 × 160 and the number of time steps 160.
iovlp Its Time(sec)  Table 8: Overlap test for Example 3: β 2 1 = 10 −6 , β 2 2 = 10 −6 , n t = 400, n x = 80, n y = 80, DOF = 5, 120, 400, np = 128.     scalability and compute time in Figure 11 and Table 9. LU factorization is used as the subdomain solver. It is observed that the speedup for three point sources is almost linear, still satisfactory but a bit worse than Examples 1,2 and 3. As a conclusion the speedup deteriorates slowly with the number of unknown point sources.

Comparisons with two reduced space SQP methods
A reduced space method for reconstruction of the location and intensity of a single point pollutant source was developed in [13]. With the source location known in our current case, the process of reconstructing the source intensity described in [13] can be stated as follows: np n t n x × n y Solver Time(sec)  number of processors increases with the refinement of the meshes. The subdomain solvers for all three kinds of methods are ILU. We firstly compute the result by the FS method with zero initial guess and record the error accuracy e = f − f * . Then we use the same initial guess and the error bound e for the reduced space methods RS(1) and RS (2), and set the stopping criterium as f k − f * < e. In this way we can compare the computing time for all these methods.
As shown in Table 10, the computing time of the FS method is much less than the ones of RS(1) and RS (2). For the two reduced space methods, RS(2) using the space-time domain decomposition solver is faster than RS(1) keeping the time marching process and using a space domain decomposition solver. We can see that the space-time fully coupled preconditioner is much better for parallellization, and the all-at-once method for the fully coupled KKT system is always more efficient than the reduced space iterative optimization method on parallel systems.

Concluding remarks
We developed a new space-time domain decomposition method for unsteady source inversion problems. The main ingredient of our algorithm includes solving the fully coupled KKT system by GMRES iteration with a space-time additive Schwarz preconditioner.
Although the size of the linear system is significantly increased compared to the reduced space SQP methods, the one-shot method avoids the sequential step between the state equation and the adjoint equation, as well as the time-marching process in the time dimension, and thus achieves higher degree of parallelism. This is well confirmed by the numerical results shown in the last section. Another advantage of the new method is that the recovery of multiple sources is obtained using the same algorithmic and software framework as the single source case, and the framework is easily extended to recover other kinds of source intensities.
We have observed from the numerical examples that the new space-time additive Schwarz method is quite robust also with respect to the noise in the observation data.
It is important to note that the new space-time method is highly parallel and scalable, and extensible naturally to three-dimensional problems.