ACCELERATING THE BAYESIAN INFERENCE OF INVERSE PROBLEMS BY USING DATA-DRIVEN COMPRESSIVE SENSING METHOD BASED ON PROPER ORTHOGONAL DECOMPOSITION

. In Bayesian inverse problems, using the Markov Chain Monte Carlo method to sample from the posterior space of unknown parameters is a formidable challenge due to the requirement of evaluating the forward model a large number of times. For the purpose of accelerating the inference of the Bayesian inverse problems, in this work, we present a proper orthogonal decomposition (POD) based data-driven compressive sensing (DCS) method and construct a low dimensional approximation to the stochastic surrogate model on the prior support. Speciﬁcally, we ﬁrst use POD to generate a reduced order model. Then we construct a compressed polynomial approximation by using a stochastic collocation method based on the generalized polynomial chaos ex- pansion and solving an l 1 -minimization problem. Rigorous error analysis and coeﬃcient estimation was provided. Numerical experiments on stochastic ellip- tic inverse problem were performed to verify the eﬀectiveness of our POD-DCS method.

1. Introduction. The inverse problems refer to exploring the inherent nature of things based on observable phenomena. There are many parameters of interest in scientific and engineering problems that cannot be directly observed. We often discuss these quantities indirectly through known data, that is, to formulate such problems in the form of inverse problems [14,7]. It can be seen from this that inverse problems are the products of the rapid development of science and engineering, which have been widely used in various fields such as geological engineering [25,10], medicine [4], environment [31], telemetry [28], control [2] and so on.
Due to the fact that parameters are sensitive to observation data, which is usually impure, this leads to the inverse problems are ill-posed, which can be addressed with regularization [9,27]. However, regularization methods only provide point estimates of the unknown parameters and without quantifying the uncertainties of the solution. Then, the statistical inference method [14,26] entered the researchers' vision. In Bayesian inverse problems, the unknown parameters are treated as random vectors with known prior distribution, and we need to infer their conditional distribution under observation data, namely the posterior distribution. In general, it is difficult to obtain the analytical form of the posterior distribution, so we often consider using the Markov Chain Monte Carlo (MCMC) method [3] to explore the posterior space. In these sampling algorithms, the forward model solution needs to be calculated for each candidate sample. We know that evaluating forward models is expensive in many practical problems, so direct sampling algorithms are prohibitive. To overcome this difficulty, many scholars have proposed a series of surrogate models and effective sampling algorithms to reduce the computational cost of Bayesian inverse problems. The former achieves the goal by reducing the cost of a single forward model evaluation. For example, the use of generalized polynomial chaos (gPC) basis functions to express the solutions of forward problems is proposed in [21,22], the projection-based model reduction techniques can be seen in [19,15], and adaptive methods to construct surrogate models are presented in [32,17]. The latter achieves this purpose by reducing the number of samples required, which has been studied in [23,20].
Here, we focus on the construction of effective surrogate models. Since unknown parameters are treated as random vector ξ(ω), the forward problems can be regarded as the stochastic problems on prior support Γ. Although the stochastic problem is more complicated than the original one, we can calculate the approximate posterior probability corresponding to every candidate sample at a negligible cost once its solution expression obtained. In the physical domain D and image probability space of unknown parameters, we can use the deterministic and stochastic basis functions to represent the solutions of stochastic problems [13,8]. For instance, with the stochastic finite element methods, an approximation of solution u(x, ξ) : D ×Γ → R has the form:ǔ where the coefficientč ij is with respect to (w.r.t.) a finite element basis ϕ j and a multivariate polynomial chaos (PC) basis ψ i . In the current work, we propose the data-driven compressive sensing method based on proper orthogonal decomposition for constructing the accurate approximate solutions of stochastic forward problems to accelerate the calculation of Bayesian inverse problems. The POD-DCS method is derived from the data-driven compressive sensing method proposed in [18]. But here, we use proper orthogonal decomposition (POD) basis functions instead of Karhunen-Loève basis, so as to generate a reduced order model (ROM) and avoid the recovery of covariance. According to the idea of compressive sensing (CS), an accurate approximate solution of the ROM can be obtained by calculating a basis pursuit (BP) problem [18,30] with a small amount of data. The advantage of our method is that it constructs the reduced order model using snapshots first, which causes the degree of freedom (DoF) of the forward problem to be reduced from the cardinality of the finite element to the that of POD, and then we only need to reconstruct the low-dimensional ROM at a lower cost. From the numerical results we can see that when using the same number of fully discrete finite element solutions, our scheme can improve accuracy and sparsity compared to the conventional CS method based on finite element basis. Moreover, the cost of evaluating a forward model using our method is only a fraction of that with the finite element method (FEM), so it can speed up the calculation of Bayesian inverse problems effectively. As we all know, many practical problems can be described by partial differential equations (PDE). Here, we concentrate only on elliptic PDE, which is widely used in the studies of oil reservoirs and groundwater [25], and its ill-posedness have been discussed in [16]. Of course, our method can be naturally extended to other partial differential equations. All the computations were performed using MATLAB R2014b on a personal computer with 1.60GHz CPU and 4GB RAM.
The rest of this paper is organized as follows. In Section 2, we describe the model problem used as the background of our study, then introduce the framework of Bayesian inference and the stochastic surrogate model. In Section 3, we discuss how to construct the POD-DCS approximate solution for a stochastic forward problem, and provide some direct simulation results and the corresponding algorithm. The error and sparsity analyses of our scheme are conducted in Section 4. In Section 5, we compare our POD-DCS scheme with other methods, and use it to solve elliptic inverse problem. Finally, we give some conclusions and indicate possible future work in Section 6.
2. Problem setup and Bayesian framework. Consider an underground steady state aquifers modelled by two-dimensional elliptic partial differential equation with Dirichlet boundary as where domain D = [0, 1] 2 . Let u(x) be the hydraulic head, a(x) be the permeability field, and apply a constant source term in the whole domain and satisfy f ≡ 2. The goal of this elliptic inverse problem is to estimate the unknown permeability field a(x) from a set of observations z = [u(x 1 , ξ), . . . , u(x nz , ξ)] T ∈ R nz of the solution u at points {x s } nz s=1 . In this work, we consider the permeability field defined by n p = 4 radial basis functions with centers x 1 , · · · , x np as: where · 2 is Euclidean norm. These radial basis functions are shown in Figure 1, and what we need to do is determine these weights Let ξ = [ξ 1 , · · · , ξ np ] T be the weight vector, and the permeability field and model output depending on ξ are denoted as a(x; ξ) and u(x; ξ), respectively. Without loss of generality, here we consider the additive error, i.e., z s = u(x s ; ξ) + e s , s = 1, . . . , n z where {x s } nz s=1 are sensor locations, ξ ∈ R np is the weight vector corresponding to the data z = [z 1 , . . . , z ns ] T ∈ R nz , and additive error e = [e 1 , . . . , e nz ] T ∈ R nz comes from measurement error, etc. Then, the forward model can be written as where G : R np → R nz is a mapping from input parameters ξ to noise-free data. Let the noise e and parameters ξ are independent mutually. Suppose further that the components of e are independent and identically distributed (i.i.d.), and satisfy e i ∼ N (0, σ 2 e ) for i = 1, · · · , n z . 2.1. Bayesian framework. In Bayesian inverse problems, the unknown parameters ξ is regarded as random vector in a properly defined probability space (Ω, F, P), whose components are i.i.d.. Moreover, the prior probability density function of ξ is known. In this work, we only consider the uniform prior in [0, 1] np .
According the forward model (5) and the assumption of noise independence, likelihood function π(z|ξ) has the form In many practical problems, the posterior distributions (6) are analytically intractable. Consequently, many sampling algorithms, e.g. MCMC, have been used to ascertain the posterior space. The framework of Bayesian inverse problems with direct sampling algorithm is shown in Figure 2. The form of the likelihood function (7) implies that for each candidate sample ξ, we need to evaluate the corresponding forward model G(ξ), which is expensive generally. Therefore, the naive sampling methods are prohibitive. A series of reduced order models and effective sampling algorithms have been developed to deal with this problem. In order to reduce the total computational cost, the former is by reducing the cost of a single evaluation of the forward model, and the latter is by reducing the number of samples required. In current work, we concentrate on constructing a surrogate model to reduce the cost of a single forward model evaluation.

2.2.
Stochastic surrogate model. In probability space (Ω, F, P), we denote the probability density function of random variable ξ i (ω) : Ω → R as π i (ξ i ), whose range is Γ i = ξ i (Ω) ⊂ R. Because the components of ξ are assumed to be independent, the joint probability density function of random vector ξ has the form Here, the joint probability density function π(ξ) is equivalent to the prior density function in Bayes' formula (6).
For any ω ∈ Ω, the realization of random vector ξ is belong to Γ. Therefore, in the Bayesian framework, we can restate the boundary value problem (2) as following stochastic forward problem in image probability space (Γ, B(Γ), π(ξ)dξ) rather than abstract probability space (Ω, F, P): Here, B(Γ) is the Borel σ-algebra on Γ, and π(ξ)dξ is the measure of random vector ξ. In fact, we can construct explicit expression of the solution u(x, ξ) to stochastic problem (9). Then for any ω ∈ Ω, substituting the realization ξ(ω) into u(x, ξ), the function value is consistent with the solution u(x; ξ) of the deterministic problem (2) which has the same input ξ. Using this solution expression, we can repeatedly evaluate the forward model (2) at a negligible cost. For this reason, the model (9) is called the stochastic surrogate model of problem (2).
3. Data-driven compressive sensing method based on proper orthogonal decomposition. In this section, we propose a data-driven compressive sensing method based on proper orthogonal decomposition, which can be used to construct efficient and sparse solution for a stochastic forward model. It's well-known that the solution u(x, ξ) : D × Γ → R of stochastic forward model (9) can be represented by stochastic and deterministic basis functions. Here, we propose the POD-DCS method, which chooses a set of data-driven POD basis functions {φ j (x)} m j=1 for domain D and a set of gPC basis functions {ψ i (ξ)} n i=1 for prior support Γ, and the coefficients are determined by solving a BP problem. Next, we discuss how to construct the POD-DCS approximate solution for a stochastic forward problem.
3.1. Proper orthogonal decomposition. The corresponding degree of freedom of the finite element method is expressed by n u , and let S ∈ R nu×Ks be the snapshot matrix: where u h (x, ξ i ) ∈ R nu is the finite element solution of problem (9) associated with input parameter ξ i ∈ Γ. These random inputs {ξ i } Ks i=1 are sampling from Γ randomly. By solving eigenvalue problem with we can obtain the diagonal singular value matrix Λ with Λ jj = λ j , and the matrix of Since R is positive semidefinite and symmetric, we can make the eigenvectors {v j } Ks j=1 to be orthogonal and sort the eigenvalues in a descending order. After normalization, the POD basis can be represented as Here, the dimension of POD basis, m, is determined by the energy ratio ν m defined as Once the basis functions are computed, the solution u(x, ξ) can be approximated as The coefficients {α j } m j=1 are called reduced states and satisfy the following algebraic system From the definition of the POD basis functions we known that φ i (x) and φ j (x) are orthogonal for i, j = 1, · · · , m, and {φ j (x)} m j=1 minimize the error measure with minimum Ks j=m+1 λ j [12]. When K s is large enough, M can be regarded as an approximation of the mean square error in the L 2 (D)-norm 3.1.1. Simulation results using POD-based stochastic ROM. For 4-dimensional stochastic problem (9), we draw K s samples of input parameters ξ from prior support Γ randomly, and evaluate the corresponding forward model (9) by using finite element method with mesh size h = 2 −6 (i.e. n u = 4225). These solutions compose the snapshot matrix S. For K s = 50, 100, 200 and 300, the first 15 eigenvalues of correlation matrix R are shown on the left of Figure 3. We can observe that the eigenvalues have very little differences when K s ≥ 100, but have large gaps between K s = 50 and K s = 100. In order to explore the prior space fully and ensure the accuracy of the POD-based reduced order model, we need use K s ≥ 100 realizations to construct the snapshot matrix. The right side of the Figure 3 shows the cumulative energy ratios of the first 15 POD basis functions with K s = 100, 200 and 300. We can see that the energy ratios of first 2 POD basis functions have slight differences for different K s . For each case, the first 3 basis functions can capture 99% energy of snapshot set, and the cumulative energy ratio of the first 9 basis functions is very close to 1. We define the expectation and variance of L 2 (D)-norm error between the full finite element solution u h (x, ξ) and the POD-based approximate solution u(x, ξ) as and where E[·] means the expectation and V ar[·] represents the variance. Here, we approximate these statistics with K r = 500 samples. Figure 4 displays these error estimates of the POD-based approximate solution associated with different K s and different degrees of freedom m. We observe that for a given m, the expectations and variances in these three cases only have a little differences. And when m ≤ 4, the errors decrease very quickly, while they are very small variations with m ≥ 9. The results for m = 3, 6, 9 and 12 are shown in Table 1. It is clear that the change in energy from 9 to 12 POD basis functions is so small that it is almost negligible, and the improvements in the corresponding error statistics are also small. Moreover, for m = 9, the order of magnitude of the differences in E is -4, and in variance V is -5 between these three cases, while they are larger for m = 3 and m = 6. Therefore, we consider using K s = 100 realizations to generate the POD basis functions and choose m to be 9 in the next experiment to ensure accuracy without consuming too much computing resources.

3.2.
Compressive sensing method. In our POD-DCS algorithm, using the POD method first can greatly reduce the DoF of the problem in hand, and simplify the subsequent processing. Next, we utilize the idea of compressive sensing to express the reduced states {α j (ξ)} m j=1 with gPC basis functions, and determine the    expression coefficients by using a small amount of data and solving a l 1 -minimization problem.
Clearly, once the POD basis functions {φ j (x)} m j=1 are obtained, we only need to compute the reduced states α j (ξ) : Γ → R in the expression (16) for j = 1, · · · , m. Here, we use the N th-order generalized polynomial chaos to represent the reduced states as the form where {ψ i (ξ)} n i=1 represents a set of multivariate Legendre polynomials with order N that are orthonormal w.r.t. the joint probability density function π(ξ), and ψ 1 (ξ) = 1. The cardinality of the stochastic polynomial basis is The stochastic collocation method [29] based on polynomial interpolation can be utilized to determine the coefficients {c ij } n, m i=1,j=1 . With collocation points we have a linear system Simply expressed as where A = (α j (ξ i )) ∈ R Kc×m is the matrix of reduced state generated by solving system (17), c = (c ij ) ∈ R n×m is the coefficient matrix that needs to be determined, Ψ = (ψ i (ξ j )) ∈ R Kc×n represents the stochastic information matrix, and K c is the number of stochastic collocation points. As shown in Equation (23), with the increase of random input dimension n p , the cardinality n of polynomial basis functions grows very quickly, which will cause linear system (24) to be underdetermined, i.e., K c < n.
Based on the idea of compressive sensing, given a highly incomplete set of reduced states by solving algebraic system (17), an accurate approximate solution of linear system (24) can be obtained by solving the BP problem where dictionary matrix Θ = I m ⊗ Ψ is the Kronecker product of an m × m identity matrix I m with stochastic information matrix Ψ, vec( A) denotes the vectorization of matrix A by column, and c = ( c ij ) represents the coefficient matrix we calculated. Then the reduced states can be approximated by Combining equations (16) and (26), we can get the POD-DCS approximate solution of stochastic elliptic problem (9) as Remark 1. By using the orthogonality of multivariate Legendre polynomials, we have The details of our POD-DCS method for stochastic forward problem is presented as following Algorithm 1.

3.2.1.
Simulation results of the POD-DCS method. Table 1 illustrates that using 100 realizations to generate the POD-based approximate solution with dimension 9 can guarantee the accuracy of the model. Thus, we utilize 100 snapshots to construct ROM, and retain only the first 9 basis functions. In this case, E = 2.8400 × 10 −3 and V = 1.4750 × 10 −4 .  (14) by solving eigenvalue problem (11), where m is the smallest positive integer such that ν m ≥ ν holds. 3: Construct the reduced state matrix A as (24) by solving algebraic system (17) with collocation points {ξ i } Kc i=1 . 4: Select appropriate stochastic basis functions {ψ i (ξ)} n i=1 with order N . 5: Obtain coefficient matrix c by solving BP problem (25). 6: Generate the approximate reduced states ( α j (ξ)) and the POD-DCS approximate solution u(x, ξ) of system (9) as Now, we need to determine the coefficients { c ij } n, m i=1,j=1 in the expression (26) by solving l 1 -minimization problem (25). Here, the 7th-order Legendre PC basis functions ψ(ξ) are used to represent the reduced states α j (ξ) for j = 1, · · · , m, and the total DoF of matirx c is n × m = 2970. Without loss of generality, we only consider the coefficients with absolute values larger than a fixed threshold τ , i.e., set the entries of c whose absolute value smaller than the threshold τ to 0, and let R τ be the proportion of non-zero coefficients in matrix c, that is Like the definitions of E and V, we denote the expectation and variance of L 2 (D)norm error between the full finite element solution u h (x, ξ) and the POD-DCS approximate solution u(x, ξ) as E and V respectively, namely and .
The sparsity and error estimation of the POD-DCS method w.r.t. different number of collocation points sampled from the prior space randomly and different thresholds are drawn in Figure 5. Compared with τ = 0, for each K c , the ratio of non-zero coefficients is less than 5% with τ = 0.1, but the mean of the L 2 (D)norm error is much larger. This is due to the fact that the threshold is so large that some influential basis functions are ignored. For τ = 0.01, the highest proportion of non-zero coefficients in matrix c is about 33%, and the error statistics are almost consistent with those of τ = 0. Therefore, we can take the threshold τ = 0.01 to ensure the accuracy. Through regression, the expectation of our approach converges with order 1.58 and variance converges with order 2.97, that is, , which can be seen in Figure 6. Such a highly "incomplete" set of reduced states does contain enough information to accurately recover the reduced order model. When K c ≥ 200, the expectations E and variances V of L 2 (D)-norm error do not change much as K c increase. So here we use 200 collocation points to determine the coefficients ( c ij ) of expression (27), that is, we need to calculate 200 algebraic systems (17).

4.
Analysis. The POD-DCS method has been described in the previous section. In order to introduce the conclusions of error analysis and coefficients estimation of our scheme, we first introduce several relevant properties of the compressive sensing method in this section.

Error analysis.
Definition 4.1 (see [11]). A vector b ∈ R n is called k-sparse if b 0 ≤ k holds, and its best k-term approximation error in the l p -norm is defined as where b 0 is the number of non-zero terms in vector b.
Note that for 0 < q < p ≤ ∞ and set s = 1/q − 1/p > 0, the prior estimation holds: In order to ensure that the matrix c can be reconstructed exactly by using l 1 minimization, we introduce the restricted isometry property (RIP).
Lemma 4.2 (see [5,24]). If for any k-sparse vector b ∈ R n , there exists a constant δ ∈ (0, 1) such that the inequality holds. Then δ k (A) := min(δ) is called the restricted isometry constant (RIC) of matrix A ∈ R m×n , and the matrix A is said to satisfy the RIP of order k with RIC δ k (A). Similarly, denoting the RIC of matrix B ∈ R p×q by δ k (B), we can obtain the inequality Based on Lemma 4.2 we know that the RIC of dictionary matrix Θ in (25) satisfies δ k (Θ) = δ k (Ψ). Lemma 4.3 (see [11]). Assuming that δ 3k (Θ) < 1/3, then the solution of l 1minimization problem (25) satisfies where the constant C δ > 0 depends only on δ 3k (Θ).
By Lemma 4.3, we give the error estimation of our POD-DCS method as following.
Theorem 4.4. There exist constants C 1 , C 2 , C 3 , C 4 , θ > 0 and 0 < q < 1, assuming that the dictionary matrix Θ in (25) satisfies δ 3k (Θ) < 1/3. Then with probability close to one, the expectation (31) of L 2 (D)-norm error between the finite element solution and the POD-DCS approximate solution satisfies where C 3 , C 4 depend on the smoothness of the reduced states and δ k (Θ) respectively, while constants C 1 , C 2 are universal.
Proof. By using the inequality 2ab ≤ a 2 + b 2 , the error (31) follows that . Denote We first estimate I 1 . The standard Monte Carlo Finite Element Method (MCFEM) approximates the expectation function I 1 by the average of samples. Note that the random inputs {ξ j } Ks j=1 used to generate the snapshot set are sampling from prior support Γ randomly. Thus, we can use these samples to approximate I 1 asĪ where u h j and u j represent the finite element solution and POD-based solution with random input ξ j for j = 1, · · · , K s , respectively. The number of realizations, K s , controls the statistical error E S = I 1 −Ī 1 [1], and .
According to the central limit theorem we have then choose a constant C q ≥ 1.65, such that holds with probability close to one, where C q is called quantile. With the error measure (18),Ī 1 satisfies As (22), we use multivariate polynomials with order N to approximate reduced state α j (ξ) for j = 1, · · · , m, then there exist constants C N , θ > 0 such that holds. The constant C N depends on the smoothness of {α j (ξ)} m j=1 . Using the biorthogonality of the POD-DCS solution and Lemma 4.3, we have By the prior estimation (34), we can arrive the error estimate with 0 < q < 1 which completes the proof.
This theorem implies that the mean square error E consists of four parts, including statistical error, truncation error, polynomial approximation error and sparse reconstruction error. Among them, the first two items are derived from the PODbased reduced order model, and the last two items are caused by the sparse reconstruction of the corresponding reduced order model. Therefore, we can balance these four errors and discrete error to save computing resources for a desired accuracy.

4.2.
Sparsity. The error analysis has been completed and we are now ready to show the sparsity of our POD-DCS solution.
Proof. According to the eigenvalue problem (11) and the POD basis function (14) we obtain Thus, we know that λ j φ j (x) can be regarded as an approximation of the expectation of u h (x, ξ), φ j (x) u h (x, ξ). Using the POD-DCS solution to approximate finite element solution, and combining the biorthogonality of the POD-DCS solution, λ j φ j (x) can be approximated by which implies that (39) holds true. Figure 7 confirms the conclusion of Theorem 4.5 numerically. The error in this figure is due to the fact that we only use the average of 100 samples to approximate the expectation, and the error of POD-DCS solution. It is well-known that the eigenvalues decay rapidly in many practical problems, so Theorem 4.5 implies that the coefficients are compressible and illustrates the feasibility of using the PODbased ROM first.

Numerical experiments.
In this section, we compare the POD-DCS scheme with POD-based ROM and the conventional CS method, and use our method to solve the 4-dimensional elliptic inverse problem (2)-(3) to further describe its feasibility and advantages.

5.1.
Comparison of accuracy. The error estimates and sparsity of our POD-DCS scheme and other two different methods are shown in Table 2. All three methods are constructed with 100 full discrete finite element solutions. Among them, the POD-DCS method is constructed as above discussion with K s = 100, m = 9, K c = 200, POD-based ROM has same dimension m, and CS represents the conventional compressive sensing method.
Obviously, the POD method has highest accuracy, while the reconstruction error of reduced states makes the accuracy of our POD-DCS method to be inferior to POD, and relatively little information leads to the worst accuracy of the conventional CS method. In terms of sparsity, the proposed POD-DCS scheme is slightly better than the CS method. However, the total DoF of our method is 0.2024 × n × m = 601, which is only 0.2% of that in CS method, and its DoF is 0.2182 × n × n u ≈ 3.04 × 10 5 . The reason for the sharply decrease in the DoF is due to the fact that we use the reduced order model, which leads to the DoF in physical space decline from n u to m. Therefore, compared to the CS method, the proposed method can not only improve the accuracy and sparsity, but also greatly reduce the degree of freedom.
From the discussions in Section 3 and Theorem 4.4 we know that the accuracy of the POD-DCS method can be improved by increasing the number of snapshots, the dimension of POD basis, the number of collocation points and the order of polynomials. These quantities can be determined with the required accuracy.

Comparison of efficiency.
Here we compare the efficiency of different methods. Table 3 summarizes the cost of each stage in the construction of different models. From the perspective of model construction, the number of full finite element solutions used in the three methods is the same. For POD-based ROM, although it has high accuracy and does not need to solve the BP problem, the time required to evaluate the model once is about 5 times that of the other two methods, which is not conducive to the implementation of the sampling algorithm. It takes only 0.2095s to evaluate the solution expression obtained by CS method, but the size of the dictionary matrix Θ in the corresponding BP problem (25) is (100 × n u ) × (n × n u ), which is usually so large that it will exceed the computing resources or the calculation speed will be very slow. According to the nature of the finite element basis functions, here we can decompose the BP problem into solving the each column of the coefficient matrix c, that is, transform the problem into solving n u l 1 -minimization problems with size of 100 × n. This process takes about 1.4 hours. Since the sparse reconstruction in our POD-DCS scheme is just for reduced states, and the size of dictionary matrix is (m × K c ) × (m × n). It takes only 21s to solve this BP problem, which is 0.42% of the time required by the CS method. For the POD-DCS method, we also need to evaluate the reduced order models w.r.t. K c collocation points, but it is usually much cheaper than the full finite element method. Therefore, from an efficiency perspective, the offline cost of POD-based ROM is relatively small, but the online cost is larger than the other two methods. While compared with the CS method, the online time of the POD-DCS method only has a little difference, but the offline time has a obvious advantage. Moreover, our method achieves 11 times acceleration when evaluating a forward model, and the time required to construct the solution expression of stochastic surrogate model can be offset by the repeated calculation of the forward model. It is well known that both POD and stochastic collocation methods can deal with highly nonlinear problems, so for such complex problems, our method will be more attractive due to its high efficiency.

5.3.
Elliptic inverse problem. The accuracy and efficiency comparison of the POD-DCS method with other two methods has been completed. Now, we utilize this method to deal with the elliptic inverse problem (2)-(3). We use finite element method with mesh size h = 2 −8 to generated noise-free data associated with input Note that the mesh size is finer than that used in inversion in order to prevent the "inverse crime". The true permeability field a(x; ξ o ) and noise-free output are shown in Figure 8. The n z = 49 measurement sensors are uniformly distributed over D with grid spacing 2 −3 .
y y x x In this Bayesian inverse problem, the components of weight vector ξ are assumed to be i.i.d. and have a prior ξ i ∼ U (0, 1) for i = 1, · · · , n p , where n p = 4. As described in section 3, we first construct the POD-DCS approximate solution u(x, ξ) of the stochastic problem (9) in prior support Γ = [0, 1] np with K s = 100, m = 9, K c = 200 and τ = 0.01. Then substituting this approximate solution expression into (7) to get the approximate likelihood function, thereby obtaining the approximate posterior probability density function from Bayes' rule (6). The framework for solving the Bayesian inverse problems using the POD-DCS approximate solution of the stochastic surrogate model is shown in Figure 9. Denote the approximate likelihood function and the approximate posterior probability density function associated with POD-DCS solution as π(z|ξ) and π(ξ|z) respectively. The Hellinger distance and Kullback-Leibler distance between the exact posterior distribution π(ξ|z) and its approximation π(ξ|z) have been shown in [26] and [30]. Here, we use delayed-rejection adaptive Metropolis (DRAM) algorithm to explore the approximate posterior distribution π(ξ|z), whose computational cost is only a fraction of that in exploring the true posterior distribution π(ξ|z). This is because the cost of evaluating a finite element solution is about 11 times that of a POD-DCS solution, which can be observed from Table 3.
Here we consider that the observation data z contains noise e, which satisfies zero-mean Gaussian distribution N (0, σ 2 e I nz ) with σ e = 0.05. For the finite element method, conventional CS method and POD-DCS method, we use the DRAM algorithm to draw 3 × 10 4 samples respectively. Among them, the first 5000 samples are discarded as burn-in periods and the last 25000 samples are used to estimate the posterior densities. The 95% confidence interval and posterior marginal probability density of each component of unknown input parameters ξ = [ξ 1 , ξ 2 , ξ 3 , ξ 4 ] are shown in Table 4 and Figures 10. Obviously, the 95% confidence intervals w.r.t. all these three methods contain true input parameters ξ o , and the length of these confidence intervals does not exceed 0.03. But compared with the conventional CS method, the posterior marginal densities obtained by our scheme are closer to that obtained by FEM. In Figure 11, we display the estimated permeability field and its error, which correspond to the maximum a posterior probability (MAP) parameters generated by the POD-DCS method. We can see that the error between the estimated permeability field and the true permeability field is small enough. Therefore, the proposed method can deal with Bayesian inverse problems efficiently. From the previous discussion, it is clear that the POD-DCS surrogate model speeds up the evaluation speed of model while ensuring the accuracy. Therefore, this method can be considered for Bayesian inverse problem, optimal control and other problems requiring repeated evaluation of forward model. Note that the BP problem (25) is solvable in polynomial time [6]. While the dictionary matrix Θ we used depends on the numbers of POD bases and stochastic bases, i.e. m and n. Therefore, for a specific problem, the dimension of POD-based ROM and the degree of the polynomial interpolation should be selected appropriately. It is further known that this method has limitations for the problem whose eigenvalue decays slowly, which is the inherent shortcoming of the POD method. 6. Conclusion. In summary, for statistical inverse problems, we can regard the deterministic forward problem as a stochastic forward problem on the prior support of unknown parameters, and the solutions of these two problems with the same input are equal. Therefore, in this work, we propose a data-driven compressive sensing method based on proper orthogonal decomposition to construct the solution expression of stochastic surrogate model to accelerate the Bayesian inference of an inverse problem. The snapshot-based POD method is first used to construct the ROM of stochastic problem, then the stochastic collocation method based on gPC basis functions is adopted to represent the reduced states, and the coefficients are determined by solving an l 1 -minimization problem. Substituting this approximate solution into the likelihood function can obtain an approximate likelihood function, thereby obtaining an approximate posterior. The approximate posterior space can be explored faster than the original one. We make error analysis and coefficient estimation for this method, and prove that as the number of snapshots, POD cardinality, polynomial order, and number of collocation points increase, the mean square error of approximate solution will decrease. In addition, the expression coefficients are related to the eigenvalues. A series of numerical experiments show that when using the same number of full finite element solutions, the POD-DCS method has better accuracy and sparsity than the conventional CS method. Although our method needs to solve the ROM w.r.t. the collocation points, the sparse reconstruction of low-dimensional reduced states is much cheaper than the CS method. Moreover, in inferring the permeability field for elliptic PDE, our method can not only get an accurate result, but also be much faster than the direct calculation. Accelerating the decline of eigenvalue in POD method is a problem worthy of consideration. And for complex problems, evaluating the forward model is costly. In order to construct an accurate reduced order model, we usually need many realizations, which can be prohibitive. Therefore, we plan to use a multi-fidelity scheme or select appropriate snapshots to overcome this difficulty. This is the subject of future work.