Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems

In this paper, we introduce an innovative approach for addressing Bayesian inverse problems through the utilization of physics-informed invertible neural networks (PI-INN). The PI-INN framework encompasses two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The primary role of the NB-Net lies in modeling the spatial basis functions characterizing the solution to the forward problem dictated by the underlying partial differential equation. Simultaneously, the INN is designed to partition the parameter vector linked to the input physical field into two distinct components: the expansion coefficients representing the forward problem solution and the Gaussian latent noise. If the forward mapping is precisely estimated, and the statistical independence between expansion coefficients and latent noise is well-maintained, the PI-INN offers a precise and efficient generative model for Bayesian inverse problems, yielding tractable posterior density estimates. As a particular physics-informed deep learning model, the primary training challenge for PI-INN centers on enforcing the independence constraint, which we tackle by introducing a novel independence loss based on estimated density. We support the efficacy and precision of the proposed PI-INN through a series of numerical experiments, including inverse kinematics, 1-dimensional and 2-dimensional diffusion equations, and seismic traveltime tomography. Specifically, our experimental results showcase the superior performance of the proposed independence loss in comparison to the commonly used but computationally demanding kernel-based maximum mean discrepancy loss.


Introduction
Inverse problems arise in various scientific and engineering fields, such as seismic tomography [1], material inspection [2], and medical imaging [3], etc.These problems aim to infer the input parameters of a system, such as physical fields, source locations, initial and boundary conditions, from the corresponding measurement data.However, inverse problems are typically ill-posed and difficult to solve because the measurement data originates from indirect sparse and noisy observations.It is of great practical value and theoretical significance to develop an efficient computation model for accurate and stable solutions, which has drawn much attention from scientists and engineers.
The classical approaches for inverse problems include the regularization methods and Bayesian inference.Regularization methods aim to find point estimates by simultaneously minimizing the misfits between observed data and solutions of forward problems with penalty functions [4,5].Bayesian inference incorporates uncertainties in prior information and observations by Bayesian rule and constructs the posterior distribution of the parameters, and therefore can effectively quantify the uncertainty of unknown input.However, in many applications, it is computationally expensive to sample from the intractable Bayesian posterior distributions by using statistical simulation techniques, e.g., Markov Chain Monte Carlo (MCMC) [6,7,8,9], Hamiltonian Monte Carlo [10,11], and Sequential Monte Carlo [12,13].To address these issues, many researchers utilized various variational inference algorithms to improve computational efficiency, including mean-field variational inference [14,15], automatic differential variational inference [16,17], and Stein variational gradient descent [16,18].But the variational methods may face challenges in high-dimensional settings due to the difficulty of accurately approximating posterior distributions with the tractable surrogate distribution.Although all aforementioned methods have shown to be relatively efficient, they still require numerous evaluations of the forward model and the complicated parametric derivatives.Consequently, deep learning-based methods offer an efficient alternative, where they are capable of providing real-time inversions for certain classes of inverse problems with new measurement data of similar type.
Deep learning methods have emerged as a promising approach for solving inverse problems, such as medical imaging [19,20] and electromagnetic inversion [21,22].Nevertheless, these methods heavily rely on labeled data from solutions of the forward problem, rendering them inappropriate for inverse problems where such information is not present.Then, Raissi et al. [23] proposed the physics-informed neural network (PINN), which can utilize the information of physical equations instead of labeled data to solve partial differential equations (PDEs).Enormous related research has shown that PINN can also efficiently recover the unknown parameters [24,25,26,27,28,29,30,31,32].In addition, deep generative models have made significant progress in the machine learning community, including generative adversarial networks (GANs) [33], variational autoencoders (VAEs) [34], and invertible neural network (INN) based models [35,36,37].INNs have gained significant attention, particularly for their efficiency and accuracy advantages in both sampling and density estimation of the complex target distribution through a bijective transformation.This approach has shown potential in Bayesian inference [38,39,40,41,42].However, there are still some limitations in current works.For instance, INN models [38,39,40] require significant amounts of labeled data for pretraining, which is often unavailable for many practical inverse problems.While invertible DeepONets [42] do not require labeled data, an extra variational inference procedure is necessary to approximate the true posterior distribution.
In this paper, we propose a novel approach to address the issues of Bayesian inverse problems, namely physics-informed invertible neural networks (PI-INN).The PI-INN establishes a bijective transformation between the parametric input and the output, which consists of the expansion coefficients of forward problem solution and Gaussian latent variables.Meanwhile, a neural basis net (NB-Net) is established to obtain the optimal spatial basis functions of the solution.The loss function of PI-INN is informed by the physics equation and constraints on the distribution of INN's output.Notably, we introduce a new independence loss term that can be naturally incorporated with physics-informed loss terms.This loss term can fully utilize the density estimated by the INN without requiring labeled data, unlike the Maximum Mean Discrepancy (MMD) loss term in [38].Furthermore, we theoretically prove that the PI-INN can efficiently sample from the true posterior distribution of unknown parameters when the loss function approaches zero.The method admits the following main advantages: • Our theoretical proof shows that PI-INN can efficiently sample from the true posterior distribution of the inverse problem when the loss function approaches zero.In applications, it can provide precise estimates of posterior distributions, with comparable accuracy to time-consuming approximate Bayesian computation.
• PI-INN can provide a tractable estimation of the posterior distribution, which enables efficient sampling and high-accuracy density evaluation without requiring additional variational inference procedures [42].
• The novel independence loss term can be naturally incorporated with physics-informed loss terms based on the estimated density and is particularly advantageous for high-dimensional problems.In contrast, the MMD loss term equipped in [38,39] is only suitable for labeled data and easily suffers from the curse of dimensionality.
• PI-INN establishes a unified learning framework for Bayesian inverse problems, which combines physics-informed learning with data-driven learning.
The remainder of this paper is organized as follows.Section 2 provides the problem statement of Bayesian inverse problems.In Section 3, the framework of our physics-informed invertible neural networks (PI-INN) is introduced, where the algorithm of training and inversion approaches with PI-INN for the physical system is described in detail.In section 4, we first present the advantage of the new loss term and then demonstrate the performance of PI-INN for solving Bayesian inverse problems through several numerical examples.Finally, some concluding remarks are given in Section 5.

Problem statement
Consider a steady physical system described by the following PDEs: where N denotes the general differential operator defined in the domain D ⊂ R d (d = 1, 2), B is the boundary operator on the boundary ∂D, λ(x) represents the input property field and u(x) denotes the solutions of the PDEs.
The corresponding inverse problem aims to recover λ ∈ R M from the noisy observations ũ ∈ R D of u at specific sensors of the domain.The observation system is defined as follows: where F is the forward operator and is the observational noise.It is assumed that follows the zero-mean Gaussian with diagonal covariance σ 2 I D .The relationship of the above variables and operator is illustrated in Fig. 1.For the Bayesian inference method, the target is to obtain samples from the posterior distribution p(λ | ũ): where p(λ) represents the prior and is the likelihood.The main challenge associated with the classical Bayesian approaches lies in the computational burden of evaluating the forward model F .To this end, the INN model can accurately and efficiently sample from the posterior bypass Bayes' rule, and provide rapid solutions for inversion problems in a similar problem class.

Invertible neural networks
Invertible neural networks (INNs) [35,36,37], also known as flow-based models, are a particular type of neural network that can build bijective transformations between inputs and outputs.As shown in Fig. 2 (a), INNs build the bijection g θ by stacking a series of reversible and differentiable coupling layers g 1 , • • • , g n , where Υ ∈ R F and χ ∈ R F are the random variables, Υ = g θ (χ).The probability distributions can be expressed via the change-of-variable formula as follows: where h j is the intermediate variable such that h j = g j (h j−1 ), In this work, we utilize the affine coupling layer as the basic building block, which was introduced in [36,37].Fig. 2 (b) illustrates the forward and backward operations of the affine coupling layer.In the forward propagation, the input h ∈ R F is first split into two parts: h (1) ∈ R f and h (2) ∈ R F− f , f < F. These two parts then undergo the coupling operation as follows: v (1) = h (1) , v (2) = h (2) exp(s(h (1) )) + t(h (2) ), (5) where s(•) and t(•) are scale and translation functions respectively, and they are modeled by trainable neural networks.Here, represents element-wise multiplication, and the inverse transformation can be trivially given: 1) , h (2) = (v (2) − t(v (1) )) exp(−s(v (1) )).
The Jacobian of the forward map is given as ∂h (1) diag(exp(s(h (1) ))) The probability density in Eq. ( 4) can be easily calculated due to the diagonality of the jacobian.Then, the Jacobian of the inverse map and log p z (z) can be calculated in the same way.
The INNs have the capacity to learn the forward mapping of a physical system and simultaneously solve inverse problems, owing to their intrinsic invertibility.However, in practical situations, inverse problems often lack unique solutions due to the information loss from measurements, which makes it impossible to establish a bijection between measurements and parameter inputs.Furthermore, the scarcity of labeled data brings difficulties to learn forward mapping.To tackle these issues, we propose physics-informed invertible neural networks (PI-INN) in the following section.

Physics-informed invertible neural networks
In this subsection, the network architecture of PI-INN is first demonstrated.As depicted in Fig. 3, PI-INN consists of two subnetworks: an invertible neural network (INN) and a neural basis net (NB-Net).Similar to DeepONets [43,44], the separate representation of the solutions is employed by where c i represents the expansion coefficient, and φ i (x) is the basis function defined on the spatial domain.
The INN is designed to learn the bijective mapping between the parametric input λ and the unique pairs [c, z] of expansion coefficients and random latent variables.The random latent variable z is utilized to capture the inherent information loss in the output layer, which is not contained in c.Consequently, z is assumed to be independent of c and follow a tractable probability distribution, where the uniqueness of the mapping between λ and [c, z] [38] is constructed.In this work, the standard Gaussian distribution for z is adopted.Furthermore, the INN architecture consists of alternating layers and affine coupling layers.The partitioned input of the coupling layer remains unchanged for one subset, while the other subset undergoes a coupling operation.Then, the two parts of output are exchanged in alternating layers, which ensure that every dimension of the input is updated [35].Then, a fully-connected neural network named NB-Net is employed to obtain the spatial basis functions, and the outputs of the NB-Net can be orthogonalized.It is worth highlighting that PI-INN is resolution independent, i.e., it is capable of predicting the numerical solutions at any positions of the domain D and solving inverse problems with variable locations of sensors.According to Eq. ( 8), the vector representation of the solutions on {x (i) } M i=1 ⊂ D is defined as follows: where Here, {x (i) } M i=1 is an arbitrary spatital coordinates in D, and Φ ∈ R M×P can be calculated by the NB-Net.
, c can be deterimined by solving the following optimization problem: min where [ Φ] ij = φ i ( x(j) ) and ρ is the regularization coefficient.Next, the training and inversion processes will be introduced by using PI-INN in the following section.

Training and inversion procedure of PI-INN
During the learning process of PI-INN (See the blue block in Fig. 3), it not only needs to learn the solutions of the forward problem but also needs to constrain the joint distribution q(c, z).Here, q(c, z) needs to satisfy two important assumptions [38]: z must strictly follow the standard Gaussian distribution, and c and z are independent of each other, i.e., q(z|c) = q(z).This is to prevent encoding of the same information in both c and z, which will introduce more errors in the inversion process.Then, [38] utilized MMD to minimize the distance between the joint distribution of output variables given by INN and the product of the marginal distribution given by data (see Appendix C for details).While MMD can effectively distinguish between two distributions in low dimensions, it becomes less effective and requires a larger number of samples as the dimensionality increases [45].Unlike MMD, a novel independence loss term is proposed, which does not require labeled data for c, and its efficiency does not decay as the dimensionality increases.It is defined as follows where [ ĉ(i) , ẑ(i) ] = g θ (λ (i) ), z (i) is the sample from the standard Gaussian distribution, whereas ẑ(i) represents the output of INN for latent variables.p(z) is the probability density function of the standard Gaussian distribution and q(c, z) is the joint distribution given by INN, which can be calculated using the change-ofvariables formula.Finally, the explicit expression of the loss function is given as follows: where Here, L equ represents equation loss term, L bound is boundary loss term, and L ind is independence loss term.û(i) = P j=1 c j (λ (i) )φ j is the predicted solutions of forward problem, and α, β, γ are relative weights of three loss terms.For the implementation, L equ can also be defined as the variational formulation (see Appendix D).Forward training optimizes the mapping [ ĉ(i) , ẑ(i) ] = g θ (λ (i) ) and implicitly determines its inverse λ (i) = g −1 θ ( ĉ(i) , ẑ(i) ).The detailed training process is summarized in Algorithm 1.
g θ is the forward map of invertible neural network.Substitute {x ( j) } M j=1 into the NB-Net, and assemble matrix In the process of inversion, Bayesian inference can be accomplished by running the INN reversely, as shown by the green line in Fig. 3. Given the measurement ũ, the unknown input parameter can be repeatedly sampled with the latent variables z shaped as a Gaussian distribution, and c keeps constant.Additionally, the desired posterior p(λ | ũ) is represented by the PI-INN, which transforms the known distribution p(z) to the unknown input parameter space conditioned by different observations.We have theoretically proved that a well-trained PI-INN can efficiently sample, and explicitly provide the true posterior distribution without multiple likelihood evaluations required by classical Bayesian inversion methods (see Appendix A).The detailed inversion procedure is illustrated in Algorithm 2. Although PI-INN is designed for the purpose of being physics-informed, it can also be applied to labeled data, where a unified framework for both data-driven learning and physics-informed learning is introduced in Appendix B.
Remark 3.2.It is noted that the dimension of the input needs to be identical to the dimension of the output for the INN, i.e., ndim(λ) = ndim(c) + ndim(z).However, if the dimension of λ is relatively smaller than the summation of ndim(c) and ndim(z), the padding variable ζ may be recommended, such that the summation of ndim(λ) and ndim(ζ) equals that of ndim(c) and ndim(z).In this work, ζ ∼ N(0, I) is chosen.

Numerical experiments
In this section, several experiments are given to confirm the effectiveness of the PI-INN model.Firstly, the performance of our independence loss term is demonstrated and compared with the MMD loss term in [38] through solving an inverse kinematics problem.Then, the PI-INN model is applied to solve the inverse problems of 1-d and 2-d diffusion equations, and the inversion results are compared with MCMC.Finally, the practical applicability of the PI-INN model is also demonstrated in the seismic traveltime tomography problem.The loss functions used for each experiment are defined in Appendix D, and the accuracy of the posterior sample can be analyzed by the L 2 relative errors defined as follows: where λ and λ is the predicted and reference value, respectively.

Inverse kinematics
In this subsection, the accuracy and efficiency of our independence loss term will be demonstrated by the use of the inverse kinematics problem introduced in [38], where an articulated arm moves vertically along a rail and rotates at three joints.In Fig. 4 (a), the left dotted line symbolizes the rail, and the position coordinate of the arm is denoted by x 1 , while the rotation angles of the three joints are represented by x 2 , x 3 , and x 4 .Then, the forward problem will calculate the coordinate of end point y of the articulated arm, and the forward map can be defined as follows: where l 1 = 0.5, l 2 = 0.5, and l 3 = 1.0 are the length of three arms.The prior of parameters are specified by normal distributions: x i ∼ N(0, σ i ), and σ 1 = 0.25, σ 2 = σ 3 = σ 4 = 0.5 = 28.65 • .The inverse problem is to determine all the feasible parameters that generate the endpoint ỹ of the arm.[38]).For two INN models, the input and output are x = (x 1 , x 2 , x 3 , x 4 ) and [y 1 , y 2 , z] with ndim(z) = 2, respectively.The models are trained with 4000 labeled data and a batch size of 64.The Adam optimizer is used with an initial learning rate of 5 × 10 −4 and 1200 iterations, which decays with a rate of 0.8 at epochs 400, 600, and 1000.Then, the approximate Bayesian computation (ABC) is also used to obtain samples from the true posterior with comparison to two INN models (see more details in Appendix D.1).Fig. 4 (b) presents the boxplots of the relative L 2 errors for reconstructing the positions of the endpoints of two test cases.It can be seen that INN 1 outperforms INN 2, as it exhibits a lower mean and variance of relative errors.Moreover, the samples of posterior distributions for two test cases are displayed in Fig. 5.We observe that the sample distributions of INN 1 is similar to that of ABC, both in terms of single-parameter marginal distributions and multi-parameter joint distributions.In contrast, the sample distributions of INN 2 differ significantly from the true distribution.It is worth noting that the effectiveness of the MMD loss is contingent on the sample size [45].It should be emphasized that much fewer training samples of 4000 data have been used for INN 1 and INN 2 in this example, which resulted in the lower efficacy of INN 2 compared to the model in [38].Therefore, our independence loss term has demonstrated its capability in small data scenarios.

Application to 1-d diffusion equation for Gaussian and mixed Non-Gaussian random field
We consider the following diffusion equation: where u(x) represents the concentration and D(x) is the diffusion coefficient.In this example, we assume D(x; ω) = exp( D(x; ω)) follows an exponential form of a Gaussian random field (GRF), where D is defined in Eq. ( 17).The Karhunen-Loève expansion (KLE) of D(x; ω) is used to generate samples of D(x; ω), and the finite element method (FEM) is applied to obtain reference solutions with a uniform grid and the degree of freedom N Dof = 31.The observation data is obtained by reference solutions with noise on sparse grid points.
We set u(x; λ) = 5 i=1 c i (λ)φ i (x), where {φ 1 (x), • • • , φ 5 (x)} will be approximated by an NB-Net with 6 hidden layers and 64 neurons.Assumed that z follows a 5-dimensional standard Gaussian distribution, and the To avoid the computation of second-order derivatives, L equ is defined in the variational formulation.We use 4000 unlabeled data with a batch size of 64 for training and train our model for 1000 epochs using the Adam optimizer in Pytorch.The learning rate is initially set to 1 × 10 −3 and decays with a rate of 0.8 at epochs 600 and 900.
To test the efficiency of our model, we obtain the posterior samples of D(x) using Algorithm 2, and compare the results with those obtained from MCMC, as shown in Fig. 6.It can be observed that, for both test cases, the posterior samples of PI-INN are consistent with the reference samples obtained through the MCMC method.To further demonstrate the performance of our model, we consider the following mixed Non-Gaussian random field as the prior of D(x): We use the same hyperparameters for the network and training procedure as the GRF prior case.The inversion results of two test cases are displayed the in Fig. 7.The posterior mean and standard deviation are in good accordance with the reference solutions from MCMC, which demonstrates the efficiency and reliability of our method.
Furthermore, the robustness of our model is validated by analyzing its performance under different combinations of measurement sensors and noise intensity.As shown in Fig. 8, we observed that the mean and standard deviation of the relative errors of the posterior samples of PI-INN has similar order of magnitudes to those of MCMC under the different observation conditions.This indicates that the performance of our model is consistent with the MCMC method, which shows the robustness and reliability of PI-INN model.

Application to Darcy equation
In this example, the Darcy equation in the two-dimensional domain is considered: where u(x) represents the pressure field.K(x) is the permeability field, and can be sampled from K(x, ω) = exp K(x, ω) , where K(x, ω) is defined by Eq. ( 20).The domain can be divided into 64 × 64 uniform mesh and the reference solutions can be obtained by FEM.Several evenly spaced grid points are selected as the positions of sensors during the inversion process.
The random parameters of KLE are utilized as the input of INN and set u(x; λ) = 20 i=1 c i (λ)φ i (x), where the NB-Net is constructed by a fully-connected neural network with 5 hidden layers containing 128 neurons and Tanh activation function.The spatial coordinates {x ( j) } M j=1 for training are consistent with the grid points used for FEM.The INN consists of 8 coupling layers, where both s(•) and t(•) contain 1 hidden layer with 100 neurons and ReLU activation functions.The variational formulation of L equ is still adopted.8000 unlabeled data with a batch size of 64 is utilized, and the Adam optimizer with a learning rate of 5 × 10 −4 is applied to train the model for 400 iterations.To quantity the uncertainty of inversion results with the various observation conditions, we present the mean and standard deviation of the relative L 2 error of 1000 posterior samples in Fig. 11.It is noted that the mean and standard deviation of the relative errors of PI-INN is consistent with those of MCMC, which demonstrates the effectiveness of our method.

Seismic traveltime tomography
Seismic travel time tomography is a widely used technique to image the subsurface properties and interior structures of the Earth [46].In seismic imaging, it is typically formulated as an inverse problem to estimate underground velocity models with the observed travel time data [29].In this example, PI-INN is applied to solve the practical seismic tomography problem based on the first-arrival travel time.
Consider the 2-d Eikonal equation defined as follows: where D = [0, 4] 2 represents the domain, and T(x) is the travel time to any position x from the point-source is the velocity model defined on D, and ∇ denotes the gradient operator.In this example, we assume that the velocity model can be parameterized as follows: where Here, v(x 0 ) is the velocity on the origin x 0 = (0, 0), g X and g Y are the velocity gradients along the horizontal and vertical directions, respectively.In this setting, the variation of vertical velocity gradient is determined by the depth parameters h i (i = 1, 2, 3, 4).Training data is generated with samples of the prior distributions of random parameters (such as vertical velocity gradient values and depths) in Eq. ( 24) and velocity models in Eq. ( 22). 6 × 6 evenly spaced sensors are distributed to record the travel time data, which is calculated using fast sweeping method [47] with 101 × 101 uniform mesh.The source point is fixed at (2, 0).

Conclusions
In this paper, a novel PI-INN is proposed to efficiently solve Bayesian inverse problems, which is quite general and provides a unified learning framework for Bayesian inverse problems that combines physicsinformed learning with data-driven learning.In contrast to traditional Bayesian inference, PI-INN can provide a tractable estimation of the posterior distribution, which enables efficient sampling and highaccuracy density evaluation through the novel independence loss term incorporated with physics-informed loss terms and is particularly advantageous for high-dimensional problems.Moreover, the asymptotic analysis of PI-INN is theoretically derived under the assumption that the loss function approaches zero.Numerical experiments show that the proposed PI-INN is stable and effective, and can also provide enough  where the inner product is defined as ( f 1 , f 2 ) D f 1 (x) T f 2 (x)dx, and h(x, c, r) is the test function: where x| x − c ∞ ≤ r 2 defines a disk in D, r > 0 is a constant, c is randomly chosen in D, and D x represents the spatial dimension.It can be seen that h(x, c, r) defines a uniform distribution in x| x − c ∞ ≤ r 2 .How to obtain the unbiased estimate of L equ [41] will be discussed as follows: For where L (j),i equ = K (j) (x)∇u(x), ∇h(x, c i , r) − f (x i ) • K (j) (x)∇u(x), ∇h(x, c i , r) − f (x i ) .(D.7)

Appendix D.3. 2-d Darcy equation
The equation loss term similar to Eq. (D.3) is also used for the Darcy equation.For D x = 2, let x = (x 1 , x 2 ) and c = (c 1 , c 2 ) , we have where 2 |∇T 0 | 2 + 2T 0 τ(i) (∇T 0 ∇τ (i) ) − 1/(v (i) ) 2 2 , where {v (i) } N i=1 is the training set, τ(i) is the prediction solutions of τ (i) by PI-INN.It is noted that L bound is not used for this example.For the dimensionality of random variables, we set ndim(c) = 4 and ndim(z) = 6.Since seven dimensional random parameters are considered for velocity models, padding variables ζ with ndim(ζ) = 3 have been added.

Figure 1 :
Figure 1: Graphical representation of the Bayesian inversion method.

Figure 3 :
Figure 3: Schematic of PI-INN.The INN coupling with NB-Net can not only obtain the solutions of forward problems but also infer the unknown parameters from the observations.The blue and green solid lines represent the directions of learning and Bayesian inversion processes, respectively.g θ denotes the forward map, and g −1 θ is the inverse map.

Algorithm 1
Training procedure for PI-INN Input: Training set: {λ (i) } N i=1 , number of epochs: E train , learning rate: η, batch size: N batch , loss function: L, weights of loss terms: α, β, γ, coordinates set: {x ( j) } M j=1 , number of terms in (8): N P .for epoch = 1 : E train do Sample minibatches from the training data and the latent space:
To demonstrate the efficiency of the independence loss term, we trained two data-driven INN models, denoted as INN 1 and INN 2, to solve the problem (NB-Net was not utilized in this example).MMD and our independence loss terms are leveraged to constrain the output distributions of INN 1 and INN 2, respectively.The detailed loss functions are provided in Appendix D.1.Here, INN 1 consists of 8 affine coupling layers, where both s(•) and t(•) contain 3 fully-connected layers with 48 neurons.INN 2 has the same architecture as INN 1 for comparison (slightly different from the architecture in

Figure 4 :
Figure 4: Inverse kinematics: (a) Articulated arm model; (b) Boxplots of relative L 2 errors of reconstructing the position of the endpoints by three methods, and the left and right figures corresponding to Test 1 and Test 2, respectively.

Figure 5 :
Figure 5: Inverse kinematics: Sample distributions of two test cases for ABC and two INN models.In each figure, the diagonal display the histograms of four parameters, while the lower left triangle and upper right triangle areas show the pairwise scatter diagram and density estimation of the same four parameters, respectively.

Figure 6 :
Figure 6: Two different test cases for 1-d diffusion equation under GRF prior: Comparisons of inversion results for PI-INN and MCMC, where the blue line and the green line represent the reference solution and the posterior mean, respectively.The blue shaded area is the interval of posterior mean ± 3 standard deviations with 16 evenly spaced sensors and 1% zero-mean Gaussian noise.

Figure 7 :
Figure 7: Two different test cases for 1-d diffusion equation under mixed Non-Gaussian prior: Comparisons of inversion results for PI-INN and MCMC, where the blue line and the green line represent the reference solution and the posterior mean, respectively.The blue shaded area is the interval of posterior mean ± 3 standard deviations with 16 evenly spaced sensors and 1% zero-mean Gaussian noise.

Figure 8 :
Figure 8: Relative L 2 errors of 1000 posterior samples with noise intensity and the number of sensors changed for two test cases of 1-d diffusion equation: Bars represent the averages and error bars show the standard deviations.The subfigures of the first and second row correspond to GRF and mixed Non-Gaussian random field prior cases, respectively.16 sensors are equally spaced in [0, 1] for the first column, and measurements contain 1% Gaussian noise for the second column.

Figure 9 :
Figure 9: Comparison of the reference solutions u and the predicted solutions for 2-d Darcy equation by PI-INN model.

Figure 10 :
Figure 10: Comparison of the reference permeability field K and the posterior mean of PI-INN and MCMC for 2-d darcy equation.1% noise and 22 × 22 sensors are used in this example.

Figure 11 :
Figure 11: Relative L 2 errors of 1000 posterior samples with noise intensity and the number of sensors changed for 2-d darcy equation: Bars represent the averages and error bars show the standard deviations.22 × 22 sensors and 0.5% noise are used in two subfigures, respectively.
) For the network architecture, the INN comprises 8 coupling layers, where both s(•) and t(•) have 1 hidden layer with 100 neurons.The NB-Net contains 6 hidden layers with 128 neurons, and the ReLU activation function is utilized for both INN and the NB-Net.Using the Adam optimizer with a learning rate of 5 × 10 −4 , we train the model over N = 4000 unlabeled data with a batch size of 64.

Figure 12 :
Figure 12: Comparison of the reference travel time and the predicted ones by PI-INN for two test cases of seismic traveltime tomography.The triangle marks in the first column represent the sensor locations for tomography.

Figure 13 :
Figure 13: The marginal posterior distributions obtained using PI-INN (two left subfigures) and MCMC (two right subfigures) for test cases.6 × 6 sensors and 10% noise are used in this example.Black and red lines show the reference solution and posterior mean velocity, respectively.At the right side of each panel, the marginal posterior distributions are plotted for four positions, whose coordinates have been marked in the upper right corner.

Figure 14 :
Figure 14: The bivariate marginal distributions of velocities at the locations of (2.0, 2.52) and (3.0, 3.52) obtained using PI-INN and MCMC, respectively, for the two velocity profiles in Fig 13.

2 = 2 ,
numerical accuracy for the indirect sparse and noisy measurement data, which supports the theoretical results of this paper.Future work will focus on the various practical applications to solve Bayesian inverse problems in geophysics and Earth science.and the equation loss term can be constructed with its variational form:L equ = E K,c − ∇ • (K(x)∇u(x)) − f (x)h(x, c, r) E K,c (K(x)∇u(x), ∇h(x, c, r)) − ( f (x), h(x, c, r))