Augmented Physics-Informed Neural Networks (APINNs): A gating network-based soft domain decomposition methodology

In this paper, we propose the augmented physics-informed neural network (APINN), which adopts soft and trainable domain decomposition and flexible parameter sharing to further improve the extended PINN (XPINN) as well as the vanilla PINN methods. In particular, a trainable gate network is employed to mimic the hard decomposition of XPINN, which can be flexibly fine-tuned for discovering a potentially better partition. It weight-averages several sub-nets as the output of APINN. APINN does not require complex interface conditions, and its sub-nets can take advantage of all training samples rather than just part of the training data in their subdomains. Lastly, each sub-net shares part of the common parameters to capture the similar components in each decomposed function. Furthermore, following the PINN generalization theory in Hu et al. [2021], we show that APINN can improve generalization by proper gate network initialization and general domain&function decomposition. Extensive experiments on different types of PDEs demonstrate how APINN improves the PINN and XPINN methods. Specifically, we present examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve both. We also show cases where XPINN is already better than PINN, so APINN can still slightly improve XPINN. Furthermore, we visualize the optimized gating networks and their optimization trajectories, and connect them with their performance, which helps discover the possibly optimal decomposition. Interestingly, if initialized by different decomposition, the performances of corresponding APINNs can differ drastically. This, in turn, shows the potential to design an optimal domain decomposition for the differential equation problem under consideration.


Introduction
Deep learning has become popular in scientific computing and is widely adopted in solving forward and inverse problems involving partial differential equations (PDEs).The physics-informed neural network (PINN) Raissi et al. [2019] is one of the seminal works in utilizing deep neural networks to approximate PDE solutions by optimizing them to satisfy the data and physical laws governed by the PDE.Furthermore, the extended PINN (XPINN) Jagtap and Karniadakis [2020] is a follow-up work of PINN, which first proposes space-time domain decomposition to partition the domain into several subdomains, where several sub-nets are employed to approximate the solution on their subdomains, while the solution continuity between them is enforced via interface losses.Then, its output is the ensemble of all sub-nets.The theoretical analysis of when XPINNs can improve generalization over PINNs is of great interest.The recent work of Hu et al., Hu et al. [2021] analyzes the trade-off in XPINN generalization between the simplicity of decomposed target function in each subdomain and the overfitting effect due to less available training data in each subdomain, which counterbalance each other to determine if XPINN can improve generalization over PINN.However, sometimes the negative overfitting effect incurred by the less available training data in each subdomain dominates the positive effect of simpler partitioned target functions.Furthermore, XPINNs may also suffer from relatively large errors at the interfaces between subdomains, which degrades the overall performance of XPINNs.
In this paper, we propose the Augmented PINN (APINN), which employs a gate network for soft domain partitioning to mimic the hard XPINN decomposition, which can be fine-tuned for a better decomposition.The gate network gets rid of the need for interface losses and weight-averages several sub-nets as the output of APINN, where each sub-net is able to utilize all training samples in the domain in order to prevent overfitting.Moreover, APINN adopts an efficient partial parameter sharing scheme for sub-nets, to capture the similar components in each decomposed function.To further understand the benefits of our APINN, we follow the theory in Hu et al. [2021] to theoretically analyze the generalization bound of APINN, compared to those of PINN and XPINN, which justifies our intuitive understanding of the advantages of APINN.Concretely, generalization bounds for APINNs with trainable or fixed gate networks are derived, which show the advantages of soft and trainable domain and function decomposition in APINN.We also perform extensive experiments on several PDEs that validate the effectiveness of our APINN.Specifically, we have examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve For matrix norms, we denote the spectral norm by • 2 and l p,q norms by W p,q = ( j ( k |W j,k | p ) q/p ) 1/q .In the following, we introduce the formulations of PINN Raissi et al. [2019] and XPINN Jagtap and Karniadakis [2020].

PINN and XPINN
The PINN is motivated by optimizing neural networks to satisfy the data and physical laws governed by a PDE to approximate its solution.Given a set of n b boundary training points {x b,i } n b i=1 ⊂ ∂Ω and n r residual training points {x r,i } nr i=1 ⊂ Ω, the ground truth PDE solution u * : Ω → R is approximated by the PINN model u θ , by minimizing the training loss containing a boundary loss and a residual loss: where PINN learns boundary conditions in the first term, while learning the physical laws described by the PDEs in the second term.
The XPINN extends PINN by decomposing the domain Ω into several subdomains where several sub-PINNs are employed.The continuity between each sub-PINNs is maintained via the interface loss function, and the output of XPINN is the ensemble of all sub-PINNs, where each of them makes predictions on their corresponding subdomains.Concretely, domain Ω is decomposed into N D subdomains as Ω = ∪ N D i=1 Ω i .The loss of XPINN contains the sum of the PINN losses of the sub-PINNs, including boundary and residual losses, plus the interface losses using points on the interfaces of different subdomains ∂Ω i ∩ ∂Ω j , where i, j ∈ {1, 2, ..., N D } such that ∂Ω i ∩ ∂Ω j = ∅ to maintain the continuity between the two sub-PINNs i and j.Specifically, XPINN loss for the i-th sub-PINN is where λ I is the weight controlling the strength of the interface loss, θ i is the parameters for subdomain i, and each R i S (θ) is the PINN loss for subdomain i containing boundary and residual losses, i.e., where n b,i and n r,i are the number of boundary points and residual points in subdomain i respectively, and x i b,j and x i r,j are the j-th boundary and residual training points in subdomain i, respectively.Furthermore, R I (θ i , θ j ) is the interface loss between the i-th and j-th subdomains based on interface training points where is the number of interface points between the i-th and j-th subdomains, while x ij I,k is the k-th interface points between them.The first term is the average solution continuity between the i-th and the j-th sub-nets, while the second term is the residual continuity condition on the interface given by the i-th and the j-th sub-nets.We will refer to the XPINN model introduced above as XPINNv1, since it is exactly the model proposed in the original work of Jagtap and Karniadakis [2020].
In practice, XPINNv1 may exhibit relatively larger errors near the interface, i.e., the interface losses in XPINNv1 cannot necessarily maintain the continuity between different sub-PINNs.This is because the enforcement of residual continuity conditions for PDEs involving higher-order derivatives is difficult to maintain accurately due to the obvious presence of higher-order derivatives.Therefore, De Ryck et al. [2022] introduces enforcing the continuity of first-order derivatives between different sub-PINNs to resolve the issue: where d is the problem dimension, i.e., x ∈ R d .With this additional term on first-order derivatives, we name the corresponding XPINN model as XPINNv2.Subsequently, APINN outputs the weighted average of the m outputs of subnets based on the weights G(x, t) i (green), where G is also a network mapping (x, t) to the m-dimensional simplex ∆ m .The weights G(x, t) i satisfies the property of partion-of-unity, i.e., i G(x, t) i = 1.

Parameterization of Augmented PINN
In this section, we introduce the model parameterization of APINN, which is graphically shown in Figure 1.We consider a shared network h : R d → R H (blue), where d is the input dimension and H is the hidden dimension, and m sub-nets (E i (x)) m i=1 (red), where each E i : R H → R, and a gating network G : R d → ∆ m (green) where ∆ m is the m-dimensional simplex, for weight-averaging the outputs of the m sub-nets.The output of our augmented PINN (APINN) u θ parameterized by θ is: where (G(x)) i is the i-th entry of G(x), and θ is the collection of all parameters in h, G and E i .Both h and E i are trainable in our APINN, while G can be either trainable or fixed.If G is trainable, we name the model APINN, otherwise we call it APINN-F.The APINN is a universal approximator.The detailed proof is as follows.
Proof.(The APINN is a universal approximator) Denote the function class of all neural networks as N N , then it is universal, i.e., for all continuous functions f ∈ C(Ω) and > 0, there exists a neural network g ∈ N N , such that In addition, we also denote the function class of gating network by G, which collects all vector-value neural networks mapping R d to ∆ m .Back to the APINN model, and denote the function class of APINN as APIN N , which is If we choose then APINN degenerates to a vanilla multilayer network since m i=1 G(x) i = 1, i.e., N N ⊂ APIN N .Therefore, since multilayer neural networks are already universal approximator and it is a subset of APINN model, APINN is a universal approximator.
In APINN, G is pre-trained to mimic the hard and discrete decomposition of XPINN, which will be discussed in the next subsection.If G is trainable, then our model can fine-tune the pre-trained domain decomposition to further discover a better decomposition through optimization.If not, then APINN is exactly the soft version of XPINN with  the corresponding hard decomposition.APINN is better than PINN thanks to the adaptive domain decomposition and parameter efficiency.

Explanation of the Gating Network
In this section, we will show how the gating network G can be trained to mimic XPINNs for soft domain decomposition.Specifically, in Figure 2 left, XPINN decomposes the entire domain (x, t) , and the lower one 0), which is based on the interface t = 0.The soft domain decomposition in APINN is shown in Figure 2 (middle and right), which are the pretrained gating networks for the two sub-nets corresponding to the upper and bottom subdomains.Here, (G(x, t)) 1 is pretrained on exp(t − 1) and (G(x, t)) 2 on 1 − exp(t − 1).Intuitively, the first sub-PINN focuses on where t is larger, corresponding to the upper part, while the second sub-PINN focuses on where t is smaller, corresponding to the bottom part.
The gating network can also be adapted for complex domains like the L-shape domain or even high-dimensional domains by properly choosing the corresponding gating function.

Difference in the position of h
We have three options for building the model of APINN.First, the simplest idea is that if we omit the parameter sharing in our APINN, then the model becomes: The proposed model in this paper is Another method for parameter sharing is to place h outside the weighted average of several sub-nets.
Compared to the first model, our new model given in equation ( 10) adopts parameter sharing for each sub-PINN to improve parameter efficiency.Equation (10) generalizes equation ( 9) by using the same E i networks and selecting the shared network as identity mapping.Intuitively, the functions learned by each sub-PINNs should have some kind of similarity, since they are parts of the same target function.The prior of network sharing in our model explicitly utilizes intuition and is therefore more parameter efficient.
Compared to the model given in equation ( 11), our model is more interpretable.In particular, our model in equation 10 is a weighted average of m sub-PINNs E i • h, so that we can visualize each E i • h to observe what functions they are learning.However, for equation ( 11), there is no clear function decomposition due to the h being outside, so that visualization of each function component learned is not possible.

Preliminaries
To facilitate the statement of our main generalization bound, we first define several quantities related to the network parameters.For a network u θ for fixed reference matrices A l , where A l can vary for different networks.We denote its complexity as follows where i signifies the order of derivative, i.e., R i denotes the complexity of the i-th derivative of the network.We further denote the corresponding M (l), N (l), and R i quantities of the sub-PINN E j • h as M j (j), N j (l) and R i (E j • G).
We also denote those of the gate network G as M G (l), N G (l) and R i (G).
The train loss and test loss of a model u θ (x) are the same as that of PINN, i.e., Since the following assumption holds for a vast variety of PDEs, we can bound the test L 2 error by the test boundary and residual losses: Assumption 5.1.Assume that the PDE satisfies the following norm constraint: where the positive constant C 1 does not depend on u but on the domain and the coefficients of the operators L, and the function class N N L contains all L-layer neural networks.
The following assumption is widely adopted in related works Luo and Yang [2020], Hu et al. [2021].
Assumption 5.2.(Symmetry and boundedness of L).Throughout the analysis in this paper, we assume the differential operator L in the PDE satisfies the following conditions.The operator L is a linear second-order differential operator in a non-divergence form, i.e., (Lu , where all A αβ , b α , c : Ω → R are given coefficient functions and u * xα are the first-order partial derivatives of the function u * with respect to its α-th argument (the variable x α ) and u * xαx β are the second-order partial derivatives of the function u * with respect to its α-th and β-th arguments (the variables x α and x β ).Furthermore, there exists constant K > 0 such that for all x ∈ Ω = [−1, 1] d , and α, β ∈ [d], we have A αβ = A βα and A αβ (x), b α (x), c(x) are all K-Lipschitz, and their absolute values are not larger than K.

A Tradeoff in XPINN Generalization
In this subsection, we review the tradeoff in XPINN generalization, introduced in Hu et al. [2021].There are two factors that counterbalance each other to affect XPINN generalization, namely the simplicity of the decomposed target function within each subdomain thanks to the domain decomposition, and the complexity and negative overfitting effect due to the lack of available training data.When the former effect is more obvious, XPINN outperforms PINN.Otherwise, PINN outperforms XPINN.When the two factors strike a balance, XPINN and PINN perform similarly.

APINN with Non-Trainable Gate Network
In this section, we state the generalization bound for APINN with a non-trainable gate network.Since the gate network is fixed, the only complexity comes from the sub-PINNs.The following theorem holds for any gate function G.
Theorem 5.1.Assume that 5.2 holds for any δ ∈ (0, 1), with a probability of at least 1 − δ over the choice of random samples S = {x i } n b +nr i=1 ⊂ Ω with n b boundary points and n r residual points, we have the following generalization bound for an APINN model Intuition: The first term is the train loss, and the third is the probability term, in which we divide the probability δ into δ(E) for a union bound over all parameters in E j • h.The second term is the Rademacher complexity of the model.For the boundary loss, the network For the residual loss, the case of the second term is similar.Note that the second-order derivative of APINN is Consequently, each since it is fixed.

Explain the Effectiveness of APINN via Theorem 5.1
In this section, we explain the effectiveness of APINNs using Theorem 5.1, which shows that the benefit of APINN comes from (1) soft domain decomposition, (2) getting rid of interface losses, (3) general target function decomposition, and (4) the fact that each sub-PINN of APINN is provided with all the training data, which prevents overfitting.
For the boundary loss of APINN, we can apply Theorem 5.1 to each of the APINN's soft subdomains.Specifically, for the k-th sub-net in the k-th soft subdomain of APINN, i.e., the Ω k , k ∈ {1, 2, ..., m}, the bound is where n b,k is the number of training boundary points in the k-th subdomain.
If the gate net is mimicking the hard decomposition of XPINN, then we assume that the k-th sub-PINN E k focuses on Ω k , in particular G(x) j ∞ ≤ c for j = k, where c approaches zero.Note that Theorem 5.1 does not depend on any requirement on the quantity c, and we are making such assumption for illustration.Then, the bound reduces to which is exactly the bound of XPINN if the domain decomposition is hard.Therefore, APINN has the benefit of XPINN, i.e., it can decompose the target function into several simpler parts in some sub-domains.Furthermore, since APINN does not require the complex interface losses, its train loss R S (θ S ) is usually smaller than that of XPINN, and it is free from errors near the interface.
In addition to soft domain decomposition, even if the output of G does not concentrate on certain sub-domains, i.e., does not mimic XPINN, APINN still enjoys the benefit of general function decomposition, and each sub-PINN of APINN is provided with all training data, which prevents overfitting.Concretely, for boundary loss of APINN, the complexity term of the model is , which is a weighted average of the complexity of all sub-PINNs.Note that, similar to PINN, if we view APINN on the entire domain, then all sub-PINNs are able to take advantage of all training samples, thus preventing overfitting.Hopefully, the weighted sum of each part is simpler than the whole.To be more specific, if we train a PINN, u θ , the complexity term will be R 0 (u θ ).If APINN is able to decompose the target function into several simpler parts such that their complexity weighted sum is smaller than the complexity of PINN, then APINN can outperform PINN.

APINN with Trainable Gate Network
In this section, we state the generalization bound for APINN with a trainable gate network.In this case, both the gate network and the m sub-PINNs contribute to the complexity of the APINN model, influencing generalization at the same time.
Theorem 5.2.Let Assumption 5.2 holds, for any δ ∈ (0, 1), with probability at least 1 − δ over the choice of random samples S = {x i } n b +nr i=1 ⊂ Ω with n b boundary points and n r residual points, we have the following generalization bound for an APINN model where δ(G, E) )Nj (l)(Nj (l)+1) .Intuition: It is somehow similar to that of Theorem 5.1.Here, we treat the APINN model as a whole.Now, G(x) will contribute its complexity, R i (G), rather than its infinity norm, since it is trainable rather than fixed.6 Computational Experiments

The Burgers Equation
The one-dimensional viscous Burgers equation is given by The difficulty of the Burgers equation is in the steep region near x = 0 where the solution changes rapidly, which is hard to capture by PINNs.The ground truth solution is visualized in Figure 4 left.In this case, XPINN performs badly near the interface.Thus, APINN improves XPINN, especially in the accuracy near the interface, both by getting rid of the interface losses and by improving the parameter efficiency.

PINN and Hard XPINN
For the PINN, we use a 10-layer tanh network of 20-width with 3441 neurons, and provide 300 boundary points and 20000 residual points.We use 20 as the weight on the boundary and 1 as the weight for the residual.We train PINN by the Adam optimizer with 8e-4 learning rate for 100k epochs.XPINNv1 decomposes the domain based on whether x > 0. The weights for boundary loss, residual loss, interface boundary loss, and interface residual loss are 20, 1, 20, 1, respectively.XPINNv2 shares the same decomposition as XPINNv1, but its weights for boundary loss, residual loss, interface boundary loss, and interface first-order derivative continuity loss are 20, 1, 20, 1, respectively.The sub-nets are 6-layer tanh networks of 20-width with 3522 neurons in total, and we provide 150 boundary points and 10000 residual points for all sub-nets in XPINN.The number of interface points is 1000.The training points of XPINNs are visualized in Figure 4 right.We train XPINNs by the Adam optimizer with 8e-4 learning rate for 100k epochs.Both models are finetuned by the L-BFGS optimizer until convergence after Adam optimization.

APINN
To mimic the hard decomposition based on whether x > 0, we pretrain the gate net G on the function (G(x, t)) 1 = 1 − (G(x, t)) 2 = exp(x − 1), so that the first sub-PINN focuses on where x is larger and the second sub-PINN focuses on where x is smaller.The corresponding model is named APINN-X.In addition, we pretrain the gate net G on (G(x, t)) 1 = 1 − (G(x, t)) 2 = 0.8 to mimic multi-level PINN (MPINN) Anonymous [2022], where the first sub-net focuses on the majority part, while the second one is responsible for the minority part.The corresponding model is named APINN-M.All networks have a width of 20.The numbers of layers in the gate network, sub-PINN networks, and shared network are 2, 4, and 3, respectively, with 3462 / 3543 parameters depending on whether the gate network is trainable.All models are finetuned by the L-BFGS optimizer until convergence after Adam optimization.

Results
The results for the Burgers equation are shown in Table 1.The reported relative L 2 errors are averaged over 10 independent runs, which are the best L 2 errors among their whole optimization processes.The error plots of XPINNv1 and APINN-X are visualized in Figures 6 left and right, respectively.
• XPINN performs much worse than PINN, due to the large error near the interface, where the steep region is located.
• APINN-X performs the best because its parameters are more flexible than those of PINN, and it does not require interface conditions like in XPINN, so it can model the steep region well.
• APINN-M performs worse than APINN-X, which means that MPINN initialization is worse than the XPINN one in this Burgers problem.
• APINN-X-F with a fixed gate function performs slightly worse than PINN and APINN, which justifies the flexibility of trainable domain decomposition.However, even without fine-tuning the domain decomposition, APINN-X-F can still outperform XPINN significantly, which shows the effectiveness of soft domain partition.

Visualization of Gating Networks
Some representative optimized gating networks after convergence are visualized in Figure 7.In the first row, we visualize two gate nets of APINN-X.Despite the fact that their optimized gates differ, they retain the original left-andright decomposition with the change in interface position.Thus, their L 2 errors are similar.In the second row, we show two gate nets of APINN-M.Their performances differ a lot, and they weight the two subnets differently.The third figure uses a weight ≈ 0.9 for subnet-1 and a weight ≈ 0.1 for subnet-2, while the fourth figure uses ≈ 0.6 weight for subnet-1 and ≈ 0.4 weight for subnet-2.It means that the training of MPINN-type decomposition is unstable, that APINN-M is worse than its XPINN counterpart in the Burgers problem, and that the weight in MPINN-type decomposition is crucial to its final performance.From these examples, we can see that initialization is crucial for APINN's success.Despite the optimization, the trained gate will still be similar to the initialization.Furthermore, we visualize the optimization trajectory of the gating network for the first subnet in the Burgers equation in Figure 8, where each snapshot is the gating net at epoch = 0, 1E4, 2E4, 3E4.That for the second subnet G 2 can be easily computed using the property of partition-of-unity G 1 + G 2 = 1.The trajectory is smooth, and the gating net gradually converges by moving the interface from left to right and shifting the interface.

Helmholtz Equation
Problems in physics including seismology, electromagnetic radiation, and acoustics are solved using the Helmholtz equation, which is given by The analytic solution is u(x, y) = sin(a 1 πx) sin(a 2 πy), and is shown in Figure 9 left.In this case, XPINNv1 performs worse than PINN due to the large errors near the interface.With additional regularization, XPINNv2 reduces 47% the relative L 2 error compared to PINN, but it still performs worse than our APINN due to the overfitting effect caused by the small availability of the training data in each sub-domain.

PINN and Hard XPINN
For PINN, we provide 400 boundary and 10000 residual points.The XPINN decomposes the domain based on whether y > 0, whose training points are shown in Figure 9 right.We provide 200 boundary points, 5000 residual points, and 400 interface points for the two sub-nets in XPINN.Other settings of PINN and XPINN are the same as those in the Burgers equation.

Results
The results for the Helmholtz equation are shown in Table 2.The reported relative L 2 errors are averaged over 10 independent runs, which are selected as having the lowest errors during optimization.The error plots of XPINNv1, APINN-X and XPINNv2 are visualized in Figure 11      • XPINNv1 performs the worst, since its interface loss cannot enforce the interface continuity satisfactorily.
• XPINNv2 performs significantly better than PINN, but it is worse than APINN-X, because it overfits in the two sub-domains a bit due to the small number of available training samples, compared with APINN-X.
• APINN-M performs worse than APINN-X due to bad initialization of the gating network.
• The errors of APINN, XPINNv2 and PINN concentrate near the boundary, which is due to the gradient pathology Wang et al. [2021].

Visualization of Optimized Gating Networks
The randomness of this problem is smaller, so that the final relative L 2 errors of different runs are similar.Some representative optimized gating networks after convergence of APINN-X are visualized in Figure 12.Specifically, every gating network maintains approximately the original decomposition into an upper and a lower domain, despite the fact that the interfaces change a bit in each run.From these observations, the XPINN-type decomposition into an upper and a bottom domain is already satisfactory for XPINN.We also notice that the XPINN outperforms PINN, which is consistent with our observation.Furthermore, we visualize the optimization trajectory of the gating network for the first subnet in the Helmhotz equation in Figure 13, where each snapshot is the gating net at epoch = 0 to 5E2 with 6 snapshots in all.That for the second subnet G 2 can be easily computed using partition-of-unity property gating networks, i.e., i G i = 1.The trajectory is similar to the case in the Burgers equation.Here, the gating net of the Helmhotz equation converges much faster than the one in the previous Burgers equation.

Klein-Gordon Equation
In modern physics, the equation is used in a wide variety of fields, such as particle physics, astrophysics, cosmology, classical mechanics, etc. and it is given by Its boundary and initial conditions are given by the ground truth solution: and is shown in Figure 14 left.In this case, XPINNv1 performs worse than PINN due to the large errors near the interface induced by unsatisfactory continuity between sub-nets, while XPINNv2 performs similarly to PINN.APINN performs much better than XPINNv1 and better than PINN and XPINNv2.

PINN and Hard XPINN
The experimental settings of PINN and XPINN are identical to those of the previous Helmholtz equation, with the exception that XPINN now decomposes the domain based on whether x > 0.5 and Adam optimization is performed for 200k epochs.

Results
The results for the Klein-Gordon equation are shown in Table 3.The reported relative L 2 errors are averaged over 10 independent runs.The error plots of XPINNv1, APINN-X, and XPINNv2 are visualized in Figures 16 left, middle, and right, respectively.
• XPINNv1 performs the worst, since the interface loss of XPINNv1 cannot enforce the interface continuity well, while XPINNv2 performs similarly to PINN, since the two factors in XPINN generalization reach a balance.
• APINN performs better than all XPINNs and PINNs, and APINN-M is slightly better than APINN-X.

Wave Equation
We consider a wave problem given by The boundary and initial conditions are given by the ground truth solution: and is shown in Figure 17 left.In this example, XPINN is already significantly better than PINN because its relative L 2 error is 27However, APINN still performs slightly better than XPINN, even if XPINN is already good enough.

PINN and Hard XPINN
We use a 10-layer tanh network with 3441 neurons and 400 boundary points and 10,000 residual points for PINN.We use 20 weight on the boundary and unity weight for the residual.We train PINN using the Adam optimizer for 100k epochs at an 8E-4 learning rate.XPINN decomposes the domain based on whether t > 0.5.The weights for boundary loss, residual loss, interface boundary loss, interface residual loss, and interface first-order derivative continuity loss are 20, 1, 20, 0, 1, respectively.The sub-nets are 6-layer tanh networks of 20-width with 3522 neurons in total, and we provide 200 boundary points, 5000 residual points, and 400 interface points for all sub-nets in XPINN.The training points of XPINN are visualized in Figure 17 right.We train XPINN using the Adam optimizer for 100k epochs at a learning rate of 1e-4.

Results
The results for the wave equation are shown in Table 4.The reported relative L 2 errors are averaged over 10 independent runs, which are selected as the error at the epoch with the smaller training loss among the last 10% epochs.The error plots of PINN, XPINNv2, and APINN-X are visualized in Figures 19 left, middle, and right, respectively.
• Although XPINN is already much better than PINN and reduces by 27% the relative L 2 of PINN, APINN can still slightly improve over XPINN and performs the best among all models.In particular, APINN-M outperforms APINN-X.

Visualization of Optimized Gating Networks
Some representative optimized gating networks after convergence are visualized in Figure 20.The first row shows the gate networks of optimized APINN-X, while the second row shows those of APINN-M.In this case, the variance is much smaller, and the optimized gate nets maintain the characteristics at initialization, i.e., those of APINN-X remain an upper-and-lower decomposition and those of APINN-M remain a multi-level partition.Gate nets under the same initialization are also similar in different independent runs, which is consistent with their similar performances.

Boussinesq-Burger Equation
Here we consider the Boussinesq-Burger system, which is a nonlinear water wave model consisting of two unknowns.
A thorough understanding of such a model's solutions is important in order to apply it to harbor and coastal designs.The Boussinesq-Burger equation under consideration is given by where the Dirichlet boundary condition and the ground truth solution is given in Lin and Chen [2022], and shown in Figure 21 (left and middle) for the unknown u and v, respectively.In this experiment, we consider a system of PDEs, and try XPINN and APINN with more than two subdomains.

PINN and Hard XPINN
For PINN, we use a 10-layer Tanh network, and provide 400 boundary points and 10,000 residual points.We use 20 weight on the boundary and unity weight for the residual.It is trained by Adam Kingma and Ba [2014] with an 8E-4 learning rate for 100K epochs.For domain decomposition of (hard) XPINN, we design two different strategies.First, a XPINN with two subdomains decomposes the domain based on whether t > −0.5.The sub-nets are 6-layer tanh networks of 20-width, and we provide 200 boundary points and 5000 residual points for every sub-net in XPINN.Second, a XPINN4 with four subdomains decomposes the domain based on t = −1.75,−0.5,and 0.75 into 4 subdomains, whose training points are visualized in Figure 21 right.The sub-nets in XPINN4 are 4-layer tanh networks of 20-width, and we provide 100 boundary points and 2500 residual points for every sub-net in XPINN4.The number of interface points is 400.The weights for boundary loss, residual loss, interface boundary loss, interface residual loss, and interface first-order derivative continuity loss are 20, 1, 20, 0, 1, respectively.We use the Adam optimizer to train XPINN and XPINN4 for 100k epochs with an 8E-4 learning rate.To make a fair comparison, the parameter counts in PINN, XPINN, and XPINN4 are 6882, 7044, and 7368, respectively.

APINN
For APINN with two subdomains, we pretrain the gate net G of APINN-X on the function (G(x, t)) 1 = 1 − (G(x, t)) 2 = exp(0.35* (t − 2)) to mimic XPINN, and pretrain that of APINN-M on the function (G(x, t)) 1 = 1 − (G(x, t)) 2 = 0.8 to mimic MPINN.In APINN-X and APINN-M, all networks have a width of 20.The numbers of layers in the gate network, sub-PINN networks, and shared network are 2, 4, and 5, respectively, with 6945 parameters in total.For APINN with four subdomains, we pretrain the gate net G of APINN4-X on the function , to mimic XPINN.Furthermore, we pretrain that of APINN4-M on the function (G(x, t)) 1 = 0.8, and (G(x, t)) 2,3,4 = 1 15 , to mimic MPINN.The pretrained gate functions of APINN4-X are visualized in Figure 23.In APINN4-X and APINN4-M, h and G are width 20, while E i is width 18.The numbers of layers in the gate network, sub-PINN networks, and the shared network are 2, 4, and 3, respectively, with 7046 parameters in total.

Results
The results for the Boussinesq-Burger equation are shown in Tables 5 and 6.The reported relative L 2 errors are averaged over 10 independent runs, which are selected as the error at the epoch with the smaller training loss among the last 10% epochs.The key observations are as follows.
• APINN-M performs the best.• APINN and XPINN with four sub-nets do not perform as well as their two sub-net counterparts, which may be due to the tradeoff between target function complexity and number of training samples in XPINN generalization.Also, more subdomains do not necessarily contribute to parameter efficiency.
• The error of the best performing APINN-M is visualized in Figure 24, which is concentrated near the steep regions, where the solution changes rapidly.

Visualization of Optimized Gating Network
We visualize several representative optimized gating networks after convergence with similar relative L 2 errors in Figures 25, 26 and 27, for the APINNs with two subnets, APINN4-X and APINN4-M, respectively.Note that the variance of this Boussinesq-Burger equation is smaller, so these models have similar performances.The key observation is that the gate nets after optimization maintain the characteristics at initialization, especially for APINN-M.Specifically, for APINN-M, the optimized gate networks do not change much from the initialization.For APINN-X, although the position and slope of the interfaces between subdomains change, the optimized APINN-X is still partitioning the whole domain into four upper-to-bottom parts.Therefore, we have the following conclusions.
• Initialization is crucial to the success of APINN, which is reflected in the performance gaps between APINN-M and APINN-X, since the gate networks after optimization maintain the characteristics at initialization.
• APINN with one kind of initialization can hardly be optimized into another kind.For instance, we seldom see gate nets of APINN-M are optimized to be similar to the decomposition of XPINNs.
• These observations are consistent with our Theorem 5.2, which states that a good initialization of the gate net contributes to better generalization, since the gate net does not need to change significantly from its initialization.
• However, based on our extensive experiments, trainable gate nets still contribute to generalization, due to the positive fine-tuning effect, although it cannot optimize a MPINN-type APINN into a XPINN-type APINN and vice versa.
Furthermore, we visualize the optimization trajectory of the gating network for all subnets in the Boussinesq-Burger equation in Figure 28 in the Appendix, where each snapshot is the gating net at epochs = 0, 10, 20, 30, 40, and 50.The change is fast and continuous.

Summary
In this paper, we propose the Augmented Physics-Informed Neural Networks (APINN) method, which employs a gate network for soft domain partition that can mimic the hard eXtended PINN (XPINN) domain decomposition and is trainable and fine-tunable.The gate network satisfying the partition-of-unity property averages several sub-networks as the output of APINN.Moreover, it adopts partial parameter sharing for sub-nets.It has the following advantages over the state-of-the-art generalized space-time domain decomposition based XPINN method: • APINN does not include the complicated interface losses to maintain the continuity between different subnetworks (sub-PINNs) due to the gate network decomposing the entire domain in a soft way, which also contributes to better convergence and lower training loss.
• The gate network can mimic the hard decomposition of XPINN, such that APINN enjoys the advantage of XPINN in that it can decompose the complicated target function into several simpler parts to reduce the complexity and improve the generalizability of each sub-network.
• The trainable gate network enables fine-tuning the domain decomposition to discover a better function as well as domain decomposition for simpler parts, contributing to better generalization based on Hu et al. [2021].
• The parameter sharing in APINN utilizes the essential idea that each sub-PINN is learning one part of the same target function, so that the commonality can be well captured by the shared part.
• Each sub-networks in APINN takes advantage of all training samples within the domain to prevent over-fitting.By contrast, sub-networks in XPINN can only utilize part of the training samples.
All of the benefits are justified empirically on various PDEs and theoretically in Hu et al. [2021] using the PINN generalization theory.More specifically, we prove the generalization bound for APINNs with fixed and trainable get networks.Since APINNs with certain gate networks can recover PINN and XPINN, they have the advantages of the two models due to their trainability and flexibility.It is shown that APINN enjoys the benefit of general domain and function decomposition, which reduces the complexity of the optimized networks to improve generalization.In terms of parallelization, APINN shares more data points as well as parameters than XPINNs, and thus can be more expensive than the XPINN method.

Figure 1 :
Figure 1: The APINN model structure.The input (x, t) is passed through the blue shared network h, which is then routed to m distinct subnets E 1 , • • • , E m (red), yielding the corresponding m outputs of subnets E i (h(x, t)).Subsequently, APINN outputs the weighted average of the m outputs of subnets based on the weights G(x, t) i (green), where G is also a network mapping (x, t) to the m-dimensional simplex ∆ m .The weights G(x, t) i satisfies the property of partion-of-unity, i.e., i G(x, t) i = 1.

Figure 2 :
Figure 2: The first example of a gating network in APINN: an upper domain (middle) and a lower domain (right).

Figure 3 :
Figure 3: The second example of a gating network in APINN: an inner domain (middle) and an outer domain (right).

Figure 4 :
Figure 4: The Burgers equation.Left: ground truth solution.Right: training points of XPINN.

Figure 5 :
Figure 5: The Burgers equation.Train loss and relative L 2 error.Blue: Adam optimization.Red: L-BFGS finetuning.Green: final convergence point.In this case, Adam can already train the model to convergence, so additional L-BFGS converges fast due to its stopping criterion.

Figure 6 :
Figure 6: The Burgers equation.Left: error plot of APINN.Right: error plot of XPINNv1.Note that APINN and XPINNv1 share the same colorbar.Compared to the error of XPINNv1, that of APINN is negligible.

Figure 7 :
Figure 7: The Burgers equation: APINN gate nets G 1 after convergence at the last epoch.That for the second subnet G 2 can be easily computed using the property of partition-of-unity G 1 + G 2 = 1.First row: those of APINN-X with two different random seeds.Relative L 2 errors = 7.541E-4, 8.034E-4.Second row: those of APINN-M with two different random seeds.Relative L 2 errors = 6.936E-4, 8.284E-4.

Figure 8 :
Figure 8: The Burgers equation: Visualization of the gating network G 1 optimization trajectory via four snapshots, at epoch = 0, 1E4, 2E4, 3E4, from left to right and from top to bottom.The value for the second subnet, G 2 , is easily calculated using partition-of-unity property gating networks, i.e., i G i = 1.

Figure 9 :
Figure 9: The Helmholtz equation.Left: ground truth solution.Right: training points of XPINN.
left, middle, and right, respectively.

Figure 10 :
Figure 10: The Helmholtz equation.Train loss and relative L 2 error during optimization.Blue: Adam optimization.Red: L-BFGS finetuning.Green: final convergence point.L-BFGS automatically stops since its convergence criterion is satisfied.

Figure 12 :
Figure 12: The Helmholtz equation: APINN-X gate nets G 1 after convergence at the last epoch.Their relative L 2 errors are similar.

Figure 13 :
Figure 13: The Helmholtz equation: visualization of the gating network G 1 optimization trajectory via six snapshots, at epoch = 0, 1E2, 2E2, 3E2, 4E2, and 5E2, from left to right and from top to bottom.That for the second subnet G 2 can be easily computed using partition-of-unity property gating networks, i.e., i G i = 1.

Figure 17 :
Figure 17: The Wave equation.Left: ground truth solution.Right: training points of XPINN.

Figure 18 :
Figure 18: The Wave equation.Train loss and relative L 2 error during optimization.Blue: Adam optimization.Red: L-BFGS finetuning.Green: final convergence point.L-BFGS automatically stops since its convergence criterion is satisfied.

Figure 19 :
Figure 19: The Wave equation.Left: error plot of PINN.Middle: error plot of XPINNv2.Right: error plot of APINN.

Figure 20 :
Figure 20: The Wave equation: APINN gate nets G 1 after convergence at the last epoch.That for the second subnet G 2 can be easily computed using partition-of-unity property gating networks, i.e., i G i = 1.First row: those of APINN-X with two different random seeds.Relative L 2 errors = 1.477E-3, 1.527E-3.Second row: those of APINN-M with two different random seeds.Relative L 2 errors = 1.055E-3, 1.315E-3.

Figure 21 :
Figure 21: The Boussinesq-Burger equation.Left and middle: ground truth solution for u and v. Right: training points of XPINN4 with four subdomains.

Figure 22 :
Figure 22: The Boussinesq-Burger equation.Train loss and relative L 2 error during optimization.In this case, Adam can already train the model to convergence, so additional L-BFGS converges fast due to its stopping criterion.

Figure 23 :
Figure 23: The Boussinesq-Burger equation.APINN4-X pretrained gate nets, with four-dimensional output for weighted averaging the four subnets.

Figure 25 :
Figure 25: The Boussinesq-Burger equation: visualization of trained gate networks G 1 of APINN-M (first row) and APINN-X (second row), after convergence, with two different random seeds for each model.Their relative L 2 errors are similar for the same type of APINNs.

Figure 26 :
Figure 26: The Boussinesq-Burger equation: Visualization of the four trained gate networks G(x, t) 1,2,3,4 in APINN4-X in one independent run, corresponding to the four subnets.

Figure 27 :
Figure 27: The Boussinesq-Burger equation: Visualization of the four trained gate networks G(x, t) 1,2,3,4 in APINN4-M in one independent run, corresponding to the four subnets.

Table 1 :
Results for the Burgers' equation.

Table 2 :
Results for the Helmholtz equation.

Table 4 :
Results for the Wave equation.

Table 6 :
Relative L 2 error for the function v in the Boussinesq-Burger equation.