An immersed boundary neural network for solving elliptic equations with singular forces on arbitrary domains

Abstract: In this paper, we present a deep learning framework for solving two-dimensional elliptic equations with singular forces on arbitrary domains. This work follows the ideas of the physicalinform neural networks to approximate the solutions and the immersed boundary method to deal with the singularity on an interface. Numerical simulations of elliptic equations with regular solutions are initially analyzed in order to deeply investigate the performance of such methods on rectangular and irregular domains. We study the deep neural network solutions for different number of training and collocation points as well as different neural network architectures. The accuracy is also compared with standard schemes based on finite differences. In the case of singular forces, the analytical solution is continuous but the normal derivative on the interface has a discontinuity. This discontinuity is incorporated into the equations as a source term with a delta function which is approximated using a Peskin’s approach. The performance of the proposed method is analyzed for different interface shapes and domains. Results demonstrate that the immersed boundary neural network can approximate accurately the analytical solution for elliptic problems with and without singularity.


Introduction
Nowadays, deep neural networks (DNN) have been used to solve a wide variety of problems [1]. For instance, there exist several algorithms based on DNN for weather forecasting [2], detecting defaulter users [3], financial forecast [4], COVID-19 prediction models [5], among others. These algorithms infer information about the behavior of these phenomena when a set of historical data is given. Recently, deep learning methods have been also applied to solve partial differential equations (PDEs) modeling different phenomena in physics, engineering, and the biological sciences by using residual-minimizing formulations [6][7][8][9][10]. The idea is to replace usual discretization strategies, such as finite-difference, finite-element or finite-volume methods, with neural networks in order to find the numerical solution using the known data coming from boundary conditions, initial conditions, or the differential equation itself.
Deep neural networks are inspired by biological neuron interconnections [11]. Basically, an artificial neural network is a function resulting from the composition of a finite set of elementary operations and coefficients set as a computational graph. Usually these elementary operations include the binary arithmetic operations and transcendental functions [12]. The coefficients are the values to determine using a set of known data, named training data, that receives as input. Thus, the deep neural network consists of an input layer, an output layer, and a set of hidden layers (two or more), each one containing a set or arbitrary neurons [13][14][15]. The idea is updating the coefficients of the neurons by minimizing the error between the known value of training data and the corresponding predicted values [16]. This iterative procedure is known as the training process.
In mathematical models based on PDEs, data is mostly coming from the boundary and initial conditions. Moreover, in real applications, the knowledge of this data is generally limited and very costly to get. However, PDEs are based on conservation laws, physical principles, and/or phenomenological behaviors that can be incorporated into the training process of the DNN. Raissi et al. [9] propose a class of universal function approximators named physical-informed neural networks to deal with these cases. Their neural network formulation differs from the standard DNN by involving the differential equations used to model the problem. This technique has shown benefits in solving partial differential equations with a lack of information. However, although a series of promising results were presented in [9], the authors remark that their work creates many questions concerning the range of applicability of the DNN and more work is needed collectively to set the foundations in this field. Moreover, results show that the performance of these techniques are directly related to the particular equation which is analyzed.
In the recent years, physical-informed neural networks have been applied in fluid mechanics [17,18], biomedical problems [19], among others. For instance, Sun et al. [18] provide a deep learning approach, using a fully-connected neural network, that models a fluid flow without using any labeled data, allowing reducing the computational costs derived from using several simulation data. Few works have been presented to solve two-dimensional (2D) elliptic equations with artificial neural networks on arbitrary domains [20][21][22][23][24][25][26]. For example, Koryagin et al. [24] provides PyDENs, an open-source easy mechanism to solve PDEs using neural networks. PyDENs allows to define and configure the solution of several PDE problems, that includes heat and wave equations. However, the analysis of the performance of their proposed DNN is still very limited. Motivated by this, in this paper we analyze the capacity of a deep learning framework for solving 2D elliptic equations. Extensive numerical simulations are presented in order to deeply investigate the performance of the DNN on rectangular and arbitrary domains. The accuracy is also compared with standard schemes based on finite differences. Furthermore, this paper shows that a classical method such as the immersed boundary method (IBM) can coexist in harmony with deep neural networks to solve elliptic equations with singularities. To the best of the authors' knowledge, there are not other works proposing a similar approach.
Elliptic equations with singular forces arise in problems where discontinuities in the solution and derivatives are expected across immersed interfaces such as fluid-structure interactions and two-phase flow systems. A well-known technique developed to solve problems with these characteristics is the immersed boundary method. It was initially introduced by Peskin [27,28] and developed to simulate cardiac mechanics and associated blood flow. However, the IBM has been extended to simulate a wide range of problems. For instance, some applications are: massive boundaries using penalty IBM [29] or D'Alembert force [30], porous boundaries [31], generalized IBM for torque in flexible rods [32], membrane transport and osmosis [33], stochastic IBM [34], variable density and viscosity fluids [35], among others [36,37]. The idea of the IBM is to regularize discontinuous interface problems, which typically produces low-order accuracy approximation at the interface.
The present work is aimed to solve 2D elliptic equations as data-driven solutions of PDEs. Thus, we seek for the unknown hidden solution of an equation with fixed parameters. The IBM is also introduced by the Peskin's approach to deal with equations with singular forces. One of the main advantages of this method is the low number of known data required on arbitrary locations of the boundaries. This flexibility can not be obtained using standard algorithms such as finite-difference, finite-element or finite-volume methods. Moreover, applications with arbitrary domains are directly implemented with minor modifications to the code. For rectangular domains, a random number of points are selected from uniform grids. However, in order to deal with arbitrary geometries, the location of the points are obtained as the vertex points of unstructured grids which has more flexibility in fitting complicated boundaries. All code and data-sets accompanying this manuscript are available on GitHub at https://github.com/ibmcimatmerida/IBMNN. This paper is organized as follows. The second section presents the governing equations. The third section introduces to the numerical method including a brief description of the physics-informed neural network and the IBM by Peskin's approach. Next, details of the implementation of the DNN algorithm is included in Section 4. The numerical results are presented in Section 5. Finally, last section includes the conclusions and future work.

Governing equations
Let us consider elliptic equations of the form where Ω is an arbitrary domain and the boundary, ∂Ω, is composed of the union of sections with Dirichlet, Neumann or Robin boundary conditions. There is also an irregular interface Γ contained in Ω across which sourceg may have a delta function singularity. We remark that for discontinuities to arise in the exact solution or its derivatives, there must be discontinuities or singularities present in the coefficients of Eq (2.1). Another possibility is that β and κ are continuous but the source termg has a delta function singularity along the interface [38]. This paper focuses on a deep neural network method for solving a two-dimensional elliptic equation given by where function J is defined as where X(s) and Y(s) are the parametric equations of a rectifiable curve Γ and C(s) is the source strength. By this we mean that J(x, y) is a distribution with the property that for any smooth test function φ(x, y). Thus, the analytical solution u will be continuous but the normal derivative will have a jump at the interface of magnitude C(s), thus where the superscript indicates the limit of the normal derivative at each side of the interface. We remark that the interface Γ can be an arbitrary piecewise smooth curve lying in Ω. Figure 1 shows an example of an immersed interface in an arbitrary domain. It is important to remark that Γ is not necessarily closed or even connected. Moreover, we do not require an explicit parametrization of this curved as a numerical discretization over the points will be directly applied.

DNN approximation
Our goal is to develop a DNN that can be used together with the known boundary data to obtain accurate approximations of Eq (2.2). For equations with singular forces, we follow the ideas of the immersed boundary method together with physical-inform neural networks to reach this goal.

Physics-informed neural networks
In this paper, elliptic equations are solved by employing a DNN based on the physical-informed neural networks proposed by Raissi et al. [9]. It is based on a model of data-driven solutions of partial differential equations which all parameters are given and u is considered as the unknown hidden solution of the system.
We proceed by approximating u(x, y) by a deep neural network as follows where W and b are two sets of parameters of the neural network to determine. More details about this DNN will be given in the next section. We also consider f as the left-hand-side of Eq (2.2) as follows which results in a physics-informed neural networkf (x, y; W, b). Note thatf not only includes the solutionũ but also its partial derivatives. These derivatives are obtained by automatic differentiation [12]. In this procedure, given a set of parameters (W, b), the derivatives of the neural network can be known by combining the derivatives of the constituent operations through the chain rule. This gives the derivative of the overall composition. An efficient algorithm to obtain these derivatives is referred as backpropagation [16].
In the absence of any interface, parameters W and b in Eq (3.1) can be obtained by using only two finite sets of points: the training and collocation points.
• First, the training points {x i u , y i u } N u i=1 are located at the boundary ∂Ω u which the data is already available for Dirichlet boundary conditions: • Second, the collocation points {x j f , y j f } N f j=1 are located inside the domain Ω (boundary included) which the differential equation represented by f is known: Here N u and N f are the number of training and collocation points, respectively. We remark that these points can be arbitrarily selected as shown in Figure 1. We denote a particular point as x D , x N and x f .
where E u corresponds to the error of the training boundary data {x i u , y i u , u i } N u i=1 for Dirichlet boundary conditions given by and E f enforces the solution of the elliptic equation by using Eq (3.2) at the collocation points j=1 and taking f j = 0 as follows It is important to remark that Neumann and Robin boundary conditions can be also considered in our domain by including extra errors to the loss function Eq (3.3) as done for Dirichlet conditions in Eq (3.4). For instance, Neumann boundary conditions at the set of training points {x i u n , y i where E u n corresponds to the error of the training boundary data {x i u n , y i u n , u i n } N un i=1 given by and where u i n corresponds to the known values of the normal derivative at the boundary.

Immersed boundary method by Peskin's approach
In the case of elliptic equations with singular forces, note that the singularity is already incorporated at the DNN by the definition of f in Eq (3.2). Thus, the corresponding right-hand side values at collocation points (x j f , y j f ) are evaluated as follows where J is an approximation of J. In this paper, this approximation follows the IBM by Peskin's approach.
The main idea of IBM is to replace the delta function by some discrete approximation d h with support related to a discretization parameter h. In a standard uniform rectangular mesh, h would correspond to the mesh width. In this paper, we employ the Peskin's discrete delta function Thus, the delta is replaced by a smoothed function which can be easier to implement. However, this function becomes sharper as h becomes smaller making it harder to approximate. This formulation also approximates the interface by a set of N s points (x s ) k = (X(s k ), Y(s k )), k = 1, 2, . . . , N s , and replace the integrals in Eq (2.3) by a discrete sum over the interface. If an arc-length parametrization is given, then this approximation can be done as follows where ∆s k = ∆s can be directly calculated from the discretization interval. Otherwise, ∆s k representing the discretization step on the interface can be calculated as (3.11)

Predicted solution and errors
Once the optimization process to obtain W and b is finished, the resulting pairũ(x, y; W, b) and f (x, y; W, b) must approximate elliptic Eq (2.2) with the corresponding boundary and interface conditions. It is important to remark that during training (optimization process) residual of the Eq (3.3) will not be exactly zero, and therefore our approximation will have the accuracy of the residual loss.
The DNN methodology incorporates many decisions to obtain an accurate numerical solution. The source of errors are coming from the DNN configuration such as the number of layers, neurons and activation function, the number and distribution of training and collocation points, the optimizer, and the choice of the loss function. The approximation of elliptic equations with singular forces will also depend on the number and location of interface points x s as well as parameter h. In this work, we take enough interface points such that the integral approximation does not significantly contribute to the global precision of the problem. Thus, the major influence in the precision of IBM will be given by the selection of parameter h.
Finally, we remark that the predicted solutionũ is given as a function and not as a set of approximated points. Thus, an arbitrary set of predicted points can be selected to test the precision of the proposed method. In this paper, we refer to this set as x p . They are usually different from x u and x f and they can be selected at any position of the domain, see Figure 1.
In the next section, we will describe the DNN and the minimization algorithm.

Methodology
We are interested in approximating the solutions to Eq (2.2) with deep neural networks. In terms of implementation, there are several decisions about the DNN formulation that should be taken into account. In this section, we briefly describe two of the most important parts: the neural network and the optimization algorithm.

Deep neural network
The DNN model of L layers consists of an input layer, an output layer, and a set of L − 1 hidden layers, each one containing N arbitrary neurons or nodes. The output layer is not counted in L. The received input traverses the network from the input layer to the output layer, through the hidden layers. When the signals arrive in each node, an activation function φ : R → R is used to produce the node output [13][14][15]. Neural networks with many layers (two or more) are called deep neural networks.
In this work, the neural network is described in terms of the input x ∈ R 2 , the outputũ ∈ R, and an input-to-output mapping x →ũ, where x = (x, y) is any selected point in Ω. For any hidden layer , we consider the pre-activation X ∈ R N and post-activation U ∈ R N +1 as and respectively. Thus, the activation in the -th hidden layer of the network for j = 1, . . . , N +1 , is given by [10]: where for k = 1, . . . , N . Here, W k and b are the weights and bias parameters of layer . Activation functions φ must be chosen such that the differential operators can be readily and robustly evaluated using reverse mode automatic differentiation [12,39]. Throughout this work,we have been using relatively simple deep feed-forward neural networks architectures with hyperbolic tangent activation functions. Results show that this function is robust for the proposed formulation.
It is important to remark that as more layers and neurons are incorporated into the DNN the number of parameters significantly increases. Thus the optimization process becomes less efficient. The final size of the bias and weights vector are respectively, where N L+1 = 1. Figure 2 shows an example of the computational graph representing a DNN as described in Eqs (4.1)- (4.4). This graph has the weights or biases on each edge labeled. When one node's value is the input of another node, an arrow goes from one to another. In this particular example, we have This is, the total number of layers is L = 6. The first layer (l = 1) has only two elements (dark gray nodes), N = 3 for = 2, 3, . . . , L, and the last layer (not counted in L) has a single neuron (only one solution). Bias is also considered (light grey nodes), there is a bias node in each layer, which has a value equal to the unit and is only connected to the nodes of the next layer. Although, the number of nodes for each layer can be different; the same number has been employed in this paper for simplicity. For the residual network, the derivatives off are obtained by applying automatic differentiation. The code has been implemented in Python using Tensorflow [40]; currently one of the most popular and well documented open-source libraries for machine learning computations.

Optimization algorithm
Deep neural networks (see Eqs (4.1)-(4.4)) are trained iteratively, updating the coefficient of the neurons by minimizing the error (see Eq (3.3)) between the target value of the inputs and the predicted values. Thus, the network parameters can be obtained by minimizing the loss function as follows where the vector W b = [W, b] represents the total weights and bias of unknown parameters.
There are many algorithms used to solve this minimization problem. Moreover, the accuracy of the final numerical solution significantly depends on the residual loss of the chosen optimization algorithm. In this work, we use the Limited-memory Broyden-Fletcher-Goldfarb-Shanno with Boundaries (L-BFGS-B) method [41]. It is an optimization algorithm in the family of quasi-Newton methods using a limited amount of computer memory. The algorithm starts with an initial estimate of the optimal value, W 0 b = [W 0 , b 0 ], and proceeds iteratively to refine that estimate with a sequence of better esti- . . until tolerance is reached. We remark that each iteration requires one o more evaluations of the loss function.
Similar to thef function, the required gradients for the optimization process are obtained using backpropagation provided by TensorFlow.

Results
Now, we present several numerical tests to analyze the applicability and accuracy of the proposed neural network on elliptic equations without any singular force. These tests allow us to analyze in detail the DNN predictions without IBM. In this section, we include the following three examples: • Example 1 studies the Laplace equation with a smooth solution. We perform an analysis of the number of training and collocation points, as well as the number of hidden layers and neurons required to obtain accurate solutions. The predicted solution is initially calculated on rectangular domains and later applied to arbitrary ones. Finally, the predicted solutions are compared with the numerical solutions of standard methods such as finite difference and finite element.
• Example 2 is designed to test the DNN using different boundary conditions. Thid example studies a mixed boundary value problem with Dirichlet and Neumann conditions. The network parameters are also studied here; however, we limit our analysis to rectangular domains.
• Example 3 analyzes the capacity of the DNN for elliptic equations with more complex configurations. This example studies the predicted solution of a Helmholtz equation with variable coefficients β(x, y) and κ(x, y) on a rectangular domain. Here we vary the number of training and collocation points as well as the network architecture.
For each case, analytical solutions are used to quantify the numerical error of the proposed formulation. Besides particular sections which the DNN configurations are analyzed, in the rest of our experiments, we always learn the latent solution u by training all 921 parameters corresponding to a 4-layer deep neural network. The first layer contains 2 neurons and each hidden layer contains 20 neurons. The hyperbolic tangent activation function is applied for these simulations. The experiments were executed on a server with Intel Xeon E5-2690 CPU 3.0 GHz, Ubuntu 18.04 (64-bits), 20 cores, 256 GB RAM and a video card NVIDIA Tesla K40. The software was developed using the Python 3.6 programming language and the TensorFlow 1.14 library.

Example 1: Laplace equation
As a first example, we proceed by solving the two-dimensional Laplace equation with smooth solution and Dirichlet boundary conditions as follows where u 0 is calculated from the analytical solution given by The predicted solution is initially analyzed in the rectangular domain Ω = [−1, 1] × [−1, 1], and later on arbitrary domains.

Predicted solution on rectangular domains
The predicted solution is initially calculated in the domain Ω = [−1, 1]×[−1, 1] using a (N+1)×(N+ 1) rectangular grid with different number of subdivisions N. The total number of training points (N u ) is randomly chosen from the boundary data, and the number of collocation points (N f ) is randomly sampled using a space filling Latin Hypercube Sampling (LHS) strategy inside the domain. Figure 3 shows the analytical and predicted solution using N u = 10 and N f = 25. The results demonstrate that the proposed method approximates accurately the exact solution even for few number of training and collocation points. This is expected as previous works have shown that physics-informed neural networks gives accurate results for smooth solutions of PDEs [9]. It is important to remark that the results in Figure 3 were obtained using the 36 points of the 6 × 6 rectangular grid. It corresponds to N = 5 subdivisions in each direction of the domain. However, the predicted solution and the corresponding error measurements should be independent of this choice once the training has been performed. Table 1 shows the relative L 2 -norm and L ∞ -norm error of the same latent solutionũ(x, y; W, b) using different grid resolutions N. Results demonstrate that there is not a significant difference between the norm errors of two different grids, as expected. Table 1. Example 1: Relative L 2 -norm and L ∞ -norm error between the predicted and the exact solution for N u = 10 and N f = 25 varying the grid resolution of the predicted solution.  The performance of the deep neural network is further analyzed for different number of training and collocation points. These numbers are chosen as N u = 2N and N f = N 2 . The absolute errors are displayed in Figure 4 for N = 5, 10, and 20. Stable and accurate results are obtained in each case. As expected, the absolute errors decrease as the total number of training data is increased. Note that the largest errors are not necessarily located close to the highest values of the exact solution. Moreover, the maximum error is located in regions close to the boundary. Although Dirichlet boundary conditions are applied, non-zero errors can be expected at the boundaries. This is related to the minimization of the loss function and DNN architecture which residual E u may not be exactly zero. These errors are also related to the low number of training points and its random location selected for each test case. Table 2 shows the relative L 2 -and L ∞ -norm error for different number of training and collocation points using the current 4-layer network architecture. In general, the prediction is more accurate as the total number of training data N u increased. Note that for many collocation points N f , such as 400 and 1600, the errors stop decreasing for N u close to N as shown in Figure 5. On the other hand, few number of collocation points requires more training data to reach a similar precision, as expected. We remark that the performance of the neural networks also depends on the selected configuration. Now, the DNN is analyzed for different architectures including different number of hidden layers and neurons. We keep the same hyperbolic tangent activation function for all these simulations. Figure 6 shows the resulting relative L 2 -norm and L ∞ -norm for different number of hidden layers, and different number of neurons per layer, while the total number of training and collocation points is kept fixed to Table 2. Example 1: Relative L 2 -norm and L ∞ -norm error between the predicted and the exact solution for different boundary training data N u , and collocation points N f .  N u = 40 and N f = 1600, respectively. We observe that accurate approximations are obtained for all the network architectures. Moreover, as the number of neurons is increased, the predictive accuracy is increased; there is not a significant change after 20 neurons. Note that the number of layers seems to stabilize the accuracy in this example (not many fluctuations). The number of layers plays an important role in the computer power required for the algorithm as shown in Table 3. Note that the solution is not only more accurate using two layers and 20, 30 or 40 neurons than using eight layers and 10 neurons; but also the simulation of the first case is five times faster the second one.

Predicted solution on arbitrary domains
The following experiments are designed to show the capacity of the DNN for solving the elliptic equations on arbitrary domains. Two different domains are selected as shown in Figure 7. The first domain is a circle of radius one and center (0, 0). The second domain considers an arbitrary domain inspired on a realistic configuration in flows such as rivers, fluvial or estuary zones. The boundaries of the square are deformed by cutting the sides using two circles of radius 1 and 1/2 located at the topright and top-left corners, and an ellipse of mayor and minor axis equal to 7/4 and 1 at the bottom-right corner. The training and collocation points are located at the vertex points of unstructured meshes. A     training points is N u = 40 for all cases and the number of collocation points is N f = 1126 and 512 for the circle and general domain, respectively. We also approximate the solution using the same 4-layer deep neural network used for the rectangular domains. As expected, the results demonstrate that the proposed method approximates accurately the exact solution. Note that the largest errors are located in regions close to the boundary. Although not shown here, more collocation points do not contributes to obtain smaller errors. However, the behavior of the error is related not only to the number of training points, but also to its specific location. It is because the arbitrary geometry of the problem plays an important role in the approximation. If there is not boundary information in some particular region, then the error will become larger there. For instance, this can be observed in the top boundary of the general case. The DNN is also analyzed for different number of training points. Table 5 shows the relative L 2 -and L ∞ -norm error for the two cases. The solution is calculated by fixing N f on Mesh 1 and varying N u . The errors between the predicted and analytical solution are calculated using Mesh 2. Results are more accurate as N u increases until reaching its maximum precision. It is expected as more data at boundary describes more precise the irregular configuration of the domain. Note that more data N u is required to obtain more accurate solutions as the complexity of the domain increases. Finally, it is important to remark that the maximum precision for all domains are similar, including the rectangular domain, as shown in Figure 9. It means that if enough information is provided by the training and collocation points, then the DNN can approximate the same solution independently of the domain shape.

Accuracy
As previously commented, there is a maximum precision of the predicted solution. Thus, the algorithm does not obtain more accurate solutions either increasing N u or N f or changing the network architecture. Thus, the main source of error is coming from other components of the neural network approximation such as the loss configuration, activation function or the optimization algorithm. In the current simulations, the minimum norm errors are close to 10 −4 and 5 × 10 −4 using the L 2 -and L ∞ -norm, respectively. These results hold for either rectangular or arbitrary domains, see Tables 5  Table 5. Example 1: Relative L 2 -norm and L ∞ -norm error between the predicted and the exact solution for different boundary training data N u . and 9. In order to get an idea of how much accurate is the approximation of the DNN, we compare our results with the numerical solutions of the standard schemes such as finite-difference (FDM) and finite-element (FEM) methods. First, we compare our results on rectangular domains using the numerical solution of the standard second-order central FDM as shown in Table 6. An estimated order is computed as Order = log E N 1 /E N 2 / log (N 1 /N 2 ), where E N 1 and E N 2 are the errors for resolutions N 1 and N 2 , respectively. Note that the accuracy of the deep neural network is close to the one given by the FDM using a 15 × 15 mesh grid. The neural network solution does not reach a precision similar to a second-order accurate method. This is a clear disadvantage of this methodology. However, the number of known data re- quired at the boundary is significantly low. Moreover, this prior data knowledge is not fixed at some determinate location. This is definitely an advantage of this methodology that is not found using standard methods such as the finite-difference method. Furthermore, the deep neural network framework can be applied either to rectangular or arbitrary domains without any extra effort. We also compare the accuracy of the DNN over the proposed circular domain with the numerical solution of the FEM with linear elements as implemented by Alberty et al. [44]. The minimum DNN errors for this case are 1.05 × 10 −4 and 6.07 × 10 −4 using the L 2 -and L ∞ -norm, respectively (see Table  5). Table 7 shows the same norms of the FEM using triangular meshes of different mean interior lengths (∆). Note that the accuracy of the DNN is close to the one given by the FEM using Mesh 1. As previously commented, the neural network solution does not have the consistency property of a scheme such as the proposed FEM. Thus DNN errors do not continue decreasing as more points are used; it is again a disadvantage of this methodology. However, prior known data used here is not limited to a well-distributed triangular discretization over the whole domain as in the FEM. This is a notable advantage of DNN methodology in comparison with standard methods for arbitrary geometries such as finite-element or finite-volume method. Although the results are not shown here, we have similar conclusions when we compare errors between the DNN and FEM for the general domain.
To improve the precision of the algorithm, we analyze the performance of the neural network by optimizing the error using different loss functions during the training of the neural network. Several loss functions were tested; however, most of them do not converge to a feasible solution. Overall, only four of them give us feasible solutions: the present mean square error (MSE), the error based on the L 1 -norm, based on the L 1 -norm and the Logarithm of the hyperbolic Cosine (LogCosH) [42]. Although the results are not presented here, there is not a significant difference between the results of the tested loss functions. On the other hand, the optimization algorithm plays a more significant role. Let us consider the case with N u = 80, N f = 6400 and the initial 4-layer network. After 7.77 seconds of training with L-BFGS-B, the numerical solution is obtained with an error of 3.16 × 10 −4 and 1.50 × 10 −3 for the L 2 -and L ∞ -norm, respectively. If we change the optimizer to the Sequential Least Square Quadratic Programming (SLSQP) [43], the L 2 -and L ∞ -norm errors are significantly reduced to 2.32 × 10 −5 and 1.33 × 10 −4 , respectively. However, the training time is around 343.25 seconds. It means that the SLSQP optimizer is almost 45 times slower than L-BFGS-B method. Note that besides the east edge of the square which Neumann conditions are used, Dirichlet conditions are applied at all boundaries. Both Neumann and Dirichlet training points (N u n and N u ) are randomly selected from the boundary data. The exact solution of the considered problem is given in [45] as:

Example 2: Laplace equation with mix boundary conditions
u(x, y) = sinh(πy/2) sin(πx/2) sinh(π/2) . (5.8) Results using N u n = 30 (Dirichlet) and N u = 10 (Neumann) training points are plotted in Figure 10. We apply 1600 collocation points and the same DNN arquitechture as previous examples: 3 hidden layers and 20 neurons per layer. As expected, the proposed method approximates accurately the exact solution. Note that the maximum error is located close to one of the bottom corners where Dirichlet boundary conditions are applied. Thus Neumann boundary approximations are more accurate than the Dirichlet ones. Results also show the errors do not significantly decrease as N u n increases, as shown in Figure 11. Although, the results are not shown here, a similar behavior between Neumann and Dirichlet conditions was also observed using different elliptic equations such as the Laplace equation with exact solution u(x, y) = e x cos(y) of Example 1.   Table 8. For this example, the minimum L 2 -and L ∞ -norm errors are 2.95 × 10 −4 and 6.02 × 10 −4 , respectively. They are obtained using N u = 160 and N u n = 10. We remark that the performance of the DNN is not significantly improved with more number of collocation points. Table 9 shows the errors for N u = 160 but N f = 6400 instead of N f = 1600 used in Table 8. Note that both tables give similar approximations. Results also confirm that few number of Neumann training points are required to reach the maximum accuracy of the method. For instance, accurate results are obtained using N u = 160 and only N u n = 5 training points.
The performance of the DNN is also analyzed in terms of number of hidden layers and neurons and the results are shown in Table 10. We keep the same hyperbolic tangent activation function and fix N u n = 10, N u = 160 and N f = 1600 for all these simulations. Accurate solutions are obtained for all Table 8. Example 2: Relative L 2 -norm and L ∞ -norm error for different number of Dirichlet and Neumann training data using N f = 1600.  Table 9. Example 2: Relative L 2 -norm and L ∞ -norm error for different number of Dirichlet and Neumann training data using N f = 6400.

Example 3: Elliptic equation with variable coefficients
For this test, we consider the elliptic equation with variable coefficients and Homogeneous Dirichlet boundary conditions over the region Ω = [−1, 1] × [−1, 1] given by (β(x, y)u x ) x + (β(x, y)u y ) y + κ(x, y)u = g(x, y), (x, y) ∈ Ω, u(x, y) = 0, (x, y) ∈ ∂Ω, (5.9) where β(x, y) = sin(x + y), κ(x, y) = −(x 2 + y 2 ). In this example, g(x, y) is selected according to the exact solution given by u(x, y) = sin(πx) sin(πy). (5.10) Note that κ is always negative, thus this equation corresponds to a monotone Helmholtz-type equation with variable coefficients. The diffusion coefficient was selected such that β varies for each point of the domain. The analytical solution, predicted solution, and absolute errors using N u = 40, N f = 1600 and the current 4-layer network architecture are displayed in Figure 12. As expected, accurate predictions are also obtained for elliptic equations with variable coefficients. Note that the maximum errors is located at the left-top corner with none training points. In comparison with Example 1, the predicted solution of this example requires more training points at the boundary to obtain accurate solutions.  Table 11 shows the relative L 2 -and L ∞ -norm error for different number of training and collocation points. As expected, the prediction is more accurate as the total number of training data and collocation points increased. Opposite to Example 1, inaccurate approximations are obtained using only 25 collocation points. One of the most accurate solutions are obtained using N u = 40 and N f = 1600. The errors are 7.44 × 10 −4 for the L 2 -norm and 2.6 × 10 −3 for the L ∞ -norm. Table 11. Example 3: Relative L 2 -norm and L ∞ -norm error for different boundary training data and collocation points using 3 hidden layers and 20 neurons per layer. On the other hand, Table 12 shows the performance of the DNN using different number of hidden layers and neurons. We keep the same hyperbolic tangent activation function and fix N u = 40 and N f = 1600 for all these simulations. Note that accurate solutions are obtained even for 2 hidden layers and 10 neurons. There is not a significant improvement of the solution if we increase the number of these parameters. However, precision may be lost if many layers are applied, such as the results obtained using eight hidden layers and N e = 30 per layer. Table 12. Example 3: Relative L 2 -norm and L ∞ -norm error for different number of hidden layers and neurons per layer using N u = 40 and N f = 1600.

Results with singular forces
In this section, we present several numerical tests to analyze the applicability and accuracy of the proposed DNN with the IBM on elliptic equations with singular forces. As described in Section 2, this kind of examples includes the term at the right-hand side of the elliptic equation which allows solutions with discontinuous derivatives. Thus, the solution of the problem is continuous at interface Γ; however, the derivative is discontinuous in the normal direction. Moreover, the magnitude of this discontinuity is known and is given as the jump condition: C(s) = ∂u ∂n Γ . Next, we describe the predicted solutions obtained after performing simulations for a different number of training and collocation points. Discussion about the experiments are presented in the following two examples: • Example 4 presents a Poisson equation with a circular interface Γ. In this example, the derivative in the normal direction is discontinuous with a constant jump. The predicted solution is analyzed not only in terms of N u and N f but also of the number of interface points, N s . Moreover analysis of Peskin's parameter h is done. Rectangular and arbitrary domains are tested here. Comparisons between predicted and numerical solutions of standard methods are also given at the end of this example.
• Example 5 is designed to test the capacity of the DNN to solve more challenging test cases with IBM. Here, the interface has a bubble shape solution and the jump condition is not constant.

Example 4: Poisson equation with a circular interface
This example deals with an elliptical equation with singular source given by where Γ is the circle (x − x 0 ) 2 + (y − y 0 ) 2 = r 2 0 . Dirichlet boundary conditions are applied according to the exact solution given by where r = (x − x 0 ) 2 + (y − y 0 ) 2 . Note that the solution is continuous at the interface. However, the derivative in the normal direction is discontinuous satisfying a constant jump condition: ∂u ∂n Γ = 1 r 0 = 2, where n is the outward normal unit vector at the interface.
An arc-length parametrization is considered for the interface. For the interface discretization, we take N s points on Γ and ∆s = (2π/N s )r 0 . It is important to remark that we do not have a direct reference to set the Peskin's parameter h. In the standard IBM based on finite differences, h = ∆x has shown to be a proper choice [38] where ∆x is the step size of the discretization. However, there is not a mesh grid in the DNN. Instead, we have a set of points randomly distributed in the domain. In order to deal with this issue, h is initially chosen according to ∆s which depends on the number of interface points. Thus, the forcing function will become sharper as more points on the interface are applied. An analysis about the selection of h will be given later in this section.

Predicted solution on rectangular domains
For the following simulations, we consider Ω = [−1, 1] × [−1, 1] and the circumference with centre (x 0 , y 0 ) = (0, 0) and radius r 0 = 1/2. We select a number N as a reference resolution of the problem. Thus for the number of training points, N u = N is randomly chosen from the boundary data. For number of collocation points, N f = N 2 is sampled using a space filling Latin Hypercube Sampling strategy inside the domain as previously done in Example 1. The predicted solution is always calculated using the points of the (N + 1) × (N + 1) rectangular grid. For the interface points, we take N s = N. In the numerical experiments, we have found that beyond this point, increasing the number of points on the interface gives little improvement in the solution. For the discrete delta function, we take h = ∆s. The analytical solution, predicted solution, and forcing function for N = 80 is shown in Figure 13. It is found using the same 4-layer DNN of previous examples and h ≈ 0.04. The results demonstrate that the proposed method approximates accurately the exact solution.  Figure 14 shows the results for three sets of points corresponding to N = 40, 80 and 160. Note that the IBM initially dominates the global accuracy of the method because the largest errors are located in regions close to the interface. However, as more points are used, the errors on the interface become smaller and the global error is mainly determine by the number of training points and its random location at the boundary. A comparison of the predicted and exact solutions at y = 0 is presented in Figure 15(a). Note that the DNN accurately approximates the solution close to the interface as more points are considered. Figure 15(b) presents the norm errors for resolutions ranging from N = 10 to N = 160. The effect in the global accuracy of IBM and DNN can be clearly noticed here. The behavior of the error before and after N = 90 is different. The first section has a smooth decreasing tendency; meanwhile, the second part becomes stable and oscillating as reported in previous examples without interface. It means that the error will not significantly decrease even if more points are considered. Moreover, results show that resolutions at the first section follow a first-order approximation as the standard IBM based on finite differences, as shown in Table 13.   More details about these errors can be found in Table 14. Results show that the error stabilizes at values close to 2×10 −3 and 2×10 −2 for the relative L 2 -and L ∞ -norm, respectively. Note that h = 0.3141 (N = 10) does not obtain an accurate approximation to the problem as the forcing function is not sharp enough. On the other hand, if we keep increasing N, inaccurate solutions are obtained because h is too small. Finally, we remark that as the number of points is increased, more computational time is required in the training process. For this example, a single simulation takes around 100 seconds of training. We can compare this with the 6 seconds of CPU time used for the simulations of Example 1. The performance of the DNN with IBM is also analyzed for different number of training and collocation points. Table 15 shows the relative L 2 -norm and L ∞ -norm error for this analysis using the 4-layer network architecture. The latent solution is calculated by fixing all parameters except N u . Four cases are presented: N = 20, N = 40, N = 80 and N = 160. We remark that the number of interface points, N s , and h are also fixed according to N. Results show that there is not a significant difference in the errors for N u ≥ N. Moreover, the maximum precision of the DNN is already reached for N = 80 (N f = 6400, N u ≥ 80). Note that the results are already very similar for N = 160 (N f = 25600, N u ≥ 80). We observe that although the errors are bigger, the DNN still calculates correctly the solution of the differential equation using only N u = 10. Figure 16 shows the case for N = 80. Note that accurate results are obtained even if there is not any boundary point at the west boundary. This is a clear advantage of this methodology; in particular, if limited data is available.   Now, we investigate the effect of the number of interface points and the size of h in the approximation. As discussed in Section 3.2, the interface points is mainly to approximate the integral over the interface. In this case, the interface has a circular shape and few points are enough to obtain accurate approximations. Table 16 shows the error analysis varying the number of interface points. In all simulations, besides the number of training and collocation points, parameter h is fixed according to the example N by taking h = (2π/N)r 0 . Numerical results show that beyond N s = N, increasing the number of points on the interface gives little improvement in the solution.  As previously commented, there is not an obvious choice to select Peskin's parameter h. Figure 17 shows the predicted at y = 0 for different h values. It is clear that h = 0.2 is too big to capture the singularity close to the interface and h = 0.006 is too small to give accurate results. Note that h equal to ∆x = 2/N and ∆s = π/N s give accurate results; however the solution for h = ∆x fails to approximate values inside the interface. For more details, Figure 18 shows the norm error analysis by varying h for different N cases (N u = N, N f = N 2 , and N s = N). Results confirm that the correct selection of h plays an important role in the final approximation. The error decreases as h becomes smaller; however, there is a breaking point where the simulations begin to give inaccurate results. As the DNN accuracy depends on the number of training and collocation points, both ∆x and ∆s look like plausible choices for h. However, once the minimum error of DNN is reached, the choice h = ∆x is located in the region where the errors begin to increase. On the other hand, h = ∆s is located closer to minimum error.

Predicted solution on arbitrary domains
To demonstrate the capacity of the DNN and IBM on arbitrary domains, let us consider the interface problem given by Eqs (6.1) and (6.2) with two irregular domains. First, the problem is simulated in the circle with radius 1 and Γ is a circle of r 0 = 1/2 and (x 0 , y 0 ) = (0, 0). Second, we use the general domain given in Example 2 and Γ is the circle of r 0 = 1/4 and (x 0 , y 0 ) = (0.3, 0.3). In these simulations, N f is selected according to Mesh 1 and N u = 32. As the previous example, we should decide how to choose N s and h. In this case, they are selected according to the mean interior length of the triangular mesh. For Mesh 1, this value is close to ∆ = 0.0625. Thus, the number of interface points and the Peskin's parameter are chosen as N s = 2/∆ = 32 and h = ∆, respectively. For the second case, we expect that half value of h will give more accurate results as the interface radius is only half of the first case.
The predicted solution and absolute errors for the two examples are shown in Figure 19. For h = ∆, large errors are accumulated around the interface, thus the maximum error is due to IBM. For the circular case, inaccurate solutions are obtained for h = ∆/2. On the other hand, for the second case, the absolute errors are larger than the first case using the same h = ∆. Moreover, the numerical solution is still accurate for ∆/2. The interface is reduced to half of radius, thus these results are expected.  Table 4.
More details about the predicted solution can be found in Figure 20 and Table 17. The figure shows   On the other hand, the maximum precision of the DNN is reached with an error of 2 × 10 −3 and 2 × 10 −2 for the relative L 2 -and L ∞ -norm, respectively. To get an idea of how much accurate is the approximation of the DNN, we compare our results with the numerical solution of the standard IBM based on a finite difference method using a rectangular domain. Table 18 shows the results presented for Leveque and Li [38] for the same problem. For the discrete delta function method, they consider N s points on the interface Γ, where N s = 2/∆x = 2/∆y is the also number of uniform grid points in each direction; and h = ∆x. Note that the standard IBM for N = 20 is not accurate enough to reach the first order of accuracy of the model; however, the DNN does. Both formulations have similar accuracy for N = 40 and N = 80. However, as the limit of the accuracy of the predicted solution is reached, the performed of the classical approximation is more accurate than the proposed solution, as expected. Finally, we propose a new and more challenge example to test the proposed immersed boundary neural network. Here, the solution is continuous with a non-constant normal jump condition on the interface. Moreover, none circular interfaces can be chosen with an available analytical solution. In this example, the interface geometry resembles the bubble shape transition of the rising air bubble immersed in water [46,47].
In this example, the Poisson equation is selected with a singular source term along Γ as follows u xx + u yy = g + Γ C(s)δ(x − X(s))δ(y − Y(s))ds. Thus, the right-hand side g is given by 12(x 2 + y 2 ) − 4(a + b) if r > r 0 . (6.5) The interface, Γ, corresponds to the closed curve r = r 0 , (6.6) where and r 0 = z 2 0 − 2ab − (a + b) 2 /2. (6.8) The right-hand side integral now contains a non-constant C function which is defined according to the exact solution. We remark that the solution is continuous at the interface. However, the derivative in the normal direction is discontinuous satisfying the following jump condition C(s) = ∂u ∂n Γ = (4x 3 − 2(a + b)x) 2 + (4y 3 − 2(a + b)y) 2 . (6.9) In this example, we fix the parameters a = 1/2 and b = 1/4. Note that different interface shapes can be obtained by varying z 0 . This curve can be parametrized as follows (x(θ), y(θ)) = √ r 0 cos(θ) + (a + b)/2, √ r 0 sin(θ) + (a + b)/2 . (6.10) where 0 ≤ θ < 2π. Note that this is not an arc-length parametrization. A square domain is selected with different interface configurations. Dirichlet boundary conditions are applied according to the exact solution. In the present simulations, three different interfaces are used. They correspond to z 0 = 0, 0.15 and 0.3. As the value of z 0 increases, the round shape is lost and the interface has a flat region as shown in Figure 21. Thus, to correctly approximate the integral over Γ, we consider 160 points on the interface. The number of training and collocation points are set as N u = 160 and N f = 1600, respectively. The DNN architecture is fixed to 4 layers with 40 neurons per hidden layer. Figure 21 shows the numerical solution and absolute errors for the three different interfaces. In all simulations, the DNN is capable to capture correctly the solution. Results also show that the numerical solution becomes less accurate as the interface becomes more irregular. The global accuracy of the problem is affected for several parameters of the DNN formulation. First, as analyzed in the previous examples, the choice of h significantly contributes to the precision of the method. We expect different h values for different interfaces. For instance, the Peskin's parameter for the first two cases is h = 1/40 = 0.025 and for the third case h = 1/80 = 0.0125. Second, the number of training points also plays an important role in the approximation of this example. In particular, more errors are found for the cases with an interface close to the boundary. Note that more N u points than other examples are used here. In the first two cases, the largest absolute errors are equally distributed between the interface and the boundary. It shows that the DNN recover the solution of the problem with high precision. However, for the third case, the errors are bigger and located in regions close to the boundary.
More details about the precision of the numerical solution can be observed in Figure 22. This shows the exact solution, predicted solution, and absolute error at x = 0.5 and y = 0.5 for z 0 = 1.5. Note how the maximum error comes from the interface and boundary approximation as expected. However, the numerical approximation fits nicely the exact solution.

Conclusions
In this paper we have analyzed the capacity of a deep learning framework for solving twodimensional elliptic equations on arbitrary domains. The source term considers singularities with a delta function on an interface. This work follows the ideas of the physical-inform neural networks to approximate the solutions on arbitrary domains and the immersed boundary method to deal with the singularity. The results demonstrate that the proposed method approximates accurately the analytical solutions for elliptic problems with and without singularity. For singular forces, the choice of the Peskin's parameter significantly contributes obtaining accurate solutions. As expected, the method is more accurate as the number of training and collocation points are increased. However, there is a limit in the precision of the method even if more points are included. The source of this error are coming from other components of the approximation such as the optimization algorithm. Although the numerical results are not as precise as other classical methods, such as finite-difference scheme, one of the main advantages of this method is the low number of known data and their arbitrary location required at the boundaries. This flexibility can not be obtained using finite-difference, finite-element or finitevolume methods. Furthermore, the deep neural network framework can be applied to arbitrary domains without any extra effort as we analyzed in this paper. Future work involves further improvement of the present algorithm. We also plan to study the behavior of this methodology on elliptic equations with discontinuities solutions arising from discontinuities present in the coefficients of this equation.