Artificial Neural Network Methods for the Solution of Second Order Boundary Value Problems

: We present a method for solving partial differential equations using artificial neural networks and an adaptive collocation strategy. In this procedure, a coarse grid of training points is used at the initial training stages, while more points are added at later stages based on the value of the residual at a larger set of evaluation points. This method increases the robustness of the neural network approximation and can result in significant computational savings, particularly when the solution is non-smooth. Numerical results are presented for benchmark problems for scalar-valued PDEs, namely Poisson and Helmholtz equations, as well as for an inverse acoustics problem.


Introduction
Artificial neural networks (ANNs) have been a topic of great interest in the machine learning community due to their ability to solve very difficult problems, particularly in the fields of image processing and object recognition, speech recognition, medical diagnosis, etc. More recently, applications have been found in engineering, especially where large data sets are involved. From a mathematical point of view, neural networks are also interesting due to their ability to efficiently approximate arbitrary functions [Cybenko (1989)]. A natural question is to determine whether ANNs can be used to approximate the solution of partial differential equations which commonly appear in physics, engineering and mathematical problems. Several articles and even a book [Yadav (2015)] have been very recently devoted to this topic. In most of the approaches considered, a collocation-type method is employed which attempts to fit the governing equations and the boundary conditions at randomly selected points in the domain and on the boundary. Among these methods we mention the Deep Galerkin Method [Sirignano and Spiliopoulos (2018)], Physics Informed Neural Networks [Raissi, Perdikaris, and Karniadakis (2019)], as well as the earlier works in Lagaris et al. [Lagaris, Likas and Fotiadis (1998) ;Lagaris, Likas and Papageorgiou (2000); van Milligen, Tribaldos and Jiménez (1995); Kumar and Yadav (2011); McFall and Mahan (2009)]. This method appears to produce reasonably accurate results, particularly for high-dimensional domains [Han, Jentzen and Weinan (2018)] and domains with complex geometries [Berg and Nyström (2018)], where the meshfree character of these methods makes them competitive with established discretization methods. Another related approach is to use an energy minimization formulation of the governing equation as in Weinan et al. [Weinan and Yu (2018); Wang and Zhang (2019)]. This formulation has the advantage that only the first derivatives need to be computed for a 2 nd order problem, however it requires a more precise integration procedure and not all governing equations can be cast in an energy-minimization framework. In this work, we employ a collocation formulation for solving 2 nd order boundary value problems such as Poisson's equation and Helmholtz equation. Different from existing methods which typically use a randomly scattered set of collocation points, we present an adaptive approach for selecting the collocation points based on the value of the residual at previous training steps. This method can improve the robustness of collocation method, particularly in cases when the solution has a non-smooth region where increasing the number of training points is beneficial. The paper is structured as follows: in Section 2 we give an overview of artificial neural networks and briefly discuss their approximation properties. The application of ANNs to forward and inverse boundary-value problems is discussed in Section 3. Detailed numerical results are presented in Section 4, followed by concluding remarks.

Structure of neural network
In this section, we briefly describe the anatomy of a neural network and its properties for approximating arbitrary functions. We focus in particular on simple feed-forward neural networks which are used in the subsequent sections.

Feed-forward networks
A feed-forward network can be seen as a computational graph consisting of an input layer, an output layer and an arbitrary number of intermediary hidden layers, where all the neurons (units) in adjacent layers are connected with each other. It can be used to represent a function : ℝ → ℝ by using n neurons in input layer and m neurons in the output layer, see Fig. 1. We index the layers, starting with the input layer at 0, and the output layer as , and we denote the number of neurons in each layer by 0 = , 1 , … , = . To each connection between the i-th neuron in layer − 1 and the j-th neuron in layer , with 0 < ≤ , we associate a weight and to each neuron in the layers 0 < ≤ we associate a bias , = 1, … , . Moreover, we define an activation function : ℝ → ℝ between the layers and − 1. Then the values at each neuron can be written in terms of the activation function applied to a linear combination of the neurons in the previous layer given by the corresponding weights and biases, i.e., This can be written more compactly in matrix form as: where is a matrix of weights corresponding to the connections between layers − 1 and , = [ ] and = [ ] are column vectors and the activation function is applied element-wise. Existing computational frameworks such as Tensorflow or Pytorch perform very efficient parallel execution, also on GPUs if available, of computational graphs like the ones defined by (2). Moreover, the input values can be defined as multi-dimensional arrays (tensors) and the computation of the corresponding outputs is efficiently vectorized and distributed across the available computational resources.
A key observation is that, for a given a neural network, the partial derivatives of the outputs with respect to the weights and biases can also be efficiently computed by the backpropagation algorithm. The idea is to perform the chain rule starting with the last layer and store the intermediary values in a computational graph where the order of the layers is reversed. The back-propagation algorithm enables the application of gradient-based minimization algorithms were a loss function based on the output of the neural network is to be minimized. Moreover, the partial derivatives of the outputs with respect to the inputs or with respect to some other prescribed parameters can be calculated in a similar way. In typical applications of deep learning, a neural network is trained on a set of matching inputs and outputs by seeking to minimize the difference between the predicted values and some known correct outputs. This often requires large data sets that often must be manually processed and are themselves subject to different types of errors. We avoid this by defining a loss function which minimizes the residuals of the governing equations at a chosen set of training points. The training points can be simply generated as randomly scattered points in the domain as in Raissi et al. [Raissi, Perdikaris and Karniadakis (2019)]. In this work, we adopt instead an adaptive procedure which iteratively adds more points to the training set where the residual values are higher than some prescribed threshold, as detailed in Section 3.
Aside from the choice of the size of the neural networks (the number of hidden layers and the number of neurons in each layers) and that of the training points, other important parameters are related to the selection of the activation function and the choice of the minimization algorithm. Typical activation functions used are ramp functions like ReLU, sigmoid (logistic) function, and the hyperbolic tangent function (tanh). In this work, we use the tanh activation function which is preferable due to its smoothness. For optimization, we use the Adam (adaptive momentum) optimizer which based on stochastic gradient descent followed by a quasi-Newton method (L-BFGS) which builds an approximated Hessian at each gradient-descent step.

Theoretical approximation properties of neural networks
It is well-known that artificial neural networks have very good approximation properties when the function to be approximated is continuous. This has been established since the 1980s, where it was shown (Hornik, Stinchcombe and White 1989) that under mild assumptions on the activation function, a given function ( ) can be approximated within any chosen tolerance by a network with single hidden layer and a finite number of neurons. It was later shown, e.g., [Lu, Pu, Wang et. al. (2017)] that by choosing an non-linear activation function and forming deeper networks, the number of neurons can be significantly reduced. Various estimates of the approximation properties of neural networks for approximating PDEs have been more recently derived [Sirignano and Spiliopoulos (2018)]. We note however, that while the neural networks are in theory capable to represent complex functions in a very compact way, finding the actual parameters (weights and biases) which solve a given differential equation within a given approximation tolerance can be quite difficult. In the following, we present a method for selecting the training data which gives some control over the performance of the solver and optimization algorithms.

Collocation solver for PDEs
In the collocation method, the main idea is to define a loss function which is based on the strong form of the governing equations and the boundary conditions. The loss function is evaluated at chosen sets of points in the interior on the domain as well as on the boundary. More specifically, let us assume that a general boundary value problem can be written as: where Ω ∈ ℝ is the problem domain with boundary Ω, ℒ, are interior and boundary differential operators, and , are prescribed functions (e.g., loading data).

Poisson equation
We consider equations of the form: where ( ) is the unknown solution, Ω ⊂ ℝ is the computational domain, ≥ 0 is a given constant, is the given source term, � is the prescribed Dirichlet data on the Dirichlet boundary Ω , is the outer normal vector, and is the Neumann data. When = 0, the problem is a standard Poisson equation; we also consider a more general setting when e.g., = 1. We define a loss function, ℂ( ; , , �, ) using a mean squared error (MSE) for the interior and boundary governing equations as: where , and represent the number of points ( * , * ), � , �, and ( , ) in the interior domain, on the Dirichlet boundary and the Neumann boundary respectively. Moreover, and represent penalty terms for the Dirichlet and Neumann boundaries. While in some cases it is enough to set = = 1, the convergence of the loss function can be improved by increasing one or both of these values to ensure that the boundary conditions are satisfied. During the minimization process, the terms in the cost function can be monitored and usually more accurate results can be obtained when the loss for the interior points is of the same order as the losses corresponding to the Dirichlet and Neumann boundaries.
(8) Here ( , ) can be complex-valued function. This equation is a time-independent form of the wave equation and it has applications in the study of various physical phenomena, such as acoustics, seismology and electromagnetic radiation. For many problems, the domain Ω is not bounded and the solution ( , ) can be highly oscillatory, which creates difficulties in standard finite element analysis. Different types of boundary conditions, usually of the Neumann type can be imposed depending on the problem. As for Poisson's equation, we construct a loss function which seeks to minimize the residual of the governing equation at collocation points. While in theory neural networks are defined in ℝ , where n is the number of neurons in the input layer and can capture unbounded domains, we limit ourselves here to finite domains.

Inverse problems
In this class of problems, one is given a particular solution * ( ), which satisfies the governing equations of the form: ℒ( * ( ); * ) = ( ) for ∈ Ω, (9) ( * ( ); * ) = ( ) for ∈ Ω, (10) where * is unknown. The problem is then to determine the unknown parameter or vector of parameters * . This can be reformulated as an optimization problem, where we start with an initial guess , and we approximate the solution which satisfies the governing equations. In the framework of neural networks, we can use gradient descent to minimize ‖ * − ‖ under some suitable norm. Formally, we define a cost function of the form: where * are interior collocation points, are boundary collocation points, is a penalty term for enforcing boundary conditions. This method seeks to find both and λ simultaneously which approach * and * respectively. In this work, we investigate applying this method to the Helmholtz equation where the wave number k is unknown.

Adaptive collocation
Several methods can be proposed to select the collocation points in the interior and the boundary of the domain. Here we propose a method where we start with a coarse grid and then select additional points based on the evaluation of the residual. We apply this method for the selection of the interior points, for which evaluating the governing equations requires the most computational effort, although in principle it can also be applied to the boundary points. The main idea of the method is described in Fig. 2. The blue dots represent the training (collocation points) in the interior, the green dots represent the model evaluation points after the first training is complete, and the purple dots represent the additional training points. We note that evaluating the model at a larger number of points is quite inexpensive computationally, while the number of training points impacts the performance much more significantly as the governing equations need to be evaluated at the training points at each gradient descent step. Therefore, this method provides a criterion for selecting the collocation points in an efficient manner. Once the training is completed for one step, the network weights and biases can be carried over to the subsequent step, resulting in faster convergence. The exact solution of this equation is ( , ) = sin(2 ) cos(2 ). We consider a loss function of the form: where ( * . * ) are interior collocation points, and ( , The results obtained for a shallow network with one hidden layer of 10 neurons are shown in Figure 3. The blue dots represent the interior collocation points, while the red and green dots represent the points corresponding to the Dirichlet and Neumann boundary conditions respectively. We note that even for this simple network with 41 parameters (30 weights and 11 biases), an accurate solution can be obtained. Because the solution is smooth throughout the domain, the training points are generally evenly distributed. However, more points are selected near the corners and boundaries since the residuals and the actual errors are higher there. The relative 2 errors obtained by increasing the number of layers while keeping the number of neurons per layer fixed and using the same refinement strategy are shown in Tab. 1. It can be observed that except for the single-layer network, the error decreases significantly as more training points are used. Moreover, the error for deeper networks is greatly reduced compared to the single-layer network, although the number of parameters and computational cost increases as well.

Source problem on a quarter-annulus domain
We now consider a 2 nd order problem with pure Dirichlet boundary conditions on a nonrectangular domain. The governing equation is given by: −Δ ( , ) + ( , ) = ( , ), for ( , ) ∈ Ω ( , ) = 0, for ( , ) ∈ Ω, where Ω is quarter of an annulus located in the first quadrant and centered at the origin, with inner radius = 1 and outer radius = 4. Here ( , ) is chosen such that the exact solution is ( , ) = ( 2 + 2 − 1)( 2 + 2 − 16) sin( ) sin( ), i.e., ( , ) = (3 4 − 67 2 − 67 2 + 3 4 + 6 2 2 + 116) sin( ) sin( ) + (68 − 8 3 − 8 2 ) cos( ) sin( ) + (68 − 8 3 − 8 2 ) cos( ) sin( ). As before, we define a cost function consisting of term corresponding to the interior governing equation and another term corresponding to the boundary conditions. To ensure that the boundary conditions are satisfied during the minimization process, we apply a penalty factor = 100, so that the cost function becomes: We first choose a neural network with 2 hidden layers with 10 neurons each and the tanh activation function. The initial set of collocation points consists of = 19 2 points in the interior and = 168 points on the boundary, spaced uniformly as shown in Fig. 4. Subsequent refinements are done according to the same procedure as in the first example. The relative 2 error is calculated as 0.00053738 for the initial training and decreases to 0.00036766 and 0.00043207 as the number of collocation points is increased.

Poisson equation with a corner singularity
To investigate the ability of the proposed refinement scheme to approximate solutions with sharp gradients, we next consider the Poisson equation on a domain with an internal boundary which results in a corner-type singularity appearing in the solution. The domain considered is given by Ω ≔ (−1,1) 2 − [0,1) in Cartesian coordinates and we seek an unknown solution which satisfies: −Δ ( , ) = 0, for ( , ) ∈ Ω ( , ) = 1/2 sin ( /2) for ( , ) = (� 2 + 2 , arctan ( / )) ∈ Ω. The exact solution in polar coordinates is ( , ) = 1/2 sin ( /2) , which has the singular term 1/2 creating approximation difficulties near the origin. In finite elements methods, a more refined mesh is typically required to obtain a good approximation. This problem was also investigated in Weinan et al. [Weinan and Yu (2018)] using an energy minimization method. The geometry is modelled by considering 3 rectangular subdomains (−1,0) × (−1,1), (0,1) × (−1,0), and (0,1) × (0,1). We define a loss function of the form: In the initial grid we choose equally spaced points with a distance of 0.05 in the x and y directions. For the points on the boundary, we choose more densely spaced points, with a distance of 0.025 in Cartesian coordinates and we set a penalty factor of = 500 to ensure that the boundary conditions are respected. As before, we evaluate the model on grids with more points and append the points where the residual value is large to the training set in the next step. The results obtained by the adaptive collocation scheme using a network with 3 hidden layers and 30 neurons each are shown in Fig. 5. In general, the residual values in a narrow region around the singularity are much larger than in the rest of the domain and they are selected in the subsequent training step. Also, larger residuals are observed along the line = 0, with < 0 as the neural network with a coarser training grid has difficulties in capturing correctly the end of the internal boundary. However, as can be seen from the plots, the error diminishes as the number of training points increases. The accuracy can be further improved by choosing larger networks although the number of training points needs to be increased as well. Here we select = 12 as the wave number and = 2 as the mode number. This problem admits an analytical solution which can be written as: ( , ) = cos ( )( 1 exp(− ) + 2 exp( ), where = � 2 − ( ) 2 , and 1 and 2 are the solution of the 2 × 2 linear system: In the following, we compute only the real part of the solution ( , ) as the imaginary part can be computed by a similar procedure. As before, we define a loss functions which minimizes the residual of the governing equation at interior and boundary points: The results of the adaptive collocation method are shown in Fig. 6. We have used a neural network with 3 hidden layers of 30 neurons each and a grid of 99 × 49 uniformly spaced points in the interior of the domain in the initial step. For the boundary, we have used = 400 uniformly spaced points and a penalty parameter of = 100. As before, the size of the training set is increased based on the residual value on a finer grid (with double the points in each direction) in subsequent steps. Due to the oscillatory nature of the solution, the additional training points are also generally evenly distributed in the domain with higher concentration in the areas where the residual value was initially larger than average.

Inverse acoustic problem
Here we consider the same governing equation and boundary conditions as in the previous example, but we seek instead to solve for the wave number while the value of the solution (for the correct ) is given. In this case we build a loss function that minimizes the difference between the given solution and the residual of the governing equations simultaneously: Here has the same form as in the previous section but with = 4 and = 1. We start with = 1 as an initial guess and seek to minimize the loss function with as a free parameter. For this problem, we choose a grid of 149 × 29 equally spaced points in the interior of the domain and = 800 boundary collocation points and = 100. The results for this example are presented in Fig. 7. We can observe that the solution has been represented with reasonable accuracy both in terms of ( , ) as well . The relative 2 error for ( , ) in this example is 0.084, while the computed is 3.882 as compared to 4 in the reference solution. As in the other examples, we have used the Adam optimizer followed by a quasi-Newton method (L-BFGS). It can be noted that the latter converges significantly faster, however in many cases performing a stochastic gradient-descent like Adam helps the solver to avoid being trapped in early local minima.

Conclusions
We have presented a collocation method for solving boundary values problem using artificial neural networks. The method is completely mesh-free as only scattered sets of points are used in the training and evaluation sets. Although uniform grids of training points have been used in the initial training step, the method could be easily adapted to scattered data obtained e.g. by Latin hypercube sampling methods. The method was shown to produce results with good accuracy for the parameters chosen, although as common in deep learning methods, parameter selection may require some manual tuning. A more detailed study of the convergence and approximation properties of neural networks, as well as selecting robust minimization procedures remain open as possible research topics. Moreover, the applicability of these methods to energy minimization formulations, for the differential equations which allow it, can be investigated in future work.