A Mesh-Free Algorithm to Solve an Inverse Source Problem for Degenerate Two-Dimensional Parabolic Equation from Final Observations

. The main purpose of this work is to propose a new network architecture model for deep learning applied to solve an inverse source problem for a two-dimensional degenerate parabolic equation from ﬁnal observations with degeneracy occurring anywhere in the spatial domain.


Introduction
Deep neural networks are machine learning models that have achieved remarkable success in a number of domains, from visual recognition and speech to natural language processing and robotics (see [6]).
Recently, Sirignano and Spiliopoulos [7] proposed to solve PDEs using a mesh-free deep learning algorithm. The method is similar in spirit to the Galerkin method but with several key changes using ideas from machine learning. The Galerkin method is a widely-used computational method which seeks a reduced-form solution to a PDE as a linear combination of basis functions. The deep learning algorithm, or the deep Galerkin method (DGM), uses a deep neural network instead of a linear combination of basis functions. The deep neural network is trained to satisfy the differential operator, initial condition, and boundary conditions using stochastic gradient descent at randomly sampled spatial points. By randomly sampling spatial points, the authors avoid the need to form a mesh and instead convert the PDE problem into a machine-learning problem.
Let us introduce the following functional space √ a∇u ∈ L 2 (Ω) and u(x, y) = 0 for any (x, y) ∈ ∂Ω , . Then the weak formulation of problem (1) is Let us define the bilinear form The bilinear form B is noncoercive.
where r is a real strictly positive constant. Obviously, the set U is a compact of L 2 (0, T ).
Inverse Source Problem (ISP). Let u be the solution to (1). Assuming R is known, determine the time-dependent source part h from the measured data u obs at the final time u(T, ·).
Remark 1 It should be mentioned that we do not need the supplement distributed measurements to obtain the numerical solution of the inverse problem.
We treat the ISP by interpreting its solution as a minimizer of the following optimization problem: subject to u is a solution to (1), whereh is an a priori (background state) knowledge of the state h exact , and ε > 0 is the regularization parameter. Minimization problem (2) can be equivalently formulated as follows: where f (x, y, t) = h(t)R(x, y, t) and where θ u , θ h ∈ R k are the neural network's parameters. The goal is to find a set of parameters θ = (θ u , θ h ) such that the function z minimizes the error J (z). If the error J (z) is small, thenũ(x, y, t; θ u ) will closely satisfy the PDE differential operator, boundary conditions, and initial condition. Therefore, θ u which minimizes J produces a reduced-form modelũ(x, y, t; θ u ) which approximates the PDE's solution u(x, y, t).

Well-posedness of the problem
We recall the following result.
, there exists a unique weak solution which solves problem (1) and is such that with constant C depending on Ω and T .
Lemma 1 Let u be the weak solution of (1) corresponding to a given initial state u 0 . Then the input-output operator is Lipschitz continuous.
Proof. Let δh ∈ L 2 (0, T ) be a small variation such that h + δh ∈ U. Consider δu = u δ − u, where u is the weak solution of (1) with initial state u 0 and u δ is the weak solution of (1) with source term h δ = h + δh. Then δu is the solution to the following problem: Hence, δu is a weak solution of (1). By Theorem 1, there exists C, depending only on Ω and T, such that . This implies the Lipschitz continuity of the input-output operator (4).
An immediate consequence of Lemma 1 is the following result.

Proposition 1
The functional J is continuous on U, and there exists a unique minimizer h ∈ U of J(h), i.e., The differentiability of the functional J is deduced from the differentiability of the input-output operator (4) where u is the weak solution of (1) with time-dependent source part h.
We have the following result.
Proposition 2 Let u be the weak solution of (1) with time-dependent source part h. Then the input-output operator (4) is G-derivable.
Proof. Let h ∈ U and δh ∈ L 2 (0, T ) be a small variation such that h + δh ∈ U. Define the function where δu is the solution to the following variational problem: We want to show that φ(h) = o(δh). It can be easily verified that function φ is a solution to following variational problem: By the same way that was used in the proof of Lemma 1, we deduce that C δh − (δh) 2 2 L 2 (0,T ) . Hence, the input-output operator ϕ : h −→ u is G-derivable.

Model and algorithm
To approximate (u, h exact ), we present a new model described by Fig. 1, whereũ is the output of layers fully connected andh is a multi-layered composition output that gives a Taylor series approximation of the timedependent source part.

Figure 1: New model
The Taylor cell represented in Fig. 1 gives the following network architecture: which presents an expansion in Taylor series of order n. The deep learning method for the inverse source problem (DL-ISP) approach approximates h and u by two deep neural networks with common inputs and loss function. Fig. 1 shows a visualization of the overall architecture given by where θ u and θ h are the parameters of neural networks.
Our goal now is to find θ = (θ u , θ h ) such that u ob is an observation of N u , andh is the a priori knowledge of N h .
The cost function is constructed as follows: The derivatives ofũ can be evaluated using automatic differentiation (see [2]) since it is parametrizing as a neural network.
Remark 2 During the optimization, we observe that for all givenh, one searchesũ which minimizes the difference A(ũ) −h(t)R(x, y, t). For that, the approximationh is involved in the computation ofũ. 3. Compute the gradient ∇ θnJ (θ n , d m ) for the sampled mini-batch d m using backpropagation.
4. Use the estimated gradient to take a descent step at d m with learning rates to update θ n+1 : The parameters are updated using the well-known ADAM algorithm with a decaying learning rate schedule. 6. Save model to be used for any x ∈ Ω and t ∈ ]0, T [ .
We implement the algorithm using TensorFlow, which is software libraries for deep learning. TensorFlow has reverse mode automatic differentiation, which allows the calculation of derivatives for a broad range of functions. For example, TensorFlow can be used to calculate the gradient of the neural network 1 with respect to x or t, or θ. TensorFlow also allows the training of models on graphics processing units (GPUs).

Numerical results
For the simulations, in all tests below, we take the hyper parameters L = 3 (i.e., four hidden layers), M = 50 (number of units in each layer) forũ, and L = 6 forh. The neural network parameters are initialized using the keras.initializers.glorot normal initialization.
We do several tests, in order to show the performance of our approach. After training, we test the trained model on a testing data of 3000 couples of (t, x) which are uniformly generated. After 6000 epoch, we find the following results.

Conclusion
The deep learning DL-ISP Algorithm for solving PDEs presented in this paper is mesh-free, which is a key point since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. Moreover, suggested algorithm does not have a rounding errors caused by the discretization which have a very important role for the construction of solution.
The ease of implementing the DL-ISP Algorithm and the independence of this algorithm to PDEs make the method very efficient. Also, it presents an efficient way to solve nonlinear equations. But there remains the problem of the parametrization of the algorithm. The choice of the number of layers, number of units in each layer, and activation function turns out to be very important to have a good approximation of the solution.