Physics Informed by Deep Learning: Numerical Solutions of Modified Korteweg-de Vries Equation

In this paper, with the aid of symbolic computation system Python and based on the deep neural network (DNN), automatic di ﬀ erentiation (AD), and limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithms, we discussed the modi ﬁ ed Korteweg-de Vries (mkdv) equation to obtain numerical solutions. From the predicted solution and the expected solution, the resulting prediction error reaches 10 − 6 . The method that we used in this paper had demonstrated the powerful mathematical and physical ability of deep learning to ﬂ exibly simulate the physical dynamic state represented by di ﬀ erential equations and also opens the way for us to understand more physical phenomena later.


Introduction
In recent years, nonlinear phenomena have been widely used in fields such as mathematics, physics, chemistry, biology, finance, and engineering technology. Because a large number of mathematical models of scientific and engineering problems are reduced to the problem for determining solutions of ordinary differential equations (ODEs) and partial differential equations (PDEs) and the problems are complex and the amount of calculation is huge, except for a few special types of differential equations that can be solved by analytical methods, the analytical expressions to be obtained are extremely difficult in most cases. Therefore, the research on the numerical methods for PDE has become a popular mainstream direction. Numerical solutions have attracted the attention of scientific researchers, and it is also a large-scale scientific and engineering calculation.
The numerical method of PDE is based on whether the regular grid method and the gridless method are used when discretizing. Due to the difficulties in the structure of the numerical format and meshing, it is subject to many restrictions in practice. In obtaining high-precision and high-resolution solutions, not experienced computational mathematicians will have difficulty for the reason that the structure of the numerical format is very complicated.
Artificial neural networks (ANN) which are simplified models of the biological nervous system represent a technology that has various applications in the area of mathematical modeling, text recognition, voice recognition, learning and memory, pattern recognition, signal processing, automatic control, signal processing, decision-making assistance and time-series analysis, etc. [1]. ANN has been applied to solve ordinary differential equations and partial differential equations as early as more than 20 years ago. As we all know, solving differential equations by neural networks can be regarded as a mesh-free numerical method. Due to the importance of differential equations, many methods have been developed in the literature for solving them [2]. Rosenblatt introduced the first model of supervised learning based on a singlelayer neural network with a single neuron [3]. Mcfall studied boundary value problems with arbitrary irregular boundaries by an artificial neural network method in 2006 [4]. Mall and Chakraverty solved ordinary differential equations with the application of the Legendre neural network in 2016 [5].
However, due to the limitation of computing methods and computing resources at that time, this technology has not received enough attention. With the development of deep learning in recent years, Professor Karniadakis from the Department of Applied Mathematics at Brown University and his collaborators reexamined the technology and developed a set of deep learning algorithm frameworks based on the original. It was named "physics-informed neural networks (PINN)" and was first used to solve forward and inverse problems of partial differential equations. This has also triggered a lot of follow-up research work and has gradually become a research hotspot in the emerging interdisciplinary field of Scientific Machine Learning (SCIML). From the point of view of function approximation theory in mathematics, the neural network can be regarded as a general nonlinear function approximator, and the modeling process of partial differential equations is also looking for nonlinear functions satisfying constraint conditions, which have something in common. Thanks to the AD technology widely used in the deep learning neural network, the differential form constraint conditions in the differential equation are integrated into the loss function design of the neural network, so as to obtain the neural network constrained by the physical model-this is the most basic design idea of PINN.
Both PINN's network structure and loss function need to be tailored to the form of differential equations, which is different from current work in computational physics that directly utilizes machine learning algorithms. Different from the classical supervised learning task, PINN has the regularization factor of differential equation and initial boundary value condition in addition to the supervised data part in the design of loss function. These regularization factors are different and need to be tailored to achieve the optimal design according to the problem. The traditional computational differential equation numerical solution is obtained by finite difference, finite element, and other numerical methods, but the disadvantage is that it needs to give clear initial value conditions, and the numerical solution algorithm is sensitive to the boundary region; the condition slightly changed must be recalculated, which is difficult to be used in real-time calculation and prediction. PINN overcomes the problem that the traditional numerical simulation method is sensitive to   [10].
In the present study, we take advantage of the fast developing machine learning and use the method of PINN that was proposed by Raissi et al. [11] to study the mkdv equation. AD and L-BFGS [12] optimization algorithms had been used to train loss function. First, we introduced the main ideas of the algorithm. Second, we use the method to study two kinds of initial solutions of the mkdv equation, and the predicted solitary wave is first shown in this paper. We also show the relative L 2 -norm error between the predicted and the exact solution uðt, xÞ for the different number of initial and boundary training data N u and different number of collocation points N f . The three-dimensional diagram and projected image of the exact solutions and predicted solutions of the mkdv equation with different initial solutions are shown in Figures 1-4. Finally, we conclude the paper. From the results obtained in the experiment, some novel and important developments for searching for analytical solitary wave solutions for PDE were investigated. The results of this manuscript may well complement the existing literature as the following: extended and modified direct algebraic method, extended mapping method, and Seadawy techniques to find solutions for some nonlinear partial differential equations such as dispersive solitary wave solutions of Kadomtsev-Petviashvili-Burgers dynamical equations [13]; the elliptic function, bright and dark solitons, and solitary wave solutions of higher-order NLSE [14]; abundant lump solution and interaction phenomenon of (3 + 1)-dimensional generalized Kadomtsev-Petviashvili equation [15]; describing the bidirectional propagation of small amplitude long capillary 3 Advances in Mathematical Physics gravity waves on the surface of shallow water [16]; dispersive traveling wave solutions of the equal-width and modified equal-width equations [17]; periodic solitary wave solutions of the (2 + 1)-dimensional variable-coefficient Caudrey-Dodd-Gibbon-Kotera-Sawada equation [18]; rational solutions and lump solutions to the generalized (3 + 1)-dimensional shallow water-like equation [19]; new solitary wave solutions to coupled Maccari's system [20]; and lump solutions to a (2 + 1)-dimensional fourth-order nonlinear PDE possessing a Hirota bilinear form [21]. Therefore, this study is of significance for the later study of soliton solutions.

Main Ideas of the Algorithm
2.1. Illustration of the Algorithm. Deep learning is a new field in machine learning research. Its motivation lies in establishing and simulating a neural network for analysis and learning of the human brain. It mimics the mechanism of the human brain to interpret data. The concept of deep learning comes from the research of artificial neural networks. The multilayer perceptron with multiple hidden layers is a kind of deep learning structure. We give the structure of a simple neural network and deep neural network in Figure 5. In this paper, the network was used as a supervised network that means multilayer perceptron needs a teacher to tell the neural network what the desired output should be. Deep learning forms a more abstract high-level representation attribute category or feature by combining low-level features to discover distributed feature representations of data. Deep learning uses a hierarchical structure similar to neural networks. The system consists of a multilayer network consisting of an input layer, a hidden layer (multilayer), and an output layer. Only nodes in adjacent layers are connected. There is no connection between each other, and each layer can be regarded as

Advances in Mathematical Physics
perceptron is a mathematical function that maps input to output value. This function is composed of many simpler functions. Each layer of a fully connected DNN can be expressed as follows: k = 1, 2, ⋯, L, W ðkÞ is the weights from layer k − 1 to layer k, and w ij means the weight between the j-input and i-neuron of the hidden layer. b ðkÞ is bias vector. Nðz 0 ; ϑÞ can be considered an approximate solution for uðx, tÞ of a PDE. The final approximate solution is solved by adjusting the parameters ϑ to minimize the error of the approximate solution and the exact solution.
A fully connected neural network was previously proven [23,24] by Jones and Carroll and Dickinson that any continuous function defined in a finite domain can be approximated. In this paper, we introduced the form and construction of the solution of PDE using the physicsinformed neural network method. The schematic of the physics-informed neural network for the mkdv equation is shown in Figure 6. Consider the general form of the PDE as follows: where N is a nonlinear function of time t, space x, solution u, and its derivatives and the subscripts denote partial dif-ferentiation in either time t or space x. For example, u xx is the second derivative of u with respect to x. (1), let us define f ðt, xÞ as follows:

Details of the Algorithm. According to Equation
where mean square errors are defined, respectively, as MSE u and MSE f : The objective function of DNN training is performed by the mean squared error on the network outputs.
The weight and bias between the neural networks uðt, xÞ and f ðt, xÞ can be learned by minimizing the mean squared error loss, t i f , x i f were domain data, N u is the number of sampling points on the boundary, N f is the number of sampling points on the region, t i u , x i u , u i were initial and boundary training data on uðt, xÞ, and uðt i u , x i u Þ is predicted solution.

Example for Modified Korteweg-de Vries Equation
The modified Korteweg-de Vries (mkdv) equation may be written as [26] u t + 6u 2 u x + u xxx = 0, If −4 ≤ x ≤ 4, 0 ≤ t ≤ 2. We got the training and test data by using conventional spectral methods and using the Chebfun package [27] with a spectral Fourier discretization with 256 modes and a fourth-order explicit Runge-Kutta temporal integrator with time-step size 10 −4 .
There are two parts of data points to form the collocation points of training f ðx, tÞ: one part used the Latin hypercube sampling strategy to generate 10000 data points and the other part uses random sampling to generate 456 data points. Randomly extract N u = 100 data from the initial and boundary data as training points, and we learn the latent solution uðt, xÞ by using the L-BFGS algorithm to optimize the parameters to minimize the error function Equation (7). We had shown the predicted solution uðx, tÞ in Figure 7 by an 11-layer deep neural network in which each hidden layer contained 30 neurons. The relative L 2 -norm error for this  Table 1. We also studied the effect of the DNN architecture which is constructed by 9 layers with 20 neurons per hidden layer with different training points N u and collocation points N f on relative L 2 -norm error which is shown in Table 2. The three-dimensional diagram and projected image of the exact solutions and predicted solutions of the mkdv equation with initial state uðx, 0Þ = 2 exp ð−xÞ/ðexp ð−2xÞ + 1Þ are shown in Figures 1 and 2.
In order to further study the effectiveness of the performance of the algorithm to approximate the exact solutions of mkdv equations, we change the initial condition as follows [28]: We got the training and test data by using conventional spectral methods and using the Chebfun package with a spectral Fourier discretization with 256 modes and a fourth-order explicit Runge-Kutta temporal integrator with time-step size 10 −4 . The data points used to obtain f ðx, tÞ are divided into two parts, one part used the Latin hypercube sampling strategy to generate 10000 data points and the other part uses random sampling to generate 456 data points. Randomly extract N u = 100 data from the initial and boundary data as training points, and we learn the latent solution uðt, xÞ by using the L-BFGS algorithm to optimize the parameters to minimize the error function Equation (7). We had shown the predicted solution uðx, tÞ in Figure 9 by an 11-layer deep neural network in which each hidden layer contained 15 neurons. Running time of the code is 439.9942 seconds. The relative L 2 -norm error for this case is 1:0333972 · 10 −5 . Judging from the physical propagation diagram of the exact solution which is a soliton solution obtained by the Chebfun package and the predicted solution in Figure 9, the waveform of the single soliton has not changed over time. The   9 Advances in Mathematical Physics exact dynamics and learned dynamics of uðx, tÞ are shown in Figure 10. We choose the number of training points as N u = 100 and collocation points as N f = 10000; under this condition, we study the influence of different layers and different neurons on the relative L 2 -norm error. The relative L 2 -norm error tends to decrease with the increase of layers and neurons, and it is shown in Table 3. We also studied the effect of the DNN architecture constructed by 9 layers with 15 neurons per hidden layer with different training points N u and collocation points N f on relative L 2 -norm error which is shown in Table 4. The threedimensional diagram and projected image of the exact solutions and predicted solutions of the mkdv equation with initial state uðx, 0Þ = 2 sech ð2xÞ are shown in Figures 3 and 4.

Conclusions
With the increase of data volume, the improvement of computing power, and the emergence of new machine learning algorithms (deep learning), artificial intelligence has become a field with many practical applications and active research topics. Deep learning is one of the ways to artificial intelligence. It is a type of machine learning, a technology that enables computer systems to be improved from experience and data.
In this paper, we briefly describe details of the algorithm of DNN. Figures 5-10 show the basic structure of the simple neural network and deep neural network, schematic of the physics-informed neural network, and comparison diagram of the precise dynamical system and the predicted dynamical system of the mkdv equation. Tables 1 and 3 show L 2 -norm error between the predicted and exact solutions of uðt, xÞ for different numbers of hidden layers and different numbers of neurons per layer. Tables 2 and 4 show the relative L 2 error between the predicted and exact solutions uðt, xÞ for different numbers of training points N u and collocation points N f . Tables 1-4 illustrate the relative L 2 error that tends to decrease with the increase in layers and neurons. This method demonstrates the strong mathematical and physical ability of deep learning to simulate the physical dynamic state represented by differential equations and also opens the way for us to understand more physical phenomena later.

Data Availability
The data in the manuscript can be generated by MATLAB software. The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflict of interest.