Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Scientific machine learning has been successfully applied to inverse problems and PDE discovery in computational physics. One caveat concerning current methods is the need for large amounts of ("clean") data, in order to characterize the full system response and discover underlying physical models. Bayesian methods may be particularly promising for overcoming these challenges, as they are naturally less sensitive to the negative effects of sparse and noisy data. In this paper, we propose to use Bayesian neural networks (BNN) in order to: 1) Recover the full system states from measurement data (e.g. temperature, velocity field, etc.). We use Hamiltonian Monte-Carlo to sample the posterior distribution of a deep and dense BNN, and show that it is possible to accurately capture physics of varying complexity, without overfitting. 2) Recover the parameters instantiating the underlying partial differential equation (PDE) governing the physical system. Using the trained BNN, as a surrogate of the system response, we generate datasets of derivatives that are potentially comprising the latent PDE governing the observed system and then perform a sequential threshold Bayesian linear regression (STBLR), between the successive derivatives in space and time, to recover the original PDE parameters. We take advantage of the confidence intervals within the BNN outputs, and introduce the spatial derivatives cumulative variance into the STBLR likelihood, to mitigate the influence of highly uncertain derivative data points; thus allowing for more accurate parameter discovery. We demonstrate our approach on a handful of example, in applied physics and non-linear dynamics.


Introduction
In recent years, pioneering research has been conducted into the application of machine learning to computational physics and engineering contexts: example works include [1,2,3,4,5,6].As a result, new sub-fields within the computational sciences have emerged, including, but not limited to physics-informed machine learning [7] and scientific machine learning (SciML) [8].Within these sub-fields, the development of machine learning based methods to infer the parameters of dynamical system governing equations, and/or the discovery of new partial differential equations (PDE) has attracted significant attention and important early success.In such early work, different inference strategies have been proposed.A popular method, known as SINDy [4] has its foundations in building a dataset of spatial derivatives that are potentially involved in the governing equation of the observed physics, in order to perform a sparse linear regression aimed at estimating the coefficients of each derivative term forming some latent PDE.This seminal work was later extended in [5], where the PDE derivative terms were computed either through polynomial interpolations or finite differences, and in [9,10,11] using neural networks.A major advantage of these methods is the interpretability: the spatial derivatives involved in the PDE, along with their coefficients, are discovered explicitly.Other methods directly approximate the differential operator using a physics-informed neural network (PINN) [3].While these methods generally yield highly accurate forward models, they lack interpretability [12], or only allow for recovering the coefficients of a PDE with the functional form already known [3].
The sparse regression based methods outlined earlier have been providing promising results, but the regression system is often poorly conditioned, and the spatial derivatives estimations must be highly accurate in order to provide satisfying results.In [9,10,11], measured data from a physical system are interpolated using a neural network, and then differentiated to create the spatial derivative dataset.This requires abundant data, with as little noise as possible.In practical engineering and scientific applications however, acquiring enough data (which is typically experimental measurements) may be a very expensive and time consuming process.While deep and dense neural networks are able to capture complex physics described in snapshots of physical system dynamics (e.g.shocks or sharp gradients), they are also more likely to overfit noisy measurements.Unfortunately, methods for fine tuning the networks parameters (e.g.cross validation) may be inapplicable due to the lack of abundant data in many scientific applications.Consequently, there is a trade-off: Use more data for training, and potentially obtain better derivative estimations, but with a significant risk of overfitting, or use more data for testing and limit overfitting, but incur a greater risk of missing complex underlying physics.
Bayesian methods, in particular Bayesian neural network (BNN) [13,14,15], are promising for avoiding this trade-off; as they a naturally less prone to overfitting, even with very noisy and sparse data.Furthermore, BNNs provide confidence intervals on their prediction, which can be used for improving the accuracy of PDE discovery techniques.In the field of PDE discovery, Bayesian machine learning methods have been used to infer parameters of governing equations with known functional form.For example, [1] used Gaussian processes, and [16,17] used BNNs combined with PINNs to infer PDE parameters, but do not permit the discovery of unknown PDEs.In this paper, we extend the original approach outlined in [4,5,9,11,18] in two important ways.Firstly, we propose to use a BNN to interpolate the snapshot measurements from the physical system.Relying on BNNs offers two major advantages: It makes the network more robust to noise and sparse data (allowing for minimal tuning of the network's hyperparameters) and it provides valuable confidence intervals over the interpolation predictions.Secondly, we propose to use these BNN confidence intervals to quantify the uncertainty over the spatial derivative dataset, and introduce this uncertainty into a sequential threshold Bayesian linear regression model (STBLR) to recover the PDE coefficients.
In the following sections, we first provide elements of background on neural networks and Bayesian inference (section 2).We then introduce our framework for PDE discovery (section 3) and present our results on three application examples (section 4).

Standard Deep Neural Networks
A neural network [14,19,20] is a non-linear parametric function, able to learn and approximate any other continuous function representations under weak conditions, provided that the network is sufficiently complex [21].Figure 1 shows a standard fully connected neural network architecture with n l = 4 hidden layers and n u = 6 units, respectively.Analytically, the output of a neural network for regression with a time-space input coordinate X i = (t i , x i ) can be written as a function composition: Where ϕ is a non-linear activation function, and the set of parameters (weights) In a neural network regression framework, we may further assume that the known data have been generated from the network, itself, and subsequently corrupted with Gaussian noise [14,19].
That is, for input X i = (t i , x i ), we have: Figure 1: Representation of a neural network with 4 hidden layers and 6 hidden units Leading to the well-known likelihood function: In standard non-Bayesian deep learning applications, we are generally interested in maximizing equation 3 (or variations of it) with respect to the weights w, using numerical optimization methods [20].

Bayesian Inference for Neural Networks
In a BNN, the weights are assumed to be sampled from a prior probability distribution, for example a standard Gaussian distribution with a 0 mean and unit standard deviation: Using equation 3 and Bayes' rule, the posterior distribution over the weights can be written as: Due to the highly non-linear nature of f (t, x|w), the likelihood is a very complicated function; thus sampling from the posterior is analytically intractable.
We can either approximate the posterior with e.g. a Gaussian distribution and find the parameters of the Gaussian by minimizing the KL divergence between the true and the approximate posterior (variational inference) [22,15,23,24], or use Monte-Carlo sampling [13,25,23,15].Variational inference for BNNs generally offers poor accuracy and often fails to provide meaningful confidence intervals [15,24,26,27].Here, we rely on Hamiltonian Monte-Carlo (HMC) instead [13,28,29] HMC flips the log-posterior probability distribution to be "up-side-down", so that the regions of high probability become minima.If a virtual particle (with momentum, v) is placed on the flipped distribution and starts rolling freely, it will naturally go towards the regions of lower potential energy (minima), i.e.
regions with higher probability.HMC has two steps.First it simulates the motion of such a particle, using Hamiltonian physics, and subsequently records the position over time; thus providing a set of sample candidates.The method then uses Metropolis Hastings [30] to either accept or reject these samples.The Hamiltonian step generally provides suitable samples, and allows for a higher acceptance rate than other Markov-Chain-Monte-Carlo (MCMC) methods [13].
The Hamiltonian can be expressed as a joint probability between w and v: Where T and V are the kinetic and potential energy, respectively.Using Hamilton's equations, and assuming that the momentum is independent from w, this leads to: Equation 7 can then be integrated using standard integration algorithms (Euler, Leap-Frog, etc. [29]) to find sample candidates.

Inversion Framework for PDE Discovery
We consider n measurements from a given physical system, whose response is of the form of inputs-outputs, D = {X i , y i } i∈[ [1,n]] ; where the input is a timespace coordinate, and the output is a corresponding measured physical quantity (potentially noisy).In this work, all the response data are generated from numerical solutions of PDEs, using Chebyshev polynomials with a Runge-Kutta time integration scheme, using the Chebfun matlab package [31].
Unless specified otherwise, we assume 16 measurement sensors placed at random space locations within the problem domain, that record the physical quantities of interest over time with increment ∆t.The input data have the form , where t i is time and x i is the sensor Cartesian coordinate.The output has the form y i = u(t i , x i )+ , where u is the PDE solution, and is noise corrupting the data.We consider three cases: = 0 (noiseless), ∼ N (0, 0.01 2 ) and ∼ N (0, 0.05 2 ).
The general framework for PDE discovery using these system response data is outlined in the following subsections, and summarized in figure 2.

Bayesian Neural Network Fitting
We assume that u can be approximated with a Bayesian neural network, f (equation 8), trained on the measurement data using HMC.
In the following sections, the BNN indifferently employs a single architecture: n l = 4 fully connected hidden layers of n u = 50 hidden units each, with ϕ = tanh activation functions.Unless specified otherwise, the prior over each neural network weight is a standard Gaussian distribution (zero mean and unit standard deviation).The noise standard deviation in the likelihood function is assumed and time and evaluated at each input (tj, xj) j2[ [1,nd]] .The derivatives are then averaged over the set of weight samples.For example, the expected value and variance of the first order time derivative is: Using this method and following the approach introduced by [4, 5], we build a surrogate dataset of successive derivatives in space X potentially comprised in the underlying governing PDE.The spatial derivative orders included in X is arbitrary and we may include non-linear terms as well (the total number of derivative candidates is defined as nc).Similarly, we compute the corresponding time derivative output vector ỹ.We also compute the variance of each spatial derivatives and sum them up to quantify the uncertainty over each surrogate derivatives (covariance matrix ⌃).[nd⇥nd] (13) 8 • Dataset of candidate derivatives: and time and evaluated at each input (tj, xj) j2[ [1,nd]] .The derivatives are then averaged over the set of weight samples.For example, the expected value and variance of the first order time derivative is: Using this method and following the approach introduced by [4, 5], we build a surrogate dataset of successive derivatives in space X potentially comprised in the underlying governing PDE.The spatial derivative orders included in X is arbitrary and we may include non-linear terms as well (the total number of derivative candidates is defined as nc).Similarly, we compute the corresponding time derivative output vector ỹ.We also compute the variance of each spatial derivatives and sum them up to quantify the uncertainty over each surrogate derivatives (covariance matrix ⌃).(11)  and time and evaluated at each input (t j , x j ) j2[ [1,nd]] .The derivatives are then averaged over the set of weight samples.For example, the expected value and variance of the first order time derivative is: Using this method and following the approach introduced by [4, 5], we build a surrogate dataset of successive derivatives in space X potentially comprised in the underlying governing PDE.The spatial derivative orders included in X is arbitrary and we may include non-linear terms as well (the total number of derivative candidates is defined as n c ).Similarly, we compute the corresponding time derivative output vector ỹ.We also compute the variance of each spatial derivatives and sum them up to quantify the uncertainty over each surrogate derivatives (covariance matrix ⌃).[nd⇥nd] (13) 8 • Sequential threshold Bayesian linear regression: • Posterior probability over the PDE coefficients: • Discovered PDE (1% noise): • Ground true PDE: (a) most coe cients towards 0 and it can provide confidence intervals over the recovered PDE coe cients.Using a BLR was first proposed in [18], but here we go further by introducing the variance over each surrogate derivatives (quantified through ⌃) directly into the BLR model in order to automatically discard the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is:  most coe cients towards 0 and it can provide confidence intervals over the recovered PDE coe cients.Using a BLR was first proposed in [18], but here we go further by introducing the variance over each surrogate derivatives (quantified through ⌃) directly into the BLR model in order to automatically discard the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is:  fore, a BLR with Gaussian priors should be well suited as it will naturally skew most coe cients towards 0 and it can provide confidence intervals over the recovered PDE coe cients.Using a BLR was first proposed in [18], but here we go further by introducing the variance over each surrogate derivatives (quantified through ⌃) directly into the BLR model in order to automatically discard the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is:  entire set of surrogate derivatives.Hence most coe cients should be 0. Therefore, a BLR with Gaussian priors should be well suited as it will naturally skew most coe cients towards 0 and it can provide confidence intervals over the recovered PDE coe cients.Using a BLR was first proposed in [18], but here we go further by introducing the variance over each surrogate derivatives (quantified through ⌃) directly into the BLR model in order to automatically discard the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is:  most coe cients towards 0 and it can provide confidence intervals over the recovered PDE coe cients.Using a BLR was first proposed in [18], but here we go further by introducing the variance over each surrogate derivatives (quantified through ⌃) directly into the BLR model in order to automatically discard the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is:  fore, a BLR with Gaussian priors should be well suited as it will naturally skew most coe cients towards 0 and it can provide confidence intervals over the recovered PDE coe cients.Using a BLR was first proposed in [18], but here we go further by introducing the variance over each surrogate derivatives (quantified through ⌃) directly into the BLR model in order to automatically discard the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is: x Derivative Estimations • Regression model: as σ = 0.01.In the HMC sampling we take n s = 6000 samples (200 burn), with an integration time stepping ∆τ = 5 • 10 −4 (∆τ = 1 • 10 −4 in section 4.2).

Surrogate Derivative Dataset
Once the BNN is properly trained, we sample a set of d time-space coordinates randomly: (t j , x j ) ∼ U [Ω] where Ω is the time-space domain and U the uniform distribution (in the following sections, d = 10000).The BNN is differentiated multiple times with respect to space and time and evaluated at each input (t j , x j ) with j ∈ [ [1, d]].The derivatives are then averaged over the set of network weights.For example, the expected value and variance of the first order time derivative is: Using this method and following the approach introduced by [4, 5], we build a library of successive expected derivatives in space X; potentially comprising the underlying governing PDE.The spatial derivative orders included in X are arbitrary, and we may include non-linear terms as well.The total number of derivative candidates is defined as n c , and in the following section we have n c = 11 (the specific list of derivatives used is outlined in table 2).Similarly, we compute the corresponding expected time derivative output vector, ỹ, and the variance of each spatial derivative (matrix Z).
Higher order differentiation tends to exhibit higher variance.To ensure that the uncertainty of each datapoint is not made overly unbalanced by the higher order terms, each term in the variance matrix Z is normalized, by dividing each columns with their maximum value.Then, every normalized term within each row is summed, thus providing a vector, γ( Z), quantifying the uncertainty of each derivative data point:

Sequential Threshold Bayesian Linear Regression Fitting
The library of derivative terms is then fitted with a sequential threshold Bayesian linear regression (STBLR) [18], in order to approximate the coefficients associated with each derivative.This is a sparse regression problem, and it is expected that the set of actual derivatives involved in the ground truth PDE should be much smaller than the entire set of candidate derivatives.Hence most coefficients should be 0. Therefore, a STBLR with Gaussian priors should be well suited, as it will naturally skew most coefficients towards 0; it also provides confidence intervals over the recovered PDE coefficients.Using a STBLR was first proposed in [18], but here we go further, by introducing the uncertainty over each derivative term within the library (quantified through γ( Z)) directly into the BLR model, in order to automatically minimize the influence of highly uncertain derivatives.The regression model can be written as follows: The system of equations written explicitly is: Each input-output pair { Xj , ỹj } with j ∈ [ [1, d]] is scaled by 1/γ j ; thus allowing for discounting the influence of highly uncertain derivative data points.Equation 14 leads to the posterior distribution over the PDE coefficients: Where ζ and θ are hyperparameters that maximize the marginal likelihood p(ỹ| X, γ( Z)).The vector of coefficients associated with each derivative can be taken as the mean value E[c] with respect to the posterior (with variance V[c]).
Here, Bayesian inference is analytically tractable, and the expected values of interest can be computed exactly.
In a STBLR, regression is repeated iteratively.Initially, a wide range of candidate derivatives are assumed so that an initial Bayesian linear regression may be performed.Coefficients having their absolute expected value fall under an arbitrary threshold, δ, are assumed to be, in reality, null; thus the corresponding derivative candidates are removed from the derivative dataset.The process is repeated until all the remaining derivative candidates have absolute expected coefficient values greater than δ.Note that here, we use a dynamic threshold: δ is doubled at each iteration.STBLR is further detailed in algorithm 1.

Error Metric and Computer Implementation
For assessing the accuracy of our models, we consider two error metrics.First, the 2 norm of the difference between the ground truth and the discovered vector of PDE coefficients: Secondly, we consider the 2 norm of the difference between the ground truth PDE solution and the numerical solution of the discovered PDE (we solve it using the chebfun package [31]): For baseline comparison, a standard deep neural network (DNN) is also trained with the same architecture (i.e.same hyperparameters) as the BNN, using a mean-squared error loss with a learning rate, α = 2 • 10 −4 , over 30000 iterations (we use Adam for gradient descent [32]).We also compare STBLR with a sequential threshold ordinary least squared regression (STOLS).STOLS follows the same sequential idea as STBLR, but the Bayesian linear regression is replaced with ordinary least squares, and each term in γ( Z) carries the same weight (∀j, γj = 1).
To implement our neural network models, we use PyTorch [33].HMC sampling is performed using the Hamiltorch add-on library [25], and the linear regressions for discovering PDE coefficients are performed using scikit-learn [34].Our code and data is available at github.com/CBonneville45/BayesianDeepLearningPDE.

Burgers Equation
Let us consider Burgers equation example as originally proposed in [12]:   The dictionary of spatial derivative candidates, with the discovered coefficients using the STBLR trained on the BNN derivative data, is shown in table 2. Table 3 shows the discovered coefficients obtained with the STOLS trained on the DNN derivative data (no uncertainty weighing).In the first two noise cases, although the DNN fits the measurement data better (the prediction RMSE on the test set is lower, as shown in table 1), the BNN is able to better discover the underlying PDE, thanks to the well quantified uncertainty.For larger amount of noise ( ∼ N (0, 0.05 2 )), the STBLR trained on the BNN derivative data slightly underestimates the influence of both the non-linear term and the diffusion term in Burgers equation, and also slightly overestimates other derivative terms.However, it still outperforms the baseline comparison: the STOLS trained on the DNN data completely misses the diffusion term, and dramatically underestimate the non-linear term.
As shown in table 1, the BNN is able to limit overfitting better than the standard DNN in cases with a larger amount of noise.In figure 5, we notice how the standard DNN (green line) becomes oscillatory around noisy datapoints, while the BNN captures the underlying pattern correctly.The later also provides meaningful confidence intervals: As seen on figure 4 and 5, the confidence intervals become wider (i.e. higher uncertainty) in space-time regions far from any measurement data, and the prediction errors are concentrated in areas of high uncertainty.Consequently, the BNN is able to produce better derivative and uncertainty estimates, ultimately yielding higher PDE discovery accuracy than the DNN coupled with STOLS.

Korteweg-De Vries Equation
We now consider the Korteweg-De Vries (KdV) equation example as originally proposed in [12]: Measurements are recorded every ∆t = 0.8 s for each of the 16 sensors (n = 800 training points), the ground truth solution is shown in figure 7. The predictions of the BNN and DNN are shown in figure 8 and 9.  Table 4 shows the prediction RMSE for both the BNN and DNN.The DNN predictions outperform the BNN, but then fails to recover the ground truth PDE, as shown in table 5 and 6.Even in the cases with limited to no noise at all, the DNN does a poor job.While it correctly identifies the third order spatial derivative and the non-linear term, it dramatically underestimate their influence.Conversely, the STBLR trained on the BNN derivative data not only identifies the derivatives correctly, but also predicts their coefficients much more accurately.This example shows that the accuracy of the BNN predictions is independent from its capability to discover the governing PDE.Indeed, the BNN doesn't have to be accurate everywhere, it only takes a few correct derivative estimations, with well quantified uncertainty, to find the ground truth PDE.
In the case with large amount of noise ( ∼ N (0, 0.05 2 )), the STBLR trained on the BNN derivative data still discovers the third order spatial derivative, and the non-linear term, with satisfactory accuracy; clearly outperforming the DNN.Indeed, the STOLS trained on the DNN derivative data misses the third order spatial derivative and dramatically underestimates the non-linear term.

Heat Equation
Finally, we consider the following 1D heat equation: The temperature is recorded every ∆t = 0.    Table 7 shows the prediction RMSE for both the BNN and DNN and figure 13 represents the BNN and DNN predictions at different times.Similarly to previous examples, the standard DNN yields lower errors on the test set.However, as shown in table 8 and 9, the STBLR trained on the BNN derivative data is able to recover the ground truth diffusion equation much better than the DNN/STOLS model.It accurately detects the second order spatial derivative (though with a slight underestimation of the diffusivity coefficient), while the DNN/STOLS models fails to find the diffusion term, and instead surprisingly discovers a squared first order derivative term that is not remotely comprised in the ground truth heat equation.
Figure 14 shows the dynamics of the discovered PDE.Since the STBLR trained on the BNN data correctly identified the heat equation, the dynamics for each noise case is very consistent with the ground truth, although the heat diffusion is a little slower due to the underestimated diffusivity coefficient.For comparison, the non-linear PDE discovered using the baseline DNN/STOLS model leads to dynamics completely different from the ground truth.Note that in this later case, the numerical solutions of the discovered PDE are likely becoming unstable for t > 3, but the dynamic is already wrong before this.

Noise
BNN DNN = 0 0.0593 0.0203 ∼ N (0, 0.01 2 ) 0.0395 0.0243 ∼ N (0, 0.05 2 ) 0.0399 0.0310  Our proposed BNN/STBLR model is able to accurately discover both linear and non-linear PDEs with excellent accuracy; even in the noisiest cases and when assuming a large set of candidate derivatives.In noisy cases, the BNN is able to limit overfitting, without having to rely on data-demanding validation methods and specific tuning of the network hyperparameters.The BNN also provides valuable uncertainty quantification, which allows for discarding inaccurate derivative estimations; thus furnishing much better PDE discovery performance than its frequentist counterpart.
While relying on BNNs to approximate the physical quantities of interest limits overfitting and helps quantifying uncertainty over the set of candidate derivatives, a potential caveat of this method is the need for approximate inference when training the neural network.First, Monte-Carlo methods, here used for sampling the posterior, are notoriously inefficient, and may be expensive when the data are more abundant (although in this case, fine-tuning a standard deep neural network may be feasible and preferable).Secondly, generating the dataset of candidate derivatives is computationally intensive: for each set of weights sampled from the posterior, the weights are loaded into the neural network, a forward pass is performed, and the network is differentiated multiple times with respect to the inputs.Then, the derivatives are averaged over all the posterior samples.This process may be time consuming especially if a large number of weight samples is necessary.Approximating the posterior with a simpler (tractable) distribution along with variational inference may alleviate this burden, but also results in less informative derivative uncertainty quantification, and thus a less accurate PDE coefficient discovery.
In this paper, we presented a framework for PDE discovery fully based on Bayesian inference, by combining Bayesian neural network with Bayesian linear regression.The use of BNNs for interpolating sparse physical measurements allows for accurately approximating the full physical system response, with well quantified uncertainty in regions where the data is seldom and noisy; ultimately allowing for more accurate PDE discovery.We believe that Bayesian methods can play a significant role in the field of PDE and dynamical system discovery, and this paper is a new contribution towards such a direction.

Figure 2 :
Figure 2: General PDE discovery framework.(a) Measurement data representing noisy snapshots of physical system dynamics, y i = u(t i , x i ) + i in time and space.(b) Fitting of the measurement data with a BNN.(c) Differentiation of the trained BNN with respect to time and space, for each sets of weights sampled from the posterior.(d) Construction of a dataset of derivatives, potentially comprising the underlying PDE governing the system, obtained from the BNN.The derivative values are stochastic, and we take their expected value with respect to the BNN posterior.A sequential threshold Bayesian linear regression is performed on the derivative dataset, weighted by the derivative variance, to obtain the value of each PDE coefficients.(e) Discovery of the coefficients and derivatives involved in the underlying PDE (here Burgers equation with noisy measurement data).

20 )
Measurements are recorded every ∆t = 0.2 s for each of the 16 sensors (n = 800 training points), the ground truth solution is shown in figure 3. The predictions of the BNN and DNN are shown in figure 4 and 5, and table 1 shows the RMSE between the BNN predictive mean, DNN prediction and ground truth over 1000 random input points.

Figure 6 Table 2 :
Figure6shows the learned dynamics (i.e. the solution of the discovered PDE) for the STBLR trained on the BNN data, compared with the STOLS trained on the DNN data.For every noise cases, the BNN/STBLR does a remarkable job at learning the PDE dynamics accurately.The DNN/STOLS is able to capture the dynamics fairly accurately for cases with little noise, but it clearly fails to capture the shock that occurs over time around x = 0 in the most noisy case.

Figure 6 :
Figure 6: Learned solutions for the Burgers equations.First and third row show the solutions of the discovered PDE, with the BNN/STBLR and the DNN/STOLS, respectively.Second and fourth row show the absolute error with respect to the ground truth for the BNN/STBLR and the DNN/STOLS, respectively

Figure 7 :Figure 8 :Figure 9 :
Figure 7: Ground truth solution for KdV equation (black dots represent measured time-space points)

Figure 10 shows
Figure10shows the learned dynamics for the STBLR trained on the BNN data, compared with the STOLS trained on the DNN data.For every noise case, the BNN/STBLR is able to learn the PDE dynamics with high accuracy (despite some errors localized around the wave front).Conversely, in each cases (and particularly the most noisy case), The DNN/STOLS dramatically fails to capture accurately the dynamics.

Figure 10 :
Figure 10: Learned solutions for the KdV equations.First and third row show the solutions of the discovered PDE, with the BNN/STBLR and the DNN/STOLS, respectively.Second and fourth row show the absolute error with respect to the ground truth for the BNN/STBLR and the DNN/STOLS, respectively 2 s for each of the 16 sensors (n = 800 training points), and the ground truth solution is shown in figure 11.The predictions of the BNN and DNN are shown in figure 12 and 13.

Figure 11 :Figure 12 :
Figure 11: Ground truth solution for the heat equation (black dots represent measured timespace points)

Figure 13 :
Figure 13: Predictions of the BNN (solid blue line 95% confidence intervals) and DNN (solid green line) compared to the ground truth (dashed red line) at t = 1 s, 3 s and 6 s.The black circles represent sensor measurement data -Heat equation

Figure 14 :
Figure 14: Learned solutions for the heat equation.First and third row show the solutions of the discovered PDE, with the BNN/STBLR and the DNN/STOLS, respectively.Second and fourth row show the absolute error with respect to the ground truth for the BNN/STBLR and the DNN/STOLS, respectively

Table 1 :
RMSE of the Bayesian and Standard Deep Neural Network -Burgers Equation

Table 3 :
Dictionary of candidate derivatives and discovered coefficients using STOLS and

Table 4 :
RMSE of the Bayesian and Standard Deep Neural Network -KdV Equation

Table 5 :
Dictionary of candidate derivatives and discovered coefficients using STBLR and

Table 6 :
Dictionary of candidate derivatives and discovered coefficients using STOLS and

Table 7 :
RMSE in the predicted heat equation response of the Bayesian and Standard Deep