Deep neural network enabled corrective source term approach to hybrid analysis and modeling

In this work, we introduce, justify and demonstrate the Corrective Source Term Approach (CoSTA) -- a novel approach to Hybrid Analysis and Modeling (HAM). The objective of HAM is to combine physics-based modeling (PBM) and data-driven modeling (DDM) to create generalizable, trustworthy, accurate, computationally efficient and self-evolving models. CoSTA achieves this objective by augmenting the governing equation of a PBM model with a corrective source term generated using a deep neural network. In a series of numerical experiments on one-dimensional heat diffusion, CoSTA is found to outperform comparable DDM and PBM models in terms of accuracy -- often reducing predictive errors by several orders of magnitude -- while also generalizing better than pure DDM. Due to its flexible but solid theoretical foundation, CoSTA provides a modular framework for leveraging novel developments within both PBM and DDM. Its theoretical foundation also ensures that CoSTA can be used to model any system governed by (deterministic) partial differential equations. Moreover, CoSTA facilitates interpretation of the DNN-generated source term within the context of PBM, which results in improved explainability of the DNN. These factors make CoSTA a potential door-opener for data-driven techniques to enter high-stakes applications previously reserved for pure PBM.


Introduction
The recent wave of digitalization has given a push to emerging technologies like digital twins. A digital twin is defined as a virtual representation of a physical asset enabled through data and simulators for real-time prediction, optimization, monitoring, controlling, and improved decision making . Paramount to digital twins' success is the level of physical realism that can be instilled into them. In this regard, as noticed by Rasheed et al. (2020), modeling plays an important role. The exact requirements of the modeling depend on the digital twin's capability level, which is generally categorized on a scale from 0 to 5 (0-standalone, 1-descriptive, 2-diagnostic, 3-predictive, 4-prescriptive, 5-autonomous) (see Figure 1).
Although a digital twin offers huge potential in many industries, adaptation of the digital twin technology has been stagnated since its inception, mainly due to the lack of methodological works. To leverage asset-twin technologies, Kapteyn et al. (2021) proposed a unifying mathematical foundation that draws from probabilistic graphical models and dynamical system theory. The role of surrogate models in the development of digital twin Wang, 2021). Some of the advantages of DDM are their inherent online learning capability, high computational efficiency for inference, and accuracy even for very challenging problems (assuming the training, validation, and test data are prepared properly). However, acceptability of DDM in high-stake applications has been fairly limited due to their data-hungry and black-box nature, poor generalizability, inherent bias, and lack of a robust theory for model stability analysis. In fact, the numerous vulnerabilities of DNN have been highlighted in several recent works ( simulation results resulting in a set of ordinary differential equations (ODEs). One advantage of this method is that many terms in the resulting ODEs can be computed offline using the data, and hence, what remains in an online phase is a simple forward integration of the ODEs in time, which can be very fast. These models however, do not perform well when physics is either missing or get lost during the dimensionality reduction. In the PGML approach However, the PGML models still do not generalize well to extrapolation scenarios. Within a PINN framework (Raissi et al., 2019), the commonly used mean squared error cost function of the DNN is regularized with the residual of the equation describing the physical laws that should be satisfied. This kind of regularization can pose a challenge during the optimization step because of the increase in the complexity of the cost function. Finally, sparse regression based on l 1 regularization and symbolic regression based on gene expression programming have been shown to be very effective in discovering hidden or partially known physics directly from data (Vaddireddy et al., 2020). The data-driven discovered physics is then added to the PBM to improve their predictions. Notable work using this approach can be found in Brunton et al. (2016). One of the limitations of this class of method is that, in the case of sparse regression, additional features are required to be handcrafted, while in the case of symbolic regression, the resulting models are often unstable and hence might not be fit for interpretation.
From a careful analysis of the work cited here, it is clear that the HAM approach has the potential to fulfil the four desirable modeling characteristics described earlier in this section. However, as discussed above, earlier approaches to HAM have some limitations. In the current article, we present the Corrective Source Term Approach (CoSTA) to HAM. The novelty of the work from a theoretical perspective, is the development of a sound mathematical foundation of the CoSTA approach to augment the governing equation of a PBM describing partial physics with a DNN-generated corrective source term that takes into account the remaining unknown/ignored physics. By doing so, the resulting model retains the generalizability and interpretability of the PBM while exploiting DDM to make the predictions more accurate by modeling the unknown physics reflected in the data. Explainable AI is often attributed to the enhancing processes in which the results of the machine learning models and solutions can be better understood by humans. In our view, our proposed CoSTA approach can be classified as a new explainable AI approach while synthesizing a modular framework between black-box DDM and PBM. From a practical perspective, the superiority of CoSTA in terms of accuracy and generalizability is quantitatively demonstrated by comparing its results against those of pure PBM and DDM in a series of numerical experiments concerning heat diffusion.
In Section 2, we start with the rationale behind the development of CoSTA. We then continue with a presentation of our chosen PBM (Section 2.2) and DDM (Section 2.3), before we explain how the PBM and a DNN can be combined using CoSTA (Section 2.4). We also provide some background on the method of manufactured solutions (Section 2.5). Section 3 is devoted to explaining our experimental setup -including the manufactured solutions considered, our DNN architecture and hyperparameter choices, and our data generation, training and testing procedures. Our experimental results are presented and discussed in Section 4 before the article is concluded in Section 5 with a brief summary and an outlook on future work.

Theory
In this section we present the rationale behind the CoSTA approach, followed by an overview of the PBM, DDM and HAM models used in our numerical experiments. The section concludes with a brief introduction to the method of manufactured solutions.

The rationale behind the CoSTA approach
In this subsection, we present a mathematical foundation for the proposed CoSTA approach. Assume that we are aiming to solve a problem on a domain Ω (with boundary ∂Ω and outward pointing unit normal n) that can be represented by a linear partial differential equation (PDE) defined as follows: Here L is a linear 1 differential operator (e.g. L = −∇ 2 for Poisson problems), u is the unknown (e.g. temperature for heat diffusion problems), f is the (true) source term (e.g. a heat source/sink in heat diffusion problems), g d is the prescribed Dirichlet boundary condition along the boundary section ∂Ω d , and N is the differential operator related to the Neumann boundary condition (e.g. ∂u/∂n for heat diffusion problems) with prescribed value g n (e.g. heat flux) along the boundary section ∂Ω n . We assume that ∂Ω d and ∂Ω n cover the whole boundary ∂Ω without overlapping each other.
We will now address different cases of uncertainties/errors/lack of information in the abstract PDE defined above. We will, in general, letũ denote an analytical solution to a perturbed version of the PDE problem defined in Equations (1)(2)(3). Furthermore, we denote numerically computed solutions of the original PDE and its perturbation as u num andũ num , respectively. Here the subscript " num " indicates the finite resolution of the numerical method (e.g. finite difference method (FDM), finite volume method (FVM) or finite element method (FEM)).
Let the error between the two analytical solutions u andũ be denotedẽ, i.e. we havẽ We define the corresponding residualr as follows:r Notice that there is a unique relationship between the error in the analytical solution of the perturbed PDEũ and the residual obtained by inserting this solution into the (true) original PDE. The CoSTA approach utilizes this relationship, as illustrated for different cases 1-4 below. Cases 1 and 2 concern possible sources of error in the governing PDE itself, while Case 3 is the case when the governing PDE is known without error but cannot be solved analytically. Combinations of Cases 1-3 are treated as Case 4.
Case 1: Inaccurate source term or boundary conditions: In many real-world problems, the source term f (e.g. describing internal heat generation) may not be known exactly.
Let the inaccurate source term be denotedf and assume that we are able to compute exactly the corresponding PDE such that: Lũ =f in Ω. (8) Assuming that we know u (e.g. by measurements or analytical solution), we can add a corrective source termr to compute an improved (analytical) solution denotedũ costa : =f +r (11) If we are able to evaluate Lu for a given solution u, we get the following relationship from the equations above 2 : Thus, by the herein developed CoSTA, we may retain the true analytical solution without any modeling error caused by the inaccurate source termf . The main point to be observed here is that inaccuracies in the source term can be corrected for by computing the residual from measured (or manufactured) solutions.
Suppose now that, instead of having an inaccurate source termf , we have an inaccurate Dirichlet conditiong d (e.g. inaccurate surface temperature) or an inaccurate Neumann conditiong n (e.g. unknown heat flux). We may correct for any error inũ caused byg d by replacing it with u along the Dirichlet boundary ∂Ω d and similarly any error caused byg n is taken care of by replacing it with N u along ∂Ω n .
Case 2: Inaccurate physical parameters and differential operators: Let the true differential operator L be dependent on some physical parameter k (e.g. heat conductivity) that may be a spatial and/or temporal function.
We indicate this dependency by writing L(k). Assume now that we do not know the exact value of k, but instead only an approximationk, such that our perturbed PDE is defined using the operatorL = L(k). Alternatively, assume that we do not know (or simply neglect) some terms in the true operator L and denote the resulting inaccurate operatorL. 2 If u is only known in discrete points (e.g. it is measured) we may interpolate it or project it onto a polynomial basis of order p to achieve up which then can be differentiated and used instead of u in Equation (12) For these two situations we will typically solve the following problem: Lũ = f in Ω. (13) Assuming again that we know u (e.g. by measurements or analytical solution) the residual, due to inaccurate differential operator whereL = L, is given by Equation (7), i.e.,r = Lẽ. However, if L is unknown we cannot computer from the relations above. Therefore, we introduce an alternative residualr corresponding to using the perturbed differential operator as follows:r :=Lẽ (14) We then add ar as a corrective source term to find the CoSTA-improved (analytical) solution u costa : Lu costa = f +r (15) =Lũ +Lẽ (16) =Lu (17) Thus, CoSTA can be looked upon as either solving a "manufactured solution" defined by the true solution in Equation (17), or as solving the problem using a perturbed (corrected) source term f +r as given in Equation (15) in both cases using the (inaccurate) perturbed differential operatorL. Notice that, in the above, we get an analytical solution u costa that corresponds to a source termLu defined by the true solution u on a perturbed PDE defined bỹ L. If the perturbed PDE admits a unique analytical solution, then the use of CoSTA will imply that u costa = u.
Case 3: Inaccurate differential operator due to discretization errors: Above we have described inaccuracy in the continuous PDE due to modeling errors. However, when we solve a PDE with FDM/FVM/FEM we introduce discretization errors as we are solving the problem with a discrete approximation L num of the true differential operator L (e.g. FDM using central differences). Following the approach for Case 2 above, we get the same relationships as given in Equation (15)(16)(17) by substitutingL with L num for problems with only discretization errors and no modeling errors.
Case 4: Combined modeling and discretization errors: In our study herein, we will address problems where we have both modeling and discretization errors. Denote the corresponding differential operatorL num and the inaccurate source termf . Our approach for retaining the true solution u of the true problem defined by L and f , is outlined below.
We first solve the following problem to find a predictorũ num : Then we compute the residual, i.e. the corrective source term, corresponding to the errorẽ num = u −ũ num in the predictor: 3r Finally, we do the following corrector step to compute the CoSTA-improved numerical solution: =L numũnum +L numẽnum =L num u Notice that if we knew the true solution u(x, t; µ) at any node, at every time step for any choice of the parameter vector µ a priori , we would not need to do the predictor step or compute the related corrective source term, because we could have solved Equation (22) directly. However, in practice we do not know u for all choices of µ, but we may train a neural network to return a quite accurate corrective source term (formally defined by Equation (19)) given a predictorũ num computed by Equation (18). Thus, the corrector step in CoSTA corresponds then to solving Equation (20).
Applications: To test and demonstrate the value of the proposed CoSTA approach, we choose the problem of one-dimensional heat conduction described by a PDE derived from the first principles using the first law of thermodynamics. Accurate modeling of heat conduction is vital for a wide array of problems ranging from the modeling of heat transfer from the earth to its atmosphere, modeling heat transfer characteristics of the built environment, and modeling the accumulation of thermal stresses in heat storage infrastructures. However, the accuracy in most of these applications is compromised due to geometric simplifications, uncertainty associated with the values of thermophysical properties used in the calculation, neglection of unknown (and even known) phenomena, and numerical approximations. The CoSTA approach, if successful, has the potential to solve these kinds of issuesnot only for heat conduction modeling, but also for the modeling of any other steady-state or dynamical system that can described by a (system of) PDE(s).

Physics-based modeling
In PBM, PDEs are widely used as governing equations, describing various physical phenomena by relating partial derivatives of relevant physical quantities. In this paper, we consider the one-dimensional (1D) unsteady heat diffusion equation, which describes 1D transient heat transfer in a system of volume V and cross-sectional area A. The equation, which can be derived by applying the principle of energy conservation to the 1st law of where T , ρ, c V , and k denote temperature, density, heat capacity, and thermal diffusivity, respectively. The term on the equation's left-hand side represents the momentary change in the system's internal energy. Furthermore, the first two terms on the right-hand side represent the heat flux across the system's right (eastern, denoted by subscript e) boundary and left (western, denoted by subscript w) boundary, respectively, while the last term on the right-hand side (q) is a source term which accounts for heat generated within the system. Under certain smoothness requirements, the 1D unsteady heat equation can also be written on the so-called differential form: Comparing to Equation (1), the differential operator of the heat equation is given by while the source term is f =q.
In the cases where the solution of Equation (23) (or Equation (24)) cannot be expressed analytically, approximate solutions can be obtained using numerical methods such as FDMs, FVMs, and FEMs. For Dirichlet boundary conditions (BCs), Equation (23) can be written in the form when discretized using the Implicit Euler FVM. Here, T = [T 1 , . . . , T Nj ] T denotes the temperature at all N j interior grid nodes x 1 , . . . , x Nj . Furthermore, the superscripts n and n+1 denote two subsequent time levels, A is a tri-diagonal matrix and b is a vector depending on T n , the BCs (T a and T b ) andq. Notice that A is the algebraic matrix representation of the discrete differential operator L num , and b is the vector representation of the source term f which also includes the effects of the boundary conditions.
Since Equation (26) is an approximation of Equation (23), a solution of one of the equations is generally not a solution of the other. Note also that, in cases where the governing equation (23) is not fully known, Equation (26) has to be based on an approximation of Equation (23), which causes further discrepancies between the solutions of Equation (23) and Equation (26), as discussed in Section 1. To distinguish the two classes of solutions, we use the notation T ref to denote a solution of the true governing equation (23) (i.e., similar to u given by the Equations (1-3) in the general case) and T p to denote a solution of the discrete system (26). In the context of a prediction problem, T ref is then the ideal prediction, while T p is the prediction made by the PBM (i.e., similar toũ num given by Equation (18) in the general case). Thus the equation that we solve to generate the PBM solution is given by with the prescribed boundary conditions implicitly included in b. It should be stressed that there is no learning involved in PBM and hence, in Figure 3 -where we illustrate the training and testing processes for PBM, DDM and HAM -there is no mention of PBM in the part concerning training ( Figure 3a).

Data-driven modeling
In DDM, physics is learned directly from the observation data. For transient systems, one common DDM approach is to define a mapping from the observed state at one time level to the observed state at the subsequent time. A DNN is then trained to approximate this mapping. In the context of 1D heat diffusion problems with known Dirichlet BCs, the mapping to be learned by the DNN is where T n d refers to the temperature profile predicted by the DDM at time level n, and T n ref is the solution T ref of the true governing equation (Equation (23)) sampled at the grid nodes x 1 , . . . , x Nj and time level n. Note that the dimensionality discrepancy between the DNN's input and output is due to the input containing the boundary temperatures, which the output does not include; since the boundary temperatures are assumed known, there is no reason to have the DNN predict them. However, we do want to include the boundary temperature in the DNN input, since they represent relevant physical information. To avoid notational complexity, we use T n d to denote both vectors containing and not containing the boundary temperatures.
Our reason for choosing DNN-based DDM over other applicable DDMs is that DNNs have the ability to approximate any nonlinear function, as guaranteed by the universal approximation theorem. DNNs (Goodfellow et al., 2016), which are inspired by the biological neural networks found in e.g. human brains, typically consist of multiple layers with one layer's output being passed through a non-linear activation function before being used as the input of the next layer. Each layer typically consists of a number of processing units with their own tunable parameters commonly called weights and biases. The nature of, and relations between, the processing units vary across different layer types. We refer to the specific composition of different layers used to define a DNN as that DNN's 'architecture'.
We say that a neural network is 'trained' when the individual parameters are tuned in an effort to make the network a better approximator of the desired function. In this paper, we train the DNNs using the framework of supervised learning, which requires the preparation of sample DNN inputs and corresponding target outputs. During training, for any sample input, the DNN's output is compared to the corresponding target output using a chosen cost function. Then, the backpropagation algorithm (Goodfellow et al., 2016) is used to calculate the gradients of the computed cost with respect to the individual network parameters. Finally, the network parameters are updated, typically using a gradient descent algorithm, such as to minimize the cost function. The cost function used in this work is the commonly used mean squared error. An overview of the training and testing approach for DDM is illustrated in Figure 3 with the color blue.

Hybrid analysis and modeling with CoSTA
Given a PBM, the principal goal of the corrective source term approach (CoSTA) is to modify the governing equation solved by the PBM using a corrective source term, such as to recover the true solution of the problem at hand. In this section, we demonstrate how CoSTA can be used in practice to correct the Implicit Euler FVM for unsteady heat transfer (Equation (26)).
The first step of applying CoSTA to the Implicit Euler FVM is to add the corrective source termσ n+1 to the right hand side of Equation (26), such as to obtain the modified system whose solutions we denote T h (the subscript h corresponds to HAM). Our goal is now to obtain an explicit expression forσ n+1 using the framework from Section 2.1. To this end, notice that Equation (29) is analogous to Equation (20) with the following relations:L From the definition ofr (cf. Equation (19)), we thus havê where we utilized the definition ofẽ num to transition from the first to the second line. As in Section 2.3, we let T n+1 ref denote the true solution which we aim to find, i.e., we have u = T n+1 ref . Moreover, as the analogue of the predictor u num , we chooseT n+1 h given by where A and b are as defined in Equation (29). By inserting forL num , u andũ num in Equation (32), we thus obtain If we now insert T n h = T n ref into the equation above, 4 we get which is the definition of the corrective source term that we will use to generate data for our numerical experiments (cf. Section 3.3). Note thatσ n+1 corrects the error of the Implicit Euler FVM over a single time step. Starting from some know initial temperature profile T 0 = T 0 h = T 0 ref , the combined use of Equations (29) and (36), guarantees is not known a priori, Equation (36) -and hence also Equation (29) -cannot be evaluated using pure PBM. On the other hand, pure DDM can (implicitly) takeσ n+1 into account, but also completely discards what is already known about heat diffusion problems. Instead of going to either of these extremes, we choose a middle ground by training a deep neural network DNN σ to approximate Equation (36) given the predictorT n+1 h defined in Equation (33). The DNN approximation is then inserted into the modified PBM (Equation (29)). That in the place ofσ n+1 in Equation (29) to obtain the HAM prediction T n+1 h .
Training data for the DNN can be generated from a known time series describing the system's past, or from time series describing the temporal development of similar systems, using Equation (36). The whole process of training and testing the complete CoSTA-based HAM model is illustrated in Figure 3 using the color green.

Method of manufactured solutions
A central part of the present study is the method of manufactured solutions (MMS), which has long been a popular tool for verifying the numerical PDE solvers used in PBMs (see e.g. Roache (2002) for an extensive introduction). The key concept of MMS is to choose some explicitly expressible function, and then calculate the source term required for this function to be a solution of the PDE in question. For the 1D unsteady heat diffusion equation, this amounts to deciding upon some temperature function T (x, t) and calculating the source termq(x, t) required for the differential form of the equation (Equation (24)) to be satisfied. Thus, the use of MMS allows us to obtain exact reference solutions T ref of the 1D unsteady heat equation without running expensive high-fidelity simulations. We can then use these reference solutions to evaluate the accuracy of the temperature profiles predicted using PBM, DDM and HAM, as well as for generating DNN training and validation data.

Experimental setup and procedures
To evaluate the performance of the PBM, DDM and HAM models described in the previous section, we have designed a series of numerical experiments (the results of which are presented and discussed in Section 4), where each experiment is based on a manufactured solution of the 1D unsteady heat equation (Equation (23)). The experimental setup and procedures used to conduct these experiments are described in the following section.

Choice of manufactured solutions
All manufactured solutions T (x, t; α) used to conduct the present study are listed in Table 1 (24)). From a real world application perspective, the problem corresponds to a one dimensional solid body initialized (at t = 0) with a temperature profile using the functions given in the Table 1. Then the evaluation of the temperature across the solid body is predicted under the influence of differential heating across the length of the solid body given by the source termq. This heat transfer phenomenon is encountered in a wide variety of real life applications like heat loss / gain through the building walls, transfer of heat from the center of the earth to the atmosphere, and cooling / heating of electric chip on electrical equipment. 5 This can be proven by induction: Solutions T (x, t; α)q(x, t; α) 1 + sin (2πt + α) cos (2πx) 2π (cos (2πt + α) − 2π sin (2πt + α)) cos (2πx)

Parametrization
For each manufactured solution in Table 1, the 22 different α-values listed in Table 2 Table 2). Moreover, each data example contains two vectors, an N j + 2-dimensional DNN input vector and an N j -dimensional DNN target output vector. 7 The precise defintions of these vectors are given in Algorithm 1. Note that, for the experiments described in Section 4.2, we setq = 0 in Equation (26) and Equation (29) to synthesize modeling error. Note also that, for each time level n, all data in the datasets are generated based on T n−1 ref . This means that no error can accumulate across multiple time steps, so DNNs trained using these datasets will necessarily be trained to make corrections across individual time steps. That is, they correct only local, not global, time stepping errors. This choice was made based on three observations: 1) A successful, generalizable reduction in local error will necessarily also reduce global error. 2) Corrections of global errors are hard to interpret due to error accumulation. 3) Data is more efficiently obtained when one data example corresponds to a single time step rather than a complete time series comprising multiple time steps.

DNN training
Both DDM and CoSTA-based HAM require training of DNNs using the datasets described above. computed. The validation cost is computed analogously to the training cost, as described above. We utilize the early stopping regularization technique by stopping the DNN training if a new lowest validation cost has not been recorded for a certain number of consecutive validation periods (this number is denoted "overfit limit" in Table 3).

Model evaluation
When deploying a predictive model for use on a practical application, it is imperative that the model remains . The testing procedure is described in its entirety in Algorithm 2 and is illustrated in Figure 3b for a single time step. Since we perform testing using 4 distinct α-values (cf. Table 2), this procedure yields 5000 · 4 = 20000 test predictions (per model) in any given experiment. We use the 2 -norm to quantify the accuracy of these predictions, as described in the beginning of Section 4. For a general N -dimensional vector v ∈ R N , the definition of the 2 -norm reads where v i are the components of v.

Neural network architectures and hyperparameters
For both DDM and CoSTA-based HAM, we use DNNs with the fully connected neural network (FCNN) architecture illustrated in Figure 4, and the hyperparameters listed in Table 3. The optimal set of hyperparameters were found using a grid search for the DDM and then applied to HAM as well. The number of fully connected (FC) layers, the width of each FC layer and the learning rate were chosen such as to minimize the total validation loss obtained for the two manufactured solutions A and B at the bottom of Table 3 in a parameter grid search. The other hyperparameters values were chosen based on prior experience with DNN tuning. We make no claim that our architecture choices are optimal. To the contrary, we expect that using DNN architectures specifically constructed for time series problems (e.g. LSTM networks) would improve DDM and HAM performance. However, in the present work, we have chosen to use a simple DNN such that its technical details would not obscure our main contribution, Algorithm 1: Data generation procedure for the training and validation datasets. All training data is normalized to have a component-wise mean of 0 and standard deviation of 1. The validation data is normalized using the same normalization coefficients as the training set. For the experiments where modeling error is synthesized, we setq = 0 in Equations (33) and (29).
for T (x, t; α) in Table 1

Results and discussion
In each numerical experiment, we consider one of the manufactured solutions listed in the top section of Table 1.
In any given experiment, we aim to reproduce the time series {T n ref } Nt−1 n=0 for each α ∈ A test using our PBM, DDM and HAM models. To quantify the models' performance, we present the temporal development of the relative The boundary conditions and initial conditions corresponding to the manufactured solutions are assumed known and utilized as applicable for all three model types in all experiments. However, in order to synthesize scenarios where some relevant physics are unknown, for the experiments discussed in Section 4.2, it is assumed that we do not know the source termsq required for the manufactured solutions to satisfy Equation (23). No such limitation is imposed on the models discussed in Section 4.1.

Experiments without modeling error
In this section, we compare the performance of the three approaches PBM, DDM and HAM in situations where we have full knowledge of the physics. This implies that the structure of the governing equation, including the source term and the exact values of the PBM parameters, are fully known. The following sections present the results for interpolation and extrapolation scenarios for Solutions 0 and 3 (cf. Table 1). For the current experiment using Solution 3, we used a finer spatial discretization than elsewhere in this work (N j = 200, rather than N j = 20) in order to demonstrate the full power PBM when all physics is known. Algorithm 2: Testing algorithm used in our numerical experiments. Note that, to match the operating conditions seen during training, all DNN inputs/outputs are normalized/unnormalized using the same normalization coefficients as were used to normalize the training and validation data (cf. Algorithm 1). For the experiments where modeling error is synthesized, we setq = 0 in Equations (27), (33) and (29).

Interpolation scenarios
Define grid nodes x and time levels t 0 , t 1 , . . . , t Nt−1 . for T (x, t; α) in top part of Table 1   where a finer spatial grid was used. For both solutions, HAM improves on the accuracy of PBM by compensating for the discretization error, even when the discretization error is small. In conclusion, PBM generalizes better than DDM, while HAM generalizes approximately as well as PBM in the presence of numerical error. This ability of the HAM to correct for the numerical error can be exploited to relax the mesh requirement needed for solving equations at a fixed error tolerance. That is, equations can be solved on a coarse mesh in a computationally efficient manner, and CoSTA-based HAM can be used to correct for the numerical errors resulting from the coarse discretization.

Experiments with modeling error
In this section we present the results corresponding to manufactured solutions 1-4, given in Table 1. The results corresponding to interpolation are presented first, followed by the results corresponding to extrapolation.

Interpolation scenarios
As can be seen from the bottom halves of Figures 9-12, the predictions of CoSTA-based HAM are qualitatively correct for all interpolation scenarios. DDM also provides qualitatively correct predictions for Solutions 1, 2 and 4 (cf. Figures 9, 10 and 12), but its final-time-level predictions for Solution 3 deviates significantly from the reference profiles (cf. Figure 11). Furthermore, the relative errors of the HAM predictions are consistently more than one order of magnitude lower than those of the DDM predictions, as shown in the top halves of Figures 9-12; the difference in accuracy is particularly striking for Solution 2 (cf. Figure 10). Since HAM and DDM both utilize the same DNN setup and training routine, the significant performance difference can only be explained by the utilization of PBM inside HAM. As discussed previously, the combined use of a PBM and a DNN in CoSTA-based HAM allows some relevant physics to be accounted for by the PBM, thereby making the learning task of the DNN easier than in the DDM case where all relevant physics must be learnt by the DNN. It is interesting to note that HAM clearly benefits from the PBM in spite of the PBM yielding low quality predictions on its own; the PBM predictions are qualitatively incorrect for Solutions 2 and 3 (cf. Figures 10 and 11), and their errors are roughly one order of magnitude larger than the corresponding DDM errors for Solutions 1 and 4 (cf. Figures 9 and 12). Another interesting observation is that the DDM and HAM error curves are less smooth than those of PBM, suggesting that the DNNs inject some pseudo-random noise into the DDM and HAM predictions. This noise could possibly be reduced by informing the DNNs of the temporal correlation between data at subsequent time levels through input (e.g. use two subsequent profiles as input, rather than just a single profile), using a different DNN architecture (e.g. LSTM architecture) and/or a different training regime (e.g. perform adversarial training with a temporal discriminator).

Extrapolation scenarios
Owing to its well-known generalizability, the PBM performs similarly in the extrapolation scenarios as in the interpolation scenarios. However, as the accuracy of PBM was poor in the interpolation scenarios due to significant modeling error, the PBM results shown in Figures 13-16 are also poor in general. DDM performs better than PBM, providing qualitatively correct predictions for Solutions 2 and 4 (cf. Figures 14 and 16). The better performance of DDM in comparison to the PBM can be attributed to the fact that DDM, though generally poor in generalization, still learns about the nature of the unknown source term from the data. However, in terms of accuracy, HAM significantly outperforms DDM for both these manufactured solutions. HAM also outperforms DDM for Solution 1, where the DDM prediction for α = −0.5 is qualitatively incorrect while the HAM prediction is not (cf. Figure 13c).
Furthermore, HAM also outperforms both PBM and DDM for Solution 3 with α = 2.5 (cf. Figure 15b). The remaining scenario -Solution 3 with α = −0.5 -proves to be the most difficult scenario considered, as all three models fail to provide qualitatively correct predictions for this scenario (cf. Figure 15c). Thus, while CoSTA-based HAM generally performs well in our experiments, the results for Solution 3 with α = −0.5 show that significant improvements are still possible and desirable. In addition to ensuring temporal coherence, as previously discussed, utilizing problem-specific data augmentation techniques might be one way of boosting HAM's performance. Another possibility is to use a more accurate PBM as a basis for the full CoSTA model which will result in more relevant physics being accounted for by the PBM, thereby simplifying the learning task of the DNN.

Conclusion and future work
This paper has introduced the Corrective Source Term Approach (CoSTA) to hybrid analysis and modeling (HAM). The method utilizes a deep neural network (DNN) to generate a corrective source term that augments the discretized governing equation of a physics-based model (PBM). The purpose of the corrective source term is to account for physics that is unresolved by the PBM. In a series of numerical experiments on one-dimensional heat diffusion problems, we have compared the performance of CoSTA-based HAM with that of the uncorrected PBM and that of a comparable data-driven model (DDM). The most important conclusions from the study are as follows: • In the absence of any modeling error, i.e., when the full physics were known, the PBM was more accurate than the DDM. The PBM also generalized well to the extrapolation scenarios, while the DDM did not. On the other hand, HAM improved on the accuracy of the PBM in the interpolation scenarios, while still performing better or very close to the PBM in the extrapolation scenarios. The accuracy improvements of HAM can be attributed to its ability to compensate for the numerical discretization error. This ability can be exploited to accelerate numerical solvers by enabling the use of coarse meshes while retaining accuracy, as unresolved subgrid-scale physics can be accounted for by the corrective source term. This feature of HAM will be extremely useful in digital twin applications as accurate and real-time performance of the models is of utmost importance.
• In the presence of modeling error i.e., when some relevant physics was unknown, the PBM failed to make reasonable predictions. The DDM generally outperformed the PBM for both interpolation and extrapolation scenarios, since the DDM could learn about the unknown physics from data observations. However, HAM consistently performed better than both PBM and DDM, often decreasing the relative error of the predictions by several orders of magnitude. The improvement in predictions compared to DDM was even more pronounced for extrapolation scenarios; in other words, HAM was found to be more generalizable to new, unseen scenarios.
In the context of digital twin applications, it is foreseen that new scenarios never witnessed before could occur from time to time, and HAM will be able to deal with such situation better than PBM and DDM.
The work presented herein clearly illustrates the great potential that can be unlocked by combining PBM and Notable examples include improving the dynamic model of a system (like a vehicle or a process), improving the accuracy of building energy simulation models using regular measurements, modelling complex physical phenomena like turbulence whose full physical understanding is still limited, accelerating high fidelity numerical solvers, and stabilizing ROMs based on significant dimensionality reduction.
One important benefit is that HAM aims to utilize existing knowledge to the greatest extent possible, such that, for any given problem, HAM requires less complex data-driven techniques than pure DDM. This can improve the predictability and interpretability of the DDM, thereby making the model more trustworthy. Another related benefit is that the DNN-generated corrective source term of CoSTA can be explained and analyzed within the framework of a PBM based on known first principles. This allows for an inbuilt sanity check for the data-driven part of the HAM model. For example, in the case of heat diffusion modeling, if there exists known bounds for the energy transferred to a system, then the DNN-generated source term must also be bounded, and any violation of these bounds can be automatically detected. The lack of such automatic DNN-misbehaviour detection in pure DDM is one of the reasons why the black-box nature of DNNs has inhibited the acceptance of data-driven techniques in high-stakes applications. As for digital twins, CoSTA facilitates the use of coarse meshes in numerical solvers through its ability to correct both modeling errors and discretization errors, thereby improving computational efficiency while retaining accuracy. Furthermore, since the learning of the source term can continue even after model deployment, CoSTA allows for any new phenomena encountered by the digital twin to be learned automatically. Thus, CoSTA has the potential to develop models which will be generalizable, trustworthy, accurate and computationally efficient, and self-evolving. However, while our results are promising, more research is required with regards to the accuracy, stability and interpretability of the learned source term, such as to ensure that the findings of this paper generalize to other physical systems and processes. Making optimal choices for PBMs, as well as for DNN architectures and training regimes, are other interesting research directions. Some of the outlined improvements and extensions will be the topic of future research by the authors, but we hope that this paper will inspire other scientists to pursue the outlined research directions as well.   This is the approach I have been using.