Multi-fidelity physics constrained neural networks for dynamical systems

Physics-constrained neural networks are commonly employed to enhance prediction robustness compared to purely data-driven models, achieved through the inclusion of physical constraint losses during the model training process. However, one of the major challenges of physics-constrained neural networks consists of the training complexity especially for high-dimensional systems. In fact, conventional physics-constrained models rely on singular-fidelity data necessitating the assessment of physical constraints within high-dimensional fields, which introduces computational difficulties. Furthermore, due to the fixed input size of the neural networks, employing multi-fidelity training data can also be cumbersome. In this paper, we propose the Multi-Scale Physics-Constrained Neural Network (MSPCNN), which offers a novel methodology for incorporating data with different levels of fidelity into a unified latent space through a customised multi-fidelity autoencoder. Additionally, multiple decoders are concurrently trained to map latent representations of inputs into various fidelity physical spaces. As a result, during the training of predictive models, physical constraints can be evaluated within low-fidelity spaces, yielding a trade-off between training efficiency and accuracy. In addition, unlike conventional methods, MSPCNN also manages to employ multi-fidelity data to train the predictive model. We assess the performance of MSPCNN in two fluid dynamics problems, namely a two-dimensional Burgers' system and a shallow water system. Numerical results clearly demonstrate the enhancement of prediction accuracy and noise robustness when introducing physical constraints in low-fidelity fields. On the other hand, as expected, the training complexity can be significantly reduced by computing physical constraint loss in the low-fidelity field rather than the high-fidelity one.


Introduction
Computational simulations of fluids and other complex physical systems have critical applications in engineering and the physical sciences such as aerodynamics [1], heat transfer [2] and acoustics [3].Historically, many of these systems were effectively described using partial differential equations (PDEs).Traditional discretization and solution approaches, such as Finite Difference Method [4,5], Finite Volume Method [6,7] and Lattice Boltzmann Method [8,9], have been proven reliable for achieving high fidelity and high accuracy results.However, the slow computational speed and demanding significant resources [10,11] make it less ideal for real-time predictions in high dimensional systems.When conducting simulations of transient smoke or pollutant transport within an enclosed space, such as a hotel lobby, conventional computational fluid dynamics (CFD) techniques can require a full day of computational time on a personal computer for a just 10-minute event [12].
Faced with the high computational demands of traditional fluid dynamics methods [13,14,15], researchers increasingly turn to Reduced Order Modeling (ROM), encompassing deep learning (DL) and machine learning (ML) technologies [16,17].Autoencoders (AE) and recurrent neural networks (RNN) such as Long Short-Term Memory (LSTM) [18] networks are especially important in this regard, used for efficiently processing data and predicting evolution in latent space.For instance, Maulik et al. [19] employed a convolutional autoencoder (CAE) combined with LSTM to address the shortcomings of the proper orthogonal decomposition (POD) in capturing interactions during temporal evolution.Building on this, Nakamura et al. [20] introduced a CAE-LSTM model for high-dimensional turbulent channel flow systems.Meanwhile, Kim et al. [21] adopted a convolutional neural network (CNN) based generative model for parameterized fluid velocity fields, streamlining both fluid simulation and data compression.However, these purely data-driven methods face challenges, particularly in ensuring generalisation capability for new scenarios [22] and guaranteeing physically realistic outputs [23,24,25].
To address these issues, Physics-Constrained Neural Networks (PCNN) [26,27,28] improve model accuracy and generalisation ability by introducing physical constraint losses during the training process.PCNN integrates physical constraints into the model, reducing dependency on large amounts of high-quality training data, guiding optimisation paths, improving generalisation errors, and reducing prediction uncertainty [29,30].For instance, Fu et al. [31] introduce a Physics-Data Combined Machine Learning (PDCML) approach that employs Proper Orthogonal Decomposition (POD) and physical constraints to enhance parametric reduced-order modeling, particularly in limited data contexts.Mohan et al. [32] proposed a CNN model that incorporates the incompressibility of a fluid flow and demonstrated its effectiveness.Karbasian et al. [33] developed a new approach for PDE-constrained optimisation of nonlinear systems that transformed the physical equations from physical space to non-physical space.In the prototype problem of fluid flow prediction, Erichson et al. [34] proposed a model that incorporates physical information constraints and maintains Lyapunov stability by training an AE, which not only improves generalisation error but also reduces prediction uncertainty.
Although incorporating physical constraints into machine learning offers numerous advantages over purely data-driven approaches, it comes with its own set of challenges.During the training of ROMs, the direct application of the physical laws isn't straightforward as the evolution transpires in latent space.The latent representations need to be decoded from the latent space back to the full physical space to evaluate these laws [35].However, due to the fixed input size of the ROMs, especially when inputs are in high-fidelity field, employing physical constraints will consume a lot of computing resources.Therefore, if we can map the latent space driven from a high-fidelity field to a low-fidelity counterpart, the physical constraints can be applied within the low-fidelity space.By doing so, we unlock the potential to leverage the physical constraint losses at a low-fidelity level for model optimisation, effectively alleviating the computational burdens and complexities.Moreover, in real-world scenarios, we often encounter data in varying fidelities, which cannot be fully used due to the fixed neural network input size.Examples could be found in the field of meteorology [36,37,38].The data is obtained from several sources, including ground stations, satellites, balloons, and aircraft, each offering information with varying degrees of accuracy and reliability.Ground stations provide data that is specific to a particular location, whereas satellites offer a wider coverage area but with a decreased level of detail [39].As a result of limitations in model input size, it is hard to fully leverage all of the multi-fidelity data.Besides, low-fidelity data is easier and cheaper to obtain, while high-fidelity data is more resource consuming [40].If our high-fidelity data and its low-fidelity counterpart can achieve the same latent representation, an anticipated method would efficiently leverage all the levels of data fidelity for training and guide and constrain the high-fidelity modelling by low-fidelity physical constraints, ensuring a balance between computational efficiency and physical accuracy.
In recent years, multi-fidelity data has been harnessed primarily for several central purposes.Firstly, a surrogate model will be employed to integrate models trained on data of varying fidelity, aiming to construct a comprehensive model that captures the accuracy of high-fidelity data and the computational efficiency of low-fidelity data.Xiong et al. [41] proposed a model fusion technique based on Bayesian-Gaussian process modeling to develop cost-effective surrogate models, integrating data from both high-fidelity and low-fidelity sources and quantifying the surrogate model's interpolation uncertainty.Secondly, it involves utilising low-fidelity data to estimate or generate high-fidelity data, hence circumventing the computational expenses associated with directly obtaining high-fidelity data through simulations.Geneva et al. [42] provide a multi-fidelity deep generative model that is specifically developed for surrogate modelling of turbulent flow fields with high-fidelity utilising data obtained from a low-fidelity solver.In addition, multi-fidelity data is used to fine-tune the varying parameters in multi-scale PDEs to enhance predictive accuracy.Park et al. [43] proposed an approach that adopted a physics-informed neural network that leverages a priori knowledge of the underlying homogenised equations to estimate model parameters based on multi-scale solution data.Finally, there is an emerging practice of utilising low-fidelity data as an additional resource to improve the effectiveness of high-fidelity models.Romor et al. [44] constructed a low-fidelity response surface based on gradient-based reduction, which facilitates the updating of the nonlinear autoregressive multi-fidelity Gaussian process.However, to the best of the author's knowledge, there is no existing model or method that can leverage physical constraints in low-fidelity field to both alleviate computational burdens and ensure prediction accuracy.
In response to the above challenges, we introduce a deep learning method designed for multi-scale physical constraints, termed the Multi-Scale Physics-Constrained Neural Network (MSPCNN).Our methodology involves employing two distinct AE models tailored for high-and low-fidelity data, respectively.The first AE is trained exclusively on the high-fidelity data.For the second AE, we separately train its encoder on low-fidelity data to map it into the same latent space as the first AE, and its decoder to reconstruct the low-fidelity data from the latent representations driven from high-fidelity counterparts.Subsequently, we formulate an LSTM model embedded with physical constraints that takes the latent representations obtained by the AEs as input and uncovers the evolution laws of the physical system within the latent space.During the training of the LSTM, besides the basic metrics, such as MSE, the compressed data will be decoded to the low-fidelity field, forming the computation of the physical constraint loss that guides model refinement.Additionally, because the LSTM accepts the latent representations as input, which can be derived from data in various fidelities, the low-fidelity data can contribute to the training of high-fidelity surrogate models, considerably curbing its computational demands [45].In our study, we selected two numerical tests, a two-dimensional Burgers' system and a Shallow Water system.Both of these cases are frequently employed as benchmarks in scientific machine learning [46,47,48].Specifically, the Burgers' system is characterized by its relative simplicity and its ability to depict two-dimensional variations in viscous fluids.Conversely, the Shallow Water system captures the twodimensional horizontal dynamics of a body of water.Moreover, the Shallow Water equations encompass several temporal and spatial scales, rendering it well-suited for the validation of multi-scale models like MSPCNN.
In summary, we make the following contributions in this study: 1. We propose a novel physics-contrained machine learning model, named MSPCNN.
It innovatively leverages physical constraints in low-fidelity field for the training of high-fidelity models, making a balance between computational efficiency and physical accuracy.2. By integrating and unifying data of varying fidelity, multi-fidelity data can be used for training MSPCNN.This integration also ensures that the trained models can be flexibly adapted to yield results across different fidelity levels.The rest of this paper is organised as follows.In Section 2, we introduce the stateof-the-art PCNNs for high-dimensional dynamical systems.Section 3 presents the structure of MSPCNN and details the training methodology for it.Two numerical experiments, specifically a two-dimensional Burgers' system and a Shallow Water system, are discussed in Section 4 and Section 5, respectively.Finally, we conclude and summarise our findings in Section 6.

Physics constrained reduced order modelling: state of the art
This section focuses on the structure of state-of-art PCNNs for high dimensional dynamic systems.These models include reduced order modelling (AE), surrogate models based on recurrent neural networks (LSTM), and the incorporation of physical constraints and they are integrated in the way as shown in Fig. 2 [32,40].

Reduce Order Modelling: AE
An AE is a specialised form of neural network designed to reduce the dimensionality of input data while preserving its key features.
AE operates through an encoder-decoder architecture as shown in Fig. 2 Encoder-Decoder Training part.The encoder F e compresses the input data x t = [x 1 , x 2 , . . ., x n ] ∈ R n at time t by applying hidden layers and down-sampling, capturing essential features in a compressed latent representation In contrast, the decoder F d works to reconstruct the state vector x r t = [x r 1 , x r 2 , . . ., x r n ] ∈ R n from this latent form η t , employing up-sampling and hidden layers, i.e., The encoder and decoder are trained jointly.The training objective is to minimize the reconstruction error, i.e., the mismatch between the original input and the decoded output.For instance, if we employ the MSE as our loss function J (.): where θ Fe and θ F d are the parameters of encoder and decoder, {x 1 , x 2 , . . ., x Nstep } representing the total evolution process from initial state to the final state.N step is the total number of time steps (i.e., training samples), and ∥ • ∥ 2 represents Euclidean norm.

RNN-based Surrogate Model: LSTM
After processing the original data x t through the AE, the compressed data η t in the latent space is obtained.As the next step, it's crucial to understand the dynamics and evolution patterns within these latent representations to make accurate predictions.Since our aim is to predict the physical system behavior in long term, it is essential to choose a model that can efficiently capture temporal dependencies spanning across lengthy sequences.In light of this, researchers have opted for LSTM networks [49].Unlike traditional RNNs, which often struggle with the vanishing gradient problem [18], LSTMs are specifically designed to remember long-range dependencies in sequential data, making them an optimal choice for our requirements.LSTM also delivers a way for sequence to sequence (seq2seq) prediction (LSTM accepts k in time steps as input and gives k out time steps as output), which can decrease the online computation time, and more importantly, reduce the accumulated prediction error.For time series that encode latent representations [η 1 , η 2 , . . ., η Nstep ], LSTMs can be trained by shifting the starting time step: where ηt is the predictive result.During the training phase, various loss functions, such as MSE or mean absolute error (MAE), can be employed to quantify the difference between the predicted latent representations and the true latent representations.
When making predictions, we employ it in a circular forecasting to achieve long-time predicting as presented in Fig. 2 and Eq.4: . . .

Physical Constraints
As pointed out by [50], reducing the accumulated prediction error becomes especially critical when we use recurrent forecasting to achieve long-time predictions.
The adoption of physical constraints helps to enhance the accuracy and reliability of predictions, which is an important tool for optimising long time forecasts [51].Specifically, ML or DL models can integrate physical constraints by establishing learning biases, which are enforced during the learning process by imposing suitable penalties.Traditionally, the physical constraints can only be applied in the full physical space.Therefore, the latent representations need to be decoded to physical space for evaluating physical loss during the training procedure as shown in Fig. 2 predictive model training part.In a seq2seq prediction model, the composite physicsconstrained loss function for a single prediction step, J (referred to as Specific Loss in Fig. 2), is given by: where [η t:t+k in −1 ] is the sequence input of LSTM and [η t+k in :t+k in +kout−1 ] is the sequence output of LSTM, l data denotes the loss function used to measure the discrepancy between the predicted and true latent representations, l physics represents physics-based regularisation term, c is the number of physical constraints we applied, and α is its associated coefficient.In our practice, coefficients are determined using Optuna, a hyperparameter optimisation framework, where values are randomly selected within specified ranges in each iteration to identify optimal parameters efficiently and refine model performance.
Here we introduce two physical constraints, energy conservation and flow operator.

Energy Conservation
Energy conservation is a crucial physical constraint in many applications of physical models, such as flow simulations [52] and heat transfer simulations [53].This principle dictates that the total energy in a system remains unchanged over time, especially in isolated scenarios where no external forces or energy transfers are present.Therefore, in a data-driven model, the constraint of energy conservation can be integrated into the loss function by defining an appropriate energy conservation regularization term [54].Therefore, we define an energy conservation loss function l energy to measure the gap between the energy of the output data E out and the input data E in , and then add this loss term with a coefficient to the total loss function as demonstrated in Eq. 5.For a single prediction step, we get: where E denotes the function used to compute the total energy, consisting of both potential and kinetic energy, and | • | represents the absolute value.

Flow Operator
Flow operators [51], denoted as f , usually appear in fluid mechanics problems, such as the shallow water equations [55], and the Burgers' equation.In such problems, flow operators can be used to describe the change of properties such as velocity field and pressure field of the fluid with time.In our work, we've adopted a seq2seq prediction framework that simultaneously predicts continuous time steps, simulating the temporal evolution of fluid behaviors.We anticipate that the relationships between results of multiple time steps within single output adhere to the underlying flow operators.Therefore, we apply this operator to the last element of the input sequence η t+k in −1 (the single prediction step is demonstrated in Eq. 5), calculating the sequence output that would be derived from solving the associated PDE.The deviation between this physically-driven output and the model's prediction is then incorporated into the loss term, l flow .Our model ensures both physical consistency and alignment of its predictions with the underlying physics described by the PDE.For a single prediction step, we get: where x fp is the flow prediction data.When considering physical constraints, it is necessary to decode the hidden representations back into physical space, where the physical laws are applicable, as indicated by Eq. 6 and Eq. 7. In this process, due to the high dimension of original data, the implement of physical constraints necessitates a substantial amount of computation resources [56,57].If an interaction between high-fidelity and low-fidelity data were established, the physical constraints could be employed on low-fidelity physical space, which can definitely decrease the cost of utilisation of physical constraints.The establishment of such an interaction presents the potential to unlock substantial efficiency improvements in computer modelling.With this motivation, we present our innovative methodology in the subsequent parts, which aims to establish a connection between high-fidelity and low-fidelity data, while leveraging the advantages offered by each domain.

Multi-Scale Physics-Constrained Neural Network
Now, we will introduce our newly proposed MSPCNN in detail.To clarify the main innovative design of MSPCNN, the flowchart is shown in Fig. 3.It can be seen that the main differences between the MSPCNN and PCNN are the training process of CAEs and the implementation of physical constraints.

Multi-Fidelity CAE
Conventional models commonly employ a CAE to handle a singular level of data fidelity.This paper presents a multi-fidelity CAE architecture, as demonstrated in Fig. 3 Encoder-Decoder Training part, that comprises two separate CAEs, each specifically tailored for processing high-fidelity or low-fidelity input, respectively.The fundamental aspect of this design lies in the fact that despite the distinct levels of fidelity at which the two CAEs operate, they both facilitate the transformation of data into a latent space that is shared between them.Consequently, this shared latent space enables the identical representation between data from high-and low-fidelity fields of the same phenomenon.
Explicitly, a CAE is developed specifically for the purpose of handling high-fidelity data firstly.In this context, the encoder F h,e is responsible for compressing the original high-fidelity data x h,t into the latent space, resulting in the latent representation η t .Afterwards, the decoder F h,d employs the latent representation to recover the initial data, resulting in x r h,t .In order to train the CAE, a loss function J (θ F h,e , θ F h,d ) based on MSE is employed.The objective of this loss function is to minimise the discrepancy between the reconstructed data and the original data, as seen in Eq. 8.
The peculiarity of the CAE for the low-fidelity data lies in its objective to align with the latent space of the CAE for the high-fidelity data.In other words, these CAEs are compressing data in different levels of fidelity into a shared latent space.In order to achieve this objective, the training process initially focuses solely on the encoder F l,e , which is responsible for compressing the low-fidelity data x l,t into the latent space that is obtained using the high-fidelity data.The design of the loss function is characterized by its distinctiveness, as it strives to minimize the discrepancy between the low-fidelity data representation in the latent space and the corresponding representation of the high-fidelity data.
Subsequently, the decoder F l,d is trained separately for the low-fidelity data.The objective is to restore the low-fidelity data from the shared latent space.Once again, the MSE is utilised in order to minimize the discrepancy between the reconstructed data and the original data, as demonstrated in Eq. 9.
Encoder Training: η l,t = F l,e (x l,t ) and J (θ The algorithm of multi-fidelity CAE can be referenced as Algorithm 1 in this study.In summary, the approach commences by conducting training on the first CAE using high-fidelity data.Subsequently, the encoder of second CAE is trained using low-fidelity data, while the decoder is trained using high-fidelity data, which are first encoded into the shared latent space through the high-fidelity encoder.

LSTM in the shared latent space
The LSTM plays a pivotal role in processing the sequential data mapped into the fixed latent space by the two CAEs which have been trained at last stage, serving as the primary structure for predicting evolution.When applying physical constraints, the latent representation outputs are then decoded to low-fidelity prediction via the low-fidelity decoder.This allows for the evaluation of physical constraint errors in the low-fidelity level as shown in Eq. 10. [ where α 1 and α 2 are the associated coefficient of l energy and l flow .
For the energy conservation regularization, the low-fidelity constraint is derived from Eq. 6 as manifested in Eq. 11: Compute loss: Update parameters θ F h,e , θ F h,d using Adam optimiser Obtain η t using high-fidelity encoder: η t = F h,e (x h,t ) 18: Compute η l,t : η l,t = F l,e (x l,t ) 19: Compute loss for encoder: Update encoder parameters θ F l,e using Adam optimiser Compute loss for decoder: Update decoder parameters θ F l,d using Adam optimiser 25: end for 26: end procedure Furthermore, for the flow operation regularization, the low-fidelity constraint can be get from Eq. 7 as manifested in Eq. 12: where f l represents the flow operator in low-fidelity field, x fp l is the flow prediction data in low-fidelity field.The process of the algorithm of LSTM is summarised in Algorithm 2.
Additionally, the output from the predictive model (LSTM) remains in the form of latent representations.To obtain the final predictions in the full physical space, these representations must be passed through a decoder, as illustrated in Fig. 3 to gain the final outputs.The specific loss of the LSTM can overall be written as: Overall, compared with the PCNN, central to our proposed method is the strategic use of a shared latent space achieved by leveraging multi-fidelity CAE.This shared latent space is essential as it facilitates the smooth mapping of data across different fidelities.In other words, various fidelities data can get the same latent representation with different encoders, and the compressed data can also be decoded into either Extract sequence from input: x h,t:t+k in −1 and target x h,t+k in :t+k in +kout−1

22:
Compute loss: Update LSTM parameters θ LSTM using Adam optimiser end for 26: end procedure a low-fidelity or high-fidelity space as desired.With such a characteristic, predictive model can leverage both high-and low-fidelity data for training simultaneously and the physical constraints can be applied in low-fidelity level for high-fidelity surrogate model training.By applying physical constraints at the low-fidelity level, significant training costs can be saved compared to imposing them at the high-fidelity level.Furthermore, MSPCNN maintains the LSTM's structure intact throughout the optimisation process, ensuring that the online prediction phase remains computationally efficient and aligned with the conventional predictive models in terms of resource usage.

Numerical example: Burgers' Equation
Burgers' equation is a fundamental PDE occurring in various areas, such as fluid mechanics, nonlinear acoustics, and gas dynamics.The numerical results for the Burgers' system in this paper are derived by solving the equations using spatial discretisation with backward and central difference schemes for convection and diffusion terms, respectively, and time integration using the Euler method.In our evaluation of the MSPCNN, we employ high-fidelity and low-fidelity simulations of the 2D Burgers' equation problem.Both simulations, albeit at different resolutions, depict the same physical phenomenon, with time appropriately scaled for consistency.The domain for the high-fidelity simulation is defined as a 129×129 grid, while it is 33×33 for the low-fidelity simulation.The boundaries of these squares are configured with Dirichlet boundary conditions.The viscosity is 0.01N • s • m −2 and the initial velocity ranges from 1.5m • s −1 to 5m • s −1 .The equations are presented as: where u and v represent the velocity components and t is time, x and y represent the coordinate system.Re is the Reynolds number, which can be calculated by Re = V L υ , where V is the flow speed, specified as initial velocity, L is characteristic linear dimension and υ is viscosity.
Specifically, we use the recurrent prediction method, as shown in Eq. 4, where k in = k out = 3, to predict Burgers' Equation.In order to deeply explore the model's performance and the impact of various constraints, we designed the following sets of controlled experiments: (a) High-fidelity constraint: We use a single physical constraint, such as energy conservation (EC) or flow operator(FO), and only apply them on high-fidelity field to explore its effect.(b) Low-fidelity constraint: Under the same physical constraints, we apply the physical constraint in low-fidelity field to constrain high-fidelity surrogate models.

Effect of multiple physical constraints:
(a) High-Fidelity Multiple Constraints: We use multiple physical constraints, including energy conservation (EC) and flow operator(FO), and apply them on high-fidelity field to explore its effect.(b) Low-Fidelity Multiple Constraints: Under the same physical constraints, we apply the multiple physical constraints in low-fidelity field to constrain high-fidelity surrogate models, and compare the effect with multiple physical constraints in high-fidelity field.
These experiments aim to gain insight into the role and performance of low-fidelity data in model training and constraints.

Validation of Multi-Fidelity CAE in Burgers' Equation
Firstly, We showcase the efficacy of our multi-fidelity CAE in efficiently handling both high-fidelity and low-fidelity Burgers' equation data.Fig. 4 underscores the adeptness of our multi-fidelity CAE in transforming data between various fidelity levels.The first two rows in Fig. 4 illustrate a comparison between the original high-fidelity data and its reconstructed version derived from low-fidelity data.Similarly, the third and fourth rows display the original low-fidelity data alongside its reconstructed version obtained from high-fidelity data.The reconstructions exhibit high precision, laying a solid foundation for subsequent utilisation.These findings demonstrate that the shared latent space is capable of capturing high-fidelity field details while encoding low-fidelity data.

Training on high-fidelity data versus multi-fidelity data
As illustrated in Fig. 5, we compare a pure LSTM model using 300 high-fidelity samples against one trained with an additional 300 low-fidelity samples using the multi-scale encoder and decoder as explained in section 3. The difference in Fig. 5  is calculated at each point as the absolute value of direct subtraction of the predicted value from the actual value, which represents the absolute error at each point.Turning to Fig. 6, this graph details how the MSE and standard deviation change cumulatively as the time step increases.From Fig. 6, we can clearly see that the supplement of low-fidelity data can bring a significant improvement in prediction accuracy while reducing the uncertainties represented by the transparent zones.It's important to note that our model employs a seq2seq approach for computations, meaning the output is a sequence.However, when calculating loss and standard deviation (std), we disaggregate this sequence, comparing each time step individually with the ground truth.For the loss and std, we compute the mean squared error for each predicted timestep and then calculate the std across all cycles, reflecting model performance variability over time.This method is consistently applied across all performance figures and encompasses the entire test dataset.However, in light of the statistical results, the predictions in Fig. 5 show an opposite error graph.Our observations suggest that utilising multiple datasets centralises the errors, which results in the amplification of the error peak.This phenomenon will be further analysed in subsequent sections.

Effects of a Single Physical Constraint on the Model
In Fig. 7, we showcase the predictions of the MSPCNN and PCNN with the energy conservation constraint employing in low-fidelity (LF-EC) and high-fidelity (HF-EC) fields, respectively, compared with the basic LSTM and highlight the difference with the groundtruth.Furthermore, Fig. 8 shows the performance of these three different models in long-time prediction.Compared to the basic LSTM approach, these results show that both HF-EC and LF-EC can significantly reduce the MSE and the range of standard deviations which are visibly evident from the shaded part in Fig. 8, underscoring that physical constraints not only diminish prediction error but also augment the model's robustness when applied in the training process.Referring to Table 1, when applying the energy conservation constraint in the high-fidelity field, the MSE is reduced by nearly 85% compared to the basic LSTM model, where the low-fidelity model demonstrates an improvement of 52% relative to the basic model.However, by leveraging the energy conservation constraint in the low-fidelity field, our model can achieve around 60% of high-fidelity model's performance with only 50% of its training time.
Transitioning to Fig. 9, the prediction performances of the MSPCNN and PCNN under the constraints of low-fidelity (LF-FO) and high-fidelity (HF-FO) flow operators and their deviations from groundtrue are showcased, respectively.Fig. 10 and Table 1 complement the description of the cumulative trend of performance metrics and training time.Upon implementing the flow operator constraint, the MSE for LF-FO is reduced by approximately 66% compared with the basic LSTM.Meanwhile, for HF-FO, the MSE sees a more substantial reduction, decreasing the error by over  It is worth noting that, comparing Fig. 7 and Fig. 9 with Fig. 8 and Fig. 10, the predictions under high-fidelity physical constraints demonstrate a higher error peak, despite a lower overall MSE, which also appears in section 4.2.To further clarify this point, we plot the histogram of prediction errors for the last step of the recurrent prediction (as shown in Fig. 11).From Fig. 11, we observe that while the upper bound of the error (i.e., the maximum error) does increase when high-fidelity data is introduced, the frequency of low errors increases accordingly, leading to a reduction in the overall MSE.In contrast, the low-fidelity restriction strategy demonstrates superior performance in this aspect.As illustrated in Fig. 11, applying physical constraints in the low-fidelity field by MSPCNN not only improves the proportion of lower errors but also doesn't result in the amplification of the error peak.Compared to the basic LSTM model, the introduction of both the energy conservation constraint and the flow operator constraint in the low-fidelity field has successfully lowered the upper bound of errors from 0.175 to around 0.11.Furthermore, the histogram reveals that the distribution quantity within the 0-0.1 range is greater than that of the basic model.
The amplification of the error peak in Burgers' equation can be attributed to several factors.While the equation describes a relatively simple process, during backpropagation, the model tends to prioritize the surrounding regions of the Burgers' system due to their similar physical characteristics.This dominance causes the model to overly focus on surrounding regions, often neglecting the central evolution area and leading to increased errors there.For example, when the flow operator is used as a physical constraint, although the error in the central evolution area increases, a certain degree of error will not have a large impact on the evolution of the entire area because the velocity of the area itself is relatively large.Nevertheless, in surrounding regions characterised by consistently low and stable velocities, the presence of a substantial error has the potential to initiate a propagating disturbance.This phenomenon has the potential to cause substantial deviation from the groundtruth across the whole surrounding region.This means that the backpropagation of the model is more accurate in these surrounding regions, resulting in lower errors in these regions, while the error increases in the central regions.Additionally, as seen in Table 1, and Fig. 5, 7, 9, this phenomenon is alleviated as the error decreases.Hence, when the error diminishes, the accuracy of predictions in the central region improves.

Effect of Multiple Physical Constraints
The application of a single physical constraint has been shown to improve the long-time predictive accuracy of the model.To explore the effect of applying multiple physical constraints, we further build models incorporating both "energy conservation" and "flow operator" constraints, and test them in the high-(HF-MulCons) and low-fidelity (LF-MulCons) field.The corresponding prediction results are shown in Fig. 12, while Fig. 13 details the cumulative change of MSE and standard deviation with increasing time steps.Notably, when comparing Fig. 13 with Fig. 8, 10, it becomes evident that upon employing multiple physical constraints, the disparity between the low-fidelity model and the high-fidelity model is markedly reduced compared to scenarios with a single physical constraint.Table 1 demonstrates more statistical details.The LF-MulCons model reaches about 80% of the HF-MulCons model's accuracy while requiring only 33.5% of its training time per epoch.When comparing LF-MulCons to LF-EC and LF-FO, it is shown that LF-MulCons delivers superior MSE performance, with a slight rise in computational requirements.This finding shows that our model is capable of providing a compromise between accuracy and computational demand in multi-constraint scenarios.
Interestingly, in Fig. 12, the amplification of the error peak previously observed in high-fidelity single-physics-confined PCNN now appears in low-fidelity multiplephysics-confined MSPCNN.It is noteworthy that by employing the low-fidelity multiplephysics-confined MSPCNN, we can achieve the predictive performance of the highfidelity single-physics-confined PCNN.This suggests that multiple constraints at a low-fidelity field can potentially substitute for a single or fewer constraints at a high-fidelity field.Moreover, with further improvement in prediction accuracy, the  Overall, MSPCNN showcases its ability to integrate data across different fidelities to train a high-fidelity predictive model, thereby enhancing its accuracy.Furthermore, when implementing MSPCNN with low-fidelity physical constraints in the Burgers' system, it becomes evident that the model effectively strikes a balance between accuracy and computational efficiency.

Numerical example: Shallow Water
In the previous section, we showed that MSPCNN can efficiently fuse data with different fidelity for prediction, and confirmed on the Burgers' equation that it is completely feasible to optimise a high-fidelity model using low-fidelity physical constraints.In order to gain a deeper understanding of MSPCNN's ability to deal with complex phenomena in optimising high-fidelity models using low-fidelity physical constraints, we further conduct shallow water experimental verification.The shallow water equations are a set of hyperbolic partial differential equations that describe the flow below a pressure surface in a fluid, typically water.The governing equations are: where h is the total water depth (including the undisturbed water depth) with units of meters(m), u and v are the velocity components in the x (horizontal) and y (vertical) directions with units of meters per second(m/s), respectively, and g is the gravitational acceleration, typically measured in meters per second squared(m/s 2 ).For our simulations, the numerical results are obtained by solving the shallow water equations using a combination of the finite difference method for spatial discretisation and the Euler method for time integration.The high-fidelity data domain is 64×64 grid and the low-fidelity data domain is a 32×32 grid, each containing three channels corresponding to the velocity components u, v, and the water height h.Initial conditions for the simulations involve a cylindrical disturbance in the water height, with the central cylinder's height ranging from [0.2, 1] metres and radius varying between [4,16] grid units, allowing for a comprehensive study of wave dynamics and fluid behaviour.And the undisturbed water depth is equal to 1 metre.

Validation of Multi-Fidelity CAE in Shallow Water Systems
Similarly, we first showcase the effectiveness of our multi-fidelity CAE in efficiently compressing and then decompressing both high-fidelity and low-fidelity data in Fig. 14.We trained the multi-fidelity CAE using 300 corresponding sets of highfidelity and low-fidelity data.From Fig. 14, it's evident that our architecture successfully reconstructs the foundational data for subsequent predictions, demonstrating robust performance across a diverse array of data samples.

Effects of Physical Constraints in Shallow Water Systems
Building upon the validation of our multi-fidelity CAE, we further delve into understanding the role of physical constraints in low-fidelity field in predictions by MSPCNN.For this analysis, we used 300 sets of high-fidelity data as the training set and 30 sets as the test set.
First, we conduct a comparative analysis of various LSTM models based on the shallow water system, as shown in Fig. 15.In particular, Fig. 15(1) and (2) are the prediction results and errors comparison of basic LSTM and MSPCNN with various physical constraints in the low-fidelity field at t = 25 and t = 120, respectively.We observe that the basic LSTM model incorrectly captures the evolutionary relationships, resulting in erroneous waveform predictions.Specifically, the model prematurely predicts later-stage waveforms in the early evolution phase (t = 20), while still incorporating early-stage waveforms during the later evolution phase (t = 120).This peculiar behavior is highlighted with a pink box in Fig. 15.This issue persists in MSPCNN that introduces the EC constraint and is alleviated with the embedment of the FO constraint.However, the FO constraint also brings a new issue where the predicted results fail to capture the detailed waveforms as seen in the groundtruth, which is marked with yellow boxes in Fig. 15.Simultaneously, for the long-time prediction at t = 120, it is evident that MSPCNN applying the EC constraint causes the prediction results to become slightly smoother, as demonstrated in Fig. 15(2).
Furthermore, when we embed both energy conservation and flow operator constraints in the low-fidelity field in MSPCNN, the merits of both constraints are combined to improve the realism of predictions.As shown in Fig. 15, it improves the clarity and accuracy of early predicted waveforms, making the predicted waveforms less blurred and easier to identify.In addition, multiple constraints also enhance the stability of the model in long-time predictions, alleviating erroneous waveform predictions.However, the employment of the energy constraint still results in smoother predictions, which cannot be completely eliminated.Referring to the metrics detailed in Table 2, the LF-MulCons model achieves an MSE of 53.5% of the basic model's MSE.This not only marks a significant reduction in prediction error compared to LF-EC and LF-FO, but also underscores the benefits of incorporating various constraints.Relying on various constraints rather than a singular one, proves especially beneficial in intricate systems.Compared with Table 1, it can be easily found that the flow operator has a larger impact on the mse error reduction compared to the application of the energy constraint.We suppose that it might be because flow operators offer more direct influence on fluid behaviour and are effective in capturing complex, nonlinear fluid patterns, leading to precise and nuanced modeling compared to global constraints like energy conservation.In addition, the stability of predictions has also experienced notable enhancements, as indicated by the decreased range of standard deviation depicted in Fig. 16.
From the above analysis, when employing MSPCNN to tackle complex physical problems, we conclude that solely relying on a single physical constraint can enhance the authenticity of model predictions to some extent, but it doesn't genuinely improve the prediction accuracy.Combining multiple physical constraints, such as energy conservation and flow operator, can integrate the advantages of different constraints to enhance the realism of model predictions at multiple levels.

Robustness Evaluation with Noisy Data
In real-world scenarios, particularly when analysing complex systems, models often encounter data that is contaminated with noise.The generation of this noise can result from a multitude of origins, including imprecise measurements, intrinsic uncertainties within the system, or external disturbances.Ensuring the robustness and predictive capabilities of models designed for complex physical systems in the presence of noise is of utmost significance.In order to thoroughly evaluate the stability of our MSPCNN in this particular environment, we conduct a noisy experiment within the shallow water systems.By utilising a model that is trained on data without any noise, we conduct an evaluation of its capacity to make accurate predictions on a dataset that is intentionally contaminated with synthetic noise.This simulation aims to replicate the obstacles encountered in real-world scenarios.
In our experiments, to ensure the representativeness of numerical tests, we utilise spatial correlation patterns that are both homogeneous and isotropic with respect to the spatial Euclidean distance r = ∆ 2 x + ∆ 2 y .This means that they remain unchanged under rotations and translations.We employ these correlation patterns to simulate data errors stemming from various sources.In this context, we consider a Matern type of correlation function [58]: where L is defined as the typical correlation length scale, and we set L = 4 for the sake of simplicity.
In the simulation against noise, we introduce noise into the initial data to obtain the noisy data.This noisy data is then fed into both the basic LSTM and the  MSPCNN for recurrent predictions.The outcomes are depicted in Fig. 17.When juxtaposed with Fig. 16, it's evident that the basic LSTM model struggles with handling noisy data, leading to a remarkably high MSE.Additionally, there's a noticeable expansion in the spread of the standard deviation.In contrast, the MSPCNN fortified with multiple constraints demonstrates resilience against this noise-induced perturbation, registering only a marginal increase in both MSE and the range of the standard deviation.In summary, the MSPCNN demonstrates robust performance when confronted with noisy data.

Conclusion
Physics-constrained neural networks have emerged as a popular approach for enhancing the reliability of predictions.These networks surpass merely data-driven models by incorporating physical constraint losses into the training process.In this paper, we propose and implement a novel predictive model, MSPCNN.The model is inspired by reducing the cumulative error of long-time prediction while minimising computational cost.Its unique feature is that it can integrate and freely convert data in different fidelities through the multi-fidelity CAE.
We explicitly show that there is significant value in mapping data in various fidelities into a uniform and shared latent space through multi-fidelity CAE.Firstly, it allows low-fidelity data to play a complementary role to high-fidelity data during the training phase as the predictive model accepts latent representations as input.In addition, MSPCNN allows us to enforce physical constraints in the low-fidelity field, instead of applying at a high-fidelity level.As a result, there's a significant reduction in off-line costs, which include expenses related to data acquisition and preprocessing.Meanwhile, this approach guarantees that our model maintains a significant level of accuracy while avoiding the computing challenges commonly encountered by conventional physics-constrained neural networks.While our tests are on a toy model, using this multi-fidelity approach on high-dimensional datasets could offer more significant savings in computation and training costs.Furthermore, the results of shallow water systems emphasise the importance of incorporating multiple constraints while tackling intricate physical problems, since depending exclusively on a solitary constraint may be insufficient.Moreover, the model's adept handling of noisy data highlights its robustness, demonstrating its capacity to provide dependable predictions even in suboptimal circumstances.
The MSPCNN, with its ability to seamlessly encode high-and low-fidelity datasets in a shared latent space and embed physical constraints, offers substantial promise for transforming multiscale simulations in fluid dynamics.Due to its adaptability and computing efficiency, this technology is well-suited for real-time predictive assessments in various areas, including environmental forecasting and industrial fluid operations.Nevertheless, MSPCNN has its limitations.One notable limitation is the error amplification in scenarios with limited spatial correlation, a challenge not unique to MSPCNN but prevalent in traditional models like PCNN.We are addressing this through the development of a custom loss function that better balances simulation fidelity with error reduction.In addition to refining loss functions, another significant avenue for future work is extending our methodology to more complex mesh structures.Currently, both test cases in our study employ squared mesh simulations.However, real-world applications often require modeling on nonstructured or even adaptive meshes, where the number and arrangement of meshes can change dynamically to better capture phenomena or optimise computational resources.Furthermore, there's an ongoing exploration to leverage the capabilities of transformer-based models, which can be integrated into the MSPCNN framework as an alternative to traditional CNN and RNN architectures, potentially offering enhanced performance and adaptability.

1 . 2 .
Training on high-fidelity data versus multi-fidelity data: (a) Basic LSTM (without physical constraints): Trained using pure highfidelity data.(b) Multi-fidelity Basic LSTM training: In order to verify whether low-fidelity data and high-fidelity data can train the model simultaneously, we use multi-fidelity data (both the high-and low-fidelity data) as the training dataset.Effects of a single physical constraint on the model:

Figure 6 :
Figure 6: Performance of Basic LSTM with Multi-Fidelity Data Compared with Basic LSTM in Burgers' System

Figure 8 :
Figure 8: Performance of MSPCNN with EC Constraint Compared with Basic Predictive model in Burgers' System

Figure 10 :
Figure 10: Performance of MSPCNN with FO Constraint Compared with Basic Predictive model in Burgers' System

Figure 11 :Figure 12 :
Figure 11: Error histogram comparison in Burgers' System (1) Energy conservation constraint: comparison of high-fidelity, low-fidelity and basic model error histograms (2) Flow operator constraint: comparison of high-fidelity, low-fidelity and basic model error histograms

Figure 13 :
Figure 13: Performance of MSPCNN with Multiple Constraints Compared with Basic Predictive model in Burgers' System

Figure 14 :
Figure 14: Results from Multi-Fidelity CAE in shallow water systems (u dimension)

Figure 15 :
Figure 15: Prediction Results (u dimension) and Difference with Groundtrue Comparison of Various LSTM for (1)t = 25 and (2)t = 120 in shallow water systems

Figure 17 :
Figure 17: Performance of MSPCNN with Multiple Constraints in Low-Fidelity Field with Noise Initial Condition in shallow water systems -Scale Physics-Constrained Neural Network Coefficient of physical loss l energy , l flow Loss of energy conservation, loss of flow operator F LST M , θ LSTM LSTM function, Parameters of LSTM E in , E out Total energy of input and output sequence E Function to calculate total energy f Flow operator function x fp State vector predicted by flow operator X h,train , X l,train High and low-fidelity datasets x h,t , x r

Table 1 :
Performance Comparison between Models in Burgers' System Predictive model trained by both high and low-fidelity data.• HF-EC, LF-EC: Model with energy conservation constraint in high and low-fidelity field.• HF-FO, LF-FO: Model with flow operator constraint in high and low-fidelity field.• HF-MulCons, LF-MulCons: Model with multiple constraints in high and low-fidelity field.• MSE: Mean Squared Error with reference to the basic model set at 100%.• SSIM: Structural Similarity Index (with data range of 1.0).• Training Time/Epoch (s): Time taken to run one epoch during training, unit: seconds.• The coefficient of the physical constraint α is optimised using the validation set to achieve the best performance for each model.α EC is the coefficient of energy conservation constraint.α FO is the coefficient of flow operator constraint.Specifically, α EC = 2.0e − 6 for HF-EC, α EC = 2.8e − 4 for LF-EC, α EC = 4.3e − 6 for HF-MulCons, α EC = 1.1e − 4 for LF-MulCons, α FO = 2.5e − 3 for HF-FO, α FO = 8.5e − 4 for LF-FO, α FO = 1.2e − 3 for HF-MulCons, α FO = 5.0e − 4 for LF-MulCons.

Table 2 :
Performance Comparison between Models in Shallow Water Systems Predictive model trained by solely high-fidelity data.•LF-EC: Model with energy conservation constraint in low-fidelity field.•LF-FO: Model with flow operator constraint in low-fidelity field.•LF-MulCons: Model with multiple constraints in low-fidelity field.•MSE: Mean Squared Error with reference to the basic model set at 100%.•SSIM: Structural Similarity Index (with data range of 1.0).•Training Time/Epoch (s): Time taken to run one epoch during training, unit: seconds.•The coefficient of the physical constraint α is optimised using the validation set to achieve the best performance for each model.α EC is the coefficient of energy conservation constraint.α FO is the coefficient of flow operator constraint.Specifically, α EC