Higher-order error estimates for physics-informed neural networks approximating the primitive equations

Hu, Ruimeng; Lin, Quyuan; Raydan, Alan; Tang, Sui

doi:10.1007/s42985-023-00254-y

Higher-order error estimates for physics-informed neural networks approximating the primitive equations

Original Paper
Open access
Published: 19 July 2023

Volume 4, article number 34, (2023)
Cite this article

Download PDF

You have full access to this open access article

Partial Differential Equations and Applications Aims and scope Submit manuscript

Higher-order error estimates for physics-informed neural networks approximating the primitive equations

Download PDF

Ruimeng Hu^1,2,
Quyuan Lin ORCID: orcid.org/0000-0002-7506-8232^1,3,
Alan Raydan¹ &
…
Sui Tang¹

Abstract

Large-scale dynamics of the oceans and the atmosphere are governed by primitive equations (PEs). Due to the nonlinearity and nonlocality, the numerical study of the PEs is generally challenging. Neural networks have been shown to be a promising machine learning tool to tackle this challenge. In this work, we employ physics-informed neural networks (PINNs) to approximate the solutions to the PEs and study the error estimates. We first establish the higher-order regularity for the global solutions to the PEs with either full viscosity and diffusivity, or with only the horizontal ones. Such a result for the case with only the horizontal ones is new and required in the analysis under the PINNs framework. Then we prove the existence of two-layer tanh PINNs of which the corresponding training error can be arbitrarily small by taking the width of PINNs to be sufficiently wide, and the error between the true solution and its approximation can be arbitrarily small provided that the training error is small enough and the sample set is large enough. In particular, all the estimates are a priori, and our analysis includes higher-order (in spatial Sobolev norm) error estimates. Numerical results on prototype systems are presented to further illustrate the advantage of using the $H^s$ norm during the training.

Error estimates and physics informed augmentation of neural networks for thermally coupled incompressible Navier Stokes equations

Article 14 June 2023

Manipulating the loss calculation to enhance the training process of physics-informed neural networks to solve the 1D wave equation

Article 12 September 2023

Solving multiscale steady radiative transfer equation using neural networks with uniform stability

Article 15 July 2022

1 Introduction

1.1 The primitive equations

The study of global weather prediction and climate dynamics is largely dependent on the atmosphere and oceans. Ocean currents transport warm water from low latitudes to higher latitudes, where the heat can be released into the atmosphere to balance the earth’s temperature. A widely accepted model to describe the motion and state of the atmosphere and ocean is the Boussinesq system, a combination of the Navier–Stokes equations (NSE) with rotation and a heat (or salinity) transport equation. As a result of the extraordinary organization and complexity of the flow in the atmosphere and ocean, the full governing equations appear to be difficult to analyze, at least for the foreseeable future. In particular, the global existence and uniqueness of the smooth solution to the 3D NSE is one of the most challenging mathematical problems.

Fortunately, when studying oceanic and atmospheric dynamics at the planetary scale, the vertical scale (a few kilometers for the ocean, 10–20 kms for the atmosphere) is much smaller than the horizontal scale (many thousands of kilometers). Accordingly, the large-scale ocean and atmosphere satisfy the hydrostatic balance based on scale analysis, meteorological observations, and historical data. By virtue of this, the primitive equations (PEs, also called the hydrostatic Navier–Stokes equations) are derived as the asymptotic limit of the small aspect ratio between the vertical and horizontal length scales from the Boussinesq system [2, 29, 58, 59]. Because of the impressive accuracy, the following 3D viscous PEs is a widely used model in geophysics (see, e.g., [5, 32, 33, 38, 41, 55, 68, 72] and references therein):

$$\begin{aligned}&\partial _t V + V\cdot \nabla _h V + w\partial _z V -\nu _h \Delta _h V - \nu _z \partial _{zz} V +f_0 V^\perp + \nabla _h p = 0 , \end{aligned}$$

(1.1a)

$$\begin{aligned}&\partial _z p + T= 0, \end{aligned}$$

(1.1b)

$$\begin{aligned}&\nabla _h \cdot V + \partial _z w =0, \end{aligned}$$

(1.1c)

$$\begin{aligned}&\partial _t T + V\cdot \nabla _h T + w\partial _z T -\kappa _h \Delta _h T - \kappa _h \partial _{zz} T = Q . \end{aligned}$$

(1.1d)

Here the horizontal velocity $V = (u, v)$, vertical velocity w, the pressure p, and the temperature T are the unknown quantities which are functions of the time and space variables (t, x, y, z). The 2D horizontal gradient and Laplacian are denoted by $\nabla _h = (\partial _{x}, \partial _{y})$ and $\Delta _h = \partial _{xx} + \partial _{yy}$, respectively. The nonnegative constants $\nu _h, \nu _z, \kappa _h$ and $\kappa _z$ are the horizontal viscosity, the vertical viscosity, the horizontal diffusivity and the vertical diffusivity coefficients, respectively. The parameter $f_0 \in \mathbb {R}$ stands for the Coriolis parameter, Q is a given heat source, and the notation $V^\perp = (-v,u)$ is used.

According to whether the system has horizontal or vertical viscosity, there are mainly four different models considered in the literature (some works also consider the anisotropic diffusivity).

Case 1:: PEs with full viscosity, i.e., $\nu _h>0, \nu _z>0$: The global well-posedness of strong solutions in Sobolev spaces was first established in [10], and later in [49]; see also the subsequent articles [52] for different boundary conditions, as well as [39] for some progress towards relaxing the smoothness on the initial data by using the semigroup method.
Case 2:: PEs with only horizontal viscosity, i.e., $\nu _h>0, \nu _z=0$: [12,13,14] consider horizontally viscous PEs with anisotropic diffusivity and establish global well-posedness.
Case 3:: PEs with only vertical viscosity, i.e., $\nu _h=0, \nu _z>0$: Without the horizontal viscosity, PEs are shown to be ill-posed in Sobolev spaces [71]. In order to get well-posedness, one can consider some additional weak dissipation [15], or assume the initial data have Gevrey regularity and be convex [30], or be analytic in the horizontal direction [60, 66]. It is worth mentioning that whether smooth solutions exist globally or form singularity in finite time is still open.
Case 4:: Inviscid PEs, i.e., $\nu _h=0, \nu _z=0$: The inviscid PEs are ill-posed in Sobolev spaces [37, 44, 71]. Moreover, smooth solutions of the inviscid PEs can form singularity in finite time [11, 19, 44, 81]. On the other hand, with either some special structures (local Rayleigh condition) on the initial data in 2D, or real analyticity in all directions for general initial data in both 2D and 3D, the local well-posedness can be achieved [7, 8, 31, 36, 53, 54, 63].

Others also consider stochastic PEs, that is, the system (1.1) with additional random external forcing terms (usually characterized by generalized Wiener processes on a Hilbert space). For existence and uniqueness of solutions to those systems, see [9, 22, 23, 34, 35, 40, 42, 43, 73, 77].

In this paper, we focus on Case1 and Case2 in which the well-posedness is established in Sobolev spaces. Case1 is also assumed to have full diffusivity, while Case2 is considered to have only horizontal diffusivity. The analysis of the Case3 and Case4 requires rather different techniques as those models are ill-posed in Sobolev spaces for general initial data, and are left for future work.

System (1.1) has been studied under some proper boundary conditions. For example, as introduced in [12, 31], we consider the domain to be ${\mathcal {D}}:= \mathcal M \times (0,1)$ with $\mathcal M:= (0,1)\times (0,1)$ and

$$\begin{aligned} \begin{aligned}&V, w, p, T \text { are periodic in } (x,y,z) \text { with period } 1, \\&V \text { and } p \text { are even in } z, \text { and } w \text { and } T \text { are odd in } z. \end{aligned} \end{aligned}$$

(1.2)

Note that the space of periodic functions with such symmetry condition is invariant under the dynamics of system (1.1), provided that Q is periodic in (x, y, z) and odd in z. When system (1.1) is considered in 2D space, the system will be independent of the y variable. In addition to the boundary condition, one needs to impose the initial condition

$$\begin{aligned} (V,T)|_{t=0} = (V_0,T_0). \end{aligned}$$

(1.3)

We point out that there is no initial condition for w since w is a diagnostic variable and it can be written in terms of V (see (2.1)). This is different from the Navier–Stokes equations and Boussinesq system.

1.2 PINNs

Due to the nonlinearity and nonlocality of many PDEs (including PEs), the numerical study for them is in general a hard task. A non-exhaustive list of the numerical study of PEs includes [6, 16,17,18, 50, 51, 61, 67, 74, 75, 78] and references therein. In the past few years, the deep neural network has emerged as a promising alternative, but it requires abundant data that cannot always be found in scientific research. Instead, such networks can be trained from additional information obtained by enforcing the physical laws. Physics-informed machine learning seamlessly integrates data and mathematical physics models, even in partially understood, uncertain, and high-dimensional contexts [47].

Recently, physics-informed neural networks (PINNs) have been shown as an efficient tool in scientific computing and in solving challenging PDEs. PINNs, which approximate solutions to PDEs by training neural networks to minimize the residuals coming from the initial conditions, the boundary conditions, and the PDE operators, have gained a lot of attention and have been studied intensively. The study of PINNs can be retrieved back to the 90 s [28, 56, 57]. Very recently, [69, 70] introduced and illustrated the PINNs approach for solving nonlinear PDEs, which can handle both forward problems and inverse problems. For a much more complete list of recent advances in the study of PINNs, we refer the readers to [20, 47] and references therein. We also remark that the deep Galerkin method [76] shares a similar spirit with PINNs.

In addition to investigating the efficiency and accuracy of PINNs in solving PDEs numerically, researchers are also interested in rigorously evaluating the error estimates. In a series of works [24, 25, 64, 65], the authors studied the error analysis of PINNs for approximating several different types of PDEs. It is worth mentioning that, recently, there has been a result on generic bounds for PINNs established in [26]. We remark that our work is devoted to establishing higher-order error estimates based on the higher-order regularity of the solutions to the PEs. The analysis requires nontrivial efforts and techniques due to the special characteristics of the PEs, and these results are not trivially followed from [26].

To set up the PINNs framework for our problem, we first review the PINN algorithm [69, 70] for a generic PDE with initial and boundary conditions: for $x\in {\mathcal {D}}, y\in \partial {\mathcal {D}}, t\in [0,{\mathcal {T}}]$, the solution u satisfies

$$\begin{aligned} \text {PDE operator: }&{\mathcal {D}}[u](x,t)=0,\\ \text {Initial condition: }&u(x,0)=\phi (x),\\ \text {Boundary condition: }&\mathcal Bu(y,t) = \psi (y,t). \end{aligned}$$

The goal is to seek a neural network $u_\theta $ where $\theta $ represents all network parameters (see Definition 2.1 for details) so that

$$\begin{aligned} \text {PDE residual: }&\mathcal R_i[\theta ](x,t) = {\mathcal {D}}[u_\theta ](x,t),\\ \text {Initial residual: }&\mathcal R_t[\theta ](x,t) = u_\theta (x,0)-\phi (x),\\ \text {Boundary residual: }&\mathcal R_b[\theta ] = \mathcal Bu_\theta (y,t) - \psi (y,t), \end{aligned}$$

are all small. Based on these residuals, for $s\in \mathbb N$ we defined the generalization error for $u_\theta $:

$$\begin{aligned} \mathcal {E}_G[s;\theta ]^2 = \int _0^T (\Vert \mathcal R_i\Vert _{H^s( {\mathcal {D}})}^2 + \Vert \mathcal R_b\Vert _{H^s(\partial {\mathcal {D}})}^2) dt + \Vert \mathcal R_t\Vert _{H^s({\mathcal {D}})}^2. \end{aligned}$$

(1.4)

Notice that when $u_\theta =u$ is the exact solution, all the residuals will be zero and thus $\mathcal {E}_G[s;\theta ]=0$. In practice, one uses numerical quadrature to approximate the integral appearing in $\mathcal {E}_G[s;\theta ]$. We call the corresponding numerical quadrature the training error $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$ (see Sect. 4.3 for details), which is also the loss function used during the training. The terminology “physics-informed neural networks” is used in the sense that the physical laws coming from the PDE operator and initial and boundary conditions lead to the residuals, which in turn give the generalization error and training error (loss function). Finally, the neural networks minimize the loss function during the training to obtain the approximation for the PDE. Note that in the literature, the analysis for PINN algorithms exists only for $s = 1$.

For our problem, residuals are defined in (2.4)–(2.6), the generalization error is defined in (2.7), and training error is defined in (2.8). In order to measure how well the approximation $u_\theta $ is, we use the total error ${\mathcal {E}}[s;\theta ]^2 = \int _0^t \Vert u-u_\theta \Vert _{H^s}^2 dt$, and it is defined in (2.10) for our problem. And our analysis is for any $s \in \mathbb N$.

In this work, we mainly want to answer two crucial questions concerning the reliability of PINNs:

Q1::: The existence of neural networks $(V_\theta ,w_\theta ,p_\theta ,T_\theta )$ such that the training error (loss function) $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]<\epsilon $ for arbitrary $\epsilon >0$;
Q2::: The control of total error ${\mathcal {E}}[s;\theta ]$ by the training error with large enough sample set ${\mathcal {S}}$, i.e., ${\mathcal {E}}[s;\theta ]\lesssim \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}] + f(|{\mathcal {S}}|)$ for some function f which is small when $|{\mathcal {S}}|$ is large.

An affirmative answer to Q1 implies that one is able to train the neural networks to obtain a small enough training error (loss function) at the end. An affirmative answer to Q2 guarantees that $(V_\theta ,w_\theta ,p_\theta ,T_\theta )$ can approximate the true solution arbitrarily well in some Sobolev norms as long as the training error $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$ is small enough and the sample set ${\mathcal {S}}$ is large enough. However, $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$ is not convenient in the analysis, while $\mathcal {E}_G[s;\theta ]$ provides a better way as it is in the integral form. As discussed in [24], one can, in turn consider the following three sub-questions:

SubQ1::: The existence of neural networks $(V_\theta ,w_\theta ,p_\theta ,T_\theta )$ such that the generalization error $\mathcal {E}_G[s;\theta ]<\epsilon $ for arbitrary $\epsilon >0$;
SubQ2::: The control of total error by generalization error, i.e., ${\mathcal {E}}[s;\theta ]\lesssim \mathcal {E}_G[s;\theta ]$;
SubQ3::: The difference between the generalization error and the training error $\Big |\mathcal {E}_G[s;\theta ]- \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]\Big |< f(|{\mathcal {S}}|)$ for some function f which is small when $|{\mathcal {S}}|$ is large.

Specifically, the answers of SubQ1 and SubQ3 lead to the positive answer of Q1, and the answers of SubQ2 and SubQ3 give the solution to Q2.

Our main contributions and results in this work are the followings:

We establish the higher-order regularity result for the solutions to the PEs under Case1 and Case2, see Theorem 3.4. To our best knowledge, such a result for Case1 was proven in [46], but is new for Case2. It is necessary as the smoothness of the solutions is required in order to perform higher-order error analysis for PINNs.
We answer Q1 and Q2 (and SubQ1–SubQ3) for the PINNs approximating the solutions to the PEs, which shows the PINNs is a reliable numerical method for solving PEs, see Theorems 4.1, 4.2, 4.3, 4.4. Our estimates are all a priori, and the key strategy is to introduce a penalty term in the generalization error (2.7) and the training error (2.8). The introduction of such penalty terms is inspired by [4], where the authors studied the PINNs approximating the 2D NSE. By virtue of Theorem 3.4, the solutions for PEs in Case1 and Case2 exist globally, and therefore are bounded for any finite time. Such penalty terms is introduced to control the growth of the outputs of neural networks and to make sure they are in the target bounded set.
Rather than just consider $L^2$ norm in the errors [24, 65], i.e., ${\mathcal {E}}[s;\theta ], \mathcal {E}_G[s;\theta ], \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$ with $s=0$, we use higher-order $H^s$ norm in ${\mathcal {E}}[s;\theta ], \mathcal {E}_G[s;\theta ], \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$ for $s\in \mathbb N$. We prove that the usage of $H^s$ norm in $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$ will guarantee the control for ${\mathcal {E}}[s;\theta ]$ with the same order s. The numerical performance in Sect. 5 further verifies our theory. Such results are crucial, as some problems do require higher-order estimates, for example, the Hamilton-Jacobi-Bellman equation requires the $L^p$ error estimate with p large enough in order to be stable, see [80]. We refer the readers to [21] for more discuss on the higher order error estimates for neural networks. We believe that the higher-order error estimates developed in this work can be readily applied to other PDEs, for example, the Euler equations, the Navier–Stokes equations, and the Boussinesq system.

The rest of the paper is organized as the following. In Sect. 2, we introduce the notation and collect some preliminary results that will be used in this paper. In Sect. 3, we prove the higher-order regularity of the solutions to the PEs under Case1 and Case2. In Sect. 4, we establish the main results of this paper by answering Q1 and Q2 (through SubQ1–SubQ3) discussed above. In the end, we present some numerical experiments in Sect. 5 to support our theoretical results on the accuracy of the approximation under higher-order Sobolev norms.

2 Preliminaries

In this section, we introduce the notation and collect some preliminary results that will be used in the rest of this paper. The universal constant C that appears below may change from step to step, and we use the subscript to emphasize its dependence when necessary, e.g., $C_r$ is a constant depending only on r.

2.1 Functional settings

We use the notation $\varvec{x}:= (\varvec{x}',z) = (x, y, z)\in {\mathcal {D}}$, where $\varvec{x}'$ and z represent the horizontal and vertical variables, respectively, and for a positive time ${\mathcal {T}}>0$ we denote by

$$\begin{aligned} \Omega = [0,{\mathcal {T}}]\times {\mathcal {D}}. \end{aligned}$$

Let $\nabla =(\partial _x,\partial _y,\partial _z)$ and $\Delta = \partial _{xx}+\partial _{yy}+\partial _{zz}$ be the three dimensional gradient and Laplacian, and $\nabla _h = (\partial _{x}, \partial _{y})$ and $\Delta _h = \partial _{xx} + \partial _{yy}$ be the horizontal ones. Let $\alpha \in \mathbb {N}^n$ be a multi-index. We say $\alpha \le \beta $ if and only if $\alpha _i\le \beta _i$ for each $i\in \{1,2,...,n\}$. The notation

$$\begin{aligned} |\alpha | = \sum \limits _{j=1}^n \alpha _j, \hspace{0.2in} \alpha ! = \prod \limits _{j=1}^n \alpha _j!, \hspace{0.2in} \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) = \frac{\alpha !}{\beta !(\alpha -\beta )!} \end{aligned}$$

will be used throughout the paper. Let $P_{m,n}=\{\alpha \in \mathbb N^n, |\alpha |=m \}$, and denote by $|P_{m,n}|$ the cardinality of $P_{m,n}$, which is given by $|P_{m,n}| = \left( {\begin{array}{c}m+n-1\\ m\end{array}}\right) $. For a function f defined on an open subset $U\subseteq \mathbb R^n$ and $x=(x_1,x_2,...,x_n)\in U$, we denote the partial derivative of f with multi-index $\alpha $ by $ D^\alpha f= \frac{\partial ^{|\alpha |}f}{\partial _{x_1}^{\alpha _1}\cdots \partial _{x_n}^{\alpha _n}}. $ Let

$$\begin{aligned} L^2(U)=\left\{ f: \int _U |f(x)|^2 dx < \infty \right\} \end{aligned}$$

be the usual $L^2$ space associated with the Lebesgue measure restricted on U, endowed with the norm $ \Vert f\Vert _{L^2(U)} = (\int _{U} |f|^2 dx)^{\frac{1}{2}}, $ coming from the inner product $ \langle f,g\rangle = \int _{U} f(x)g(x) \;dx $ for $f,g \in L^2(U)$. For $r\in \mathbb N$, denote by $H^r(U)$ the Sobolev spaces:

$$\begin{aligned} H^r(U) = \{f\in L^2(U): \Vert D^\alpha f\Vert _{L^2(U)}<\infty \text { for } |\alpha |\le r\}, \end{aligned}$$

endowed with the norm $ \Vert f\Vert _{H^r(U)} = (\sum \limits _{|\alpha |\le s}\int _{U} |D^\alpha f|^2 dx)^{\frac{1}{2}}. $ For more details about the Sobolev spaces, we refer the readers to [1]. Define

$$\begin{aligned} \begin{aligned}&\widetilde{V}_e:= \left\{ \varphi \in C^\infty ({\mathcal {D}}): \varphi \text { is periodic in } (x,y,z) \text { and even in } z, \, \int _0^1 \nabla _h\cdot \varphi (\varvec{x}', z)dz = 0 \right\} ,\\&\widetilde{V}_o:= \left\{ \varphi \in C^\infty ({\mathcal {D}}): \varphi \text { is periodic in } (x,y,z) \text { and odd in }z \right\} , \end{aligned} \end{aligned}$$

and denote by $H^r_e({\mathcal {D}})$ and $H^r_o({\mathcal {D}})$ the closure spaces of $\widetilde{V}_e$ and $\widetilde{V}_o$, respectively, under the $H^r$-topology. When $r=0$, $H^r_e({\mathcal {D}}) = L^2_e({\mathcal {D}})$ and $H^r_o({\mathcal {D}}) = L^2_o({\mathcal {D}}).$ Note that in the 2D case, all notations need to be adapted accordingly by letting $\mathcal M = (0,1)$. When the functional space and the norm are defined in the spatial domain ${\mathcal {D}}$, we frequently write $L^2$, $H^r$, $\Vert \cdot \Vert _{L^2}$, and $\Vert \cdot \Vert _{H^r}$ by omitting ${\mathcal {D}}$ when there is no confusion.

By virtue of (1.1c) and the boundary condition (1.2), one can rewrite w as

$$\begin{aligned} w(\varvec{x}',z) = -\int _0^z \nabla _h\cdot V(\varvec{x}', \tilde{z}) d\tilde{z}. \end{aligned}$$

(2.1)

Notice that since $w(z=1)=0$, V satisfies the compatibility condition

$$\begin{aligned} \int _0^1 \nabla _h\cdot V(\varvec{x}', \tilde{z}) d\tilde{z} =0. \end{aligned}$$

(2.2)

By Cauchy–Schwarz inequality,

$$\begin{aligned} \begin{aligned} \Vert w\Vert ^2_{H^r({\mathcal {D}})}&= \sum \limits _{|\alpha |\le r} \int _{{\mathcal {D}}} \left| D^\alpha \int _0^z \nabla _h\cdot V(\varvec{x}', \tilde{z}) d\tilde{z} \right| ^2 d\varvec{x} \le C \Vert \nabla _h V\Vert ^2_{H^r({\mathcal {D}})}. \end{aligned} \end{aligned}$$

(2.3)

2.2 Neural networks

We will work with the following class of neural networks introduced in [24].

Definition 2.1

Suppose $R \in (0, \infty ]$, $L, W \in \mathbb {N}$, and $l_0, \ldots , l_L \in \mathbb {N}$. Let $\sigma : \mathbb {R} \rightarrow \mathbb {R}$ be a twice differentiable activation function, and define

$$\begin{aligned} \Theta =\Theta _{L, W, R}:=\bigcup _{L^{\prime } \in \mathbb {N}, L^{\prime } \le L} \bigcup _{l_0, \ldots , l_L \in \{1, \ldots , W\}} X_{k=1}^{L^{\prime }}\left( [-R, R]^{l_k \times l_{k-1}} \times [-R, R]^{l_k}\right) . \end{aligned}$$

For $\theta \in \Theta _{L, W, R}$, we define $\theta _k:=\left( \mathcal {W}_k, b_k\right) $ and $\mathcal {A}_k: \mathbb {R}^{l_{k-1}} \rightarrow \mathbb {R}^{l_k}: x \mapsto \mathcal {W}_k x+b_k$ for $1 \le k \le L$ and define $f_k^\theta : \mathbb {R}^{l_{k-1}} \rightarrow \mathbb {R}^{l_k}$ by

$$\begin{aligned} f_k^\theta (\eta )= {\left\{ \begin{array}{ll}\mathcal {A}_L^\theta (\eta ) &{} k=L \\ \left( \sigma \circ \mathcal {A}_k^\theta \right) (\eta ) &{} 1 \le k<L\end{array}\right. } \end{aligned}$$

Denote by $u_\theta : \mathbb {R}^{l_0} \rightarrow \mathbb {R}^{l_L}$ the function that satisfies for all $\eta \in \mathbb {R}^{l_0}$ that

$$\begin{aligned} u_\theta (\eta )=\left( f_L^\theta \circ f_{L-1}^\theta \circ \cdots \circ f_1^\theta \right) (\eta ). \end{aligned}$$

In our approach to approximating the system (1.1), we assign $l_0= d+1$ and $\eta =(\varvec{x}, t)$. The neural network that corresponds to parameter $\theta $ and consists of L layers and widths $(l_0,l_1,...,l_L)$ is denoted by $u_\theta $. The first $L-1$ layers are considered hidden layers, where $l_k$ refers to the width of layer k, and $\mathcal W_k$ and $b_k$ denote the weights and biases of layer k, respectively. The width of $u_\theta $ is defined as the maximum value among ${l_0,\dots ,l_L}$.

2.3 PINNs settings

We define the following residuals from the PDE system (1.1):

$$\begin{aligned}&\text {(PDE residuals)} \nonumber \\&\quad {\left\{ \begin{array}{ll} \mathcal R_{i,V}[\theta ]:= \partial _t V_\theta + V_\theta \cdot \nabla _h V_\theta + w_\theta \partial _z V_\theta +f_0 V_\theta ^\perp - \nu _h \Delta _h V_\theta - \nu _z \partial _{zz} V_\theta + \nabla _h p_\theta ,\\ \mathcal R_{i,p}[\theta ]:= \partial _z p_\theta + T_\theta ,\\ \mathcal R_{i,T}[\theta ] := \partial _t T_\theta + V_\theta \cdot \nabla _h T_\theta + w_\theta \partial _z T_\theta - \kappa _h \Delta _h T_\theta - \kappa _z \partial _{zz} T_\theta - Q,\\ \mathcal R_{i,div}[\theta ]:= \nabla _h\cdot V_\theta + \partial _z w_\theta , \end{array}\right. } \end{aligned}$$

(2.4)

the residuals from the initial conditions (1.3):

$$\begin{aligned}&\text {(initial residuals)} \nonumber \\&{\left\{ \begin{array}{ll} \mathcal R_{t,V}[\theta ] := V_\theta (t=0) - V_0,\\ \mathcal R_{t,T}[\theta ] := T_\theta (t=0) - T_0, \end{array}\right. } \end{aligned}$$

(2.5)

and for $s\in \mathbb N$, the residuals from the boundary conditions:

$$\begin{aligned} \begin{aligned}&\text {(boundary residuals) }\mathcal R_{b}[s;\theta ]\\&:= \Big \{\sum \limits _{\varphi \in \{V_\theta ,w_\theta ,p_\theta ,T_\theta \}}\sum \limits _{|\alpha |\le s} \Big [\Big (D^\alpha \varphi (x=1,y,z)- D^\alpha \varphi (x=0,y,z)\Big )^2\\&+ \Big (D^\alpha \varphi (x,y=1,z)- D^\alpha \varphi (x,y=0,z)\Big )^2\\&+ \Big (D^\alpha \varphi (x,y,z=1)- D^\alpha \varphi (x,y,z=0)\Big )^2\Big ]\\&+\sum \limits _{|\alpha |\le s, \alpha _3=0} \Big [\Big (D^\alpha w_\theta (x,y,z=1)\Big )^2 + \Big (D^\alpha w_\theta (x,y,z=0)\Big )^2\Big ] \Big \}^{\frac{1}{2}}, \end{aligned}\nonumber \\ \end{aligned}$$

(2.6)

For $s\in \mathbb N$, the generalization error is defined by

$$\begin{aligned} \begin{aligned}&\text {(generalization error) } \mathcal {E}_G[s;\theta ]\\&\qquad \qquad := \left( \mathcal {E}_G^i[s;\theta ]^2 + \mathcal {E}_G^t[s;\theta ]^2 +\mathcal {E}_G^b[s;\theta ]^2 + \lambda \mathcal {E}_G^p[s;\theta ]^2\right) ^{\frac{1}{2}}, \end{aligned} \end{aligned}$$

(2.7)

where

$$\begin{aligned} \begin{aligned}&\mathcal {E}_G^i[s;\theta ]^2:= \int _0^{{\mathcal {T}}}\left( \Vert \mathcal R_{i,V}[\theta ]\Vert _{H^s({\mathcal {D}})}^2 + \Vert \mathcal R_{i,p}[\theta ]\Vert _{H^s({\mathcal {D}})}^2 + \Vert \mathcal R_{i,T}[\theta ]\Vert _{H^s({\mathcal {D}})}^2 \right. \\&\quad \qquad \quad \left. + \Vert \mathcal R_{i,div}[\theta ]\Vert _{H^s({\mathcal {D}})}^2 \right) dt,\\&\mathcal {E}_G^t[s;\theta ]^2: =\Vert \mathcal R_{t,V}[\theta ]\Vert _{H^s({\mathcal {D}})}^2 + \Vert \mathcal R_{t,T}[\theta ]\Vert _{H^s({\mathcal {D}})}^2,\\&\mathcal {E}_G^b[s;\theta ]^2:= \int _0^{{\mathcal {T}}}\Vert \mathcal R_{b}[s;\theta ]\Vert _{L^2(\partial {\mathcal {D}})}^2 dt,\\&\mathcal {E}_G^p[s;\theta ]^2:= \int _0^{{\mathcal {T}}} \left( \Vert V_\theta \Vert _{H^{s+3}({\mathcal {D}})}^2 + \Vert p_\theta \Vert _{H^{s+3}({\mathcal {D}})}^2 + \Vert w_\theta \Vert _{H^{s+3}({\mathcal {D}})}^2 + \Vert T_\theta \Vert _{H^{s+3}({\mathcal {D}})}^2 \right) dt \end{aligned} \end{aligned}$$

and the training error is defined by

$$\begin{aligned}{} & {} \text {(training error) } \qquad \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]\nonumber \\{} & {} := \left( \mathcal {E}_{T}^i[s;\theta ;{\mathcal {S}}_i]^2 + \mathcal {E}_{T}^t[s;\theta ;{\mathcal {S}}_t]^2 +\mathcal {E}_{T}^b[s;\theta ;{\mathcal {S}}_b]^2 + \lambda \mathcal {E}_{T}^p[s;\theta ;{\mathcal {S}}_i]^2\right) ^{\frac{1}{2}}, \end{aligned}$$

(2.8)

where

$$\begin{aligned} \begin{aligned}&\mathcal {E}_{T}^i[s;\theta ;{\mathcal {S}}_i]^2:= \sum \limits _{(t_n,x_n,y_n,z_n)\in {\mathcal {S}}_i}\sum \limits _{|\alpha |\le s}w_n^i\Big [\Big (D^\alpha \mathcal R_{i,V}[\theta ](t_n,x_n,y_n,z_n)\Big )^2\\&\qquad \qquad \qquad \qquad + \Big (D^\alpha \mathcal R_{i,p}[\theta ](t_n,x_n,y_n,z_n)\Big )^2\\&\qquad \qquad \qquad \qquad +\Big (D^\alpha \mathcal R_{i,T}[\theta ](t_n,x_n,y_n,z_n)\Big )^2 +\Big (D^\alpha \mathcal R_{i,div}[\theta ](t_n,x_n,y_n,z_n)\Big )^2 \Big ],\\&\mathcal {E}_{T}^t[s;\theta ;{\mathcal {S}}_t]^2: =\sum \limits _{(x_n,y_n,z_n)\in {\mathcal {S}}_t}\sum \limits _{|\alpha |\le s}w_n^t\Big [\Big (D^\alpha \mathcal R_{t,V}[\theta ](x_n,y_n,z_n)\Big )^2 \\&\qquad \qquad \qquad \qquad + \Big (D^\alpha \mathcal R_{t,T}[\theta ](x_n,y_n,z_n)\Big )^2 \Big ],\\&\mathcal {E}_{T}^b[s;\theta ;{\mathcal {S}}_b]^2: =\sum \limits _{(t_n,x_n,y_n,z_n)\in {\mathcal {S}}_b}w_n^b \Big ( R_{b}[s;\theta ](t_n,x_n,y_n,z_n)\Big )^2,\\&\mathcal {E}_{T}^p[s;\theta ;{\mathcal {S}}_i]^2:= \sum \limits _{(t_n,x_n,y_n,z_n)\in {\mathcal {S}}_i}\sum \limits _{|\alpha |\le s+3}w_n^i\Big [\Big (D^\alpha V_\theta (t_n,x_n,y_n,z_n)\Big )^2 \\&\qquad \qquad \qquad \qquad + \Big (D^\alpha p_\theta (t_n,x_n,y_n,z_n)\Big )^2+\Big (D^\alpha w_\theta (t_n,x_n,y_n,z_n)\Big )^2\\&\qquad \qquad \qquad \qquad +\Big (D^\alpha T_\theta (t_n,x_n,y_n,z_n)\Big )^2 \Big ]. \end{aligned} \end{aligned}$$

(2.9)

with quadrature points in space-time constituting data sets ${\mathcal {S}}=({\mathcal {S}}_i, {\mathcal {S}}_t, {\mathcal {S}}_b)$ with ${\mathcal {S}}_i \subseteq [0,{\mathcal {T}}]\times {\mathcal {D}}$, ${\mathcal {S}}_t \subseteq {\mathcal {D}}$, ${\mathcal {S}}_b \in [0,{\mathcal {T}}]\times \partial {\mathcal {D}}$ and $(w_n^i,w_n^t,w_n^b)$ being the quadrature weights, defined in (4.31) and (4.32), respectively. Here $\mathcal {E}_G^p$ and $\mathcal {E}_{T}^p$ stands the penalty terms. Finally, for $s\in \mathbb N$, the total error is defined as

$$\begin{aligned} \text {(total error) } \qquad \qquad \qquad {\mathcal {E}}[s;\theta ]:= \Big (\int _0^{{\mathcal {T}}}\big (\Vert V-V_\theta \Vert ^2_{H^s} + \Vert T-T_\theta \Vert ^2_{H^s}\big )dt\Big )^{\frac{1}{2}}. \end{aligned}$$

(2.10)

Remark 1

(i)
In our setting, we impose periodic boundary conditions, (V, p) to be even in z, and (w, T) to be odd in z. The assumption of evenness and oddness allows us to perform the periodic extension in the z direction. This can be ignored when one wants to control the total error from the generalization error. Note also that, the boundary conditions $w|_{z=0,1}=0$ and $D^\alpha w|_{z=0,1}=0$ with $\alpha _3=0$ have physical meanings, and are essential in the error estimate. Therefore they are included in the boundary residuals $\mathcal R_b[s;\theta ]$.
(ii)
The total error is defined only for V and T, as for the primitive equations they are the prognostic variables, while w and p are diagnostic variables that can be recovered from V and T.
(iii)
If the original PINN framework (1.4) were followed, one could first obtain posterior estimates for SubQ1–SubQ2, meaning that constants would depend on certain norms of the outputs of the neural networks, and then made it a priori by requiring high regularity for the solution. For this approach, see, for instance, [24, Theorem 3.1 and 3.4, and Corollary 3.14]. We proceeded in an alternative way, inspired by the approach proposed in [4]. That is, we consider the additional terms $\mathcal {E}_G^p$ and $\mathcal {E}_{T}^p$ in generalization and training errors which are able to bound these constants directly by the PDE solution, and therefore achieve an a priori estimate for the total error.

3 Regularity of solutions to the primitive equations

We first give the definition of strong solutions to system (1.1) under Case1. The following definition is similar to the ones appearing in [10, 12].

Definition 3.1

Let ${\mathcal {T}}>0$ and let $V_0\in H_e^2({\mathcal {D}})$ and $T_0\in H_o^2({\mathcal {D}})$. A couple (V, T) is called a strong solution to system (1.1) on $\Omega = [0,{\mathcal {T}}]\times {\mathcal {D}}$ if

(i)
V and T have the regularities
$$\begin{aligned}&V\in L^\infty (0,{\mathcal {T}}; H_e^2({\mathcal {D}}))\cap C([0,{\mathcal {T}}];H_e^1({\mathcal {D}})),{} & {} T\in L^\infty (0,{\mathcal {T}}; H_o^2({\mathcal {D}}))\cap C([0,{\mathcal {T}}];H_o^1({\mathcal {D}}))\\&(\nabla _h V,\nu _z V_z)\in L^2(0,{\mathcal {T}}; H^2({\mathcal {D}})), \quad{} & {} (\nabla _h T,\kappa _z T_z)\in L^2(0,{\mathcal {T}}; H^2({\mathcal {D}}))\\&\partial _tV\in L^2(0,{\mathcal {T}}; H^1({\mathcal {D}})), \quad{} & {} \partial _tT \in L^2(0,{\mathcal {T}}; H^1({\mathcal {D}})) ; \end{aligned}$$
(ii)
V and T satisfy system (1.1) a.e. in $\Omega = [0,{\mathcal {T}}]\times {\mathcal {D}}$ and the initial condition (1.3).

Definition 3.2

A couple (V, T) is called a global strong solution to system (1.1) if it is a strong solution on $\Omega =[0,{\mathcal {T}}]\times {\mathcal {D}}$ for any ${\mathcal {T}}>0$.

The theorem below is from [12] and concerns the global well-posedness of system (1.1) under Case2.

Theorem 3.3

[12, Theorem 1.3] Suppose that $Q=0$, $V_0\in H_e^2({\mathcal {D}})$ and $T_0\in H_o^2({\mathcal {D}})$. Then system (1.1) has a unique global strong solution (V, T), which is continuously dependent on the initial data.

Remark 2

Theorem 3.3 works for Case2. It can be easily extended to Case1, i.e., $\nu _h, \kappa _h, \nu _z,\kappa _z>0$ (see [12, Proposition 2.6]). Moreover, under Case1, the solution (V, T) satisfies that $V\in L^2(0,{\mathcal {T}}; H_e^3({\mathcal {D}}))$ and $T\in L^2(0,{\mathcal {T}}; H_o^3({\mathcal {D}}))$ as indicated in Definition 3.1. Theorem 3.3 is proved in [12] with $d=3$, but it also holds when $d=2$. The requirement $Q=0$ can be replaced with Q being regular enough, for example, $Q\in L^\infty (0,{\mathcal {T}}; H_o^2({\mathcal {D}}))$ for arbitrary ${\mathcal {T}}>0$.

In order to perform the error analysis for PINNs, we need to establish a higher-order regularity for the solution (V, T), in particular, the continuity in both spatial and temporal variables. This is summarized in the theorem below.

Theorem 3.4

Let $k, r\in \mathbb N$, $d\in \{2,3\}$, $r>\frac{d}{2}+2k$, ${\mathcal {T}}>0$, and denote by $\Omega = [0,{\mathcal {T}}]\times {\mathcal {D}}.$ Suppose that $V_0\in H_e^r({\mathcal {D}})$, $T_0\in H_o^r({\mathcal {D}})$, and $Q\in C^{k-1}([0,{\mathcal {T}}]; H_o^r({\mathcal {D}}))$. Then system (1.1) has a unique global strong solution (V, T), which depends continuously on the initial data. Moreover, we have

$$\begin{aligned}&(V,T)\in L^\infty (0,{\mathcal {T}}; H^r({\mathcal {D}}))\cap C([0,{\mathcal {T}}];H^{r-1}({\mathcal {D}})), \end{aligned}$$

(3.1a)

$$\begin{aligned}&(\nabla _h V,\nu _z V_z, \nabla _h T,\kappa _z T_z)\in L^2(0,{\mathcal {T}}; H^r({\mathcal {D}}))\cap C([0,{\mathcal {T}}];H^{r-1}({\mathcal {D}})), \end{aligned}$$

(3.1b)

$$\begin{aligned}&(\partial _tV, \partial _t T)\in L^2(0,{\mathcal {T}}; H^{r-1}({\mathcal {D}})), \end{aligned}$$

(3.1c)

and

$$\begin{aligned}&(V,T)\in C^k\left( \Omega \right) ,\quad (w,\nabla p)\in C^{k-1}\left( \Omega \right) . \end{aligned}$$

(3.2)

To prove Theorem 3.4, we shall need the following lemma.

Lemma 3.5

[48, Lemma A.1], see also [3] Let $s\ge 1$, and suppose that $f, g\in H^s({\mathcal {D}})$. Let $\alpha $ be a multi-index such that $|\alpha |\le s$. Then

$$\begin{aligned} \left\| D^\alpha (fg) - f D^\alpha g \right\| _{L^2} \le C_s\left( \Vert f\Vert _{H^s} \Vert g\Vert _{L^\infty } + \Vert \nabla f\Vert _{L^\infty } \Vert g\Vert _{H^{s-1}} \right) . \end{aligned}$$

Proof of Theorem 3.4

We perform the proof when $d=3$. The case of $d=2$ follows similarly. Notice that in our setting $r>3$. Let’s first consider Case1.

Case1: full viscosity.

We start by showing that, for arbitrary fixed ${\mathcal {T}}>0$, we have

$$\begin{aligned} (V, T)\in C([0,{\mathcal {T}}]; H^3({\mathcal {D}}))\cap L^2(0,{\mathcal {T}}; H^4({\mathcal {D}})), \quad \text {and} \quad (\partial _tV, \partial _t T)\in L^2(0,{\mathcal {T}}; H^{2}({\mathcal {D}})). \end{aligned}$$

Let $|\alpha |\le 3$ be a multi-index. Taking $D^\alpha $ derivative on the system (1.1), and then taking the inner product of (1.1a) with $D^\alpha V$ and (1.1d) with $D^\alpha T$, by summing over all $|\alpha |\le 3$. One has

$$\begin{aligned}&\frac{1}{2} \frac{d}{dt} \left( \Vert V\Vert _{H^3}^2 + \Vert T\Vert _{H^3}^2\right) + \nu _h \Vert \nabla _h V\Vert _{H^3}^2 + \nu _z \Vert \partial _z V\Vert _{H^3}^2 + \kappa _h \Vert \nabla _h T\Vert _{H^3}^2 + \kappa _z \Vert \partial _z T\Vert _{H^3}^2\nonumber \\&\quad = \sum \limits _{|\alpha |\le 3} \Big (-\left\langle D^\alpha (V\cdot \nabla _h V+wV_z), D^\alpha V \right\rangle - \left\langle D^\alpha (V\cdot \nabla _h T+wT_z), D^\alpha T \right\rangle \nonumber \\&\qquad - \left\langle D^\alpha \nabla _h p, D^\alpha V \right\rangle + \left\langle D^\alpha Q, D^\alpha T \right\rangle \Big ). \end{aligned}$$

(3.3)

By integration by parts, thanks to (1.2), (1.1b) and (1.1c), using the Cauchy–Schwarz inequality, Young’s inequality and (2.3), one arrives at the following:

$$\begin{aligned} \left\langle D^\alpha \nabla _h p, D^\alpha V \right\rangle= & {} -\left\langle D^\alpha p, D^\alpha \nabla _h \cdot V \right\rangle = \left\langle D^\alpha p, D^\alpha \partial _z w \right\rangle \nonumber \\= & {} -\left\langle D^\alpha \partial _z p, D^\alpha w \right\rangle = \left\langle D^\alpha T, D^\alpha w \right\rangle \nonumber \\\le & {} ~ \Vert T\Vert _{H^3} \Vert w\Vert _{H^3} \le \Vert T\Vert _{H^3} \Vert \nabla _h V\Vert _{H^3} \le \frac{1}{4} \nu _h \Vert \nabla _h V\Vert _{H^3}^2 + C_{\nu _h} \Vert T\Vert _{H^3}^2.\nonumber \\ \end{aligned}$$

(3.4)

By the Cauchy–Schwarz inequality and Young’s inequality, one deduces

$$\begin{aligned} \left\langle D^\alpha Q, D^\alpha T \right\rangle \le \Vert Q\Vert _{H^3} \Vert T\Vert _{H^3} \le \frac{1}{2} \Vert Q\Vert _{H^3}^2 + \frac{1}{2} \Vert T\Vert _{H^3}^2. \end{aligned}$$

(3.5)

Using Lemma 3.5, integration by parts, and the boundary condition and (1.1c), from the Cauchy–Schwarz inequality, Young’s inequality and the Sobolev inequality, for all $|\alpha |\le 3$ one has

$$\begin{aligned} \begin{aligned}&\left\langle D^\alpha (V\cdot \nabla _h V+wV_z), D^\alpha V \right\rangle + \left\langle D^\alpha (V\cdot \nabla _h T+wT_z), D^\alpha T \right\rangle \\ =&\left\langle D^\alpha (V\cdot \nabla _h V) - V\cdot D^\alpha \nabla _h V + D^\alpha (wV_z) - w D^\alpha V_z, D^\alpha V \right\rangle \\&+ \left\langle D^\alpha (V\cdot \nabla _h T) - V\cdot D^\alpha \nabla _h T + D^\alpha (wT_z) - w D^\alpha T_z, D^\alpha T \right\rangle \\&+ \underbrace{\left\langle V\cdot D^\alpha \nabla _h V + w D^\alpha V_z, D^\alpha V \right\rangle }_{=0} + \underbrace{\left\langle V\cdot D^\alpha \nabla _h T + w D^\alpha T_z, D^\alpha T \right\rangle }_{=0}\\ \le&C\Big (\Vert V\Vert _{H^3} \Vert \nabla _h V\Vert _{L^\infty } + \Vert \nabla V\Vert _{L^\infty } \Vert \nabla _h V\Vert _{H^{2}} \\&\quad + \Vert w\Vert _{H^3} \Vert V_z\Vert _{L^\infty } + \Vert \nabla w\Vert _{L^\infty } \Vert V_z\Vert _{H^{2}}\Big )\Vert V\Vert _{H^3}\\&+ C\Big (\Vert V\Vert _{H^3} \Vert \nabla _h T\Vert _{L^\infty } + \Vert \nabla V\Vert _{L^\infty } \Vert \nabla _h T\Vert _{H^{2}} + \Vert w\Vert _{H^3} \Vert T_z\Vert _{L^\infty } \\&\quad + \Vert \nabla w\Vert _{L^\infty } \Vert T_z\Vert _{H^{2}}\Big ) \Vert T\Vert _{H^3}\\ \le&C \Vert V\Vert _{H^3}^2 \Vert \nabla _h V\Vert _{H^3} + C \Vert V\Vert _{H^3} \Vert T\Vert _{H^3}^2 + C \Vert T\Vert _{H^3}^2 \Vert \nabla _h V\Vert _{H^3}\\ \le&\frac{1}{4} \nu _h \Vert \nabla _h V\Vert _{H^3}^2 + C_{\nu _h} (1+\Vert V\Vert _{H^3}^2+ \Vert T\Vert _{H^3}^2 ) (\Vert V\Vert _{H^3}^2+ \Vert T\Vert _{H^3}^2 ). \end{aligned} \end{aligned}$$

(3.6)

Note that we have applied Lemma 3.5 for the first inequality. Combine the estimates (3.3)–(3.6), we obtain

$$\begin{aligned} \begin{aligned}&\frac{d}{dt} \left( \Vert V\Vert _{H^3}^2 + \Vert T\Vert _{H^3}^2\right) + \nu _h \Vert \nabla _h V\Vert _{H^3}^2 + \nu _z \Vert \partial _z V\Vert _{H^3}^2 + \kappa _h \Vert \nabla _h T\Vert _{H^3}^2 + \kappa _z \Vert \partial _z T\Vert _{H^3}^2\\&\quad \le C_{\nu _h} (1+\Vert V\Vert _{H^3}^2+ \Vert T\Vert _{H^3}^2 )(\Vert V\Vert _{H^3}^2+ \Vert T\Vert _{H^3}^2 ) + C \Vert Q\Vert _{H^3}^2. \end{aligned} \end{aligned}$$

From Theorem 3.3 we know that $V,T\in L^2(0,{\mathcal {T}}; H^3({\mathcal {D}}))$ for arbitrary ${\mathcal {T}}>0$. By Gronwall inequality, for any $t\in [0,{\mathcal {T}}]$,

$$\begin{aligned} \begin{aligned}&\Vert V(t)\Vert _{H^3}^2 + \Vert T(t)\Vert _{H^3}^2 + \int _0^t \left( \nu _h \Vert \nabla _h V(\tilde{t})\Vert _{H^3}^2 \right. \\&\left. \quad + \nu _z \Vert \partial _z V(\tilde{t})\Vert _{H^3}^2 + \kappa _h \Vert \nabla _h T(\tilde{t})\Vert _{H^3}^2 + \kappa _z \Vert \partial _z T(\tilde{t})\Vert _{H^3}^2 \right) d\tilde{t}\\&\le \left( \Vert V_0\Vert _{H^3}^2 + \Vert T_0\Vert _{H^3}^2 + C \int _0^{{\mathcal {T}}} \Vert Q(\tilde{t})\Vert _{H^3}^2 d\tilde{t} \right) \\&\quad \exp \left( \int _0^{{\mathcal {T}}} C_{\nu _h} (1+\Vert V(\tilde{t})\Vert _{H^3}^2+ \Vert T(\tilde{t})\Vert _{H^3}^2 ) d\tilde{t} \right) <\infty . \end{aligned} \end{aligned}$$

Therefore, we get

$$\begin{aligned}&(V, T)\in L^\infty (0,{\mathcal {T}}; H^3({\mathcal {D}}))\cap L^2(0,{\mathcal {T}}; H^4({\mathcal {D}})). \end{aligned}$$

Now for any $|\alpha |\le 2$, taking $D^\alpha $ derivative on system (1.1) and then taking the inner product of (1.1a) and (1.1b) with $\varphi \in \{f\in L^2({\mathcal {D}}): \nabla \cdot f =0\}$, one has

$$\begin{aligned} \begin{aligned} \left\langle D^\alpha \partial _t V, \varphi \right\rangle =&-\left\langle D^\alpha (V\cdot \nabla _h V), \varphi \right\rangle -\left\langle D^\alpha (w \partial _z V), \varphi \right\rangle + \nu _h \left\langle D^\alpha \Delta _h V, \varphi \right\rangle + \nu _z \left\langle D^\alpha \partial _{zz} V, \varphi \right\rangle \\&\quad - f_0 \left\langle D^\alpha V^\perp , \varphi \right\rangle - \left\langle D^\alpha \nabla _h p, \varphi \right\rangle - \left\langle D^\alpha p_z, \varphi \right\rangle - \left\langle D^\alpha T, \varphi \right\rangle . \end{aligned} \end{aligned}$$

By the Cauchy–Schwarz inequality, recalling that $H^s$ is a Banach algebra when $s>\frac{d}{2}$, we have

$$\begin{aligned} |\left\langle D^\alpha \partial _t V, \varphi \right\rangle | \le \left( C \Vert V\Vert _{H^3}^2 + C_{\nu _h,\nu _z,f_0} \Vert V\Vert _{H^4} + \Vert T\Vert _{H^2} \right) \Vert \varphi \Vert _{L^2}, \end{aligned}$$

where we have consecutively used integration by parts and $\nabla \cdot \varphi =0$ to get $ \left\langle D^\alpha \nabla _h p, \varphi \right\rangle + \left\langle D^\alpha p_z, \varphi \right\rangle =0. $ Since the inequality above is true for any $|\alpha |\le 2$, and the space $\{f\in L^2({\mathcal {D}}): \nabla \cdot f =0\}$ is dense in $L^2({\mathcal {D}})$, from the regularity of V and T, one deduces

$$\partial _t V\in L^2(0,{\mathcal {T}}; H^2({\mathcal {D}})).$$

A similar argument yields

$$\partial _t T\in L^2(0,{\mathcal {T}}; H^2({\mathcal {D}})).$$

Applying the Lions–Magenes theorem (see e.g. [79, Chapter 3, Lemma 1.2]), together with the regularity of V, T, $\partial _t V$, $\partial _t T$, we obtain

$$\begin{aligned} (V, T)\in C([0,{\mathcal {T}}];H^3({\mathcal {D}})). \end{aligned}$$

This completes the proof of (3.1) with $r = 3$. The proof of $r = 4$ and all subsequent r is then obtained by repeating the same argument. Therefore under Case1, we achieve (3.1) and moreover, $(V, T)\in C([0,{\mathcal {T}}];H^r({\mathcal {D}}))$.

Next, we show (3.2). Taking the horizontal divergence on equation (1.1a), integrating with respect to z from 0 to 1, and taking the vertical derivative on equation (1.1b) gives

$$\begin{aligned} \begin{aligned} \Delta p(\varvec{x}) =&-\int _0^1 \nabla _h \cdot \left( V\cdot \nabla _h V + w\partial _z V -\nu _h \Delta _h V - \nu _z \partial _{zz} V +f_0 V^\perp \right) (\varvec{x}',z) dz - T_z(\varvec{x})\\ =&-\int _0^1 \nabla _h \cdot \left( V\cdot \nabla _h V + w\partial _z V +f_0 V^\perp \right) (\varvec{x}',z) dz - T_z(\varvec{x}), \end{aligned} \end{aligned}$$

(3.7)

where the viscosity terms disappear due to (1.2) and (2.2). We first consider $k=1$. Since $r> \frac{d}{2}+2k$, we know $H^{r-2}$ is a Banach algebra. Since $(V,T)\in C([0,{\mathcal {T}}];H^r({\mathcal {D}}))$, one has $ \Delta p \in C([0,{\mathcal {T}}];H^{r-2}({\mathcal {D}}))$, and thus $\nabla p \in C([0,{\mathcal {T}}];H^{r-1}({\mathcal {D}}))$. This implies that $\partial _t V\in C([0,{\mathcal {T}}];H^{r-2}({\mathcal {D}}))$ and therefore, $V\in C^1([0,{\mathcal {T}}];H^{r-2}({\mathcal {D}}))$ and $w\in C^1([0,{\mathcal {T}}];H^{r-3}({\mathcal {D}}))$. Moreover, since $Q\in C^{k-1}([0,{\mathcal {T}}]; H^r({\mathcal {D}}))$, one has $\partial _t T\in C([0,{\mathcal {T}}];H^{r-2}({\mathcal {D}}))$, consequently $T\in C^1([0,{\mathcal {T}}];H^{r-2}({\mathcal {D}}))$.

When $k= 2$, since $H^{r-4}$ is a Banach algebra, we can take the time derivative on equation (3.7) and get that $\Delta p_t \in C([0,{\mathcal {T}}];H^{r-4}({\mathcal {D}}))$, and therefore $\nabla p \in C^1([0,{\mathcal {T}}];H^{r-3}({\mathcal {D}}))$. This implies $\partial _t V \in C^1([0,{\mathcal {T}}];H^{r-4}({\mathcal {D}}))$ and therefore $V\in C^2([0,{\mathcal {T}}];H^{r-4}({\mathcal {D}}))$. One can also get $w\in C^2([0,{\mathcal {T}}];H^{r-5}({\mathcal {D}}))$ and $T\in C^2([0,{\mathcal {T}}];H^{r-4}({\mathcal {D}})).$ By repeating the above procedure, one will obtain

$$\begin{aligned} \begin{aligned}&(V, T)\in \cap _{l=0}^k C^l([0,{\mathcal {T}}];H^{r-2l}({\mathcal {D}})),\\&w \in \cap _{l=0}^k C^l([0,{\mathcal {T}}];H^{r-2l-1}({\mathcal {D}})),\\&\nabla p \in \cap _{l=0}^{k-1} C^l([0,{\mathcal {T}}];H^{r-2l-1}({\mathcal {D}})). \end{aligned} \end{aligned}$$

Then by the Sobolev embedding theorem and $r> \frac{d}{2} + 2k$, we know that $H^{r-2l}({\mathcal {D}}) \subset C^{2k-2l}({\mathcal {D}})$ for $0\le l \le k$ and $H^{r-2\,l-1}({\mathcal {D}}) \subset C^{2k-2\,l-1}({\mathcal {D}})$ for $0\le l \le k-1$. Therefore,

$$\begin{aligned} (V,T)\in C^k\left( \Omega \right) ,\quad (w,\nabla p)\in C^{k-1}\left( \Omega \right) . \end{aligned}$$

Case 2: only horizontal viscosity.

Under Case 2, the proof of (3.1) when $r=3$ is more technically involved. The key difference is in the estimate of the nonlinear term.

When $D^\alpha = \partial _{z}^3$, integration by parts yields

$$\begin{aligned} \begin{aligned}&\left\langle \partial _{z}^3(V\cdot \nabla _h V+wV_z), \partial _{z}^3 V \right\rangle + \left\langle \partial _{z}^3(V\cdot \nabla _h T+wT_z), \partial _{z}^3 T \right\rangle \\&=\left\langle \partial _{z}^3(V\cdot \nabla _h V) - V\cdot \partial _{z}^3\nabla _h V + \partial _{z}^3(wV_z) - w \partial _{z}^3 V_z, \partial _{z}^3 V \right\rangle \\&\quad + \left\langle \partial _{z}^3(V\cdot \nabla _h T) - V\cdot \partial _{z}^3\nabla _h T + \partial _{z}^3(wT_z) - w \partial _{z}^3 T_z, \partial _{z}^3 T \right\rangle \\&\quad + \underbrace{\left\langle V\cdot \partial _{z}^3\nabla _h V + w \partial _{z}^3 V_z, \partial _{z}^3 V \right\rangle }_{=0} + \underbrace{\left\langle V\cdot \partial _{z}^3\nabla _h T + w \partial _{z}^3 T_z, \partial _{z}^3 T \right\rangle }_{=0}\\&= \left\langle \partial _{z}^3(V\cdot \nabla _h V) - V\cdot \partial _{z}^3\nabla _h V, \partial _{z}^3 V \right\rangle + \left\langle \partial _{z}^3(V\cdot \nabla _h T) - V\cdot \partial _{z}^3\nabla _h T, \partial _{z}^3 T \right\rangle \\&\quad + \left\langle \partial _{z}^3(wV_z) - w \partial _{z}^3 V_z, \partial _{z}^3 V \right\rangle + \left\langle \partial _{z}^3(wT_z) - w \partial _{z}^3 T_z, \partial _{z}^3 T \right\rangle \\: =&I_1 + I_2 + I_3 + I_4. \end{aligned} \end{aligned}$$

(3.8)

Using Lemma 3.5, together with the Cauchy-Schwarz inequality, Young’s inequality, and the Sobolev inequality, one obtains

$$\begin{aligned} \begin{aligned} I_1 + I_2&= \left\langle \partial _{z}^3(V\cdot \nabla _h V) - V\cdot \partial _{z}^3\nabla _h V, \partial _{z}^3 V \right\rangle + \left\langle \partial _{z}^3(V\cdot \nabla _h T) - V\cdot \partial _{z}^3\nabla _h T, \partial _{z}^3 T \right\rangle \\&\le C\Big (\Vert V\Vert _{H^3} \Vert \nabla _h V\Vert _{L^\infty } + \Vert \nabla V\Vert _{L^\infty } \Vert \nabla _h V\Vert _{H^{2}} \Big )\Vert V\Vert _{H^3}\\&\qquad + C\Big (\Vert V\Vert _{H^3} \Vert \nabla _h T\Vert _{L^\infty } + \Vert \nabla V\Vert _{L^\infty } \Vert \nabla _h T\Vert _{H^{2}} \Big ) \Vert T\Vert _{H^3}\\&\le C \left( \Vert \nabla _h V\Vert _{H^{2}} + \Vert \nabla _h T\Vert _{H^{2}}\right) \left( \Vert V\Vert _{H^3}^2 + \Vert T\Vert _{H^3}^2\right) . \end{aligned} \end{aligned}$$

(3.9)

For the estimates of $I_3$, we use (1.1c), Young’s inequality, the Hölder inequality and the Sobolev inequality, to achieve

$$\begin{aligned} \begin{aligned} I_3 =&\left\langle \partial _{z}^3(wV_z) - w \partial _{z}^3 V_z, \partial _{z}^3 V \right\rangle \\&\le ~ C \left| \left\langle \underbrace{(\nabla _h \cdot V)}_{L^\infty } \underbrace{\partial _{z}^3 V}_{L^2}, \underbrace{\partial _{z}^3 V}_{L^2} \right\rangle \right| + C \left| \left\langle \underbrace{(\nabla _h \cdot V_z)}_{L^6} \underbrace{\partial _{z}^2 V}_{L^3}, \underbrace{\partial _{z}^3 V}_{L^2} \right\rangle \right| \\&\quad + C \left| \left\langle \underbrace{(\nabla _h \cdot \partial _{z}^2 V)}_{L^2} \underbrace{V_z}_{L^\infty }, \underbrace{\partial _{z}^3 V}_{L^2} \right\rangle \right| \\ \le&C\Vert \nabla _h V\Vert _{H^2} \Vert V\Vert _{H^3}^2. \end{aligned} \end{aligned}$$

(3.10)

Similarly, one can deduce that

$$\begin{aligned} I_4 =&\left\langle \partial _{z}^3(wT_z) - w \partial _{z}^3 T_z, \partial _{z}^3 T \right\rangle \le C\Vert \nabla _h V\Vert _{H^2} \Vert T\Vert _{H^3}^2. \end{aligned}$$

(3.11)

Combining (3.8)–(3.11) yields

$$\begin{aligned}{} & {} \left\langle \partial _{z}^3(V\cdot \nabla _h V+wV_z), \partial _{z}^3 V \right\rangle + \left\langle \partial _{z}^3(V\cdot \nabla _h T+wT_z), \partial _{z}^3 T \right\rangle \nonumber \\{} & {} \quad \le C \left( \Vert \nabla _h V\Vert _{H^{2}} + \Vert \nabla _h T\Vert _{H^{2}}\right) \left( \Vert V\Vert _{H^3}^2 + \Vert T\Vert _{H^3}^2\right) . \end{aligned}$$

(3.12)

When $|\alpha |\le 3$ and $\alpha \ne (0,0,3)$, one has $ \Vert D^\alpha V\Vert _{L^2} \le \Vert \nabla _h V\Vert _{H^2} + \Vert V\Vert _{H^2}, \Vert D^\alpha T\Vert _{L^2} \le \Vert \nabla _h T\Vert _{H^2} + \Vert T\Vert _{H^2}. $ Therefore, an estimate to replace (3.6) for Case2 is

$$\begin{aligned} \begin{aligned}&\left\langle D^\alpha (V\cdot \nabla _h V+wV_z), D^\alpha V \right\rangle + \left\langle D^\alpha (V\cdot \nabla _h T+wT_z), D^\alpha T \right\rangle \\ \le&C\Big (\Vert V\Vert _{H^3} \Vert \nabla _h V\Vert _{L^\infty } + \Vert \nabla V\Vert _{L^\infty } \Vert \nabla _h V\Vert _{H^{2}} + \Vert w\Vert _{H^3} \Vert V_z\Vert _{L^\infty } \\&\quad + \Vert \nabla w\Vert _{L^\infty } \Vert V_z\Vert _{H^{2}}\Big )(\Vert \nabla _h V\Vert _{H^2} + \Vert V\Vert _{H^2})\\&+ C\Big (\Vert V\Vert _{H^3} \Vert \nabla _h T\Vert _{L^\infty } + \Vert \nabla V\Vert _{L^\infty } \Vert \nabla _h T\Vert _{H^{2}} + \Vert w\Vert _{H^3} \Vert T_z\Vert _{L^\infty } \\&\quad + \Vert \nabla w\Vert _{L^\infty } \Vert T_z\Vert _{H^{2}}\Big ) (\Vert \nabla _h T\Vert _{H^2} + \Vert T\Vert _{H^2})\\ \le&C \Vert V\Vert _{H^3}^2 \Vert \nabla _h V\Vert _{H^2} + C\Vert \nabla _h V\Vert _{H^3}\Vert V\Vert _{H^3}(\Vert \nabla _h V\Vert _{H^2} + \Vert V\Vert _{H^2})\\&\qquad \qquad + C \Vert V\Vert _{H^3} \Vert T\Vert _{H^3} \Vert \nabla _h T\Vert _{H^2} + C\Vert \nabla _h V\Vert _{H^3}\Vert T\Vert _{H^3}(\Vert \nabla _h T\Vert _{H^2} + \Vert T\Vert _{H^2})\\ \le&\frac{1}{4} \nu _h \Vert \nabla _h V\Vert _{H^3}^2 + C_{\nu _h} (1+\Vert V\Vert _{H^2}^2+ \Vert T\Vert _{H^2}^2+\Vert \nabla _h V\Vert _{H^2}^2\\&\quad + \Vert \nabla _h T\Vert _{H^2}^2 ) (\Vert V\Vert _{H^3}^2+ \Vert T\Vert _{H^3}^2 ). \end{aligned}\nonumber \\ \end{aligned}$$

(3.13)

Combining (3.12), (3.13), (3.3)–(3.5) yields

$$\begin{aligned} \begin{aligned}&\frac{d}{dt} \left( \Vert V\Vert _{H^3}^2 + \Vert T\Vert _{H^3}^2\right) + \nu _h \Vert \nabla _h V\Vert _{H^3}^2 + \kappa _h \Vert \nabla _h T\Vert _{H^3}^2\\&\qquad \le C_{\nu _h} (1+\Vert V\Vert _{H^2}^2+ \Vert T\Vert _{H^2}^2+\Vert \nabla _h V\Vert _{H^2}^2+ \Vert \nabla _h T\Vert _{H^2}^2 )(\Vert V\Vert _{H^3}^2+ \Vert T\Vert _{H^3}^2 ) + C \Vert Q\Vert _{H^3}^2. \end{aligned} \end{aligned}$$

From Theorem 3.3, we know that $(V,T)\in L^\infty (0,{\mathcal {T}}; H^2({\mathcal {D}})), (\nabla _h V, \nabla _h T) \in L^2(0,{\mathcal {T}}; H^2({\mathcal {D}}))$ for arbitrary ${\mathcal {T}}>0$. Thanks to Gronwall inequality, for any $t\in [0,{\mathcal {T}}]$,

$$\begin{aligned}&\Vert V(t)\Vert _{H^3}^2 + \Vert T(t)\Vert _{H^3}^2 + \int _0^t \left( \nu _h \Vert \nabla _h V(\tilde{t})\Vert _{H^3}^2 + \kappa _h \Vert \nabla _h T(\tilde{t})\Vert _{H^3}^2 \right) d\tilde{t}\\&\le \left( \Vert V_0\Vert _{H^3}^2 + \Vert T_0\Vert _{H^3}^2 + C \int _0^{{\mathcal {T}}} \Vert Q(\tilde{t})\Vert _{H^3}^2 d\tilde{t} \right) \\&\qquad \times \exp \left( \int _0^{{\mathcal {T}}} C_{\nu _h} (1+\Vert V(\tilde{t})\Vert _{H^2}^2+ \Vert T(\tilde{t})\Vert _{H^2}^2 + \Vert \nabla _h V(\tilde{t})\Vert _{H^2}^2+ \Vert \nabla _h T(\tilde{t})\Vert _{H^2}^2) d\tilde{t} \right) <\infty . \end{aligned}$$

Therefore, we obtain

$$\begin{aligned}&(V, T)\in L^\infty (0,{\mathcal {T}}; H^3({\mathcal {D}})), \qquad (\nabla _h V, \nabla _h T) \in L^2(0,{\mathcal {T}}; H^3({\mathcal {D}})). \end{aligned}$$

Now following a similar argument as in Case1, and using the Lions–Magenes theorem, we obtain.

$$\begin{aligned} (V,T, \nabla _hV, \nabla _h T)\in C([0,{\mathcal {T}}];H^{2}({\mathcal {D}})), \qquad (\partial _t V, \partial _t T)\in L^2(0,{\mathcal {T}}; H^{2}({\mathcal {D}})). \end{aligned}$$

Notice that we cannot achieve $(V,T) \in C([0,{\mathcal {T}}];H^{3}({\mathcal {D}}))$ since $(V,T) \not \in L^2(0,{\mathcal {T}}; H^{4}({\mathcal {D}}))$. The proof of (3.1) for $r=4$ follows the same argument as in Case1. By repeating this procedure one can eventually have (3.1) for Case2.

Finally, as $(V,\nabla _h V, T, \nabla _h T)\in C([0,{\mathcal {T}}];H^{r-1}({\mathcal {D}}))$, one can repeat the argument as in Case1 and gets

$$\begin{aligned} \begin{aligned}&(V, T)\in C([0,{\mathcal {T}}];H^{r-1}({\mathcal {D}})) \cap _{l=1}^k C^l([0,{\mathcal {T}}];H^{r-2l}({\mathcal {D}})),\\&w \in \cap _{l=0}^k C^l([0,{\mathcal {T}}];H^{r-2l-1}({\mathcal {D}})),\\&\nabla p \in \cap _{l=0}^{k-1} C^l([0,{\mathcal {T}}];H^{r-2l-1}({\mathcal {D}})). \end{aligned} \end{aligned}$$

This concludes the proof of (3.2).

4 Error estimates for PINNs

4.1 Generalization error estimates

In this section, we answer the question SubQ1 raised in the introduction: given $\varepsilon > 0$, does there exist a neural network $(V_\theta ,w_\theta ,p_\theta ,T_\theta )$ such that the corresponding generalization error $\mathcal {E}_G[s;\theta ]$ defined in (2.7) satisfies $\mathcal {E}_G[s;\theta ]< \varepsilon $? We have the following result.

Theorem 4.1

(Answer of SubQ1) Let $d\in \{2,3\}$, $n\ge 2$, $k\ge 5$, $r> \frac{d}{2} + 2k$, and ${\mathcal {T}}>0$. Suppose that $V_0\in H_e^r({\mathcal {D}})$, $T_0\in H_o^r({\mathcal {D}})$, and $Q\in C^{k-1}([0,{\mathcal {T}}]; H_o^r({\mathcal {D}}))$. Then for any given $\varepsilon >0$ and for any $0\le \ell \le k-5$, there exist $\lambda = \mathcal O(\varepsilon ^2)$ small enough and $N>5$ large enough depending on $\varepsilon $ and $\ell $, and tanh neural networks $\widehat{V}_i:= ( V_i^N)_\theta $, $\widehat{w}:= w^N_\theta $, $\widehat{p}:= p^N_\theta $, and $\widehat{T}:= T^N_\theta $, with $i=1,...,d-1$, each with two hidden layers, of widths at most and , such that the generalization error satisfies

$$\begin{aligned} \mathcal {E}_G[\ell ;\theta ] \le \varepsilon . \end{aligned}$$

Proof

For simplicity, we treat the case of $d=3$. The proof of $d=2$ follows in a similar way.

From Theorem 3.4, we know that system (1.1) has a unique global strong solution (V, T) and

$$\begin{aligned}&(V,T)\in C^k\left( \Omega \right) \subset H^{k}\left( \Omega \right) ,\\&(w,p,\nabla p)\in C^{k-1}\left( \Omega \right) \subset H^{k-1}\left( \Omega \right) . \end{aligned}$$

By applying Lemma A.1, for fixed $N>5$, there exist tanh neural networks $\widehat{V}_1$, $\widehat{V}_2$, $\widehat{w}$, $\widehat{p}$, and $\widehat{T}$, each with two hidden layers, of widths at most and such that for every $1\le s \le k-1$, one has

$$\begin{aligned}&\Vert V_i - \widehat{V}_i\Vert _{H^{s}(\Omega )} \le C_{V} \frac{1+\ln ^{s}N}{N^{k-s}},\qquad \Vert w - \widehat{w}\Vert _{H^{s-1}(\Omega )} \le C_{w} \frac{1+\ln ^{s-1}N}{N^{k-s}},\nonumber \\&\Vert p - \widehat{p}\Vert _{H^{s-1}(\Omega )} \le C_{p} \frac{1+\ln ^{s-1}N}{N^{k-s}},\qquad \Vert T - \widehat{T}\Vert _{H^{s}(\Omega )} \le C_{T} \frac{1+\ln ^{s}N}{N^{k-s}}, \end{aligned}$$

(4.1)

where the constants $C_{V}, C_{w}, C_{p}$ and $C_{T}$ are defined according to Lemma A.1. Denote by $\widehat{V} = (\widehat{V}_1,\widehat{V}_2)$ and $(\partial _1, \partial _2)=(\partial _x, \partial _y)$. For $0\le \ell \le k-5$ and $i=1,2$, one has

$$\begin{aligned}&\Vert \partial _t V_i - \partial _t \widehat{V}_i\Vert _{H^\ell (\Omega )} \le \Vert V_i - \widehat{V}_i\Vert _{H^{\ell +1}(\Omega )} \le C_{V} \frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}}, \nonumber \\&\Vert \partial _i p - \partial _i \widehat{p}\Vert _{H^{\ell }(\Omega )}, \, \Vert \partial _z p - \partial _z \widehat{p}\Vert _{H^{\ell }(\Omega )} \le \Vert p - \widehat{p}\Vert _{H^{\ell +1}(\Omega )} \le C_{p} \frac{1+\ln ^{\ell +1}N}{N^{k-\ell -2}}, \nonumber \\&\Vert \Delta _h V_i - \Delta _h \widehat{V}_i\Vert _{H^\ell (\Omega )}, \, \Vert \partial _{zz} V_i - \partial _{zz} \widehat{V}_i\Vert _{H^\ell (\Omega )} \le \Vert V_i - \widehat{V}_i\Vert _{H^{\ell +2}(\Omega )} \le C_{V} \frac{1+\ln ^{\ell +2}N}{N^{k-\ell -2}}, \nonumber \\&\Vert \nabla _h \cdot V+ \partial _z w - (\nabla _h \cdot \widehat{V} + \partial _z \widehat{w})\Vert _{H^{\ell }(\Omega )} \le \sum \limits _{i=1}^2\Vert V_i - \widehat{V}_i\Vert _{H^{\ell +1}(\Omega )} + \Vert w - \widehat{w}\Vert _{H^{\ell +1}(\Omega )} \nonumber \\&\hspace{150pt}\le (C_{V} +C_w) \frac{1+\ln ^{\ell +1}N}{N^{k-\ell -2}}, \nonumber \\&\Vert \partial _t T - \partial _t \widehat{T}\Vert _{H^\ell (\Omega )} \le \Vert T - \widehat{T}\Vert _{H^{\ell +1}(\Omega )} \le C_{T} \frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}},\nonumber \\&\Vert \Delta _h T - \Delta _h \widehat{T}\Vert _{H^\ell (\Omega )}, \, \Vert \partial _{zz} T - \partial _{zz} \widehat{T}\Vert _{H^\ell (\Omega )} \le \Vert T - \widehat{T}\Vert _{H^{\ell +2}(\Omega )} \le C_{T} \frac{1+\ln ^{\ell +2}N}{N^{k-\ell -2}}. \end{aligned}$$

(4.2)

For the nonlinear terms we use the Sobolev inequality and (4.1), since $0\le \ell \le k-5$,

$$\begin{aligned}&\Vert V\cdot \nabla _h V_i - \widehat{V} \cdot \nabla _h \widehat{V}_i\Vert _{H^\ell (\Omega )} \nonumber \\&\le \Vert (V- \widehat{V})\cdot \nabla _h V_i\Vert _{H^\ell (\Omega )} + \Vert \widehat{V} \cdot \nabla _h (V_i-\widehat{V}_i)\Vert _{H^\ell (\Omega )}\nonumber \\&\le C \Vert V_i\Vert _{W^{k-4,\infty }(\Omega )} \max _i\Vert V_i - \widehat{V}_i\Vert _{H^\ell (\Omega )} + C \Vert \widehat{V}\Vert _{W^{k-5,\infty }(\Omega )} \Vert V_i - \widehat{V}_i\Vert _{H^{\ell +1}(\Omega )}\nonumber \\&\le C\left( \Vert V\Vert _{W^{k-4,\infty }(\Omega )}+ \Vert \widehat{V}\Vert _{W^{k-5,\infty }(\Omega )}\right) C_{V} \frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}}\nonumber \\&\le C(\Vert V\Vert _{H^{k-2}(\Omega )} + C_V \frac{1+\ln ^{k-3}N}{N^3})C_V\frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}},\nonumber \\&\Vert w\partial _z V_i - \widehat{w} \partial _z \widehat{V}_i\Vert _{H^\ell (\Omega )} \le \Vert (w - \widehat{w})\partial _z V_i\Vert _{H^\ell (\Omega )} + \Vert \widehat{w} \partial _z (V_i-\widehat{V}_i)\Vert _{H^\ell (\Omega )}\nonumber \\&\le C \Vert V_i\Vert _{W^{k-4,\infty }(\Omega )} \Vert w - \widehat{w}\Vert _{H^\ell (\Omega )} + C \Vert \widehat{w}\Vert _{W^{k-5,\infty }(\Omega )} \Vert V_i - \widehat{V}_i\Vert _{H^{\ell +1}(\Omega )}\nonumber \\&\le C\left( \Vert V\Vert _{W^{k-4,\infty }(\Omega )}+\Vert \hat{w}\Vert _{W^{k-5,\infty }(\Omega )} \right) (C_w + C_{V})\frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}}\nonumber \\&\le C\left( \Vert V\Vert _{H^{k-2}(\Omega )}+\Vert w\Vert _{H^{k-3}(\Omega )} + C_w\frac{1+\ln ^{k-3}N}{N^{2}} \right) (C_w + C_{V})\frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}}. \end{aligned}$$

(4.3)

Here we have used (4.1) to obtain

$$\begin{aligned} C\Vert \widehat{V}\Vert _{W^{k-5,\infty }(\Omega )}&\le C\Vert \widehat{V}\Vert _{H^{k-3}(\Omega )} \le C(\Vert V - \widehat{V}\Vert _{H^{k-3}(\Omega )} + \Vert V\Vert _{H^{k-3}(\Omega )}) \\&\le C\left( \Vert V\Vert _{H^{k-2}(\Omega )} + C_{V} \frac{1+\ln ^{k-3} N}{N^3}\right) ,\\ C\Vert \widehat{w}\Vert _{W^{k-5,\infty }(\Omega )}&\le C\Vert \widehat{w}\Vert _{H^{k-3}(\Omega )}\le C(\Vert w - \widehat{w}\Vert _{H^{k-3}(\Omega )} + \Vert w\Vert _{H^{k-3}(\Omega )}) \\&\le C\left( \Vert w\Vert _{H^{k-3}(\Omega )} + C_{w} \frac{1+\ln ^{k-3} N}{N^2}\right) . \end{aligned}$$

A similar calculation of (4.3) yields

$$\begin{aligned} \begin{aligned}&\Vert V\cdot \nabla _h T - \widehat{V} \cdot \nabla _h \widehat{T}\Vert _{H^\ell (\Omega )}\\ \le&C\left( \Vert T\Vert _{H^{k-2}(\Omega )} + \Vert V\Vert _{H^{k-2}(\Omega )}+ C_V \frac{1+\ln ^{k-3}N}{N^3}\right) (C_V+C_T)\frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}},\\&\Vert w\partial _z T - \widehat{w} \partial _z \widehat{T}\Vert _{H^\ell (\Omega )}\\ \le&C\left( \Vert T\Vert _{H^{k-2}(\Omega )} + \Vert w\Vert _{H^{k-3}(\Omega )}+ C_w \frac{1+\ln ^{k-3} N}{N^2}\right) (C_w + C_{T})\frac{1+\ln ^{\ell +1}N}{N^{k-\ell -1}}. \end{aligned}\nonumber \\ \end{aligned}$$

(4.4)

Combining (4.2)–(4.4) gives

$$\begin{aligned}&\Big \Vert \partial _t \widehat{V} + \widehat{V} \cdot \nabla _h \widehat{V} + \widehat{w} \partial _z \widehat{V} + \nabla _h \widehat{p} + f_0 \widehat{V}^\perp - \nu _h \Delta _h \widehat{V} - \nu _z \partial _{zz} \widehat{V}\Big \Vert _{H^\ell (\Omega )} \nonumber \\&=\Big \Vert \left( \partial _t \widehat{V} + \widehat{V} \cdot \nabla _h \widehat{V} + \widehat{w} \partial _z \widehat{V} + \nabla _h \widehat{p} + f_0 \widehat{V}^\perp - \nu _h \Delta _h \widehat{V} - \nu _z \partial _{zz} \widehat{V}\right) \nonumber \\&\quad - \underbrace{\left( \partial _t V+ V \cdot \nabla _h V + w \partial _z V + \nabla _h p + f_0 V^\perp - \nu _h \Delta _h V - \nu _z \partial _{zz} V\right) }_{=0}\Big \Vert _{H^\ell (\Omega )}\nonumber \\&\le \Vert \partial _t \widehat{V} - \partial _t V\Vert _{H^\ell (\Omega )} \nonumber \\&\quad + \Vert \widehat{V} \cdot \nabla _h \widehat{V} - V \cdot \nabla _h V\Vert _{H^\ell (\Omega )} + \Vert \widehat{w} \partial _z \widehat{V} - w \partial _z V\Vert _{H^\ell (\Omega )} + f_0 \Vert \widehat{V} - V\Vert _{H^\ell (\Omega )}\nonumber \\&\quad + \Vert \nabla _h \widehat{p} - \nabla _h p\Vert _{H^\ell (\Omega )} + \nu _h \Vert \Delta _h \widehat{V} - \Delta _h V\Vert _{H^\ell (\Omega )} + \nu _z \Vert \partial _{zz} \widehat{V} - \partial _{zz} V\Vert _{H^\ell (\Omega )}\nonumber \\&\le C_{\nu _h,\nu _z,C_{V}, C_{p}, C_{w} , C_{T}}\left( 1+ \Vert V\Vert _{C^k(\Omega )} + \Vert T\Vert _{C^k(\Omega )} + \Vert w\Vert _{C^{k-1}(\Omega )} + \frac{\ln ^{k-3}N}{N^2}\right) \nonumber \\&\quad \frac{1+\ln ^{\ell +2}N}{N^{k-\ell -2}}. \end{aligned}$$

(4.5)

Similarly to (4.5), one can calculate that

$$\begin{aligned} \begin{aligned}&\Vert \partial _z \widehat{p} + \widehat{T}\Vert _{H^\ell (\Omega )} \le \Vert \partial _z \widehat{p} - \partial _z p\Vert _{H^\ell (\Omega )} + \Vert T-\widehat{T}\Vert _{H^\ell (\Omega )} \le (C_p+ C_T)\frac{1+\ln ^{\ell +1}N}{N^{k-\ell -2}},\\&\Vert \nabla _h \cdot \widehat{V} + \partial _z \widehat{w}\Vert _{H^\ell (\Omega )} \le \Vert V-\widehat{V}\Vert _{H^{\ell +1}(\Omega )} + \Vert w-\widehat{w}\Vert _{H^{\ell +1}(\Omega )} \le (C_{V} + C_w) \frac{1+\ln ^{\ell +1}N}{N^{k-\ell -2}}, \end{aligned}\nonumber \\ \end{aligned}$$

(4.6)

and

$$\begin{aligned} \begin{aligned} \Big \Vert \partial _t \widehat{T}&+ \widehat{V} \cdot \nabla _h \widehat{T} + \widehat{w} \partial _z \widehat{T} - \kappa _h \Delta _h \widehat{T} - \kappa _z \partial _{zz} \widehat{T} - Q\Big \Vert _{H^\ell (\Omega )}\\ \le&\Vert \partial _t \widehat{T} - \partial _t T\Vert _{H^\ell (\Omega )} + \Vert \widehat{V} \cdot \nabla _h \widehat{T} - V \cdot \nabla _h T\Vert _{H^\ell (\Omega )} + \Vert \widehat{w} \partial _z \widehat{T} - w \partial _z T\Vert _{H^\ell (\Omega )}\\&+ \kappa _h \Vert \Delta _h \widehat{T} - \Delta _h T\Vert _{H^\ell (\Omega )} + \kappa _z \Vert \partial _{zz} \widehat{T} - \partial _{zz} T\Vert _{H^\ell (\Omega )}\\ \le&C_{\kappa _h,\kappa _z,C_{V}, C_{w}, C_{T}}\left( 1+ \Vert V\Vert _{C^k(\Omega )} + \Vert T\Vert _{C^k(\Omega )} + \Vert w\Vert _{C^{k-1}(\Omega )} + \frac{\ln ^{k-3}N}{N^2}\right) \frac{1+\ln ^{\ell +2}N}{N^{k-\ell -2}}. \end{aligned}\nonumber \\ \end{aligned}$$

(4.7)

The combination of (4.5)–(4.7) implies that

$$\begin{aligned} \mathcal {E}_G^i[\ell ;\theta ]&= \Bigg [\int _0^{{\mathcal {T}}}\Big (\Vert \mathcal R_{i,V}[\theta ]\Vert _{H^{\ell }({\mathcal {D}})}^2 + \Vert \mathcal R_{i,P}[\theta ]\Vert _{H^{\ell }({\mathcal {D}})}^2\nonumber \\&\quad +\Vert \mathcal R_{i,T}[\theta ]\Vert _{H^{\ell }({\mathcal {D}})}^2+\Vert \mathcal R_{i,div}[\theta ]\Vert _{H^{\ell }({\mathcal {D}})}^2\Big )dt\Bigg ]^{\frac{1}{2}}\nonumber \\&\le C_{\nu _h,\nu _z,\kappa _h,\kappa _z,{\mathcal {T}},C_{V}, C_{p}, C_{w} , C_{T}}\nonumber \\&\quad \left( 1+ \Vert V\Vert _{C^k(\Omega )} + \Vert T\Vert _{C^k(\Omega )} + \Vert w\Vert _{C^{k-1}(\Omega )} + \frac{\ln ^{k-3}N}{N^2}\right) \frac{1+\ln ^{\ell +2}N}{N^{k-\ell -2}}. \end{aligned}$$

(4.8)

Next we derive the estimate for $\mathcal {E}_G^t[\ell ; \theta ]$. Note that $(V,T, \widehat{V}, \widehat{T})\in C^k\left( \Omega \right) $ and $(w,p, \widehat{w}, \widehat{p})\in C^{k-1}\left( \Omega \right) $. By applying the trace theorem and the fact that $\Omega $ is a Lipschitz domain, for $0\le \ell \le k-3$ and $|\alpha |\le \ell $, we have for $\varphi \in \{V_1, V_2, w, T, p\}$:

$$\begin{aligned} \begin{aligned}&\Vert D^\alpha \varphi - D^\alpha \widehat{\varphi }\Vert _{L^{\infty }(\partial \Omega )}\le C\Vert D^\alpha \varphi - D^\alpha \widehat{\varphi }\Vert _{W^{1,\infty }(\Omega )} \le C\Vert \varphi - \widehat{\varphi }\Vert _{W^{\ell +1,\infty }(\Omega )}. \end{aligned} \end{aligned}$$

Therefore, for $0\le \ell \le k-5$, one has

$$\begin{aligned} \mathcal {E}_G^t[\ell ;\theta ]&\le \sum \limits _{i=1}^2\Vert (V_0)_i - \widehat{V}_i(t=0)\Vert _{H^\ell ({\mathcal {D}})} + \Vert T_0 - \widehat{T}(t=0)\Vert _{H^\ell ({\mathcal {D}})}\nonumber \\&\le C\sum \limits _{|\alpha |\le \ell }\left( \sum \limits _{i=1}^2\Vert D^\alpha V_i - D^\alpha \widehat{V}_i\Vert _{L^{\infty }(\partial \Omega )} + \Vert D^\alpha T - D^\alpha \widehat{T}\Vert _{L^{\infty }(\partial \Omega )}\right) \nonumber \\&\le C\left( \sum \limits _{i=1}^2\Vert V_i - \widehat{V}_i\Vert _{H^{\ell +3}( \Omega )} + \Vert T - \widehat{T}\Vert _{H^{\ell +3}( \Omega )}\right) \le (C_{V}+C_{T}) \frac{1+\ln ^{\ell +3}N}{N^{k-\ell -3}}. \end{aligned}$$

(4.9)

Regarding the estimate for $\mathcal {E}_G^b[\ell ; \theta ]$, for $|\alpha |\le \ell $, thanks to the boundary condition (1.2),

$$\begin{aligned}&\Big |D^\alpha \widehat{V}_i(x=1,y,z)- D^\alpha \widehat{V}_i(x=0,y,z)\Big | \\ \le&\Big |D^\alpha \widehat{V}_i(x=1,y,z)- D^\alpha V_i(x=1,y,z)\Big | + \Big |D^\alpha \widehat{V}_i(x=0,y,z)- D^\alpha V_i(x=0,y,z)\Big |\\&+ \underbrace{\Big |D^\alpha V_i(x=1,y,z)- D^\alpha V_i(x=0,y,z)\Big |}_{=0}, \end{aligned}$$

and therefore,

$$\begin{aligned}&\left( \int _0^{{\mathcal {T}}} \Big |D^\alpha \widehat{V}_i(x=1,y,z)- D^\alpha \widehat{V}_i(x=0,y,z)\Big |^2 dt \right) ^{\frac{1}{2}} \\&\le C\Vert D^\alpha V_i -D^\alpha \widehat{V}_i\Vert _{L^2(0,{\mathcal {T}}; L^{\infty }(\partial {\mathcal {D}}))}\\&\le C_{{\mathcal {T}}}\Vert V_i - \widehat{V}_i\Vert _{H^{\ell +3}(\Omega )} \le C_{V} \frac{1+\ln ^{\ell +3}N}{N^{k-\ell -3}}. \end{aligned}$$

One can bound other terms similarly, and finally get

$$\begin{aligned} \begin{aligned} \mathcal {E}_G^b[\ell ;\theta ]&= \left( \int _0^{{\mathcal {T}}}\Vert \mathcal R_{b}[\ell ; \theta ]\Vert ^2_{L^2(\partial {\mathcal {D}})} dt\right) ^{\frac{1}{2}} \le (C_{V}+ C_w + C_p + C_{T}) \frac{1+\ln ^{\ell +3}N}{N^{k-\ell -4}}. \end{aligned}\nonumber \\ \end{aligned}$$

(4.10)

Finally, we recall that

$$\begin{aligned} \mathcal {E}_G^p[\ell ;\theta ]^2 = \int _0^{{\mathcal {T}}}\left( \Vert \widehat{V}\Vert _{H^{\ell + 3}({\mathcal {D}})}^2 + \Vert \widehat{p}\Vert _{H^{\ell + 3}({\mathcal {D}})}^2 + \Vert \widehat{w}\Vert _{H^{\ell + 3}({\mathcal {D}})}^2 + \Vert \widehat{T}\Vert _{H^{\ell + 3}({\mathcal {D}})}^2 \right) dt. \end{aligned}$$

Thanks to (4.1) we compute, for example,

$$\begin{aligned} \int _0^{{\mathcal {T}}} \Vert \widehat{w}\Vert _{H^{\ell + 3}({\mathcal {D}})}^2 dt&\le C \int _0^{{\mathcal {T}}} (\Vert w\Vert _{H^{\ell + 3}({\mathcal {D}})}^2 + \Vert w-\widehat{w}\Vert _{H^{\ell + 3}({\mathcal {D}})}^2) dt \\&\le C_{{\mathcal {T}}}\left( \Vert w\Vert _{C^{\ell + 3}(\Omega )}^2 + C_{w} \left( \frac{1+\ln ^{\ell +3}N}{N^{k-\ell -4}}\right) ^2\right) . \end{aligned}$$

As the true solution (V, p, w, T) exists globally and

$$\begin{aligned} (V,T)\in C^k\left( \Omega \right) , \quad (w,p)\in C^{k-1}\left( \Omega \right) , \end{aligned}$$

for any give ${\mathcal {T}}>0$ and $m\le k$, there exists some constant $C_{{\mathcal {T}}, m}$ depending only on the time ${\mathcal {T}}$ and m such that

$$\begin{aligned} \Vert V\Vert _{C^m(\Omega )}^2 + \Vert w\Vert _{C^{m-1}(\Omega )}^2 + \Vert p\Vert _{C^{m-1}(\Omega )}^2 + \Vert T\Vert _{C^m(\Omega )}^2 \le C_{{\mathcal {T}}, m}. \end{aligned}$$

(4.11)

Indeed, by [45, 46] one has that $C_{{\mathcal {T}}, m} = C_m$ independent of time ${\mathcal {T}}$ for Case 1, while analogue result for Case 2 still remains open. Since $\ell +3\le k-2$, (4.11) implies that

$$\begin{aligned} \mathcal {E}_G^p[\ell ;\theta ]^2 \le C_{{\mathcal {T}}, k-1} + C_{{\mathcal {T}}} (C_V+C_p+C_w+C_T) \left( \frac{1+\ln ^{\ell +3}N}{N^{k-\ell -4}}\right) ^2. \end{aligned}$$

Now by taking $\lambda \le \frac{\varepsilon ^2}{2C_{{\mathcal {T}}, k-1}}$, we obtain that

$$\begin{aligned} \lambda \mathcal {E}_G^p[\ell ;\theta ]^2 \le \frac{\varepsilon ^2}{2} + \lambda C_{{\mathcal {T}}} (C_V+C_p+C_w+C_T) \left( \frac{1+\ln ^{\ell +3}N}{N^{k-\ell -4}}\right) ^2. \end{aligned}$$

(4.12)

Notice that the bounds (4.8), (4.9) and (4.10) and the second part of (4.12) are independent of the neural networks’ parameterized solution $(\widehat{V}_i, \widehat{w}, \widehat{p}, \widehat{T})$, one can pick N large enough such that the bounds from (4.8), (4.9) and (4.10) are bounded by $\frac{\varepsilon }{4}$, and the second part from (4.12) is bounded by $\frac{\varepsilon ^2}{4}$, to eventually obtain that

$$\begin{aligned} \mathcal {E}_G[\ell ;\theta ] \le \varepsilon . \end{aligned}$$

4.2 Total error estimates

This section is dedicated to answer SubQ2 raised in the introduction: given PINNs $(\widehat{V}, \widehat{w}, \widehat{p}, \widehat{T}):=( V_\theta , w_\theta , p_\theta , T_\theta )$ with a small generalization error $\mathcal {E}_G[s;\theta ]$, is the corresponding total error ${\mathcal {E}}[s;\theta ]$ also small?

In the sequel, we only consider Case 2: $\nu _z=0$ and $\kappa _z=0$. The proof of Case 1 is similar and simpler. To simplify the notation, we drop the $\theta $ dependence in the residuals, for example, writing $\mathcal R_{i,V}$ instead of $\mathcal R_{i,V}[\theta ]$, when there is no confusion.

Theorem 4.2

(Answer of SubQ2) Let $d\in \{2,3\}$, ${\mathcal {T}}>0$, $k\ge 1$ and $r=k+2$. Suppose that $V_0\in H_e^r({\mathcal {D}})$, $T_0\in H_o^r({\mathcal {D}})$, and $Q\in C^{k-1}([0,{\mathcal {T}}]; H_o^r({\mathcal {D}}))$, and let (V, w, p, T) be the unique strong solution to system (1.1). Let $(\widehat{V}, \widehat{w}, \widehat{p}, \widehat{T})$ be tanh neural networks that approximate (V, w, p, T). Denote by $(V^*, w^*, p^*, T^*)= (V-\widehat{V}, w-\widehat{w}, p-\widehat{p}, T-\widehat{T})$. Then for $0\le s\le k-1$, the total error satisfies

$$\begin{aligned} {\mathcal {E}}[s;\theta ]^2\le & {} C_{\nu _h,{\mathcal {T}}} \left( \mathcal {E}_G^2[s;\theta ] (1+\frac{1}{\sqrt{\lambda }}) + \mathcal {E}_G[s;\theta ] C_{{\mathcal {T}},s+1}^{\frac{1}{2}}\right) \nonumber \\{} & {} \exp \left( C_{\nu _h,\kappa _h} \left( C_{{\mathcal {T}},s+1} + \frac{C\mathcal {E}_G[s;\theta ]^2}{\lambda }\right) \right) , \end{aligned}$$

(4.13)

where $\lambda $ appears in the definition of $\mathcal {E}_G$ given in (2.7), and $C_{{\mathcal {T}},s+1}$ is a constant bound defined in (4.28).

Remark 3

(i)
Thanks to the global existence of strong solutions to the PEs with horizontal viscosity (Theorem 3.4), the constant $C_{{\mathcal {T}},s+1}$ is finite for any ${\mathcal {T}}>0$. This is in contrast to the analysis for the 3D Navier–Stokes equations [24], where the existence of the global strong solutions is still unknown, and therefore the total error may become very large at some finite time ${\mathcal {T}}>0$ even if the generalization error is arbitrarily small.
(ii)
The constants appearing on the right hand side of (4.13) are free of the neural network solution $(\widehat{V}, \widehat{w}, \widehat{p}, \widehat{T})$. Thus, if aligned with Theorem 4.1 and set $\lambda $ such that $\frac{\mathcal {E}_G[s;\theta ]}{\sqrt{\lambda }} = \mathcal O(1)$, the total error ${\mathcal {E}}[s;\theta ]$ is guaranteed to be small as long as $\mathcal {E}_G[s;\theta ]$ is sufficiently small. In other words, this estimate is independent of training, thus a priori.
(iii)
In contrast with some previous works [24, 65] that only considered the $L^2$ total error estimates, our approach provides higher-order total error estimate $\int _0^{{\mathcal {T}}} \big (\Vert V^*(t)\Vert _{H^s}^2 + \Vert T^*(t)\Vert _{H^s}^2\big ) dt$. To obtain this estimate, it is necessary to control the residuals in corresponding higher-order Sobolev norms. This is a significant improvement over previous approaches, as the higher-order total error estimate provides more accurate and detailed information about the behavior of the solution, which can be crucial for certain applications such as fluid dynamics or solid mechanic.

Proof of Theorem 4.2

Let ${\mathcal {T}}>0$ be arbitrary but fixed. Since (V, w, p, T) is the unique strong solution to system (1.1), from (2.4) one has

$$\begin{aligned}&\partial _t V^* + V^*\cdot \nabla _h V^* + V^*\cdot \nabla _h \widehat{V} + \widehat{V}\cdot \nabla _h V^* + w^*\partial _z V^* + w^*\partial _z \widehat{V} + \widehat{w}\partial _z V^* \nonumber \\&\qquad + f_0 V^{*\perp } - \nu _h \Delta _h V^* + \nabla _h p^* = -\mathcal R_{i,V}, \end{aligned}$$

(4.14a)

$$\begin{aligned}&\partial _z p^* + T^* = - \mathcal R_{i,p}, \end{aligned}$$

(4.14b)

$$\begin{aligned}&\nabla _h \cdot V^* + \partial _z w^* = -\mathcal R_{i,div},\end{aligned}$$

(4.14c)

$$\begin{aligned}&\partial _t T^* + V^*\cdot \nabla _h T^* + V^*\cdot \nabla _h \widehat{T} + \widehat{V}\cdot \nabla _h T^* + w^*\partial _z T^* + w^*\partial _z \widehat{T} + \widehat{w}\partial _z T^* = -\mathcal R_{i,T}. \end{aligned}$$

(4.14d)

Moreover, we know that $(\widehat{V}, \widehat{p}, \widehat{w}, \widehat{T})$ are smooth and

$$\begin{aligned} \begin{aligned}&(V^*, V, T^*, T)\in L^\infty (0,{\mathcal {T}}; H^r({\mathcal {D}})) \subset L^\infty (0,{\mathcal {T}}; W^{k,\infty }({\mathcal {D}})),\\&(\nabla _h V^*, \nabla _h V, \nabla _h T^*,\nabla _h T, w^*, w)\in L^2(0,{\mathcal {T}}; H^r({\mathcal {D}})) \subset L^2(0,{\mathcal {T}}; W^{k,\infty }({\mathcal {D}})), \end{aligned}\nonumber \\ \end{aligned}$$

(4.15)

and from (3.7),

$$\begin{aligned} (p^*, p) \in L^\infty (0,{\mathcal {T}}; H^r({\mathcal {D}}))\subset L^\infty (0,{\mathcal {T}}; W^{k,\infty }({\mathcal {D}})). \end{aligned}$$

Using equation (2.1), we can rewrite $w^*$ as,

$$\begin{aligned} \begin{aligned} w^* = -\widehat{w}(z=0) - \int _0^z \nabla _h \cdot V^*(\varvec{x}', \widetilde{z}) d\widetilde{z} - \int _0^z (\nabla _h \cdot \widehat{V} + \partial _z \widehat{w})(\varvec{x}', \widetilde{z}) d\widetilde{z}. \end{aligned} \end{aligned}$$

Since $\widehat{w}$ is smooth enough, one has

$$\begin{aligned} \Vert w^*\Vert _{H^s} \le C(\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})} + \Vert \nabla _h V^*\Vert _{H^s} + \Vert \mathcal {R}_{i,div}\Vert _{H^s}). \end{aligned}$$

(4.16)

For each fixed $0\le s \le k-1$, let $|\alpha |\le s$ a multi-index. Taking $D^\alpha $ derivative on system (4.14), taking the inner product of (4.14a) with $D^\alpha V^*$ and (4.14d) with $D^\alpha T^*$, then summing over all $|\alpha |\le s$ gives

$$\begin{aligned} \begin{aligned}&\frac{1}{2} \frac{d}{dt} \left( \Vert V^*\Vert _{H^s}^2 + \Vert T^*\Vert _{H^s}^2\right) + \nu _h \Vert \nabla _h V^*\Vert _{H^s}^2 + \kappa _h \Vert \nabla _h T^*\Vert _{H^s}^2\\ =&- \sum \limits _{|\alpha |\le s} \Big (\left\langle D^\alpha (V^*\cdot \nabla _h V^* + V^*\cdot \nabla _h \widehat{V} + \widehat{V}\cdot \nabla _h V^* \right. \\&\quad \left. + w^*\partial _z V^* + w^*\partial _z \widehat{V} + \widehat{w}\partial _z V^*), D^\alpha V^* \right\rangle \\&\quad - \left\langle D^\alpha (V^*\cdot \nabla _h T^* + V^*\cdot \nabla _h \widehat{T} + \widehat{V}\cdot \nabla _h T^* + w^*\partial _z T^* + w^*\partial _z \widehat{T} + \widehat{w}\partial _z T^*), D^\alpha T^* \right\rangle \\&\quad + \left\langle D^\alpha \nabla _h p^*, D^\alpha V^* \right\rangle + \left\langle D^\alpha \mathcal R_{i,V}, D^\alpha V^* \right\rangle + \left\langle D^\alpha \mathcal R_{i,T}, D^\alpha T^* \right\rangle \Big ). \end{aligned}\nonumber \\ \end{aligned}$$

(4.17)

We first estimate the nonlinear terms. Applying the Sobolev inequality, the Hölder inequality, and Young’s inequality, and combining with (4.16), one has

$$\begin{aligned} \begin{aligned}&\left| \left\langle D^\alpha (V^*\cdot \nabla _h V^* + w^* \partial _z \widehat{V} + V^*\cdot \nabla _h \widehat{V} + \widehat{V}\cdot \nabla _h V^*), D^\alpha V^* \right\rangle \right| \\ \le&C\Big (\Vert V^*\Vert _{H^s}\Vert \nabla _h V^*\Vert _{W^{s,\infty }} + \Vert w^*\Vert _{H^s}\Vert \partial _z \widehat{V}\Vert _{W^{s,\infty }} \\&\quad + \Vert V^*\Vert _{H^s}\Vert \nabla _h \widehat{V}\Vert _{W^{s,\infty }} + \Vert \nabla _h V^*\Vert _{H^s} \Vert \widehat{V}\Vert _{W^{s,\infty }}\Big )\Vert V^*\Vert _{H^s}\\ \le&C\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})}^2 + C\Vert \mathcal {R}_{i,div}\Vert _{H^s}^2 + C(\Vert V^*\Vert _{W^{s+1,\infty }} \\&\quad + \Vert \widehat{V}\Vert _{W^{s+1,\infty }})\Vert V^*\Vert _{H^s}^2 + C \Vert \widehat{V}\Vert _{W^{s+1,\infty }} \Vert \nabla _h V^*\Vert _{H^s} \Vert V^*\Vert _{H^s}\\ \le&C_{\nu _h}\left( 1+\Vert V^*\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{V}\Vert ^2_{W^{s+1,\infty }} \right) \Vert V^*\Vert _{H^s}^2 + \frac{\nu _h}{6} \Vert \nabla _h V^*\Vert _{H^s}^2 \\&\quad + C\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})}^2 + C\Vert \mathcal {R}_{i,div}\Vert _{H^s}^2. \end{aligned}\nonumber \\ \end{aligned}$$

(4.18)

Similarly for T, one can deduce

$$\begin{aligned} \begin{aligned}&\left| \left\langle D^\alpha (V^*\cdot \nabla _h T^* + w^* \partial _z \widehat{T} + V^*\cdot \nabla _h \widehat{T} + \widehat{V} \cdot \nabla _h T^* ), D^\alpha T^* \right\rangle \right| \\ \le&C\Big (\Vert V^*\Vert _{H^s}\Vert \nabla _h T^*\Vert _{W^{s,\infty }} \\&\quad + \Vert w^*\Vert _{H^s}\Vert \partial _z \widehat{T}\Vert _{W^{s,\infty }} + \Vert V^*\Vert _{H^s}\Vert \nabla _h \widehat{T}\Vert _{W^{s,\infty }} + \Vert \nabla _h T^*\Vert _{H^s} \Vert \widehat{V}\Vert _{W^{s,\infty }}\Big )\Vert T^*\Vert _{H^s}\\ \le&C_{\nu _h,\kappa _h}\left( 1+\Vert T^*\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{T}\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{V}\Vert ^2_{W^{s+1,\infty }} \right) (\Vert V^*\Vert _{H^s}^2 + \Vert T^*\Vert _{H^s}^2)\\&+ \frac{\nu _h}{6} \Vert \nabla _h V^*\Vert _{H^s}^2+ \frac{\kappa _h}{2} \Vert \nabla _h T^*\Vert _{H^s}^2 + C\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})}^2 + C\Vert \mathcal {R}_{i,div}\Vert _{H^s}^2. \end{aligned}\nonumber \\ \end{aligned}$$

(4.19)

For the rest of the nonlinear terms, we provide the details for $s\ge 1$. The case of $s=0$ follows easily. By the triangle inequality, one has

$$\begin{aligned} \begin{aligned}&\left| \left\langle D^\alpha (\widehat{w} \partial _z V^* + w^* \partial _z V^*), D^\alpha V^* \right\rangle \right| = \left| \left\langle D^\alpha (w \partial _z V^*), D^\alpha V^* \right\rangle \right| \\&\le \left| \left\langle D^\alpha ( w \partial _z V^*) - w \partial _z D^\alpha V^*, D^\alpha V^* \right\rangle \right| + \left| \left\langle w \partial _z D^\alpha V^*, D^\alpha V^* \right\rangle \right| := |I_1| + |I_2|. \end{aligned}\nonumber \\ \end{aligned}$$

(4.20)

Using the Hölder inequality, one has

$$\begin{aligned} |I_1| = \left| \int _{{\mathcal {D}}} \sum \limits _{|\beta |=0,\beta < \alpha }^{s-1} \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) D^{\alpha -\beta } w D^\beta \partial _z V^* \cdot {{\mathcal {D}}}^\alpha V^* d\varvec{x}\right| \le C\Vert w\Vert _{W^{s+1,\infty }}\Vert V^*\Vert _{H^s}^2.\nonumber \\ \end{aligned}$$

(4.21)

By integration by parts and thanks to the boundary condition of w, we have

$$\begin{aligned} I_2 = - \frac{1}{2} \int _{{\mathcal {D}}} |D^\alpha V^*|^2 \partial _z w d\varvec{x} \le C \Vert w\Vert _{W^{s+1,\infty }} \Vert V^*\Vert _{H^s}^2. \end{aligned}$$

(4.22)

A similar calculation for T yields

$$\begin{aligned} \left| \left\langle D^\alpha (\widehat{w} \partial _z T^* + w^* \partial _z T^*), D^\alpha T^* \right\rangle \right| \le C\Vert w\Vert _{W^{s+1,\infty }}\Vert T^*\Vert _{H^s}^2. \end{aligned}$$

(4.23)

Next, by integration by parts, and using (1.1b) and (1.1c), we have

$$\begin{aligned}&\left\langle D^\alpha \nabla _h p^*, D^\alpha V^* \right\rangle = \int _0^1 \int _{\partial \mathcal M} D^\alpha p^* D^\alpha V^*\cdot \textbf{n} d\sigma (\partial \mathcal M) dz - \int _{\mathcal {D}} D^\alpha p^* D^\alpha \nabla _h \cdot V^* d\varvec{x}\\&= \int _0^1 \int _{\partial \mathcal M} D^\alpha p^* D^\alpha V^*\cdot \textbf{n} d\sigma (\partial \mathcal M) dz + \int _{\mathcal {D}} D^\alpha p^* D^\alpha ( \partial _z w^* + \mathcal R_{i,div}) d\varvec{x}\\&= \int _{\partial {\mathcal {D}}} D^\alpha p^* (D^\alpha V^*, D^\alpha w^*)^\top \cdot \textbf{n} d\sigma (\partial {\mathcal {D}}) + \int _{\mathcal {D}} D^\alpha p^* D^\alpha \mathcal R_{i,div} d\varvec{x} \\&\quad -\int _{\mathcal {D}} D^\alpha \partial _z p^* D^\alpha w^* d\varvec{x}\\&= \int _{\partial {\mathcal {D}}} D^\alpha p^* (D^\alpha V^*, D^\alpha w^*)^\top \cdot \textbf{n} d\sigma (\partial {\mathcal {D}}) + \int _{\mathcal {D}} D^\alpha p^* D^\alpha \mathcal R_{i,div} d\varvec{x} \\&\quad + \int _{\mathcal {D}} D^\alpha (T^* + \mathcal R_{i,p}) D^\alpha w^* d\varvec{x}\\&:= I_1 + I_2 + I_3. \end{aligned}$$

Since the domain ${\mathcal {D}} = \mathcal M\times (0,1)$ is Lipschitz and $(D^\alpha p^*, D^\alpha V^*, D^\alpha w^*) \in W^{1,\infty }({\mathcal {D}})$, the trace theorem and the Hölder inequality yield

$$\begin{aligned} \begin{aligned} I_1&\le C\left( \Vert D^\alpha p^*\Vert _{L^\infty ( \partial {\mathcal {D}})} + \Vert D^\alpha V^*\Vert _{L^\infty ( \partial {\mathcal {D}})} + \Vert D^\alpha w^*\Vert _{L^\infty ( \partial {\mathcal {D}})}\right) \Vert \mathcal R_{b}[s]\Vert _{L^1(\partial {\mathcal {D}})}\\&\le C \left( \Vert D^\alpha p^*\Vert _{W^{1,\infty }} + \Vert D^\alpha V^*\Vert _{W^{1,\infty }} + \Vert D^\alpha w^*\Vert _{W^{1,\infty }}\right) \Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})}\\&\le C \left( \Vert p^*\Vert _{W^{s+1,\infty }} + \Vert V^*\Vert _{W^{s+1,\infty }} + \Vert w^*\Vert _{W^{s+1,\infty }}\right) \Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})}. \end{aligned}\nonumber \\ \end{aligned}$$

(4.24)

By the Cauchy–Schwarz inequality, we have

$$\begin{aligned} \begin{aligned} I_2 \le C \Vert p^*\Vert _{H^{s}} \Vert \mathcal R_{i,div}\Vert _{H^{s}}. \end{aligned} \end{aligned}$$

(4.25)

Thanks to (4.16), by the Cauchy–Schwarz inequality and Young’s inequality,

$$\begin{aligned} \begin{aligned} I_3&\le \Vert w^*\Vert _{H^s}\Vert \mathcal {R}_{i,p}\Vert _{H^s} + \Vert T^*\Vert _{H^s}(\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})} + \Vert \nabla V^*\Vert _{H^s} + \Vert \mathcal {R}_{i,div}\Vert _{H^s})\\&\le \Vert w^*\Vert _{H^s}\Vert \mathcal {R}_{i,p}\Vert _{H^s}+ \Vert T^*\Vert _{H^s}(\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})} + \Vert \mathcal {R}_{i,div}\Vert _{H^s}) + C_{\nu _h} \Vert T^*\Vert ^2_{H^s} \\&\quad +\frac{\nu _h}{6} \Vert \nabla V^*\Vert _{H^s}. \end{aligned}\nonumber \\ \end{aligned}$$

(4.26)

For the rest two terms in (4.17), applying the Cauchy-Schwarz inequality gives

$$\begin{aligned} \begin{aligned} \left\langle D^\alpha \mathcal R_{i,V}, D^\alpha V^* \right\rangle + \left\langle D^\alpha \mathcal R_{i,T}, D^\alpha T^* \right\rangle&\le \Vert \mathcal R_{i,V}\Vert _{H^s} \Vert V^*\Vert _{H^s} + \Vert \mathcal R_{i,T}\Vert _{H^s} \Vert T^*\Vert _{H^s}\\&\le \Vert \mathcal R_{i,V}\Vert _{H^s} \Vert V^*\Vert _{W^{k,\infty }} + \Vert \mathcal R_{i,T}\Vert _{H^s} \Vert T^*\Vert _{W^{k,\infty }}. \end{aligned}\nonumber \\ \end{aligned}$$

(4.27)

Combining the estimates (4.17)–(4.27), one gets

$$\begin{aligned} \begin{aligned}&\frac{d}{dt} \left( \Vert V^*\Vert _{H^s}^2 + \Vert T^*\Vert _{H^s}^2\right) + \nu _h \Vert \nabla _h V^*\Vert _{H^s}^2 + \kappa _h \Vert \nabla _h T^*\Vert _{H^s}^2\\&\le C_{\nu _h,\kappa _h}\big (1+\Vert V^*\Vert ^2_{W^{s+1,\infty }}+\Vert T^*\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{V}\Vert ^2_{W^{s+1,\infty }} \\&\quad + \Vert \widehat{T}\Vert ^2_{W^{s+1,\infty }} + \Vert w\Vert ^2_{W^{s+1,\infty }} \big )(\Vert V^*\Vert _{H^s}^2 + \Vert T^*\Vert _{H^s}^2)\\&\quad + C_{\nu _h}\big (\Vert \mathcal R_{i,div}\Vert _{H^s}+ \Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})} + \Vert \mathcal R_{i,p}\Vert _{H^s} \\ {}&\quad + \Vert p^*\Vert _{W^{s+1,\infty }} + \Vert V^*\Vert _{W^{s+1,\infty }} + \Vert w^*\Vert _{W^{s+1,\infty }} + \Vert T^*\Vert _{W^{s+1,\infty }} \big )\\&\quad \times \big (\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})} + \Vert \mathcal R_{i,div}\Vert _{H^s} \\&\quad + \Vert \mathcal R_{i,V}\Vert _{H^s} + \Vert \mathcal R_{i,T}\Vert _{H^s} + \Vert \mathcal R_{i,p}\Vert _{H^s} \big )\\&:= C_1(\Vert V^*\Vert _{H^s}^2 + \Vert T^*\Vert _{H^s}^2) + C_2 \big (\Vert \mathcal R_{b}[s]\Vert _{L^2(\partial {\mathcal {D}})} + \Vert \mathcal R_{i,div}\Vert _{H^s} \\ {}&\quad + \Vert \mathcal R_{i,V}\Vert _{H^s} + \Vert \mathcal R_{i,T}\Vert _{H^s} + \Vert \mathcal R_{i,p}\Vert _{H^s} \big ). \end{aligned} \end{aligned}$$

Thanks to Gronwall inequality, we obtain that, for any $t\in [0,{\mathcal {T}}]$

$$\begin{aligned} \Vert V^*(t)\Vert _{H^s}^2 + \Vert T^*(t)\Vert _{H^s}^2\le & {} \left( {\mathcal {E}}_G^t[s;\theta ](t)^2 + \int _0^{t} C_2(\tilde{t})({\mathcal {E}}_G^i[s;\theta ] + {\mathcal {E}}_G^b[s;\theta ])(\tilde{t}) d \tilde{t}\right) \\{} & {} \exp \left( \int _0^{t} C_1(\tilde{t})d\tilde{t}\right) . \end{aligned}$$

By integrating t over $[0,{\mathcal {T}}]$,

$$\begin{aligned} \int _0^{{\mathcal {T}}} \left( \Vert V^*(t)\Vert _{H^s}^2 + \Vert T^*(t)\Vert _{H^s}^2\right) dt&\le C_{{\mathcal {T}}}\left( {\mathcal {E}}_G[s;\theta ]^2 + {\mathcal {E}}_G[s;\theta ] \int _0^{{\mathcal {T}}} C_2(t) dt\right) \\&\exp \left( \int _0^{{\mathcal {T}}} C_1(t)d t\right) . \end{aligned}$$

The derivation till now gives both $C_1$ and $C_2$ depending on the PINNs approximation $(\widehat{V}, \widehat{w}, \widehat{p}, \widehat{T})$, thus if they are large the total error may not be under control. To overcome this issue, we next find proper upper bounds for $C_1$ and $C_2$ that are independent of the outputs of the neural networks.

First, we apply triangle inequality to bound

$$\begin{aligned} \Vert \varphi ^*\Vert _{W^{s+1,\infty }} \le \Vert \varphi \Vert _{W^{s+1,\infty }} + \Vert \widehat{\varphi }\Vert _{W^{s+1,\infty }}, \quad \varphi \in \{V,p,w,T\}. \end{aligned}$$

This implies

$$\begin{aligned} C_1 \le C_{\nu _h,\kappa _h}\left( 1+\Vert V\Vert ^2_{W^{s+1,\infty }}+\Vert T\Vert ^2_{W^{s+1,\infty }} + \Vert w\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{V}\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{T}\Vert ^2_{W^{s+1,\infty }} \right) . \end{aligned}$$

As the true solution V, T, w satisfy (4.15), there exists a bound $C_{{\mathcal {T}},s+1}$ such that

$$\begin{aligned} \int _0^{{\mathcal {T}}} \left( \Vert V\Vert ^2_{W^{s+1,\infty }}+\Vert T\Vert ^2_{W^{s+1,\infty }}+ \Vert w\Vert ^2_{W^{s+1,\infty }} \right) dt\le C_{{\mathcal {T}},s+1} . \end{aligned}$$

(4.28)

For the PINNs approximation $\widehat{V}$ and $\widehat{T}$, by Sobolev inequality we have

$$\begin{aligned} \int _0^{{\mathcal {T}}} \left( \Vert \widehat{V}\Vert ^2_{W^{s+1,\infty }} + \Vert \widehat{T}\Vert ^2_{W^{s+1,\infty }}\right) dt&\le C \int _0^{{\mathcal {T}}} \left( \Vert \widehat{V}\Vert ^2_{H^{s+3}} + \Vert \widehat{T}\Vert ^2_{H^{s+3}}\right) dt \le C\mathcal {E}_G^p[s;\theta ]^2 \\&\le \frac{C\mathcal {E}_G[s;\theta ]^2}{\lambda }. \end{aligned}$$

Therefore,

$$\begin{aligned} \int _0^{{\mathcal {T}}} C_1(t) dt \le C_{\nu _h,\kappa _h} ( C_{{\mathcal {T}},s+1} + \frac{C\mathcal {E}_G[s;\theta ]^2}{\lambda }). \end{aligned}$$

Similarly, we can find that $C_2$ can be bounded as

$$\begin{aligned} \int _0^{{\mathcal {T}}} C_2(t) dt \le C_{{\mathcal {T}}} \left( \int _0^{{\mathcal {T}}} C^2_2(t) dt\right) ^{\frac{1}{2}} \le C_{\nu _h,{\mathcal {T}}} \left( \mathcal {E}_G[s;\theta ] + C_{{\mathcal {T}}, s+1}^{\frac{1}{2}} + \frac{C\mathcal {E}_G[s;\theta ]}{\sqrt{\lambda }}\right) . \end{aligned}$$

This implies that

$$\begin{aligned} {\mathcal {E}}[s;\theta ]^2 \le C_{\nu _h,{\mathcal {T}}}&\left( \mathcal {E}_G^2[s;\theta ] (1+\frac{1}{\sqrt{\lambda }}) + \mathcal {E}_G[s;\theta ] C_{{\mathcal {T}},s+1}^{\frac{1}{2}}\right) \nonumber \\&\exp \left( C_{\nu _h,\kappa _h} \left( C_{{\mathcal {T}},s+1} + \frac{C\mathcal {E}_G[s;\theta ]^2}{\lambda }\right) \right) . \end{aligned}$$

(4.29)

4.3 Controlling the difference between generalization error and training error

In this section, we will answer question SubQ3: given PINNs $(\widehat{V}, \widehat{w}, \widehat{p}, \widehat{T}):=( V_\theta , w_\theta , p_\theta , T_\theta )$, can the difference between the corresponding generalization error and the training error be made arbitrarily small?

This question can be answered by the well-established results on numerical quadrature rules. Given $\Lambda \subset \mathbb {R}^d$ and $f\in L^1(\Lambda )$, choose some quadrature points $x_m\in \Lambda $ for $1\le m\le M$, and quadrature weights $w_m>0$ for $1\le m\le M$, and consider the approximation $\frac{1}{M}\sum _{m=1}^M w_m f(x_m) \approx \int _\Lambda f(x)dx. $ The accuracy of this approximation depends on the chosen quadrature rule, the number of quadrature points M and the regularity of f. In the PEs case, the problem is low-dimensional $d\le 4$, and allows using standard deterministic numerical quadrature points. As in [24], we consider the midpoint rule: for $l>0$, we partition $\Lambda $ into $M \sim (\frac{1}{l})^d$ cubes of edge length l and we denote by $\{x_m\}_{m=1}^M$ the midpoints of these cubes. The formula and accuracy of the midpoint rule $\mathcal Q_M^\Lambda $ are then given by,

$$\begin{aligned} \mathcal Q_M^\Lambda [f]:= \frac{1}{M}\sum _{m=1}^Mf(y_m), \qquad \left| \int _\Lambda f(y) dy - \mathcal Q_M^\Lambda [f]\right| \le C_f M^{-2/d} \le C_f l^2, \end{aligned}$$

(4.30)

where $C_f \le C\Vert f\Vert _{C^2}$. We remark that for high dimensional PDEs, one will need mesh-free methods to sample points in $\Lambda $, such as Monte Carlo, to avoid the curse of dimensionality, for instance, see [76].

Let us fix the sample sets ${\mathcal {S}}=({\mathcal {S}}_i,{\mathcal {S}}_t,{\mathcal {S}}_b)$ appearing in (2.9) as the midpoints in each corresponding domain, that is,

$$\begin{aligned} \begin{aligned}&{\mathcal {S}}_i:= \{(t_n,x_n,y_n,z_n): (t_n,x_n,y_n,z_n) \text { are midpoints of cubes of edge length}\,l_i \\&\qquad \qquad \text { in } [0,{\mathcal {T}}]\times {\mathcal {D}} \},\\&{\mathcal {S}}_t:= \{(x_n,y_n,z_n): (x_n,y_n,z_n) \text { are midpoints of cubes of edge length}\,l_t \\&\qquad \qquad \text { in } {\mathcal {D}} \},\\&{\mathcal {S}}_b:= \{(t_n,x_n,y_n,z_n): (t_n,x_n,y_n,z_n) \text { are midpoints of cubes of edge length}\, l_b \\&\qquad \qquad \text { in } [0,{\mathcal {T}}]\times \partial {\mathcal {D}} \}, \end{aligned}\nonumber \\ \end{aligned}$$

(4.31)

and the edge lengths and quadrature weights satisfy

$$\begin{aligned} l_i \sim | {\mathcal {S}}_i|^{-\frac{1}{d+1}}, \quad l_t \sim | {\mathcal {S}}_t|^{-\frac{1}{d}}, \quad l_b \sim | {\mathcal {S}}_b|^{-\frac{1}{d}}, \quad w_n^i = \frac{1}{| {\mathcal {S}}_i|}, \quad w_n^t = \frac{1}{| {\mathcal {S}}_t|}, \quad w_n^b = \frac{1}{| {\mathcal {S}}_b|}.\nonumber \\ \end{aligned}$$

(4.32)

Based on the estimate (4.30), we have the following theorem concerning the control of the generalization error from the training error.

Theorem 4.3

(Answer of SubQ3) Suppose that $k\ge 5$ and $0\le s\le k-5$, let $n=\max \{k^2,d(s^2-s+k+d)\}$. Consider the PINNs $( V_\theta , w_\theta , p_\theta , T_\theta )$ with the generalization error $\mathcal {E}_G[s;\theta ]$ and the training error $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]$. Then their difference depends on the size of ${\mathcal {S}}$:

$$\begin{aligned} \Big |\mathcal {E}_G[s;\theta ]^2 - \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]^2\Big |\le C_{i} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} + C_{t} |{\mathcal {S}}_t|^{-\frac{2}{d}} + C_{b} |{\mathcal {S}}_b|^{-\frac{2}{d}}, \end{aligned}$$

where

$$C_{i} = C_s N^{(3d+s+7)(6\,s+24)}(\ln N)^{6(s+4)^2} +C_Q + \lambda C_s N^{(3d+s+8)(6\,s+30)}(\ln N)^{6(s+5)^2}$$

$$\begin{aligned} C_{t} = C_{b} = C_s N^{(3d+s+5)(6s+12)}(\ln N)^{6(s+2)^2}. \end{aligned}$$

Proof

Following from (4.30) and the fact that $( V_\theta , w_\theta , p_\theta , T_\theta )$ are smooth functions, one obtains the estimate

$$\begin{aligned}&\left| \int _0^{{\mathcal {T}}}\Vert \mathcal R_{i,V}[\theta ]\Vert _{H^s}^2 dt - \sum \limits _{(t_n,x_n,y_n,z_n)\in {\mathcal {S}}_i}\sum \limits _{|\alpha |\le s}w_n^i|D^\alpha \mathcal R_{i,V}[\theta ](t_n,x_n,y_n,z_n)|^2 \right| \\ \le&\sum \limits _{|\alpha |\le s} \left| \Vert D^\alpha \mathcal R_{i,V}[\theta ]\Vert ^2_{L^2(\Omega )} - \sum \limits _{(t_n,x_n,y_n,z_n)\in {\mathcal {S}}_i}w_n^i|D^\alpha \mathcal R_{i,V}[\theta ](t_n,x_n,y_n,z_n)|^2 \right| \\ \le&C \Vert \mathcal R_{i,V}[\theta ]\Vert _{C^{2}([0,{\mathcal {T}}]; C^{s+2}({\mathcal {D}}))}^2 |{\mathcal {S}}_i|^{-\frac{2}{d+1}}. \end{aligned}$$

Let $\theta \in \Theta _{L, W, R}$ with $L=3$. Note that $n=\max \{k^2,d(s^2-s+k+d)\}$, therefore by Theorem 4.1 and Lemma A.1, we know that $R=\mathcal O(N\ln (N))$ and $W=\mathcal O(N^{d+1})$. Now by virtue of Lemma A.2 we can compute that

$$\begin{aligned}&C \Vert \mathcal R_{i,V}[\theta ]\Vert _{C^{2}([0,{\mathcal {T}}]; C^{s+2}({\mathcal {D}}))}^2 |{\mathcal {S}}_i|^{-\frac{2}{d+1}} \\&\quad \le C_{s} (W^3 R^{s+4})^{6(s+4)}\\&\quad \le C_s (N^{3d+3} N^{s+4}\ln ^{s+4}(N))^{6s+24} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} \le C_s N^{(3d+s+7)(6s+24)}(\ln N)^{6(s+4)^2} |{\mathcal {S}}_i|^{-\frac{2}{d+1}}. \end{aligned}$$

Similarly we can obtain the following bounds.

$$\begin{aligned}&\Big |\mathcal {E}_G^i[s;\theta ]^2 - \mathcal {E}_{T}^i[s;\theta ;{\mathcal {S}}_i]^2 \Big |\le (C_s N^{(3d+s+7)(6s+24)}(\ln N)^{6(s+4)^2} +C_Q) |{\mathcal {S}}_i|^{-\frac{2}{d+1}},\\&\Big |\mathcal {E}_G^t[s;\theta ]^2 - \mathcal {E}_{T}^t[s;\theta ;{\mathcal {S}}_t]^2 \Big |\le C_s N^{(3d+s+5)(6s+12)}(\ln N)^{6(s+2)^2} |{\mathcal {S}}_t|^{-\frac{2}{d}},\\&\Big |\mathcal {E}_G^b[s;\theta ]^2 - \mathcal {E}_{T}^b[s;\theta ;{\mathcal {S}}_b]^2 \Big |\le C_s N^{(3d+s+5)(6s+12)}(\ln N)^{6(s+2)^2} |{\mathcal {S}}_b|^{-\frac{2}{d}}. \end{aligned}$$

Finally for the penalty term we have

$$\begin{aligned} \Big |\lambda \mathcal {E}_G^p[s;\theta ]^2 - \lambda \mathcal {E}_{T}^p[s;\theta ;{\mathcal {S}}_i]^2 \Big |\le \lambda C_s N^{(3d+s+8)(6s+30)}(\ln N)^{6(s+5)^2} |{\mathcal {S}}_i|^{-\frac{2}{d+1}}. \end{aligned}$$

Remark 4

The above result is an a priori estimate in the sense that $\Big |\mathcal {E}_G[s;\theta ]^2 - \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]^2\Big |$ is controlled by a quantity purely depending on the neural network architectures before training.

From the result above we remark that, in order to have a small difference between $\mathcal {E}_G[s;\theta ]$ and $\mathcal {E}_{T}[s;\theta ]$, one needs to pick sufficiently large sample sets ${\mathcal {S}}=({\mathcal {S}}_i,{\mathcal {S}}_t, {\mathcal {S}}_b)$ at the order of:

$$\begin{aligned} |{\mathcal {S}}_i|&= \mathcal O\Big (\max \{\lambda N^{(3d+s+8)(6s+30)(d+1)} (\ln N)^{6(d+1)(s+5)^2},\nonumber \\&N^{(3d+s+7)(6s+24)(d+1)} (\ln N)^{6(d+1)(s+4)^2} \}\Big ), \nonumber \\ |{\mathcal {S}}_b|&= |{\mathcal {S}}_t| = \mathcal O\left( N^{(3d+s+5)(6s+12)d} (\ln N)^{6d(s+2)^2}\right) . \end{aligned}$$

(4.33)

4.4 Answers to Q1 and Q2

Combining the answers to SubQ1–SubQ3, i.e., Theorem 4.1–4.3, we have the following theorem which answers Q1 and Q2.

Theorem 4.4

Let $d\in \{2,3\}$, ${\mathcal {T}}>0$, $k\ge 5$, $r> \frac{d}{2} + 2k$, $0\le s\le k-5$, $n=\max \{k^2,d(s^2-s+k+d)\}$, and assume that $V_0\in H_e^r({\mathcal {D}})$, $T_0\in H_o^r({\mathcal {D}})$, and $Q\in C^{k-1}([0,{\mathcal {T}}]; H_o^r({\mathcal {D}}))$. Let (V, w, p, T) be the unique strong solution to system (1.1).

(i)
(answer of Q1) For every $N>5$, there exist tanh neural networks $(V_\theta ,w_\theta ,p_\theta ,T_\theta )$ with two hidden layers, of widths at most and such that for every $0\le s\le k-5$,
$$\begin{aligned} \mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]^2 \le C\left( \frac{(\lambda +1)(1+\ln ^{2s+6} N)}{N^{2k-2s-8}} + \lambda \right) + C_{i} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} + C_{t} |{\mathcal {S}}_t|^{-\frac{2}{d}} + C_{b} |{\mathcal {S}}_b|^{-\frac{2}{d}}, \end{aligned}$$
where the constants $C, C_{i}, C_b, C_t$ are defined based on the constants appearing in Theorems 4.1 and 4.3. For arbitrary $\varepsilon >0$, one can first choose $\lambda $ small enough, and then either require N small and k large enough so that $2k-2s-8$ large enough, or N large enough, and $| {\mathcal {S}}_i|, | {\mathcal {S}}_b|, | {\mathcal {S}}_t|$ large enough according to (4.33), so that $\mathcal {E}_{T}[s;\theta ;{\mathcal {S}}]^2 < \varepsilon $.
(ii)
(answer of Q2) For $0\le s \le 2k-1$, the total error satisfies
$$\begin{aligned} {\mathcal {E}}[s;\theta ]^2 \le&C_{\nu _h,{\mathcal {T}}} \Big ((\mathcal {E}_{T}^2[s;\theta ] + C_{i} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} + C_{t} |{\mathcal {S}}_t|^{-\frac{2}{d}} + C_{b} |{\mathcal {S}}_b|^{-\frac{2}{d}}) (1+\frac{1}{\sqrt{\lambda }})\\&\hspace{1.5cm}+ \left( \mathcal {E}_{T}[s;\theta ] + (C_{i} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} + C_{t} |{\mathcal {S}}_t|^{-\frac{2}{d}} + C_{b} |{\mathcal {S}}_b|^{-\frac{2}{d}})^{\frac{1}{2}}\right) C_{{\mathcal {T}},s+1}^{\frac{1}{2}}\Big )\\&\times \exp \left( C_{\nu _h,\kappa _h} \left( C_{{\mathcal {T}},s+1} + \frac{C(\mathcal {E}_{T}[s;\theta ]^2+ C_{i} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} + C_{t} |{\mathcal {S}}_t|^{-\frac{2}{d}} + C_{b} |{\mathcal {S}}_b|^{-\frac{2}{d}})}{\lambda }\right) \right) , \end{aligned}$$
where the constants are defined based on the constants appearing in Theorems 4.2 and 4.3. In particular, when the training error is small enough and $C_{i} |{\mathcal {S}}_i|^{-\frac{2}{d+1}} + C_{t} |{\mathcal {S}}_t|^{-\frac{2}{d}} + C_{b} |{\mathcal {S}}_b|^{-\frac{2}{d}}$ is small enough, the total error will be small.

Proof

The proof follows from the proof of Theorem 4.1–4.3.

Remark 5

Theorem 4.4 implies that setting $s>0$ in the training error (2.8) can guarantee the control of the total error (2.10) under $H^s$ norm. However, it may be unnecessary to use $s>0$ in (2.8) if one only needs to bound (2.10) under $L^2$ norm. See more details in the next section on the numerical experiments.

4.5 Conclusion and discussion

In this paper, we investigate the error analysis for PINNs approximating viscous PEs. We answer Q1 and Q2 by giving positive results for SubQ1–SubQ3. In particular, all the estimates we obtain are a priori estimates, meaning that they depend only on the PDE solution, the choice of neural network architectures, and the sample size in the quadrature approximation, but not on the actual trained network parameters. Such estimates are crucial when designing a model architecture and selecting hyperparameters prior to the training.

Our key step of obtaining such a priori estimates is to include a penalty term in the generalization error $\mathcal {E}_G[s;\theta ]$ and training error $\mathcal {E}_{T}[s;\theta ]$. This idea is inspired by [4] in the study of PINNs for 2D NSE. In fact, such a priori estimates hold for both 2D NSE and 3D viscous PEs for any time ${\mathcal {T}}>0$, but are only true for a short time interval for 3D NSE. The main reason is that the global well-posedness of the 3D NSE remains open and is one of the most challenging mathematical problems. To bypass this issue and still get a priori estimates for 3D NSE, [24] instead requires high regularity $H^k$ with $k>6(3d+8)$ for the initial condition.

Our result is the first one to consider higher order error estimates under mild initial condition $H^k$ with $k \ge 5$. In particular, we prove that one can control the total error ${\mathcal {E}}[s;\theta ]$ for $s\ge 0$ provided that $\mathcal {E}_{T}[s;\theta ]$ is used as the loss function during the training with the same order s.

5 Numerical experiments

For the numerical experiments, we consider the 2D case and set $f_0 = 0$. System (1.1) reduces to

$$\begin{aligned}&u_t + uu_x + wu_z -\nu _h u_{xx} - \nu _z u_{zz} + p_x = 0 , \end{aligned}$$

(5.1a)

$$\begin{aligned}&\partial _z p + T= 0, \end{aligned}$$

(5.1b)

$$\begin{aligned}&u_x + w_z =0, \end{aligned}$$

(5.1c)

$$\begin{aligned}&T_t + uT_x + wT_z -\kappa _h T_{xx} - \kappa _z T_{zz} = Q. \end{aligned}$$

(5.1d)

We consider the following Taylor-Green vortex as the benchmark on the domain $(x,z) \in [0, 1]^2$ and $t \in [0, 1]$:

$$\begin{aligned} {\left\{ \begin{array}{ll} u = -\sin (2\pi x) \cos (2\pi z) \exp (-4\pi ^2(\nu _h+\nu _z)t)\\ w= \cos (2\pi x) \sin (2\pi z) \exp (-4\pi ^2(\nu _h+\nu _z)t)\\ p = \frac{1}{4} \cos (4\pi x) \exp (-8\pi ^2(\nu _h+\nu _z)t) + \frac{1}{2\pi } \cos (2\pi z) \exp (-4\pi ^2\kappa _zt)\\ T = \sin (2\pi z) \exp (-4\pi ^2 \kappa _zt)\\ Q = \pi \cos (2\pi x) \sin (4\pi z) \exp (-4\pi ^2 (\nu _h+\nu _z + \kappa _z ) t). \end{array}\right. } \end{aligned}$$

(5.2)

In this section we show the performance of PINNs approximating system (5.1) using the benchmark solution (5.2) with $\nu _z = \nu _h = \kappa _z = \kappa _h = 0.01$ (Case 1) and $\nu _z = \kappa _z = 0$, $\nu _h = \kappa _h = 0.01$ (Case 2). We will perform the result by setting $s=0$ ($L^2$ residual) or $s=1$ ($H^1$ residual) in the training error (2.8) with $\lambda = 0$ during training, and compare their $L^2$ and $H^1$ total error (2.10).

Remark 6

The reason to set $\lambda = 0$ is twofold: the evaluation of $H^{s+3}$ will slow down the algorithm due to computing many higher-order derivatives, and we observe in experiments that the total error is already sufficiently small without including the $\mathcal {E}_{T}^p$ term in (2.8). In other words, the inclusion of $\mathcal {E}_{T}^p$ is rather technical and mainly aims to provide prior error estimates rather than posterior ones.

For the PINNs architecture, we make use of four fully-connected multilayer perceptrons (MLP), one for each of the unknown functions (u, w, p, T), where each MLP consists of 2 hidden layers with 32 neurons per layer. In all cases, the quadrature for minimizing the training error is computed at equally spaced points using the midpoint rule and by taking 5751 points in the interior of the spatial domain, 1024 points at the initial time $t=0$, and 544 points on the spatial boundary. Following from the midpoint rule, the quadrature weights in (2.9) are $w^i_n = 1/5751$, $w^t_n = 1/1024$, and $w^b_n = 1/544$ for all n. The learning rate is 1e–4 while using the Adam algorithm for optimizing the training residuals, and the activation function used for all networks is the hyperbolic tangent function.

Figures 1 and 2 depict the $L^2$ and $H^1$ errors of the PINNs after being trained using the residuals in the cases $s = 0$ and $s = 1$. Since the pressure p is defined up to a constant term, we only plot the $L^2$ error of $\partial _x p_\theta $ and $\partial _z p_\theta $, rather than the $L^2$ or $H^1$ error of $p_\theta $ itself. Tables 1 and 2 give the absolute and relative total error ${\mathcal {E}}[s;\theta ]$ defined in (2.10) with $s=0$ or $s=1$ trained using the $L^2$ residuals or $H^1$ residuals.

As shown in the figures and tables, the $L^2$ errors of the PINNs when trained with the $H^1$ residuals are not significantly improved compared to the ones trained with the $L^2$ residuals. We think this is due to the already-good learning of solutions under $L^2$ residuals, and that the $L^2$ residuals are sufficient to control the $L^2$ error. However, as expected from the analysis in Sect. 4, we observe a noticeably smaller error in the $H^1$ norm (and in the $L^2$ norm for $\partial _x p_\theta $ and $\partial _x p_\theta $) when trained with the $s=1$ residuals, indicating that the derivatives of the unknown functions are learned better under $H^1$ residuals.

Table 1 Case 1. Absolute and relative total error ${\mathcal {E}}[s;\theta ]$ for $s = 0, 1$ for the PINNs trained by minimizing the training error ${\mathcal {E}}_T[r; \theta ;{\mathcal {S}}]$ for $r=0, 1$

Full size table

Table 2 Case 2. Absolute and relative total error ${\mathcal {E}}[s;\theta ]$ for $s = 0, 1$ for the PINNs trained by minimizing the training error ${\mathcal {E}}_T[r; \theta ;{\mathcal {S}}]$ for $r=0, 1$

Full size table

The PINNs and loss functions for the training error were all implemented using the DeepXDE library [62]. The code for the results in this paper can be found at https://github.com/alanraydan/PINN-PE.

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Adams, R.A., Fournier, J.J.F.: Sobolev spaces. Elsevier, Amsterdam (2003)
MATH Google Scholar
Azérad, P., Guillén, F.: Mathematical justification of the hydrostatic approximation in the primitive equations of geophysical fluid dynamics. SIAM J. Math. Anal. 33(4), 847–859 (2001)
MathSciNet MATH Google Scholar
Beale, J.T., Kato, T., Majda, A.: Remarks on the breakdown of smooth solutions for the 3-d Euler equations. Commun. Math. Phys. 94(1), 61–66 (1984)
MathSciNet MATH Google Scholar
Biswas, A., Tian, J., Ulusoy, S.: Error estimates for deep learning methods in fluid dynamics. Numerische Mathematik 151(3), 753–777 (2022)
MathSciNet MATH Google Scholar
Blumen, W.: Geostrophic adjustment. Rev. Geophys. 10(2), 485–528 (1972)
Google Scholar
Bousquet, A., Hong, Y., Temam, R., Tribbia, J.: Numerical simulations of the two-dimensional inviscid hydrostatic primitive equations with humidity and saturation. J. Sci. Comput. 83(2), 1–24 (2020)
MathSciNet Google Scholar
Brenier, Y.: Homogeneous hydrostatic flows with convex velocity profiles. Nonlinearity 12(3), 495 (1999)
MathSciNet MATH Google Scholar
Brenier, Y.: Remarks on the derivation of the hydrostatic Euler equations. Bull. Sci. Math. 127(7), 585–595 (2003)
MathSciNet MATH Google Scholar
Brzeźniak, Z., Slavík, J.: Well-posedness of the 3d stochastic primitive equations with multiplicative and transport noise. J. Differ. Equ. 296, 617–676 (2021)
MathSciNet MATH Google Scholar
Cao, C., Titi, E.S.: Global well-posedness of the three-dimensional viscous primitive equations of large scale ocean and atmosphere dynamics. Ann. Math. 245– 267 (2007)
Cao, C., Ibrahim, S., Nakanishi, K., Titi, E.S.: Finite-time blowup for the inviscid primitive equations of oceanic and atmospheric dynamics. Commun. Math. Phys. 337(2), 473–482 (2015)
MathSciNet MATH Google Scholar
Cao, C., Li, J., Titi, E.S.: Global well-posedness of the three-dimensional primitive equations with only horizontal viscosity and diffusion. Commun. Pure Appl. Math. 69(8), 1492–1531 (2016)
MathSciNet MATH Google Scholar
Cao, C., Li, J., Titi, E.S.: Strong solutions to the 3d primitive equations with only horizontal dissipation: near h1 initial data. J. Funct. Anal. 272(11), 4606–4641 (2017)
MathSciNet MATH Google Scholar
Cao, C., Li, J., Titi, E.S.: Global well-posedness of the 3d primitive equations with horizontal viscosity and vertical diffusivity. Phys. D: Nonlinear Phenomena 412, 132606 (2020)
MathSciNet MATH Google Scholar
Cao, C., Lin, Q., Titi, E.S.: On the well-posedness of reduced 3 d primitive geostrophic adjustment model with weak dissipation. J. Math. Fluid Mech. 22, 1–34 (2020)
MathSciNet MATH Google Scholar
Charney, J.: The use of the primitive equations of motion in numerical prediction. Tellus 7(1), 22–26 (1955)
MathSciNet Google Scholar
Chen, C., Liu, H., Beardsley, R.C.: An unstructured grid, finite-volume, three-dimensional, primitive equations ocean model: application to coastal ocean and estuaries. J. Atmosp. Ocean. Technol. 20(1), 159–186 (2003)
Google Scholar
Chen, Q., Shiue, M.-C., Temam, R., Tribbia, J.: Numerical approximation of the inviscid 3d primitive equations in a limited domain. ESAIM: Math. Modell. Numer. Anal. 46(3), 619–646 (2012)
MathSciNet MATH Google Scholar
Collot, C., Ibrahim, S., Lin, Q.: Stable singularity formation for the inviscid primitive equations (2021). arXiv preprint arXiv:2112.09759
Cuomo, S., Di Cola, V. S., Giampaolo, F., Rozza, G., Raissi, M., Piccialli, F.: Scientific machine learning through physics-informed neural networks: where we are and what’s next (2022). arXiv preprint arXiv:2201.05624
Czarnecki, W. M, Osindero, S., Jaderberg, M., Swirszcz, G., Pascanu, R.: Sobolev training for neural networks. Adv. Neural Inform. Process. Syst. 30 (2017)
Debussche, A., Glatt-Holtz, N., Temam, R.: Local martingale and pathwise solutions for an abstract fluids model. Phys. D Nonlinear Phenomena 240(14–15), 1123–1144 (2011)
MathSciNet MATH Google Scholar
Debussche, A., Glatt-Holtz, N., Temam, R., Ziane, M.: Global existence and regularity for the 3d stochastic primitive equations of the ocean and atmosphere with multiplicative white noise. Nonlinearity 25(7), 2093 (2012)
MathSciNet MATH Google Scholar
De Ryck, T., Jagtap, A.D, Mishra, S.: Error estimates for physics-informed neural networks approximating the Navier-Stokes equations. IMA J. Numer. Anal. (2023)
De Ryck, T., Mishra, S.: Error analysis for physics informed neural networks (pinns) approximating kolmogorov pdes (2021). arXiv preprint arXiv:2106.14473
De Ryck, T., Mishra, S.: Generic bounds on the approximation error for physics-informed (and) operator learning (2022). arXiv preprint arXiv:2205.11393
De Ryck, T., Lanthaler, S., Mishra, S.: On the approximation of functions by tanh neural networks. Neural Netw. 143, 732–750 (2021)
Google Scholar
Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10(3), 195–201 (1994)
MATH Google Scholar
Furukawa, K., Giga, Y., Hieber, M., Hussein, A., Kashiwabara, T., Wrona, M.: Rigorous justification of the hydrostatic approximation for the primitive equations by scaled Navier–Stokes equations. Nonlinearity 33(12), 6502 (2020)
MathSciNet MATH Google Scholar
Gerard-Varet, D., Masmoudi, N., Vicol, V.: Well-posedness of the hydrostatic Navier–Stokes equations. Anal. PDE 13(5), 1417–1455 (2020)
MathSciNet MATH Google Scholar
Ghoul, T.E., Ibrahim, S., Lin, Q., Titi, E.S.: On the effect of rotation on the life-span of analytic solutions to the 3d inviscid primitive equations. Arch. Ration. Mech. Anal. 1– 60 (2022)
Gill, A.: Adjustment under gravity in a rotating channel. J. Fluid Mech. 77(3), 603–621 (1976)
Google Scholar
Gill, A.E., Adrian, E.: Atmosphere-ocean dynamics, p. 30. Academic press, Cambridge (1982)
Google Scholar
Glatt-Holtz, N., Temam, R.: Pathwise solutions of the 2-d stochastic primitive equations. Appl. Math. Optim. 63(3), 401–433 (2011)
MathSciNet MATH Google Scholar
Glatt-Holtz, N., Ziane, M.: The stochastic primitive equations in two space dimensions with multiplicative noise. Disc. Cont. Dyn. Syst. B 10(4), 801 (2008)
MathSciNet MATH Google Scholar
Grenier, E.: On the derivation of homogeneous hydrostatic equations. ESAIM Math. Model. Numer. Anal. 33(5), 965–970 (1999)
MathSciNet MATH Google Scholar
Han-Kwan, D., Nguyen, T.T.: Ill-posedness of the hydrostatic Euler and singular Vlasov equations. Arch. Ration. Mech. Anal. 221(3), 1317–1344 (2016)
MathSciNet MATH Google Scholar
Hermann, A.J., Owens, W.B.: Energetics of gravitational adjustment for mesoscale chimneys. J. Phys. Oceanogr. 23(2), 346–371 (1993)
Google Scholar
Hieber, M., Kashiwabara, T.: Global strong well-posedness of the three dimensional primitive equations in $L^p$-spaces. Arch. Ration. Mech. Anal. 221(3), 1077–1115 (2016)
MathSciNet MATH Google Scholar
Hieber, M., Hussein, A., Saal, M.: The primitive equations with stochastic wind driven boundary conditions: global strong well-posedness in critical spaces (2020). arXiv preprint arXiv:2009.09449
Holton, J.R.: An introduction to dynamic meteorology. Am. J. Phys. 41(5), 752–754 (1973)
Google Scholar
Hu, R., Lin, Q.: Local martingale solutions and pathwise uniqueness for the three-dimensional stochastic inviscid primitive equations. Anal. Comput. Stoch. Part. Differ. Equ. 1–49 (2022)
Hu, R., Lin, Q.: Pathwise solutions for stochastic hydrostatic euler equations and hydrostatic Navier–Stokes equations under the local rayleigh condition (2023). arXiv preprint arXiv:2301.07810
Ibrahim, S., Lin, Q., Titi, E.S.: Finite-time blowup and ill-posedness in sobolev spaces of the inviscid primitive equations with rotation. J. Differ. Equ. 286, 557–577 (2021)
MathSciNet MATH Google Scholar
Ju, N.: The global attractor for the solutions to the 3d viscous primitive equations. Disc. Cont. Dyn. Syst. 17(1), 159–179 (2006)
MathSciNet MATH Google Scholar
Ju, N.: Global uniform boundedness of solutions to viscous 3d primitive equations with physical boundary conditions, Indiana Univ. Math. J. 69, 1763–1784 (2020)
MathSciNet MATH Google Scholar
Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, Liu: Physics-informed machine learning. Nat. Rev. Phys. 3(6), 422–440 (2021)
Google Scholar
Klainerman, S., Majda, A.: Singular limits of quasilinear hyperbolic systems with large parameters and the incompressible limit of compressible fluids. Commun. Pure Appl. Math. 34(4), 481–524 (1981)
MathSciNet MATH Google Scholar
Kobelkov, G.M.: Existence of a solution “in the large’’ for the 3d large-scale ocean dynamics equations. Comptes Rendus Mathematique 343(4), 283–286 (2006)
MathSciNet MATH Google Scholar
Korn, P.: Strong solvability of a variational data assimilation problem for the primitive equations of large-scale atmosphere and ocean dynamics. J. Nonlinear Sci. 31(3), 1–53 (2021)
MathSciNet MATH Google Scholar
Krishnamupti, T.N., Bounoua, L.: An introduction to numerical weather prediction techniques. CRC Press, Boca Raton (2018)
Google Scholar
Kukavica, I., Ziane, M.: On the regularity of the primitive equations of the ocean. Nonlinearity 20(12), 2739 (2007)
MathSciNet MATH Google Scholar
Kukavica, I., Temam, R., Vicol, V.C., Ziane, M.: Local existence and uniqueness for the hydrostatic Euler equations on a bounded domain. J. Differ. Equ. 250(3), 1719–1746 (2011)
MathSciNet MATH Google Scholar
Kukavica, I., Masmoudi, N., Vicol, V., Wong, T.K.: On the local well-posedness of the prandtl and hydrostatic Euler equations with multiple monotonicity regions. SIAM J. Math. Anal. 46(6), 3865–3890 (2014)
MathSciNet MATH Google Scholar
Kuo, A.C., Polvani, L.M.: Time-dependent fully nonlinear geostrophic adjustment. J. Phys. Oceanogr. 27(8), 1614–1634 (1997)
Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Google Scholar
Lagaris, I.E., Likas, A.C., Papageorgiou, D.G.: Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw 11(5), 1041–1049 (2000)
Google Scholar
Li, J., Titi, E.S.: The primitive equations as the small aspect ratio limit of the Navier–Stokes equations: rigorous justification of the hydrostatic approximation. J. Math. Pures Appl. 124, 30–58 (2019)
MathSciNet MATH Google Scholar
Li, J., Edriss S, T., Guozhi, Y.: The primitive equations approximation of the anisotropic horizontally viscous 3d Navier–Stokes equations. J. Differ. Equ. 306, 492–524 (2022)
MATH Google Scholar
Lin, Q., Liu, X., Titi, E.S.: On the effect of fast rotation and vertical viscosity on the lifespan of the $3 d $ primitive equations. J. Math. Fluid Mech. 24, 1–44 (2022)
MathSciNet MATH Google Scholar
Liu, J.-G., Wang, C.: A fourth order numerical method for the primitive equations formulated in mean vorticity. Commun. Comput. Phys. 4, 26–55 (2008)
MathSciNet MATH Google Scholar
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: DeepXDE: a deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
MathSciNet MATH Google Scholar
Masmoudi, N., Wong, T.K.: On the h s theory of hydrostatic Euler equations. Arch. Ration. Mech. Anal. 204(1), 231–271 (2012)
MathSciNet MATH Google Scholar
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for pdes. IMA J. Numer. Anal. 42(2), 981–1022 (2022)
MathSciNet MATH Google Scholar
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating pdes. IMA J. Numer. Anal. 43(1), 1–43 (2022)
MathSciNet MATH Google Scholar
Paicu, M., Zhang, P., Zhang, Z.: On the hydrostatic approximation of the Navier–Stokes equations in a thin strip. Adv. Math. 372, 107293 (2020)
MathSciNet MATH Google Scholar
Pei, Y.: Continuous data assimilation for the 3d primitive equations of the ocean (2018). arXiv preprint arXiv:1805.06007
Plougonven, R., Zeitlin, V.: Lagrangian approach to geostrophic adjustment of frontal anomalies in a stratified fluid. Geophys. Astrophys. Fluid Dyn. 99(2), 101–135 (2005)
MathSciNet MATH Google Scholar
Raissi, M., Karniadakis, G.E.: Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018)
MathSciNet MATH Google Scholar
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
MathSciNet MATH Google Scholar
Renardy, M.: Ill-posedness of the hydrostatic euler and Navier–Stokes equations. Arch. Ration. Mech. Anal. 194(3), 877–886 (2009)
MathSciNet MATH Google Scholar
Rossby, C.-G.: On the mutual adjustment of pressure and velocity distributions in certain simple current systems, ii. J. Mar. Res 1(3), 239–263 (1938)
Google Scholar
Saal, M., Slavík, J.: Stochastic primitive equations with horizontal viscosity and diffusivity (2021). arXiv preprint arXiv:2109.14568
Samelson, R., Temam, R., Wang, C., Wang, S.: Surface pressure poisson equation formulation of the primitive equations: numerical schemes. SIAM J. Numer. Anal. 41(3), 1163–1194 (2003)
MathSciNet MATH Google Scholar
Shen, J., Wang, S.: A fast and accurate numerical scheme for the primitive equations of the atmosphere. SIAM J. Numer. Anal. 36(3), 719–737 (1999)
MathSciNet MATH Google Scholar
Sirignano, J., Spiliopoulos, K.: Dgm: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
MathSciNet MATH Google Scholar
Slavík, J.: Large and moderate deviations principles and central limit theorem for the stochastic 3d primitive equations with gradient-dependent noise. J. Theor. Prob. 1– 46 (2021)
Smagorinsky, J.: General circulation experiments with the primitive equations: I. the basic experiment. Mon. Weather Rev. 91(3), 99–164 (1963)
Google Scholar
Temam, R.: Navier–stokes equations: theory and numerical analysis. American Mathematical Soc. 343 (2001)
Wang, C., Li, S., He, D., Wang, L.: Is $ l^2$ physics-informed loss always suitable for training physics-informed neural network? (2022). arXiv preprint arXiv:2206.02016
Wong, T.K.: Blowup of solutions of the hydrostatic euler equations. Proc. Am. Math. Soc. 143(3), 1119–1125 (2015)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Q.L. would like to thank Jinkai Li for interesting discussions on higher-order regularity results for the primitive equations, and is partially supported by Hellman Family Faculty Fellowship. A.R. would like to thank Lu Lu and the DeepXDE maintenance team for providing the PINN implementation library used in this paper. R.H. was partially supported by the NSF grant DMS-1953035, and the Faculty Career Development Award, the Research Assistance Program Award, the Early Career Faculty Acceleration funding and the Regents’ Junior Faculty Fellowship at the University of California, Santa Barbara. S.T. was partially supported by the Regents Junior Faculty fellowship, Faculty Early Career Acceleration grant and Hellman Family Faculty Fellowship sponsored by the University of California Santa Barbara and the NSF under Award No. DMS-2111303. Use was made of computational facilities purchased with funds from the National Science Foundation (CNS-1725797) and administered by the Center for Scientific Computing (CSC). The CSC is supported by the California NanoSystems Institute and the Materials Research Science and Engineering Center (MRSEC; NSF DMR 1720256) at UC Santa Barbara.

Author information

Authors and Affiliations

Department of Mathematics, University of California, Santa Barbara, CA, 93106, USA
Ruimeng Hu, Quyuan Lin, Alan Raydan & Sui Tang
Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, 93106, USA
Ruimeng Hu
School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, 29634, USA
Quyuan Lin

Authors

Ruimeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Quyuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Alan Raydan
View author publications
You can also search for this author in PubMed Google Scholar
Sui Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quyuan Lin.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

This article is part of the section “Computational Approaches” edited by Siddhartha Mishra.

Appendix A. Appendix

Lemma A.1

[24, Theorem B.7] Let $d, n \ge 2, m \ge 3, \delta >0, a_i, b_i \in \mathbb {Z}$ with $a_i<b_i$ for $1 \le i \le d, \Omega =\prod _{i=1}^d\left[ a_i, b_i\right] $ and $f \in H^m(\Omega )$. Then, for every $N \in \mathbb {N}$ with $N>5$, there exists a tanh neural network $\widehat{f}^N$ with two hidden layers, one of width at most and another of width at most , such that for $k \in \{0,1,2,\dots ,m-1\}$ it holds that

$$\begin{aligned} \left\| f-\widehat{f}^N\right\| _{H^k(\Omega )} \le C_{k,m,d,f,\delta ,\Omega } (1+\ln ^k N) N^{-m+k}. \end{aligned}$$

Moreover, the weights of $\hat{f}^N$ scale as $\mathcal O(N\ln (N)+N^\gamma )$ with $\gamma = \max \{m^2, d(k^2-k+m+d) \}/n.$

Compared to Theorem B.7 in [24], here the result holds for $k\in \{0,1,2,\dots ,m-1\}$ instead of $k\in \{0,1,2\}$. Notice that $\gamma = \max \{m^2, d(2+m+d) \}/n$ in [24], which is by taking $k=2$ for the general case. The proof of Lemma A.1 follows almost the same as Theorem B.7 in [24] and Theorem 5.1 in [27], and the constant $C_{k,m,d,f,\delta ,\Omega }$ can be found by following their proofs. We omit the details.

Lemma A.2

[24, Lemma C.1] Let $d, n, L, W \in \mathbb {N}$, and let $u_\theta : \mathbb {R}^{d+1} \rightarrow \mathbb {R}^{d+1}$ be a neural network with $\theta \in \Theta _{L, W, R}$ for $L \ge 2, R, W \ge 1$; cf. Definition 2.1. Assume that $\Vert \sigma \Vert _{C^n} \ge 1$. Then it holds for $1 \le j \le d+1$ that

$$\begin{aligned} \left\| \left( u_\theta \right) _j\right\| _{C^n} \le 16^L(d+1)^{2 n}\left( e^2 n^4 W^3 R^n\Vert \sigma \Vert _{C^n}\right) ^{n L}. \end{aligned}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hu, R., Lin, Q., Raydan, A. et al. Higher-order error estimates for physics-informed neural networks approximating the primitive equations. Partial Differ. Equ. Appl. 4, 34 (2023). https://doi.org/10.1007/s42985-023-00254-y

Download citation

Received: 04 April 2023
Accepted: 03 July 2023
Published: 19 July 2023
DOI: https://doi.org/10.1007/s42985-023-00254-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Higher-order error estimates for physics-informed neural networks approximating the primitive equations

Abstract

Similar content being viewed by others

Error estimates and physics informed augmentation of neural networks for thermally coupled incompressible Navier Stokes equations

Manipulating the loss calculation to enhance the training process of physics-informed neural networks to solve the 1D wave equation

Solving multiscale steady radiative transfer equation using neural networks with uniform stability

1 Introduction

1.1 The primitive equations

1.2 PINNs

2 Preliminaries

2.1 Functional settings

2.2 Neural networks

Definition 2.1

2.3 PINNs settings

Remark 1

3 Regularity of solutions to the primitive equations

Definition 3.1

Definition 3.2

Theorem 3.3

Remark 2

Theorem 3.4

Lemma 3.5

Proof of Theorem 3.4

4 Error estimates for PINNs

4.1 Generalization error estimates

Theorem 4.1

Proof

4.2 Total error estimates

Theorem 4.2

Remark 3

Proof of Theorem 4.2

4.3 Controlling the difference between generalization error and training error

Theorem 4.3

Proof

Remark 4

4.4 Answers to Q1 and Q2

Theorem 4.4

Proof

Remark 5

4.5 Conclusion and discussion

5 Numerical experiments

Remark 6

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix A. Appendix

Appendix A. Appendix

Lemma A.1

Lemma A.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation