1 Introduction

Algebraic flux correction (AFC) schemes were proposed in [25] and have since then become an active research area [3, 5, 18, 21, 28]. These methods provide a robust framework guaranteeing that discrete maximum principles hold [6, 27] and/or that entropy conditions are satisfied in the case of nonlinear equations [22, 23]. Nonlinear AFC approaches combine a high order baseline scheme, such as the Galerkin finite element discretization, with a provably bound-preserving low order approximation. In this manner, global and/or local constraints can be imposed on the values of AFC solutions resulting from discretizations of various partial differential equations.

The focus of most efforts dealing with AFC schemes was on the development of numerical algorithms, while theoretical aspects have only recently started to attract significant interest. Barrenechea et al. [5] were the first to show solvability of a nonlinear AFC system arising from discretization of stationary convection-diffusion-reaction equations. Moreover, they prove that the scheme convergences with a rate of at least \(\frac{1}{2}\) in the AFC energy norm. In their subsequent work [6], they derived a sharper, first order error estimate under the assumption that the limiter is linearity preserving. Unfortunately, the proof technique that was used to obtain this superconvergence result relies on the presence of diffusive terms. Lohmann [27] extended the analysis of AFC schemes to the case without diffusive terms and obtained similar theoretical results for linear hyperbolic problems, again with a provable rate of \(\frac{1}{2}\). Other theoretically investigated aspects of AFC procedures include their connection to edge-based diffusion [4], proofs of invariant domain preservation for the low order method [13], and a study of a posteriori error estimators [17]. The recent work of Jha and Ahmed [18] presents the first theoretical foundation of AFC schemes for parabolic convection-diffusion-reaction equations. The AFC schemes analyzed therein are based on flux-corrected transport algorithms that are fully discrete and employ implicit time stepping.

In contrast to [18], this manuscript presents semi-discrete stability and a priori error analysis of AFC schemes for finite element discretizations of the time-dependent linear advection equation. To cure the oscillatory behavior of continuous Galerkin methods, we stabilize the antidiffusive fluxes using low order time derivatives (defined by (2.14) below). Flux correction is performed using Kuzmin’s [21] monolithic convex limiting (MCL) scheme. For analytical purposes, we make an assumption regarding compatibility of the semi-discrete approximations and corresponding time derivatives. As shown in [15, Sec. 3.3], it is possible to enforce this condition by adapting the limiting procedure of the standard MCL approach. The results obtained in this manner are slightly more diffusive but exhibit the same second-order convergence rates in practice. However, based on our experience, the additional fix only rarely needs to be activated because compatibility seems to be automatically satisfied for the standard MCL scheme in most cases. Evidence for this claim is provided in Sect. 5.4. Therefore, we do not discuss enforcement of the compatibility condition in this work and instead refer the interested reader to [15, Sec. 3.3].

We prove that the nonlinear semi-discrete scheme is stable and that for finite times its spatial accuracy w. r. t. the \({\textrm{L}}^2\) norm is at least of order \(\frac{1}{2}\) for linear finite elements and generally unstructured meshes. In practice, second order superconvergence can be expected for smooth solutions and uniform meshes, as evidenced by the numerical examples of this paper and, for instance [3, 18, 28].

The structure of this article is as follows. We begin with the formulation of the continuous problem, and review the construction of the monolithic AFC schemes under discussion. Subsequently, in individual sections, we present the main theoretical outcomes of our work, an energy estimate and an a priori error analysis. Finally, we report the results of numerical experiments and draw conclusions. The contents of this paper are to a large degree based on [14, Ch. 5], which improves upon the analysis presented in our preprint [15]. In particular, we improved both the theoretical parts and the numerical examples by properly addressing the treatment of boundary conditions and presenting numerical results not just for simple 1D problems but also in the 2D case.

2 Discretization of the advection equation

In this section, we summarize the MCL strategy for linear transport problems [21]. Our presentation includes a brief discussion of the continuous model problem, a summary of the design principles of the low order method, as well as the formulation of the corresponding monolithic flux-correction schemes. Algebraic limiters of this kind have only recently been applied to different target discretizations such as high-order discontinuous Galerkin methods, e. g., [30]. These algorithms exploit ideas originally proposed in the context of continuous finite elements. It is therefore natural to perform our analysis in this framework as well.

2.1 Continuous model problem

Let \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\) be a polyhedral domain, \(\varvec{v}\in \varvec{{\textrm{C}}}(\overline{\varOmega }\times {\mathbb {R}}_+)^d\) a known velocity field, and \(\varvec{n}\in {\mathbb {R}}^d\) the unit outward normal to \(\partial \varOmega \). We define the time-dependent in- and outflow boundaries of \(\varOmega \) as

$$\begin{aligned} \varGamma _-(t) \, {:=}\, \{\varvec{x}\in \partial \varOmega : \varvec{v}(\varvec{x},t)\cdot \varvec{n}(\varvec{x}) < 0 \}, \qquad \varGamma _+(t) \, {:=}\, \{\varvec{x}\in \partial \varOmega : \varvec{v}(\varvec{x},t)\cdot \varvec{n}(\varvec{x}) > 0 \}. \end{aligned}$$

In what follows, we suppress the dependence of \(\varGamma _\pm (t)\) on time t. The initial-boundary value problem for the linear advection equation reads

$$\begin{aligned} \partial _t u + \varvec{v}\cdot \nabla u ={}&0 \ \quad \text {in } \varOmega \times {\mathbb {R}}_+, \end{aligned}$$
(2.1a)
$$\begin{aligned} u ={}&\hat{u}\ \quad \text {on } \varGamma _- \times {\mathbb {R}}_+, \end{aligned}$$
(2.1b)
$$\begin{aligned} u ={}&u_0 \quad \text {in } \varOmega , \end{aligned}$$
(2.1c)

where \(\hat{u}\) is a given inflow boundary profile and \(u_0\) is an initial datum. For analytical purposes, we assume that the velocity field is solenoidal, i. e., \(\nabla \cdot \varvec{v}=0\) in \(\varOmega \), which allows us to interpret (2.1a) as a hyperbolic conservation law with flux function \(\varvec{f}(u,\varvec{x},t) = \varvec{v}(\varvec{x},t) u\). Let us remark that the flux correction tools discussed in this section can also be applied to problem (2.1) in the case of more general velocities.

To derive the weak formulation of (2.1), we multiply (2.1a) by a test function w and perform integration by parts. Replacing the consistent flux \(\varvec{v}\cdot \varvec{n}\, u\) appearing in the resulting boundary integral with the upwind flux

$$\begin{aligned} f_{\varvec{n}}(u,\hat{u}) \, {:=}\, {\left\{ \begin{array}{ll} \varvec{v}\cdot \varvec{n}\,u &{} \text {on } \varGamma _+, \\ \varvec{v}\cdot \varvec{n}\,\hat{u}&{} \text {on } \varGamma _-, \end{array}\right. } \end{aligned}$$
(2.2)

to incorporate the boundary data \(\hat{u}\), we obtain

$$\begin{aligned} \int _\varOmega w \partial _t u \,\textrm{d}\varvec{x}- \int _\varOmega \nabla \cdot (w\,\varvec{v}) u \,\textrm{d}\varvec{x}+ \int _{\partial \varOmega } w\, f_{\varvec{n}}(u,\hat{u}) \,\textrm{d} s= 0. \end{aligned}$$
(2.3)

With regard to the continuous weak formulation of (2.1), we follow Di Pietro and Ern [8, Chs. 2–3]. In particular, we introduce the graph space [8, Def. 2.1]

$$\begin{aligned} {\textrm{V}} \, {:=}\, \{w\in {\textrm{L}}^2(\varOmega ):\varvec{v}\cdot \nabla w \in {\textrm{L}}^2(\varOmega )\} \end{aligned}$$

and define a weak solution to (2.1) as follows.

Definition 1

(Weak solutions to the linear advection equation) A function \(u\in {\textrm{C}}({\mathbb {R}}_+; {\textrm{V}}) \cap {\textrm{C}}^1({\mathbb {R}}_+;{\textrm{L}}^2(\varOmega ))\) is a weak solution to (2.1) if \(u(\cdot ,0)=u_0\) almost everywhere in \(\varOmega \) and

$$\begin{aligned} \int _\varOmega w\,\partial _t u \,\textrm{d}\varvec{x}+ a(u,w) ={}&b(w) \qquad \forall w\in {\textrm{V}},~t \in {\mathbb {R}}_+, \end{aligned}$$
(2.4)

where

$$\begin{aligned} a(\cdot ,\cdot ):{}&{\textrm{V}} \times {\textrm{V}} \rightarrow {\mathbb {R}},&\qquad a(u,w) \, {:=}\,{}&\int _\varOmega w\, \varvec{v}\cdot \nabla u \,\textrm{d}\varvec{x}- \int _{\varGamma _-} w\, u\, \varvec{v}\cdot \varvec{n}\,\textrm{d} s, \end{aligned}$$
(2.5)
$$\begin{aligned} b(\cdot ):{}&{\textrm{V}} \rightarrow {\mathbb {R}},&\qquad b(w) \, {:=}\,{}&-\int _{\varGamma _-} w\, \hat{u}\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s. \end{aligned}$$
(2.6)

Formulation (2.4)–(2.6) is derived from (2.3) by performing integration by parts and using the definition of the upwind flux (2.2). Thus, only boundary integrals over the inlet \(\varGamma _-\) appear in (2.5) and (2.6).

Remark 1

In this work, we assume that a unique solution u in the sense of Definition 1 and [8] exists. For settings similar to ours, the validity of this assumption can be rigorously proven (see for instance [7]) but for general velocities this is not a trivial task. In principle, one can invoke the method of characteristics and use an energy estimate to show well-posedness. However, rigorous existence and uniqueness results regarding solutions of (2.4) are typically obtained under additional assumptions. For details on these issues, we refer the reader to [8, Sec. 3.1.1] and the references therein.

Remark 2

For the integrals in the weak formulation below to be well-defined, Di Pietro and Ern [8, Sec. 2.1.3] require that in- and outflow boundaries are well-separated, i. e., 

$$\begin{aligned} \inf _{(\varvec{x},\varvec{y}) \in \varGamma _- \times \varGamma _+} \Vert \varvec{x} - \varvec{y}\Vert >0. \end{aligned}$$
(2.7)

Since our analysis is based on the same variational expression, we admit that the theory dictates this assumption on the model. However, we remark that some classical benchmarks for advection problems, such as LeVeque’s solid body rotation [26] (see Sect. 5.3) do not satisfy (2.7).

2.2 Finite element discretization and low order method

The low order method that is employed in this work is the algebraic Lax–Friedrichs scheme [1, 13, 29, 33] adapted to linear advection problems. In the AFC literature, this linear version is called the discrete upwinding method because of its equivalence to the node-centered upwind finite volume scheme [25, Sec. 6]. Let us now review the main steps of deriving this low order method. First, we discretize (2.4) in space using continuous linear finite elements.

Let \({\mathcal {K}}_h = \{K^1,\ldots K^E\}\) be a simplicial mesh of \(E=E(h) \in {\mathbb {N}}\) disjoint elements such that \(\varOmega =\bigcup _{e=1}^E K^e\). Furthermore, let \(\varvec{x}_1,\ldots ,\varvec{x}_N \in \overline{\varOmega }\), \(N=N(h) \in {\mathbb {N}}\), be the vertices of the mesh and \(\varphi _1,\ldots ,\varphi _N\) be the corresponding piecewise linear Lagrange basis polynomials, satisfying \(\varphi _i(\varvec{x}_j) = \delta _{ij}\). For simplicity, we assume that the mesh has no hanging nodes. The corresponding finite element space shall be denoted as \({\textrm{V}}_h \, {:=}\, \{w_h \in {\textrm{C}}(\varOmega ):{ \left. w_h \right| _{K} } \in {\mathbb {P}}_1(K) ~\forall K \in {\mathcal {K}}_h\}\) and the semi-discrete numerical solution is expanded as follows

$$\begin{aligned} u_h(\varvec{x},t) \, {:=}\, \sum _{i=1}^N u_i(t) \varphi _i(\varvec{x}), \qquad u_i(t) = u_h(\varvec{x}_i,t). \end{aligned}$$

Testing (2.4) with \(\varphi _i\), \(i \in \{1,\ldots ,N\}\), we obtain the spatial semi-discretization

$$\begin{aligned} \sum _{j=1}^N m_{ij}\frac{\,\textrm{d}u_j}{\,\textrm{d}t} = -\sum _{j=1}^N a_{ij}\,u_j -\int _{\varGamma _-} \varphi _i \, \big (\hat{u} - \sum _{j=1}^N u_j\varphi _j\big )\, \varvec{v}\cdot \varvec{n}\,\textrm{d} s, \end{aligned}$$
(2.8)

where \(m_{ij}\) are scalar-valued entries of the consistent mass matrix

$$\begin{aligned} M=(m_{ij})_{i,j=1}^N, \qquad m_{ij}= \int _\varOmega \varphi _i\,\varphi _j \,\textrm{d}\varvec{x}, \qquad i,j \in \{1,\ldots ,N\}, \end{aligned}$$

and

$$\begin{aligned} A=(a_{ij})_{i,j=1}^N, \qquad a_{ij}= \int _\varOmega \varphi _i \,\varvec{v}\cdot \nabla \varphi _j \,\textrm{d}\varvec{x},\qquad i,j \in \{1,\ldots ,N\}. \end{aligned}$$

Let us now briefly summarize the steps to construct the low order method used in [13, 21], among others. We perform row sum mass lumping, i. e., replace the entries of M in the left hand side of (2.8) with those of

$$\begin{aligned} M_{{\textrm{L}}} = {{\,\textrm{diag}\,}}(m_1,\ldots ,m_N), \qquad m_i\, {:=}\,\sum _{j=1}^N m_{ij}= \int _\varOmega \varphi _i\,\textrm{d}\varvec{x},\qquad i\in \{1,\ldots ,N\}. \end{aligned}$$

In addition, we use the partition of unity property of basis functions, i. e., the fact that they sum to one everywhere in \(\varOmega \) to rewrite

$$\begin{aligned} \sum _{j=1}^N a_{ij}u_j = \sum _{j\in {\mathcal {N}}_i\setminus \{i\}}a_{ij}(u_j - u_i). \end{aligned}$$

Here \({\mathcal {N}}_i\) is the nodal stencil defined by

$$\begin{aligned} {\mathcal {N}}_i \, {:=}\, \{j\in \{1,\ldots ,N\}:\text {int}({{\,\textrm{supp}\,}}\varphi _i)\cap \text {int}({{\,\textrm{supp}\,}}\varphi _j) \ne \emptyset \}, \end{aligned}$$

where int(\(\cdot \)) denotes the interior of a set and supp is the support of a function. Moreover, we add diffusive fluxes of the form \(d_{ij}(u_j-u_i)\), where

$$\begin{aligned} d_{ij}= \max \{|a_{ij}|,|a_{ji}|\}, \qquad i\in \{1,\ldots ,N\},~j\in {{\mathcal {N}}_i\setminus \{i\}}. \end{aligned}$$
(2.9)

As a final modification to (2.8), we employ a lumped approximation of boundary terms. This step involves a localization of boundary integrals to individual faces on the domain boundary.

Definition 2

(Nodal boundary faces, [14]) Let \({\mathcal {F}}_{\partial \varOmega }\) denote the set of \((d-1)\)-dimensional boundary faces of \({\mathcal {K}}_h\). Then the set \({\mathcal {F}}_i\) contains all boundary faces that meet at node \(\varvec{x}_i \in \partial \varOmega \), \(i \in \{1,\ldots ,N\}\). For \(\varGamma _k \in {\mathcal {F}}_i\), we define \(\hat{u}_i^k \, {:=}\, \hat{u}(\varvec{x}_i)\) as the value of \(\hat{u}\) corresponding to \(\varGamma _k \subseteq \varGamma _-\).

The above modifications made to (2.8) yield the low order method

$$\begin{aligned} m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t} = \sum _{j\in {\mathcal {N}}_i\setminus \{i\}}(d_{ij}- a_{ij})(u_j - u_i) + \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k (\hat{u}_i^k - u_i), \qquad i\in \{1,\ldots ,N\}, \end{aligned}$$
(2.10)

where

$$\begin{aligned} b_i^k \, {:=}\,{}&-\int _{\varGamma _k} \varphi _i\,\min \{0, \varvec{v}\cdot \varvec{n}\}\,\textrm{d} s. \end{aligned}$$

Note that \(b_i^k\ge 0\). We may also write (2.10) in the bar state form [13, 16, 21]

$$\begin{aligned} m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t} = \sum _{j\in {\mathcal {N}}_i\setminus \{i\}}2d_{ij}(\bar{u}_{ij}- u_i) + \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k (\hat{u}_i^k - u_i), \qquad i\in \{1,\ldots ,N\}, \end{aligned}$$
(2.11)

where the bar states \(\bar{u}_{ij}\) are defined by

$$\begin{aligned} \bar{u}_{ij}= {\left\{ \begin{array}{ll}\displaystyle \frac{u_i+u_j}{2} - \frac{a_{ij}(u_j-u_i)}{2d_{ij}} &{}\text {if } a_{ij}\ne 0,\\ \displaystyle \frac{u_i+u_j}{2} &{}\text {if } a_{ij}= 0, \end{array}\right. }\qquad i\in \{1,\ldots ,N\},~j \in {\mathcal {N}}_i \setminus \{i\}. \end{aligned}$$
(2.12)

Remark 3

Definition (2.9) ensures that \(\bar{u}_{ij}\) is a convex combination of \(u_i\) and \(u_j\) because

$$\begin{aligned} \min \{u_i,u_j\} \le \bar{u}_{ij}\le \max \{u_i,u_j\} \qquad \Leftrightarrow \qquad |a_{ij}| \le d_{ij}. \end{aligned}$$

Instead of (2.9), the classical version of the discrete upwinding method uses [25]

$$\begin{aligned} d_{ij}= \max \{a_{ij}, 0, a_{ji}\}, \qquad i\in \{1,\ldots ,N\},~j\in {{\mathcal {N}}_i\setminus \{i\}}. \end{aligned}$$
(2.13)

If \(\nabla \cdot \varvec{v}=0\), this definition is equivalent to (2.9), unless both nodes \(\varvec{x}_i\) and \(\varvec{x}_j\) lie on \(\partial \varOmega \). This fact follows from integration by parts and omission of the resulting boundary integral. The validity of discrete maximum principles for nodal values can be shown for (2.13) using alternative proof techniques [27, Sec. 4.3.2]. However, individual bar states \(\bar{u}_{ij}\) of the discrete upwinding method based on (2.13) may violate the local maximum principle \(\min \{u_i,u_j\} \le \bar{u}_{ij}\le \max \{u_i,u_j\}\).

2.3 Monolithic convex limiting

The low order method (2.10) produces very diffusive approximations. To recover the accuracy of the standard finite element discretization (2.8), we perform algebraic flux correction. First, we define raw antidiffusive fluxes \(f_{ij}=m_{ij}(\dot{u}_i - \dot{u}_j) + d_{ij}(u_i-u_j)\) for \(i\in \{1,\ldots ,N\}\), \(j \in {\mathcal {N}}_i \setminus \{i\}\) and their limited counterparts \(f_{ij}^*\), which are specified below. Here \(\dot{u}_h =\sum _{i=1}^N\dot{u}_i \varphi _i\) is a suitable approximation to the time derivative \(\frac{\,\textrm{d}u_h}{\,\textrm{d}t}\). Following [21], we employ the low order nodal values

$$\begin{aligned} \dot{u}_i = \frac{1}{m_i}\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}(d_{ij}- a_{ij})(u_j - u_i) + \frac{1}{m_i}\sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k (\hat{u}_i^k - u_i), \qquad i\in \{1,\ldots ,N\}, \end{aligned}$$
(2.14)

to compute \(\dot{u}_h\) in practice. This approach can be interpreted as a modification of the target scheme corresponding to the standard continuous Galerkin discretization that otherwise exhibits a suboptimal first order convergence rate [31, Sec. 14.3.1]. As illustrated in Sect. 5.2, the use of low order time derivatives \(\dot{u}_i\) (instead of their consistent Galerkin counterparts defined by (5.2) below) also has a stabilizing effect on the overall approximation. This approach was proposed in the original publication on MCL schemes [21] and corresponds to a cheap and effective target scheme. Advanced stabilization techniques for higher-order methods applied to linear and nonlinear hyperbolic problems can be found in [28] and [23], respectively. By the above definition of raw antidiffusive fluxes, we have \(f_{ij}=-f_{ji}\). To preserve the conservation property of the limited scheme, we enforce the corresponding constraint \(f_{ij}^*=-f_{ji}^*\) for the limited antidiffusive fluxes as is common in the AFC methodology [25]. Using the equivalence of formulations (2.10) and (2.11), we obtain a similar bar state form for the semi-discrete flux correction scheme [21]

$$\begin{aligned} m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t} ={}&\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}[(d_{ij}- a_{ij})(u_j - u_i) + f_{ij}^*] + \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k (\hat{u}_i^k - u_i) \end{aligned}$$
(2.15a)
$$\begin{aligned} ={}&\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}2d_{ij}(\bar{u}_{ij}^* - u_i) + \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k (\hat{u}_i^k - u_i), \qquad i\in \{1,\ldots ,N\}, \end{aligned}$$
(2.15b)

where the limited bar states are defined by [21]

$$\begin{aligned} \bar{u}_{ij}^* \, {:=}\, \bar{u}_{ij}+ \frac{f_{ij}^*}{2d_{ij}}. \end{aligned}$$

Thus, a forward Euler update for (2.15) reads

$$\begin{aligned} \tilde{u}_i = \left[ 1 - \frac{\varDelta t}{m_i} \left( \sum _{j\in {\mathcal {N}}_i\setminus \{i\}}2d_{ij}+ \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k \right) \right] u_i + \frac{\varDelta t}{m_i}\left( \sum _{j\in {\mathcal {N}}_i\setminus \{i\}}2d_{ij}\bar{u}_{ij}^* + \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k \hat{u}_i^k\right) , \end{aligned}$$

where \(\varDelta t\) is the time step. Hence, the updated solution \(\tilde{u}_i\) is a convex combination of \(u_i\), the \(\bar{u}_{ij}^*\), and the \(\hat{u}_i^k\), provided that the Courant–Friedrichs–Lewy (CFL) condition

$$\begin{aligned} \varDelta t \le \min _{i\in \{1,\ldots ,N\}}\frac{m_i}{\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}2d_{ij}+ \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k} \end{aligned}$$
(2.16)

is satisfied. In other words, if (2.16) holds, the forward-Euler updated solution \(\tilde{u}_i\) preserves all local bounds that these states are constrained by. This argument made here for a forward Euler step directly carries over to p-stage, pth-order accurate strong-stability-preserving Runge–Kutta (SSPp-RK) methods [12, 34], where \(p\in \{1,2,3\}\).

In the process of flux correction, we enforce the local maximum principles

$$\begin{aligned} u_i^{\min } \le \bar{u}_{ij}^*\le u_i^{\max },\qquad u_i^{\min }\, {:=}\,\min _{j\in {\mathcal {N}}_i} u_j, \qquad u_i^{\max } \, {:=}\,\max _{j\in {\mathcal {N}}_i} u_j \end{aligned}$$
(2.17)

in addition to skew symmetry of antidiffusive fluxes. Rearranging these constraints, we obtain Kuzmin’s formula for the limited antidiffusive fluxes of his monolithic convex limiter [21]

$$\begin{aligned} f_{ij}^* = {\left\{ \begin{array}{ll} \min \{f_{ij}, 2d_{ij}u_i^{\max } - \bar{w}_{ij}, \bar{w}_{ji} - 2d_{ij}u_j^{\min }\} &{} \text {if } f_{ij}\ge 0,\\ \max \{f_{ij}, 2d_{ij}u_i^{\min } - \bar{w}_{ij}, \bar{w}_{ji} - 2d_{ij}u_j^{\max }\} &{} \text {if } f_{ij}\le 0, \end{array}\right. } \end{aligned}$$
(2.18)

where \(\bar{w}_{ij}\, {:=}\,2d_{ij}\bar{u}_{ij}\).

Lemma 1

(Conservation property of the MCL scheme, [21, 25, 33]) The semi-discrete scheme (2.15) in which \(f_{ij}^* = -f_{ji}^*\) is conservative in the following sense

$$\begin{aligned} \frac{\,\textrm{d}}{\,\textrm{d}t} \int _\varOmega u_h \,\textrm{d}\varvec{x}= - \int _\varOmega \varvec{v}\cdot \nabla u_h \,\textrm{d}\varvec{x}- \sum _{i=1}^N \sum _{\varGamma _k \in {\mathcal {F}}_i} \int _{\varGamma _k} \varphi _i \,(\hat{u}_i^k - u_i) \min \{0, \varvec{v}\cdot \varvec{n}\} \,\textrm{d} s. \end{aligned}$$
(2.19)

Proof

Summing over all degrees of freedom, we exploit the symmetry of diffusion coefficients \(d_{ij}\), skew symmetry of antidiffusive fluxes \(f_{ij}^*\), and the zero row sum property of matrix A. \(\square \)

Remark 4

Continuous weak solutions u defined by (2.4) satisfy the conservation relation

$$\begin{aligned} \frac{\,\textrm{d}}{\,\textrm{d}t} \int _\varOmega u \,\textrm{d}\varvec{x}= - \int _\varOmega \varvec{v}\cdot \nabla u \,\textrm{d}\varvec{x}-\int _{\varGamma _-} (\hat{u} - u) \varvec{v}\cdot \varvec{n}\,\textrm{d} s. \end{aligned}$$
(2.20)

Thus, (2.19) is a semi-discrete counterpart of (2.20) that accounts for the flux-lumped quadrature rule used in the AFC setting.

Let us now rewrite the bar state form (2.15b) of the semi-discrete MCL scheme in a formulation that is more amenable to theoretical investigations. Despite the fact that using MCL, the fluxes \(f_{ij}^*\) can be calculated directly via (2.18), we introduce correction factors \(\alpha _{ij}(u_h)=\alpha _{ji}(u_h)\in [0,1]\) defined by \(\alpha _{ij}(u_h)=f_{ij}^*/f_{ij}\) if \(f_{ij}\ne 0\) and \(\alpha _{ij}(u_h)=1\) otherwise. The dependence of correction factors on the discrete solution makes AFC schemes nonlinear. Using the above definition of \(f_{ij}\), the semi-discrete MCL scheme (2.15) reads

$$\begin{aligned} m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t} ={}&\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}[(1-\alpha _{ij}(u_h))\,d_{ij}(u_j-u_i) - a_{ij}(u_j-u_i) + \alpha _{ij}(u_h)\,m_{ij}(\dot{u}_i - \dot{u}_j)] \nonumber \\&+ \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k (\hat{u}_i^k - u_i), \qquad i\in \{1,\ldots ,N\}, \end{aligned}$$
(2.21)

and can equivalently be written as

$$\begin{aligned} \sum _{i=1}^N w_i m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t} + a_h(u_h,w_h) + d_h(u_h;u_h,w_h) - m_h(u_h;\dot{u}_h,w_h) = b_h(w_h) \end{aligned}$$
(2.22)

for all \(w_h \in {\textrm{V}}_h\) given by \(w_h=\sum _{j=1}^N w_j\varphi _j\). The bilinear and linear forms

$$\begin{aligned} a_h(u_h,w_h) \, {:=}\,{}&\int _\varOmega w_h\,\varvec{v}\cdot \nabla u_h \,\textrm{d}\varvec{x}- \sum _{i=1}^N w_i\, u_i \int _{\varGamma _-} \varphi _i \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s,\\ b_h(w_h) \, {:=}\,{}&-\sum _{i=1}^N w_i \sum _{\varGamma _k\in \mathcal F_i}\hat{u}_i^k \int _{\varGamma _k} \varphi _i \,\min \{0, \varvec{v}\cdot \varvec{n}\}\,\textrm{d} s\end{aligned}$$

are associated with the (stabilized) Galerkin finite element discretization corresponding to \(\alpha _{ij}= 1\) for all \(i\in \{1,\ldots ,N\}\), \(j \in {\mathcal {N}}_i \setminus \{i\}\). The nonlinear forms [5, 18, 27]

$$\begin{aligned} d_h(u_h;v_h,w_h) ={}&\sum _{i=1}^N w_i\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}(1-\alpha _{ij}(u_h))\,d_{ij}(v_i - v_j), \end{aligned}$$
(2.23)
$$\begin{aligned} m_h(u_h;v_h,w_h) ={}&\sum _{i=1}^N w_i\sum _{j\in {\mathcal {N}}_i\setminus \{i\}}\alpha _{ij}(u_h)\,m_{ij}(v_i - v_j) \end{aligned}$$
(2.24)

in (2.22) are due to algebraic flux correction.

Lemma 2

(Scalar product properties of nonlinear forms, [5]) For arbitrary \(u_h, v_h, w_h \in {\textrm{V}}_h\), the nonlinear forms (2.23) and (2.24) satisfy

$$\begin{aligned}&d_h(u_h; v_h,v_h) \ge 0,\qquad d_h(u_h; v_h,w_h)^2 \le {} d_h(u_h; v_h,v_h)\,d_h(u_h; w_h,w_h), \\&\quad m_h(u_h; v_h,v_h) \ge 0,\qquad m_h(u_h; v_h,w_h)^2 \le {} m_h(u_h; v_h,v_h)\,m_h(u_h; w_h,w_h). \end{aligned}$$

Proof

Proofs of these statements for \(d_h(\cdot ;\cdot ,\cdot )\) can be found in [27, p. 113], see also [5, Lem. 3.1 and Sec. 6]. The same arguments apply to \(m_h(\cdot ;\cdot ,\cdot )\). \(\square \)

3 Energy estimate

Let us now derive an energy estimate for approximations obtained via (2.22). In the proof of this stability result, we rely on the assumption that the following requirement is satisfied.

Definition 3

(Compatibility condition, [15, Ineq. (3.16)]) Let \(\dot{u}_h,u_h\in {\textrm{V}}_h\) be given functions and \(\lambda \, {:=}\, \Vert \varvec{v}\Vert _{\varvec{{\textrm{L}}}^{\infty }(\varOmega \times {\mathbb {R}}_+)^d}\) be the maximum velocity. Define the nonlinear forms \(d_h(\cdot ; \cdot , \cdot )\) and \(m_h(\cdot ;\cdot ,\cdot )\) as in (2.23) and (2.24), respectively. Suppose that there exists a constant \(\gamma \in (0,1)\) such that

$$\begin{aligned} \frac{\gamma h}{\lambda } m_h(u_h;\dot{u}_h,\dot{u}_h) \le (1- \gamma ) d_h(u_h; u_h, u_h)-m_h(u_h;\dot{u}_h,u_h). \end{aligned}$$
(3.1)

Then we say that \(\dot{u}_h\in {\textrm{V}}_h\) is compatible with \(u_h\in {\textrm{V}}_h\).

The ratio \(h/\lambda \) has physical units \([h]/[\lambda ]=\textrm{m}/(\textrm{m}\textrm{s}^{-1})=\textrm{s}\). It is used in inequality 3.1 to ensure that all terms have the same units for \([\dot{u}_h]=\textrm{s}^{-1}[u_h]\). Note that if we set \(w_h=u_h\) in (2.22), the two nonlinear forms contained therein coincide with the right hand side of (3.1) plus the nonnegative remainder \(\gamma \, d_h(u_h; u_h, u_h)\). Due to Lemma 2, we can bound these terms below by a positive number if \((u_h,\dot{u}_h)\) is a compatible pair. This argument is our main motivation for relying on (3.1) for theoretical purposes. Clearly, the pair \((u_h,0)\) satisfies (3.1). Thus if we do not compensate the mass lumping error in the process of limiting, the scheme automatically satisfies (3.1). As illustrated in Sect. 5.4, the standard MCL scheme using low order time derivatives (2.14) for stabilization purposes is also prone to producing compatible pairs. In [15, Sec. 3.3] we present a modified MCL procedure with which (3.1) can be guaranteed. Due to the complicated nature of this approach we chose not to discuss it any further in this work.

Before presenting our energy estimate, we need to prove the following technical result.

Lemma 3

Any function \(v_h \in {\textrm{V}}_h\) defined by \(v_h = \sum _{i=1}^N v_i\varphi _i\) satisfies the identity

$$\begin{aligned} v_h^2 - \sum _{i=1}^N v_i^2 \varphi _i = -\sum _{\begin{array}{c} \\ {i,j=1}{i < j} \end{array}}^N (v_i - v_j)^2\varphi _i \,\varphi _j. \end{aligned}$$

Proof

Invoking the partition of unity property of basis functions, we obtain

$$\begin{aligned} v_h^2 - \sum _{i=1}^N v_i^2 \,\varphi _i ={}&\sum _{i=1}^N v_i^2 \varphi _i\,(\varphi _i-1) + \sum _{\begin{array}{c} \\ {i,j=1}{i\ne j} \end{array}}^N v_i\,v_j \, \varphi _i\,\varphi _j = -\sum _{\begin{array}{c} \\ {i,j=1}{i\ne j} \end{array}}^N v_i^2 \varphi _i\, \varphi _j \\&\quad + \sum _{\begin{array}{c} \\ {i,j=1}{i\ne j} \end{array}}^N v_i\,v_j \varphi _i\,\varphi _j \\ ={}&\sum _{\begin{array}{c} \\ {i,j=1}{i< j} \end{array}}^N v_i\,(v_j-v_i)\,\varphi _i \,\varphi _j +\sum _{\begin{array}{c} \\ {i,j=1}{j< i} \end{array}}^N v_i\,(v_j-v_i)\, \varphi _i \,\varphi _j \\ ={}&\sum _{\begin{array}{c} \\ {i,j=1}{i < j} \end{array}}^N (v_i - v_j)(v_j-v_i)\,\varphi _i \,\varphi _j. \end{aligned}$$

\(\square \)

Proposition 1

(Semi-discrete energy estimate) Assume that there is a finite time \(T>0\) such that \(\varvec{v}(\cdot ,t) \in \varvec{\mathrm W}^{1,\infty }(\varOmega )\) and \(\nabla \cdot \varvec{v}(\cdot ,t) = 0\) in \(\varOmega \) for all \(t\in (0,T)\). Let \(u_h(t)\) and \(\dot{u}_h(t)\) satisfy (2.22) and, additionally, the compatibility condition (3.1) with a constant \(\gamma \in (0,1)\) for all \(t\in (0,T)\). Then the following estimate holds for the solution \(u_h(T)\) of the semi-discrete problem (2.22)

$$\begin{aligned} \sum _{i=1}^N m_i \,u_i(T)^2&+\int _0^T\int _{\varGamma _+} u_h^2\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\,\textrm{d} t-\int _0^T \sum _{\begin{array}{c} \\ {i,j=1}{i < j} \end{array}}^N (u_i-u_j)^2 \int _{\varGamma _-} \varphi _i \,\varphi _j\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\nonumber \\ {}- \frac{1}{2} \int _0^T \sum _{i=1}^N&u_i^2 \int _{\varGamma _-} \varphi _i\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ 2\gamma \int _0^T \big [ \frac{h}{\lambda }m_h(u_h;\dot{u}_h,\dot{u}_h) + d_h(u_h;u_h,u_h) \big ] \,\textrm{d} t\nonumber \\&\quad \le \sum _{i=1}^N m_i \, u_i(0)^2 + 2 \int _0^T\sum _{i=1}^N \sum _{\varGamma _k\in {\mathcal {F}}_i} \int _{\varGamma _k} \varphi _i \,(\hat{u}_i^k)^2 \,\max \{0, -\varvec{v}\cdot \varvec{n}\}\,\textrm{d} s\,\textrm{d} t. \end{aligned}$$
(3.2)

Proof

Testing (2.22) with \(w_h=u_h\), we use the compatibility condition (3.1), the identity \(u_h\,\varvec{v}\cdot \nabla u_h = \frac{1}{2}\nabla \cdot (\varvec{v}\,u_h^2)\), the divergence theorem and Young’s inequality to show that

$$\begin{aligned} \frac{1}{2}\sum _{i=1}^N m_i \frac{\,\textrm{d}(u_i)^2}{\,\textrm{d}t}&{}+ \frac{1}{2} \int _{\partial \varOmega } u_h^2\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \sum _{i=1}^N u_i^2 \int _{\varGamma _-} \varphi _i \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\&+ \frac{\gamma h}{\lambda }m_h(u_h;\dot{u}_h,\dot{u}_h) + \gamma d_h(u_h;u_h,u_h)\\ \le {}&\sum _{i=1}^N u_i\,m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t} + \int _\varOmega u_h \,\varvec{v}\cdot \nabla u_h\,\textrm{d}\varvec{x}- \sum _{i=1}^N u_i^2 \int _{\varGamma _-} \varphi _i \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\&{} +d_h(u_h;u_h,u_h) - m_h(u_h;\dot{u}_h,u_h)\\ ={}&b_h(u_h) = -\sum _{i=1}^N \sum _{\varGamma _k\in {\mathcal {F}}_i}\int _{\varGamma _k} \varphi _i \,u_i\, \hat{u}_i^k \,\min \{0, \varvec{v}\cdot \varvec{n}\}\,\textrm{d} s\\ \le {}&-\sum _{i=1}^N \frac{u_i^2}{4}\int _{\varGamma _-}\varphi _i \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \sum _{i=1}^N \sum _{\varGamma _k\in \mathcal F_i}\int _{\varGamma _k} \varphi _i \,(\hat{u}_i^k)^2 \,\min \{0, \varvec{v}\cdot \varvec{n}\}\,\textrm{d} s. \end{aligned}$$

Multiplying by factor 2 and combining the integrals over \(\varGamma _-\), we write this inequality as

$$\begin{aligned} \sum _{i=1}^N&m_i \frac{\,\textrm{d}(u_i)^2}{\,\textrm{d}t} + \int _{\varGamma _+} u_h^2\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ \int _{\varGamma _-} \varvec{v}\cdot \varvec{n}\,\left( u_h^2 - \sum _{i=1}^N u_i^2 \,\varphi _i \right) \,\textrm{d} s\\&\qquad -\frac{1}{2} \sum _{i=1}^N u_i^2 \int _{\varGamma _-} \varphi _i \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ \frac{2\gamma h}{\lambda }m_h(u_h;\dot{u}_h,\dot{u}_h) + 2\gamma d_h(u_h;u_h,u_h) \\&\quad \le 2\sum _{i=1}^N \sum _{\varGamma _k\in \mathcal F_i}\int _{\varGamma _k} \varphi _i \,(\hat{u}_i^k)^2 \,\max \{0, -\varvec{v}\cdot \varvec{n}\}\,\textrm{d} s. \end{aligned}$$

Employing Lemma 3 and integrating in time produces (3.2). \(\square \)

Note that as a consequence of Lemma 2 and of the nonnegativity of basis functions, all terms appearing on the left hand side of inequality (3.2) are nonnegative. To guarantee that the assumptions of Proposition 1 are satisfied in practice, one can use the scheme proposed in [15, Sec. 3.3], which enforces (3.1) for user defined values of \(\gamma \). In our experience, failure to apply this limiter has no negative practical effects, however.

Remark 5

The reader may wonder what significance is attached to Proposition 1. Since the fully discrete MCL scheme produces locally bound-preserving approximations, it is stable by design. Preservation of global bounds in the semi-discrete setting can be shown as in [24] under the assumption that a solution exists. The semi-discrete MCL scheme represents a nonlinear system of ordinary differential equations. Well-posedness of such initial value problems can be shown by invoking the Picard–Lindelöf theorem, which guarantees the existence of solutions on finite time intervals. Once local existence is established, we exploit a global existence and uniqueness result for ordinary differential equations [2, Thm. 7.6]. According to this theorem, solutions that cannot be extended to arbitrary times must in fact blow up, which, in our case, is prevented by Proposition 1. It follows that the semi-discrete MCL scheme (2.22) possesses a unique solution that exists for all times \(t\ge 0\).

4 Error analysis

Compared to the energy estimate derived in the previous section, our error analysis is rather involved. In particular, we need to make additional assumptions on the data of the continuous problem (2.4) as well as on the mesh sequences. These aspects are discussed in Sect. 4.1. Subsequently, in Sect. 4.2, we recall some auxiliary results from the literature on numerical analysis of finite element methods including AFC schemes. Finally, in Sect. 4.3, we state, prove, and discuss the main result of this work, which is a semi-discrete a priori error estimate for MCL approximations.

Throughout this section, the letter C (possibly with a subscript) denotes a generic positive constant that is independent of the mesh size h. Moreover, we assume that \(h\le 1\) and therefore \(h^p \le h^q\) for \(p\ge q\).

4.1 Preliminaries

Recall that we only consider meshes that are affine and geometrically conforming triangulations of \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\). Additionally, we restricted ourselves to simplicial meshes, which allows us to exploit the linearity of finite element approximations inside mesh cells.

The a priori error estimate that we present in Sect. 4.3 is valid only for quasi-uniform families of meshes, i. e., there has to exist \(C>0\) such that [8, Sec. 3.1.2]

$$\begin{aligned} h\, {:=}\,\max _{K\in {\mathcal {K}}_h} h_K \le C \min _{K\in {\mathcal {K}}_h} h_K, \end{aligned}$$

where \(h_K={{\,\textrm{diam}\,}}(K)\). As is standard in finite element analysis, we also assume shape-regularity of \((\mathcal K_h)_{h>0}\). For this requirement to be satisfied, there has to exist \(C>0\) such that \(C h_K \le r_K\), where \(r_K\) is the radius of the largest open ball that fits into K [8, Sec. 1.4.1]. Additionally, we assume that the mesh faces, which are simplices in \({\mathbb {R}}^{d-1}\), are also shape regular in this sense. Our final assumption regarding the mesh sequence is that there exists \(C>0\) such that \(h \le C \tilde{h}\), where \(\tilde{h}=\min _{\varGamma \in {\mathcal {F}}_{\partial \varOmega }} {{\,\textrm{diam}\,}}(\varGamma )\) and \({\mathcal {F}}_{\partial \varOmega }\) is the set of boundary faces (cf.  Definition 2). We do not need to assume contact regularity of the mesh sequence [8, Def. 1.38] as Di Pietro and Ern do because this requirement is automatically satisfied for simplicial triangulations.

Following [5, 27], we assume \(\mathrm H^2(\varOmega )\) regularity of the exact solution \(u(\cdot ,t)\) for all \(t\ge 0\). We also require the time derivative \(\partial _t u\) to have this regularity. Specifically, we restrict our investigations to exact solutions of (2.4) that satisfy

$$\begin{aligned} u \in {\textrm{W}}^{1,\infty }({\mathbb {R}}_+;\mathrm H^2(\varOmega )), \qquad { \left. u \right| _{\varGamma _-} } \in {\textrm{L}}^\infty ({\mathbb {R}}_+;\mathrm H^2(\varGamma _-)). \end{aligned}$$

For simplicity, we set \(u_h(\cdot ,0)\) equal to the continuous interpolant \(\mathrm I_hu_0 \in {\textrm{V}}_h\) of \(u_0\in {\textrm{C}}(\overline{\varOmega })\). The interpolation operator \(\mathrm I_h: {\textrm{C}}(\overline{\varOmega }) \rightarrow {\textrm{V}}_h\) is defined by

$$\begin{aligned} w \mapsto {}&w_h \, {:=}\, \sum _{i=1}^N w(\varvec{x}_i)\,\varphi _i. \end{aligned}$$

Also for simplicity, we assume that the boundary data \(\hat{u}\) is linear on every boundary face \(\varGamma \in {\mathcal {F}}_{-}\), where \({\mathcal {F}}_{-}= {\mathcal {F}}_{-}(t) \, {:=}\, \{\varGamma \in {\mathcal {F}}_{\partial \varOmega }:\varGamma \cap \varGamma _- \ne \emptyset \}\). This assumption corresponds to a particular choice of the quadrature rule for boundary integrals.

4.2 Auxiliary statements

To prepare the ground for the derivation of our error estimate, we first summarize a few important ingredients of its proof, beginning with some standard inequalities. Then we discuss aspects that are peculiar to algebraic flux correction schemes. Most of the AFC results were originally proven by Barrenechea et al. [5].

Lemma 4

(Interpolation error estimate for volume integrals) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\). Then there exists \(C>0\) such that

$$\begin{aligned} \Vert w - \mathrm I_hw\Vert _{{\textrm{L}}^2(\varOmega )} + h |w - \mathrm I_hw|_{\mathrm H^1(\varOmega )} \le Ch^2|w|_{\mathrm H^2(\varOmega )} \qquad \forall w\in \mathrm H^2(\varOmega ). \end{aligned}$$

Proof

See [10, Sec. 1.5.1, in particular Ex. 1.111]. \(\square \)

Lemma 5

(Interpolation error estimate for face integrals) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\) and let \(\varGamma \subset \partial K\) be a face of \(K\in {\mathcal {K}}_h\). Then there exists \(C>0\) such that

$$\begin{aligned} \Vert w - \mathrm I_hw\Vert _{{\textrm{L}}^2(\varGamma )} \le C h_K^{3/2} |w|_{\mathrm H^2(K)} \qquad \forall w \in {\textrm{H}}^2(K). \end{aligned}$$

Proof

The claim follows from the continuous trace inequality [8, Lem. 1.49] in combination with Lemma 4. \(\square \)

Lemma 6

(Discrete trace inequality, [8] Lem. 1.46) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\) and let \(\varGamma \subset \partial K\) be a face of \(K\in {\mathcal {K}}_h\). Then there exists \(C>0\) such that

$$\begin{aligned} \Vert v_h\Vert _{{\textrm{L}}^2(\varGamma )} \le C h_K^{-1/2} \Vert v_h\Vert _{{\textrm{L}}^2(K)} \qquad \forall v_h \in {\mathbb {P}}_1(K). \end{aligned}$$

Lemma 7

(Inverse inequality, [8] Lem. 1.44) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\) and let \(K\in {\mathcal {K}}_h\). Then there exists \(C>0\) such that

$$\begin{aligned} |v_h|_{{\textrm{H}}^1(K)} \le Ch_K^{-1} \Vert v_h\Vert _{{\textrm{L}}^2(K)} \qquad \forall v_h \in {\mathbb {P}}_1(K). \end{aligned}$$

Lemma 8

([5]) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\). Define \(\varGamma _{ij} \, {:=}\, \{\mu \varvec{x}_i + (1-\mu )\varvec{x}_j:\mu \in [0,1]\}\) for a pair of mesh vertices \((\varvec{x}_i,\varvec{x}_j)\) \(i\in \{1,\ldots ,N\}\), \(j \in {\mathcal {N}}_i \setminus \{i\}\). Let \(K\in {\mathcal {K}}_h\) with \(\varGamma _{ij} \subset \partial K\). Then there exists \(C>0\) such that

$$\begin{aligned} |v_h(\varvec{x}_i) - v_h(\varvec{x}_j)| \le C h_K^{1 - d/2} |v_h|_{{\textrm{H}}^1(K)} \qquad \forall v_h \in {\mathbb {P}}_1(K). \end{aligned}$$

Proof

The claim follows from a Taylor expansion, linearity, and shape regularity, see [5, Pf. of Lem. 7.3] or [27, Ineq. (4.90)] for details. \(\square \)

Lemma 9

([5, 18]) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\). Then there exist constants \(C_1=C_1(d)>0\) and \(C_2=C_2(d,\varvec{v})>0\) such that

$$\begin{aligned} m_{ij}\le C_1 h^d,\qquad d_{ij}\le C_2 h^{d-1}, \qquad i\in \{1,\ldots ,N\},~j \in {\mathcal {N}}_i \setminus \{i\}. \end{aligned}$$

Proof

Clearly, \({{\,\textrm{supp}\,}}(\varphi _i\varphi _j) \subseteq \varOmega _{ij} \, {:=}\, \{\varvec{x}\in \overline{\varOmega }: \exists \mu \in [0,1]: |\varvec{x}-(\mu \varvec{x}_i + (1-\mu )\varvec{x}_j)| \le h\}\), and due to shape regularity, there exists \(C=C(d)>0\) such that \(|\varOmega _{ij}| \le Ch^d\). Therefore

$$\begin{aligned} m_{ij}= \int _{\varOmega _{ij}} \varphi _i \,\varphi _j \,\textrm{d}\varvec{x}\le \Vert \varphi _i\Vert _{{\textrm{L}}^2(\varOmega _{ij})} \Vert \varphi _j\Vert _{{\textrm{L}}^2(\varOmega _{ij})} \le \Vert 1\Vert _{{\textrm{L}}^2(\varOmega _{ij})}^2 = |\varOmega _{ij}|\le C h^d. \end{aligned}$$

The estimate for \(d_{ij}\) is obtained similarly by invoking (2.9), factoring out the maximum velocity \(\lambda \) and using the inverse inequality, i. e., Lemma 7, see [5, Pf. of Lem. 7.3] or [27, Pf. of Thm. 4.72] for details. \(\square \)

Lemma 10

([5]) Let \(({\mathcal {K}}_h)_{h>0}\) be a shape-regular family of meshes over \(\varOmega \subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\). Then there exist constants \(C_1=C_1(d)>0\) and \(C_2=C_2(d,\varvec{v})>0\) such that

$$\begin{aligned} m_h(v_h;\mathrm I_hw,\mathrm I_hw) \le C_1 h^2 \Vert w\Vert _{{\textrm{H}}^2(\varOmega )}^2, \qquad d_h(v_h;\mathrm I_hw,\mathrm I_hw) \le C_2 h \Vert w\Vert _{{\textrm{H}}^2(\varOmega )}^2 \end{aligned}$$

for all \(v_h \in {\textrm{V}}_h\), \(w\in {\textrm{H}}^2(\varOmega )\).

Proof

The estimate for \(d_h(\cdot ;\cdot ,\cdot )\) is derived in [27, Ineq. (4.122)] by invoking Lemma 4, 8, and 9, see also [5, Lem. 3.1]. The estimate for \(m_h(\cdot ;\cdot ,\cdot )\) is obtained similarly.

\(\square \)

4.3 A priori error estimate

To state our main result, we need to define some auxiliary quantities. For \(t\ge 0\), let \(\vartheta _h(t)=\sum _{i=1}^N \vartheta _i(t)\varphi _i\in {\textrm{V}}_h\) be the discrete error \(\vartheta _h(t) \, {:=}\, \mathrm I_hu(t) - u_h(t)\) and define

$$\begin{aligned} q(T) \, {:=}\,{}&\sum _{\begin{array}{c} \\ {i,j=1}{i<j} \end{array}}^N m_{ij} (\vartheta _i(T) - \vartheta _j(T))^2 + \int _0^T \bigg [ \int _{\varGamma _+} \vartheta _h^2 \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \sum _{i=1}^N \vartheta _i^2 \int _{\varGamma _-}\varphi _i\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\&{} - \sum _{\begin{array}{c} \\ {i,j=1}{i < j} \end{array}}^N(\vartheta _i - \vartheta _j)^2 \int _{\varGamma _-} \varphi _i\,\varphi _j\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ \gamma d_h(u_h;u_h,u_h) + \frac{\gamma h}{\lambda } m_h(u_h;\dot{u}_h, \dot{u}_h) \bigg ]\,\textrm{d} t, \\ z(T) \, {:=}\,{}&\int _0^T \left[ \Vert \partial _t u\Vert ^2_{{\textrm{H}}^2(\varOmega )} + \Vert u\Vert ^2_{{\textrm{H}}^2(\varOmega )} + \Vert u\Vert _{{\textrm{H}}^2(\varGamma _-)}^2 + |\hat{u}|^2_{{\textrm{H}}^1(\varGamma _-)} \right] \,\textrm{d} t. \end{aligned}$$

Proposition 2

(Semi-discrete a priori error estimate) Let the assumptions made in Sect. 4.1 be satisfied. Assume that there is a finite time \(T>0\) such that \(\varvec{v}(\cdot ,t) \in \varvec{{\textrm{W}}}^{1,\infty }(\varOmega )\) and \(\nabla \cdot \varvec{v}(\cdot ,t) = 0\) in \(\varOmega \) for all \(t\in (0,T)\). Let \(u_h(t)\) and \(\dot{u}_h(t)\) satisfy (2.22) and, additionally, the compatibility condition (3.1) with a constant \(\gamma \in (0,1)\) independent of h for all \(t\in (0,T)\). Then there exist positive constants \(C_1=C_1(d,\varvec{v})\), \(C_2=C_2(d,\varvec{v},\gamma )\), and \(C_3=C_3(d)\) such that the estimate

$$\begin{aligned} \Vert u(T) - u_h(T)\Vert _{{\textrm{L}}^2(\varOmega )} \le C_3 h^2 |u(T)|_{{\textrm{H}}^2(\varOmega )} + \sqrt{y(T) + C_1\int _0^T {\textrm{e}}^{C_1(T-t)}\,y(t) \,\textrm{d} t} \end{aligned}$$
(4.1)

holds for the exact solution u(T) of the continuous problem (2.4), the exact solution \(u_h(T)\) of the semi-discrete problem (2.22), and \(y(T) \, {:=}\, C_2 h\, z(T) - q(T)\).

Remark 6

We do not see any practical problems if (3.1) is invalid, only the theoretical results would no longer apply.

Corollary 1

(Convergence order of the semi-discrete MCL scheme) Under the assumptions of Proposition 2, the a priori error estimate

$$\begin{aligned} \Vert u(T) - u_h(T)\Vert _{{\textrm{L}}^2(\varOmega )} \le C_3h^2|u(T)|_{{\textrm{H}}^2(\varOmega )} + \sqrt{{\textrm{e}}^{C_1T}\,C_2 h\, \Vert z\Vert _{{\textrm{L}}^\infty (0,T)}} \le C_4\,h^{\frac{1}{2}} \end{aligned}$$
(4.2)

holds with a constant \(C_4=C_4(C_1,C_2,C_3,T,u,\hat{u})>0\), which behaves as e\(^{C_1T/2}\).

Proof

(of Corollary 1) Since q and z are nonnegative functions, we may use the estimate \(y(T) \le C_2 h\,z(T)\) in (4.1). The claim follows by calculating the integral of the exponential function. \(\square \)

Proof

(of Proposition 2) This proof combines recent results on AFC schemes [5, 27] with a new way of proving a priori error estimates for nonconforming discretizations of the advection equation [32]. A particular similarity of the approach developed in [32] to our theory is that both apply to semi-discrete formulations.

We introduce the interpolation error \(\varTheta (t)= \varTheta (u,h;t) \, {:=}\, u(t) -\mathrm I_hu(t)\) and subtract (2.22) from (2.4). Setting \(w=w_h=\vartheta _h\), we obtain the error equation

$$\begin{aligned} \overbrace{\int _\varOmega \vartheta _h \frac{\partial u}{\partial t} \,\textrm{d}\varvec{x}- \sum _{i=1}^N \vartheta _i\,m_i \frac{\,\textrm{d}u_i}{\,\textrm{d}t}}^{\varXi _1}&{}+ \overbrace{a(u,\vartheta _h) - a_h(u_h,\vartheta _h)}^{\varXi _2}\\ \\ ={}&\underbrace{b(\vartheta _h) - b_h(\vartheta _h)}_{\varXi _3} + \underbrace{d_h(u_h;u_h,\vartheta _h) - m_h(u_h;\dot{u}_h,\vartheta _h)}_{\varXi _4}. \end{aligned}$$

Recall that the identity \(m_i=\sum _{j=1}^N m_{ij}\) holds for row sum mass lumping. Using this decomposition of \(m_i\) and the identities \(u = \varTheta + \vartheta _h + \mathrm I_hu - \vartheta _h,\ u_h = \mathrm I_hu - \vartheta _h\), we find that

$$\begin{aligned} \varXi _1 ={}&\int _{\varOmega } \vartheta _h \frac{\partial \varTheta }{\partial t} \,\textrm{d}\varvec{x}+ \int _{\varOmega } \vartheta _h \frac{\,\textrm{d}\vartheta _h}{\,\textrm{d}t}\,\textrm{d}\varvec{x}+\sum _{i=1}^N \vartheta _i \frac{\,\textrm{d}}{\,\textrm{d}t}\big (\sum _{j=1}^Nm_{ij}[ (\mathrm I_hu)_j - \vartheta _j] - m_i\left[ (\mathrm I_hu)_i - \vartheta _i\right] \big )\\ =&\int _{\varOmega } \vartheta _h \frac{\partial \varTheta }{\partial t}\,\textrm{d}\varvec{x}+ \frac{1}{2} \frac{\,\textrm{d}}{\,\textrm{d}t} \Vert \vartheta _h\Vert ^2_{{\textrm{L}}^2(\varOmega )} + \sum _{i,j=1}^N \vartheta _i\, m_{ij} \frac{\,\textrm{d}}{\,\textrm{d}t}\left[ (\mathrm I_hu)_j - (\mathrm I_hu)_i - (\vartheta _j - \vartheta _i) \right] \\ =&\int _{\varOmega } \vartheta _h \frac{\partial \varTheta }{\partial t}\,\textrm{d}\varvec{x}+ \frac{1}{2} \frac{\,\textrm{d}}{\,\textrm{d}t} \Vert \vartheta _h\Vert ^2_{{\textrm{L}}^2(\varOmega )}\\&\quad + \sum _{\begin{array}{c} \\ {i,j=1}{i<j} \end{array}}^N (\vartheta _i - \vartheta _j)\, m_{ij} \frac{\,\textrm{d}}{\,\textrm{d}t}\left[ (\mathrm I_hu)_j - (\mathrm I_hu)_i - (\vartheta _j - \vartheta _i) \right] . \end{aligned}$$

Arguing as in the proof of Proposition 1, we invoke the divergence theorem, Lemma 3 as well as the identities \(u= \varTheta +\mathrm I_hu\) and \(u_h = \mathrm I_hu - \vartheta _h\), which yields

$$\begin{aligned} \varXi _2 ={}&\int _\varOmega \vartheta _h \,\varvec{v}\cdot \nabla \varTheta \,\textrm{d}\varvec{x}+ \frac{1}{2}\int _{\partial \varOmega } \vartheta _h^2 \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \int _{\varGamma _-} \vartheta _h \,\varTheta \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\&{}- \int _{\varGamma _-} \vartheta _h \,\mathrm I_hu \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ \sum _{i=1}^N\vartheta _i\,(u(\varvec{x}_i)-\vartheta _i) \int _{\varGamma _-}\varphi _i\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\={}&\int _\varOmega \vartheta _h \,\varvec{v}\cdot \nabla \varTheta \,\textrm{d}\varvec{x}+ \frac{1}{2}\int _{\varGamma _+} \vartheta _h^2 \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \int _{\varGamma _-} \vartheta _h \,\varTheta \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\&{}- \frac{1}{2}\sum _{\begin{array}{c} \\ {i,j=1}{i < j} \end{array}}^N(\vartheta _i - \vartheta _j)^2 \int _{\varGamma _-} \varphi _i\,\varphi _j\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \frac{1}{2} \sum _{i=1}^N \vartheta _i^2 \int _{\varGamma _-}\varphi _i\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\\&{}+ \int _{\varGamma _-} \big ( \sum _{i=1}^N \vartheta _i\,\varphi _i \, (u(\varvec{x}_i) - \,\mathrm I_hu) \big )\, \varvec{v}\cdot \varvec{n}\,\textrm{d} s. \end{aligned}$$

As in [19, Thm. 3.43], we exploit transformation to the reference element, shape-regularity, and the equivalence of norms in finite dimensional spaces to show that

$$\begin{aligned} \sum _{i=1}^N \int _\varGamma (v_i\, \varphi _i)^2 \,\textrm{d} s\le \sum _{i=1}^N \int _\varGamma v_i^2\, \varphi _i \,\textrm{d} s\le C \Vert v_h\Vert _{{\textrm{L}}^2(\varGamma )}^2 \qquad \forall v_h \in {\textrm{V}}_h,~\varGamma \in {\mathcal {F}}_{\partial \varOmega }. \end{aligned}$$
(4.3)

To derive an estimate for \(\varXi _3\), we rewrite the boundary integrals as a sum of integrals over faces. On each face \(\varGamma \in {\mathcal {F}}_{-}\), we use the estimate \(|\hat{u}_i^k - \hat{u}| \le Ch_\varGamma |\nabla \hat{u}|\), where \(h_\varGamma = {{\,\textrm{diam}\,}}(\varGamma )\). In addition, we invoke Young’s inequality, estimate (4.3), Lemma 6, and incorporate \(\lambda = \Vert \varvec{v}\Vert _{\varvec{{\textrm{L}}}^{\infty }(\varOmega \times {\mathbb {R}}_+)^d}\) into the constant C, which yields

$$\begin{aligned} \varXi _3 ={}&\sum _{i=1}^N \sum _{\varGamma _k\in \mathcal F_i}\int _{\varGamma _k} \vartheta _i\varphi _i\,(\hat{u}_i^k - \hat{u}) \, \min \{0, \varvec{v}\cdot \varvec{n}\} \,\textrm{d} s\nonumber \\ \le {}&C \sum _{\varGamma \in \mathcal F_{-}} \sum _{i=1}^N \int _{\varGamma } \big [ h_\varGamma (\vartheta _i \varphi _i)^2 + \frac{1}{h_\varGamma }|\hat{u}_i^k - \hat{u}|^2 \big ] \,\textrm{d} s\nonumber \\ \le {}&C \sum _{\varGamma \in {\mathcal {F}}_{-}} h_\varGamma \left( \Vert \vartheta _h\Vert ^2_{{\textrm{L}}^2(\varGamma )} + |\hat{u}|^2_{{\textrm{H}}^1(\varGamma )}\right) \le C \Vert \vartheta _h\Vert _{{\textrm{L}}^2(\varOmega )}^2 + C h |\hat{u}|_{{\textrm{H}}^1(\varGamma _-)}^2. \end{aligned}$$
(4.4)

For the nonlinear terms in \(\varXi _4\), we use Lemma 2, Young’s inequality, the compatibility condition (3.1) with constant \(\gamma \in (0,1)\), and Lemma 10 to deduce

$$\begin{aligned} \varXi _4 ={}&d_h(u_h;u_h,\mathrm I_hu) - d_h(u_h;u_h,u_h) + m_h(u_h; \dot{u}_h,u_h) - m_h(u_h; \dot{u}_h,\mathrm I_hu) \\ \le {}&\frac{\gamma }{2} d_h(u_h;u_h,u_h) +\frac{1}{2\gamma } d_h(u_h; \mathrm I_hu,\mathrm I_hu) - \gamma d_h(u_h;u_h,u_h)\\&{}- \frac{\gamma h}{\lambda } m_h(u_h,\dot{u}_h, \dot{u}_h) + \frac{\gamma h}{2 \lambda } m_h(u_h; \dot{u}_h, \dot{u}_h) + \frac{\lambda }{2\gamma h} m_h(u_h; \mathrm I_hu, \mathrm I_hu) \\ \le {}&-\frac{\gamma }{2} d_h(u_h;u_h,u_h) - \frac{\gamma h}{2\lambda } m_h(u_h,\dot{u}_h, \dot{u}_h) + C h \Vert u\Vert _{{\textrm{H}}^2(\varOmega )}^2, \end{aligned}$$

where the factor \(1/\gamma \) was incorporated into the constant C. Combining the above identities for \(\varXi _1\) and \(\varXi _2\) with the inequalities for \(\varXi _3\) and \(\varXi _4\) produces the estimate

$$\begin{aligned} \frac{\,\textrm{d}}{\,\textrm{d}t}\Vert \vartheta _h\Vert _{{\textrm{L}}^2(\varOmega )}^2&{}+\sum _{\begin{array}{c} \\ {i,j=1}{i<j} \end{array}}^N m_{ij} \frac{\,\textrm{d}}{\,\textrm{d}t}(\vartheta _i - \vartheta _j)^2 + \int _{\varGamma _+} \vartheta _h^2 \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s- \sum _{i=1}^N \vartheta _i^2 \int _{\varGamma _-}\varphi _i\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s\nonumber \\&- \sum _{\begin{array}{c} \\ {i,j=1}{i< j} \end{array}}^N(\vartheta _i - \vartheta _j)^2 \int _{\varGamma _-} \varphi _i\,\varphi _j\,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ \gamma d_h(u_h;u_h,u_h)\nonumber \\&\quad + \frac{\gamma h}{\lambda } m_h(u_h,\dot{u}_h, \dot{u}_h) \nonumber \\ \le {}&-2 \int _{\varOmega } \vartheta _h \frac{\partial \varTheta }{\partial t}\,\textrm{d}\varvec{x}+ 2 \sum _{\begin{array}{c} \\ {i,j=1}{i<j} \end{array}}^N (\vartheta _i - \vartheta _j)\, m_{ij} \left[ (\mathrm I_h\partial _t u)_i - (\mathrm I_h\partial _t u)_j \right] \nonumber \\&{}- 2\int _\varOmega \vartheta _h \,\varvec{v}\cdot \nabla \varTheta \,\textrm{d}\varvec{x}+ 2\int _{\varGamma _-} \vartheta _h \,\varTheta \,\varvec{v}\cdot \varvec{n}\,\textrm{d} s+ C \Vert \vartheta _h\Vert ^2_{{\textrm{L}}^2(\varOmega )} + Ch |\hat{u}|^2_{{\textrm{H}}^1(\varGamma _-)} \nonumber \\&{} - 2 \int _{\varGamma _-} \big ( \sum _{i=1}^N \vartheta _i\varphi _i \, (u(\varvec{x}_i) - \,\mathrm I_hu) \big )\, \varvec{v}\cdot \varvec{n}\,\textrm{d} s+ C h \Vert u\Vert _{{\textrm{H}}^2(\varOmega )}^2 {=:}(\star ). \end{aligned}$$
(4.5)

The terms on the right hand side of inequality (4.5) are now bounded using standard arguments. Specifically, we make use of the assumptions on the mesh and of Young’s inequality, apply Lemma 4 to \(\varTheta =u-\mathrm I_hu\) and \(\partial _t \varTheta \), invoke Lemma 5 through 9 and argue as in the derivation of (4.4) to obtain

$$\begin{aligned} (\star ) \le {}&\Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + Ch^4 |\partial _t u |_{{\text {H}}^2(\varOmega )}^2 + C h^2\sum _{K \in \mathcal K_h} |\vartheta _h|_{{\text {H}}^1(K)}|\mathrm I_h\partial _t u - \partial _t u + \partial _t u|_{{\text {H}}^1(K)} \\ {}&\quad + C \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + C h^2 |u|^2_{{\text {H}}^2(\varOmega )} + \lambda \sum _{\varGamma \in {\mathcal {F}}_{-}} \big ( h_\varGamma \Vert \vartheta _h\Vert _{{\text {L}}^2(\varGamma )}^2 + \frac{1}{h_\varGamma } \Vert \varTheta \Vert _{{\text {L}}^2(\varGamma )}^2 \big )\\ {}&\quad +C \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + Ch |\hat{u}|^2_{{\text {H}}^1(\varGamma _-)} \\ {}&\quad + C \sum _{\varGamma \in {\mathcal {F}}_{-}} h_\varGamma \int _\varGamma \big [\sum _{i=1}^N (\vartheta _i\varphi _i)^2 + |\nabla (\mathrm I_hu - u + u)|^2 \big ]\,\text {d} s + C h \Vert u\Vert ^2_{{\text {H}}^2(\varOmega )}\\ \le {}&\Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + Ch^4 |\partial _t u |_{{\text {H}}^2(\varOmega )}^2 + C h \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + C h^3 |\partial _t u|^2_{{\text {H}}^2(\varOmega )} + C h |\partial _t u|^2_{{\text {H}}^1(\varOmega )} \\ {}&{} \quad + C \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + C h^2 |u|^2_{{\text {H}}^2(\varOmega )} + C \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + C h^2 |u|^2_{{\text {H}}^2(\varOmega )} + C \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 \\ {}&{} \quad + C h |\hat{u}|^2_{{\text {H}}^1(\varGamma _-)} + C \sum _{\varGamma \in {\mathcal {F}}_{-}} h_\varGamma \left( \Vert \vartheta _h\Vert _{{\text {L}}^2(\varGamma )}^2 + \Vert u\Vert _{{\text {H}}^2(\varGamma )}^2 \right) + C h \Vert u\Vert ^2_{{\text {H}}^2(\varOmega )} \\ \le {}&C_1 \Vert \vartheta _h\Vert _{{\text {L}}^2(\varOmega )}^2 + C_2 h \left( \Vert \partial _t u\Vert ^2_{{\text {H}}^2(\varOmega )} + \Vert u\Vert ^2_{{\text {H}}^2(\varOmega )} + \Vert u\Vert _{{\text {H}}^2(\varGamma _-)}^2 + |\hat{u}|^2_{{\text {H}}^1(\varGamma _-)} \right) . \end{aligned}$$

We now integrate in time observing that, by our definition of the discrete initial data, we have \(\vartheta _h(0)\equiv 0\). At this stage, we recall the previously given definitions of q and z, which enables us to write the resulting inequality as

$$\begin{aligned} \Vert \vartheta _h(T)\Vert _{{\textrm{L}}^2(\varOmega )}^2 \le C_1 \int _0^T \Vert \vartheta _h(t)\Vert _{{\textrm{L}}^2(\varOmega )}^2 \,\textrm{d} t+ C_2h\, z(T) - q(T). \end{aligned}$$

Using Grönwall’s Lemma as in [9, Lem. 1.9], we obtain

$$\begin{aligned} \Vert \vartheta _h(T)\Vert _{{\textrm{L}}^2(\varOmega )}^2 \le C_1\int _0^T {\textrm{e}}^{C_1(T-t)}\,(C_2 h\,z(t)-q(t)) \,\textrm{d}t + C_2 h\,z(T) - q(T). \end{aligned}$$

The triangle inequality applied to \(u-u_h = \varTheta + \vartheta _h\) then yields the error estimate (4.1) by Lemma 4. \(\square \)

We conclude the theoretical discussion with a few remarks regarding the derived error estimate. Let us first point out that in the general setting with unspecified correction factors \(\alpha _{ij}\) our result is indeed optimal (cf.  Sect. 5.1). If we set all correction factors equal to zero, we obtain the low order method, which cannot be expected to be more than \(\frac{1}{2}\) order accurate in general.

A drawback of our current approach is that the constant on the right hand side of the a priori error estimate (4.2) depends exponentially on the time T. Kučera and Shu [20] demonstrate that exponentially increasing constants can be avoided in some situations. They discretize the advection equation using discontinuous Galerkin methods and derive an error estimate without invoking Grönwall’s inequality. It would be interesting to investigate the merit of their approach for the purposes of our analysis.

Let us briefly remark that we assumed all integrals appearing in the bilinear and linear forms \(a_h(\cdot ,\cdot )\) and \(b_h(\cdot )\) to be evaluated exactly. In fact, even the energy estimate stated in Proposition 1 was derived under this assumption. For polynomial velocities one can indeed employ a quadrature rule of sufficiently high order to accurately compute all integrals. For general velocities, the theory we present needs to be adapted to include quadrature errors. As is common for linear finite elements [10, Thm. 8.5], we recommend to employ quadrature rules that are exact for polynomials in \({\mathbb {P}}_2\) and \({\mathbb {P}}_3\) for volume and boundary integrals, respectively.

Admittedly, a major limitation of Proposition 2 is the fact that the estimate is valid only for problems with exact solutions of very high regularity. In particular, the assumption that \(\partial _t u\) is \({\textrm{H}}^2\) in space is restrictive. In our opinion, the adaptation of the proofs in [5, 27] to the time-dependent setting necessitates this regularity. One can argue that if the exact solution is smooth enough for Proposition 2 to be applicable, a limiter may not even be needed and we could instead employ a stabilized Galerkin method. Since this strategy does not guarantee the validity of discrete maximum principles, AFC schemes provide an appealing alternative. Therefore, theoretical investigations of these methods should be undertaken. It is hoped that our results may serve as a stepping stone for further efforts in this direction.

5 Numerical examples

Let us now corroborate the theoretical results of this work with numerical experiments. The following acronyms are used to distinguish the numerical methods under investigation

  • LOW: low order method (2.10),

  • MCL: monolithic convex limiting scheme (2.15),

  • target: target scheme, i. e., (2.15) with \(f_{ij}^*=f_{ij}\) for all \(i\in \{1,\ldots ,N\}\), \(j \in {\mathcal {N}}_i \setminus \{i\}\).

Other methods used for comparative purposes are specified below. Recall that our stability and convergence proofs rely on the compatibility condition (3.1). In general, this condition is not fulfilled by the standard MCL approach. However, if \(\dot{u}_h\) is set to zero, i. e., if the mass lumping error is not compensated, (3.1) holds due to Lemma 2. To distinguish between the standard MCL scheme employing low order time derivatives \(\dot{u}_i\) given by (2.14) and the lumped-mass version, in which \(\dot{u}_h \equiv 0\), we employ the acronyms MCL-L and MCL-0, respectively. Here the letter L stands for low order time derivatives, while 0 stands for zero time derivatives.

In all simulations, we choose the time step according to (2.16) by setting \(\varDelta t = \nu \varDelta t_{\max }\), where \(\nu \in (0,1]\) is a user-defined value and \(\varDelta t_{\max }\) is the right hand side of inequality (2.16). Discrete initial conditions are obtained by interpolating the continuous initial datum in the discrete space.

In the following sections, we verify that approximations converge at least as fast as the provable rate of \(\frac{1}{2}\). Moreover, we stress the need for stabilization by the use of low order time derivatives (2.14) and present a comparison of results obtained with various definitions of antidiffusive fluxes in the MCL scheme. Finally, we perform an a posteriori check to see for which values of the parameter \(\gamma \) the compatibility condition (3.1) is satisfied by the MCL-L scheme.

5.1 Experimental orders of convergence

In this section, we solve the one-dimensional advection equation with constant velocity \(v=1\). The spatial domain \(\varOmega =(0,1)\) has periodic boundaries. Thus, at each time instant \(T\in {\mathbb {N}}_0\), the exact solution coincides with the initial condition. In this example, we use \(u_0(x) = \exp (-100 (x-0.5)^2)\).

We study the experimental orders of convergence for discrete upwinding (LOW), MCL-L, and MCL-0 schemes using SSP2-RK time stepping and CFL parameter \(\nu =0.5\). While values as large as \(\nu =1\) can safely be employed without causing violations of maximum principles, smaller values may be necessary to observe certain rates of convergence. Alternatively, SSP3-RK time stepping can be used to improve the temporal accuracy. In this study, we employ sequences of nested meshes with generally nonuniform mesh size h obtained by randomly perturbing the positions of the interior mesh vertices of the coarsest grid. The relative mesh sizes \(\min _{K\in {\mathcal {K}}_h} h_K / h\) of the three sequences are 1 (uniform), \(\approx 0.69\) (mildly perturbed), and \(\approx {}\)0.087 (severely perturbed), respectively. We present the \({\textrm{L}}^2(\varOmega )\) errors at the final time \(T=1\) and the corresponding experimental orders of convergence (EOC) in Tables 1, 2, 3.

Table 1 Convergence history for the one-dimensional advection equation on a sequence of uniform periodic meshes. The \(\Vert \cdot \Vert _{{\textrm{L}}^2(\varOmega )}\) errors at \(T=1\) and the corresponding EOC for \(u_0(x) = \exp (-100 (x-0.5)^2)\)
Table 2 Convergence history for the one-dimensional advection equation on a sequence of mildly perturbed periodic meshes. The \(\Vert \cdot \Vert _{{\textrm{L}}^2(\varOmega )}\) errors at \(T=1\) and the corresponding EOC for \(u_0(x) = \exp (-100 (x-0.5)^2)\)
Table 3 Convergence history for the one-dimensional advection equation on a sequence of severely perturbed periodic meshes. The \(\Vert \cdot \Vert _{{\textrm{L}}^2(\varOmega )}\) errors at \(T=1\) and the corresponding EOC for \(u_0(x) = \exp (-100 (x-0.5)^2)\)

The observed rates are in accordance with our expectations. As suggested by Corollary 1, discrete upwinding converges at least with the rate of \(\frac{1}{2}\). Actually, the low order method becomes first order accurate on very fine uniform meshes. Our preferred MCL-L scheme produces second order accurate results in this test. If no correction of the mass lumping error is performed, the order of accuracy deteriorates, while still exceeding the provable rate of \(\frac{1}{2}\). In this example, the influence of mesh perturbations on the results is insignificant. A decay in the convergence rate of the standard MCL scheme for a steady problem was observed on perturbed 2D meshes in [21, Sec. 6.1].

5.2 On the stabilizing effect of low order time derivatives

Let us now compare the standard Galerkin approach to methods that are stabilized by incorporating low order time derivatives (2.14) via antidiffusive fluxes. We consider the same setup as in the previous section with the exception that the initial condition \(u_0\) is replaced by [15]

$$\begin{aligned} u_0(x) = {\left\{ \begin{array}{ll} 1 &{} \text {if } 0.2 \le x \le 0.4, \\ \exp (10)\exp (\frac{1}{0.5-x})\exp ( \frac{1}{x-0.9}) &{} \text {if } 0.5< x < 0.9, \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(5.1)

This profile features discontinuities as well as a \({\textrm{C}}^{\infty }\) region. In Fig. 1 we display standard continuous Galerkin approximations obtained with four different combinations of time stepping schemes and CFL parameters on a uniform, a mildly perturbed and a severely perturbed mesh with 128 elements in each case. Spurious ripples that are not local to the vicinity of the discontinuities can be observed in all profiles. Although limiters can remove these oscillations, the quality of approximations obtained in this fashion is usually poor compared to solutions obtained with flux limiters applied to a stabilized target discretization (cf.  Sect. 5.3).

Fig. 1
figure 1

One-dimensional advection equation with initial condition (5.1). Consistent Galerkin approximations at \(T=1\) obtained with SSP RK time stepping on periodic meshes consisting of 128 elements

Next, we compute approximations of the stabilized target scheme, i. e., (2.21) with \(\alpha _{ij}= 1\) for all \(i\in \{1,\ldots ,N\}\), \(j \in {\mathcal {N}}_i \setminus \{i\}\). These are compared to the profiles obtained with discrete upwinding (LOW), MCL-L, and MCL-0 schemes. SSP2-RK time stepping with CFL parameter \(\nu =1\) is employed in combination with all spatial semi-discretizations. The results of this study are displayed in Fig. 2.

Fig. 2
figure 2

One-dimensional advection equation with initial condition (5.1). Stabilized Galerkin approximations at \(T=1\) obtained with SSP2-RK time stepping and \(\nu =1\) on periodic meshes consisting of 128 elements

We observe significant improvements in the solution quality for the unlimited target scheme compared to the consistent Galerkin approximations displayed in Fig. 1. Numerical results obtained with LOW, MCL-L and MCL-0 exhibit behavior similar to that observed in Sect. 5.1 In particular, the MCL-0 scheme produces a nonsymmetric profile in the left part of the domain, which can be attributed to dispersive errors that occur commonly if the mass lumping error is not compensated [35].

5.3 Solid body rotation

Let us now apply the MCL scheme to a 2D solid body rotation benchmark [26] in which \(\varOmega =(0,1)^2\), \(\varvec{v}(x,y) = 2\pi \,(0.5-y,x-0.5)^T\), \(\hat{u}= 0\) and

$$\begin{aligned} u_0(x,y)={\left\{ \begin{array}{ll} u_0^{\textrm{cone}}(x,y) &{}\text {if } r(x,y;0.5,0.25)\le r_0, \\ u_0^{\textrm{bump}}(x,y) &{}\text {if } r(x,y;0.25,0.5)\le r_0, \\ 1 &{}\text {if } r(x,y;0.5,0.75)\le r_0 \;\wedge \;\left( |x - 0.5| \ge 0.025 \;\vee \; y\ge 1-r_0\right) ,\\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(r(x,y;x_0,y_0) \, {:=}\, \sqrt{(x-x_0)^2 + (y-y_0)^2}\), \(r_0=0.15\), and

$$\begin{aligned} u_0^{\textrm{cone}}(x,y)={}&1-\frac{r(x,y;0.5,0.25)}{r_0},\\ u_0^{\textrm{bump}}(x,y)={}&\frac{1}{4} \left( 1 + \cos \left( \frac{\pi \,r(x,y;0.25,0.5)}{r_0}\right) \right) . \end{aligned}$$

In this example, a cone, a smooth bump and a slotted cylinder rotate around the domain center. At each time instant \(T\in {\mathbb {N}}_0\), the exact solution is equal to the initial condition, which is shown in Fig. 3. The numerical results displayed in this section are visualized with the open source C++ software GLVis [11].

Fig. 3
figure 3

Exact initial condition of the solid body rotation [26] interpolated in \({\textrm{V}}_h\) for a uniform triangular mesh consisting of \(2\cdot 128^2\) elements

We solve this problem numerically using triangular meshes and \(h = c/128\), where \(c=\sqrt{2}\) for uniform grids and \(c=1\) for unstructured ones. For time stepping we employ the SSP2-RK method with constant time steps \(\varDelta t= 5\cdot 10^{-4}\) and \(\varDelta t= 3.125\cdot 10^{-4}\), respectively. In addition to MCL-L and MCL-0 schemes, we test the MCL-G approach, in which the consistent Galerkin time derivative \(\dot{u}_h^{\text{ G }}=\sum _{j=1}^N \dot{u}_j^{\mathrm G} \varphi _j\) defined by

$$\begin{aligned} \sum _{j=1}^N m_{ij}\dot{u}_j^{\text{ G }} = - \sum _{j\in {\mathcal {N}}_i\setminus \{i\}}a_{ij}(u_j -u_i) + \sum _{\varGamma _k \in {\mathcal {F}}_i} b_i^k(\hat{u}_i^k - u_i), \qquad i\in \{1,\ldots ,N\} \end{aligned}$$
(5.2)

is employed to correct the mass lumping error through the antidiffusive fluxes \(f_{ij}= m_{ij}(\dot{u}_i - \dot{u}_j) + d_{ij}(u_i - u_j)\). The numerical results of this study are displayed in Fig. 4 and the approximate \({\textrm{L}}^2(\varOmega )\) errors \(e_2\) at the final time \(T=1\) are presented in the captions along with the maximum solution value \(u_h^{\max }\) for each approximation. The minimum value of each approximation is zero up to machine precision.

Fig. 4
figure 4

Solid body rotation for the 2D advection equation [26]. MCL-L (left), MCL-0 (center), and MCL-G (right) approximations at \(T=1\). Solutions obtained on uniform (top row) and unstructured (bottom row) triangular meshes with SSP2-RK time stepping using \(\varDelta t= 5\cdot 10^{-4}\) and \(\varDelta t= 3.125\cdot 10^{-4}\), respectively

Although all obtained profiles appear to be similar to each other, we can make out some differences by closely examining the results. First, we observe that on the uniform mesh MCL-0 is noticeably more diffusive than MCL-L and MCL-G. In particular, the smooth hump is not well resolved in this approach due to dispersive errors that arise in lumped Galerkin approximations [35]. The unstabilized MCL-G scheme does not suffer from this deficiency and, in fact, produces the smallest approximate error values among the three approaches. However, the lack of stabilization leads to a distortion in the shape of the sharp cone and, on the uniform mesh, a similar feature can be spotted in the region of the slotted cylinder. In the case of the advection equation, this issue does not seem to have a dominating effect on the obtained profiles but, in our experience [14, Sec. 3.4.3.2], the MCL-G scheme produces more pronounced spurious oscillations if applied to more involved problems like the Euler equations of gas dynamics. The quality of approximations obtained in this manner can be improved by employing smaller time steps or higher order SSP-RK methods [21, Fig. 2 (e)]. Nevertheless, some form of stabilization should be used in combination with continuous Galerkin discretizations of hyperbolic problems. Therefore, MCL-L is the preferable option among the three schemes under investigation. Alternative stabilization techniques for standard finite elements can be found in [23, 27] for linear and nonlinear problems, respectively.

5.4 A posteriori compatibility check

The compatibility condition (3.1) turned out to be an invaluable tool for our theoretical investigations. Unfortunately, we are unable to prove that the MCL-L scheme automatically produces compatible pairs \((u_h,\dot{u}_h)\) under suitable assumptions on the mesh. However, it is easy to check for which values of \(\gamma \in (0,1)\) condition (3.1) is fulfilled a posteriori. Indeed, (3.1) is equivalent to

$$\begin{aligned} \gamma \le \frac{d_h(u_h;u_h,u_h) - m_h(u_h;\dot{u}_h,u_h)}{d_h(u_h;u_h,u_h) + \frac{h}{\lambda }m_h(u_h;\dot{u}_h, \dot{u}_h)} \end{aligned}$$
(5.3)

if the denominator in the right hand side of (5.3) is nonzero (it is nonnegative due to Lemma 2). If the numerator in (5.3) is also nonnegative, this criterion yields an upper bound on \(\gamma \). Having calculated these a posteriori bounds via (5.3), one can check how they behave upon mesh refinement. Two issues that lead to a violation of compatibility can occur in practice. First, (5.3) can produce a negative upper bound on \(\gamma \), a case that is not covered by our theory. Secondly, \(\gamma \) may approach zero upon mesh refinement, which would cause the constant in the leading order term of our error estimate to approach infinity. We found the former concern to be valid on perturbed one-dimensional meshes. Using smaller time steps does not resolve incompatibility issues, which seem to be caused by triangulations of bad quality. In such instances, our stability and error estimates are not applicable to MCL-L but remain valid for the LOW and MCL-0 schemes, as well as for the method proposed in [15, Sec. 3.3] that enforces compatibility.

Having performed the described a posteriori check for various test problems, we conjecture that compatibility of \((u_h,\dot{u}_h)\) holds for the MCL-L scheme on uniform meshes. Below we report the results of our experiments in which we compute the values of the right hand side of (5.3). First, we consider four one-dimensional test problems. In each case, the spatial domain \(\varOmega =(0,1)\) is equipped with periodic boundaries and the velocity is \(v=1\). The first and second tests are the same as in Sects. 5.1 and 5.2. In the third and fourth tests, the final time is \(T=0.5\) and the initial conditions read

$$\begin{aligned} u_0(x) = {\left\{ \begin{array}{ll} 0.5\left( 1 + \cos \left( \frac{\pi }{0.15}(x-0.25)\right) \right) &{} \text {if } |x-0.25| < 0.15, \\ 0 &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

and \(u_0(x) = \max \{0, 1-10|x-0.2|\}\), respectively.

We solve each of these problems on a hierarchy of uniform meshes using SSP2-RK time stepping with CFL parameters \(\nu \in \{1, 0.1\}\). The largest value of \(\gamma \) for which (3.1) is satisfied during the whole simulation is presented in Table 4.

Table 4 Maximum values of \(\gamma \) over all time steps for four 1D examples. Results obtained on uniform meshes with SSP2-RK time stepping and CFL parameters \(\nu =1\) (upper half) and \(\nu =0.1\) (lower half)

Additionally, we repeat the solid body rotation test [26] from Sect. 5.3 on sequences of uniform (\(c=\sqrt{2}\)) and unstructured (\(c=1\)) triangular meshes and compute a posteriori values for \(\gamma \) via (5.3). The results of this study are summarized in Table 5, where #TS refers to the total number of employed time steps.

Table 5 Maximum values of \(\gamma \) over all time steps for the solid body rotation [26]. Results obtained on uniform and unstructured meshes with SSP2-RK time stepping and constant time steps

We observe slightly larger maximum values of \(\gamma \) on coarse girds than on fine meshes. The use of smaller time steps seems to have marginal influence on the results. In all cases, we have \(\gamma >0.4\), which is consistent to the value that we used in [15] to enforce (3.1). Contrary to the 1D case, (5.3) does not produce negative values for \(\gamma \) even on unstructured meshes in 2D. This observation leads us to believe that the low order time derivative \(\dot{u}_h\) given by (2.14) is compatible to \(u_h\) even for a certain class of unstructured meshes. Further theoretical and numerical studies are required to pinpoint, exactly which conditions a sequence of unstructured meshes has to satisfy in order to produce compatible pairs \((u_h,\dot{u}_h)\) via the standard MCL approach. The opposite point of view is that the compatibility condition (3.1) can actually be used to determine the mesh quality for mesh optimization purposes. Feasibility and benefits of this approach are yet to be determined.

6 Conclusions

We performed numerical analysis for a discretization of the linear advection equation based on an algebraic flux correction scheme. The employed monolithic convex limiting technique is a semi-discrete approach that enforces discrete maximum principles in fully discrete discretizations based on strong stability preserving Runge–Kutta methods. Outcomes of the conducted research include a stability and an a priori error estimate in the semi-discrete setting. To prove that the scheme converges with a rate of at least \(\frac{1}{2}\), we formulated a compatibility condition for the discrete solution and corresponding approximate time derivatives. It is possible to enforce such constraints via additional limiting. However, our numerical examples indicate that the original MCL scheme produces essentially compatible functions if the antidiffusive fluxes are stabilized, e. g., using a low order approximation (2.14) to the nodal time derivatives.

It is hoped that the ideas presented in this work can be used for analysis of fully discrete problems and extended to nonlinear conservation laws, hopefully even systems like the Euler equations of gas dynamics. Other interesting avenues to explore in future studies include analysis of AFC schemes for other target discretizations, such as discontinuous Galerkin methods and/or higher order finite elements. Moreover, the aspects of inexact numerical integration may need to be taken into account. We invite the interested reader to participate in these research endeavors.