UvA-DARE (Digital Academic Stability of Galerkin discretizations of a mixed space–time variational formulation of parabolic evolution equations

A BSTRACT . Weanalyze Galerkin discretizations of anew well-posed mixedspace- time variational formulation of parabolic PDEs. For suitable pairs of ﬁnite element trial spaces, theresulting Galerkin operators are shown to be uniformly stable. The method is compared to two related space-time discretization methods introduced in [ IMA J. Numer. Anal. , 33(1) (2013), pp. 242-260] by R. Andreev and in [ Comput. Methods Appl. Math. , 15(4) (2015), pp. 551-566] by O. Steinbach.


INTRODUCTION
In recent years one witnesses a rapidly growing interest in simultaneous spacetime methods for solving parabolic evolution equations originally introduced in [BJ89,BJ90], see e.g. [GK11, And13, UP14, Ste15, GN16, LMN16, SS17, DS18, NS19, RS18, VR18, SZ18, FK19]. Compared to classical time marching methods, spacetime methods are much better suited for a massively parallel implementation, and have the potential to drive adaptivity simultaneously in space and time.
Apart from the first order system least squares formulation recently introduced in [FK19], the known well-posed simultaneous space-time variational formulations of parabolic equations in terms of partial differential operators only, so not involving non-local operators, are not coercive. As a consequence, it is non-trivial to find families of pairs of discrete trial-and test-spaces for which the resulting Petrov-Galerkin discretizations are uniformly stable. The latter is a sufficient and, as we will see, necessary condition for the Petrov-Galerkin approximations to be quasi-optimal, i.e., to yield an up to a constant factor best approximation to the solution from the trial space. This concept has to be contrasted to rate optimality that, for quasi-uniform temporal and spatial partitions, has been shown for any reasonable numerical scheme under the assumption of sufficient regularity of the solution.
If one allows different spatial meshes at different times, then for the classical time marching schemes quasi-optimality of the numerical approximations is known not to be guaranteed as demonstrated in [Dup82,Sect. 4].
In view of the difficulty in constructing stable pairs of trial-and test-spaces, in [And13] Andreev considered minimal residual Petrov-Galerkin discretizations.
They have an equivalent interpretation as Galerkin discretizations of an extended self-adjoint mixed system, with the Riesz lift of the residual of the primal variable being the secondary variable. This is the point of view we will take.
A different path was followed by Steinbach in [Ste15]. Assuming a homogenous initial condition, for equal test and trial finite element spaces w.r.t. fully general finite element meshes, there stability was shown w.r.t. a weaker mesh-dependent norm on the trial space. As we will see, however, this has the consequence that for some solutions of the parabolic problem these Galerkin approximations are far from being quasi-optimal w.r.t. the natural mesh-independent norm on the trial space.
In the current work, we modify Andreev's approach by considering an equivalent but simpler mixed system that we construct from a space-time variational formulation that follows from applying the Brézis-Ekeland-Nayroles principle [BE76,Nay76]. With the same trial space for the primal variable, we show stability of the Galerkin discretization of this mixed system whilst utilizing a smaller trial space for the secondary variable. In addition, the stiffness matrix resulting from this mixed system is more sparse. In our numerical experiments the errors in the Galerkin solutions are nevertheless very comparable.
1.1. Organization. In Sect. 2 we derive the two self-adjoint mixed system formulations of the parabolic problem that are central in this work. In Sect. 3 we give sufficient conditions for stability of Galerkin discretizations for both systems. We provide an a priori error bound for the Galerkin discretization of the newly introduced system, and improved a priori error bounds for the methods from [And13] and [Ste15]. In Sect 4, we show that the crucial condition for stability (being the only condition for the newly introduced mixed system) is satisfied for prismatic space-time finite elements whenever the generally non-uniform partition in time is independent of the spatial location, and the generally non-uniform spatial mesh in each time slab is such that the corresponding L 2 -orthogonal projection is uniformly H 1 -stable. In Sect. 5 we present some first simple numerical experiments for a one-dimensional spatial domain and uniform meshes. Conclusions are presented in Sect. 6.
1.2. Notations. In this work, by C D we will mean that C can be bounded by a multiple of D, independently of parameters which C and D may depend on. Obviously, C D is defined as D C, and C D as C D and C D.
For normed linear spaces E and F, by L(E, F) we will denote the normed linear space of bounded linear mappings E → F, and by Lis(E, F) its subset of boundedly invertible linear mappings E → F. We write E ֒→ F to denote that E is continuously embedded into F. For simplicity only, we exclusively consider linear spaces over the scalar field R.

SPACE-TIME FORMULATIONS OF THE PARABOLIC EVOLUTION PROBLEM
Let V, H be separable Hilbert spaces of functions on some "spatial domain" such that V ֒→ H with dense and compact embedding. Identifying H with its dual, we obtain the Gelfand triple V ֒→ H ≃ H ′ ֒→ V ′ .
We use the notation ·, · to denote both the scalar product on H × H, and its unique extension by continuity to the duality pairing on V ′ × V. Correspondingly, the norm on H will be denoted by .
For a.e. t ∈ I := (0, T), let a(t; ·, ·) denote a bilinear form on V × V such that for any η, ζ ∈ V, t → a(t; η, ζ) is measurable on I, and such that for a.e. t ∈ I, we are interested in solving the parabolic initial value problem to finding u such that is not coercive but only satisfies a Gårding inequality a(t; η, η) + ̺ η, η η 2 V (η ∈ V), then one can consider a transformed problem such that (2.2) is valid.
In a simultaneous space-time variational formulation, the parabolic PDE reads as finding u from a suitable space of functions of time and space such that for all v from another suitable space of functions of time and space. One possibility to enforce the initial condition is by testing it against additional test functions. A proof of the following result can be found in [SS09] where for t ∈Ī, γ t : u → u(t, ·) denotes the trace map. That is, assuming g ∈ Y ′ and u 0 ∈ H, finding u ∈ X such that is a well-posed variational formulation of (2.3).
One ingredient of the proof of this theorem is the continuity of the embedding X ֒→ C(Ī, H), in particular implying that for any t ∈Ī, γ t ∈ L(X, H).
an equivalent well-posed variational formulation of the parabolic PDE is obtained by applying the so-called Brézis-Ekeland-Nayroles variational principle [BE76,Nay76], cf. also [And12,§3.2.4]. It reads as where the operator at the left hand side is in Lis(X, X ′ ), is self-adjoint and coercive.
We provide a direct proof of these facts. Since an equivalent formulation of (2.5) as a self-adjoint saddle point equation reads as finding (µ, σ, u) ∈ Y × H × X (where µ and σ will be zero) such that Thanks to (2.5), this Schur complement B ′ A −1 s B + γ ′ 0 γ 0 is in Lis(X, X ′ ), is selfadjoint and coercive.
We show that (2.9) and (2.7) are equal. Recalling the definitions of C and ∂ t , note that the right-hand sides of both equations are the same, and that The proof of our claim is completed by noting that for w, v ∈ X, As (2.9) was obtained as the Schur complement equation of (2.8), in its form (2.7) it is naturally obtained as the Schur complement of the problem of finding (λ, u) ∈ Y × X such that Knowing that its Schur complement is in Lis(X, X ′ ), A s ∈ Lis(Y, Y ′ ), and C ∈ L(X, Y ′ ), we infer that the self-adjoint operator at the left hand side of (2.10) is in Lis(Y × X, Y ′ × X ′ ). Substituting C = B − A s and Bu = g, we find that the secondary variable satisfies λ = u.
Remark 2.3. When reading γ ′ T γ T as ∂ t + ∂ ′ t + γ ′ 0 γ 0 , the system (2.10) has remarkable similarities to a certain preconditioned version presented in [NS19] of a discretized parabolic PDE using the implicit Euler method in time. Ideas concerning optimal preconditioning developed in that paper, as well as those in [And16], can be expected to be applicable to Galerkin discretizations of (2.10).
Remark 2.4. In equations (2.8) and (2.9), the operator A s can be replaced by a general self-adjointÃ s ∈ Lis(Y, Y ′ ). WithC := B −Ã s , the equivalent equation (2.7) then reads as and (2.10) as In the next section, we study Galerkin discretizations of equations (2.8) and (2.10), which then are no longer equivalent.
Since the secondary variables µ and σ in (2.8) are zero, the subspaces for their approximation do not have to satisfy any approximation properties. Since the secondary variable λ in (2.10) is non-zero, the subspace of Y for its approximation has to satisfy approximation properties, and the error in its best approximation enters the upper bound for the error in the primal variable u.
On the other hand, (uniform) stability will be easier to realize with equation (2.10) and will also be proven to hold true for A a = 0; the system matrix will be more sparse; and the number of unknowns will be smaller.
In order to facilitate the derivation of some quantitative results, we will equip the spaces Y and X with the 'energy-norms' defined by which are equivalent to the standard norms on these spaces. Correspondingly, orthogonality in Y will be interpreted w.r.t. the 'energy scalar product' (A s ·)(·).

STABLE DISCRETIZATIONS OF THE PARABOLIC PROBLEM
3.1. Uniformly stable (Petrov-) Galerkin discretizations and quasi-optimal approximations. This subsection is devoted to proving the following theorem. [Kat60,XZ03]). We obtain that

Theorem 3.1. Let W and Z be Hilbert spaces, and F
(3.1) It remains to show uniform boundedness of P δ L(Z,Z) if and only if uniform stability is valid.
The definition of P δ shows that Further, we have that where the last equality follows from E δ W ′ L(W ′ ,W δ ′ ) ≤ 1 and, for the other direction, from the fact that for given The proof is completed by Remark 3.2. In particular above analysis provides a short self-contained proof of the quantitative results that were established earlier in [TV16, §2.1, in particular (2.12)].
3.2. Uniformly stable Galerkin discretizations of (2.10). Let Y δ × X δ be a closed subspace of Y × X, and let E δ We conclude that this Galerkin operator is invertible if and only if the Schur complement and let Then with λ = u and (λ δ , u δ ) denoting the solutions of (2.10) and its Galerkin discretization, respectively, it holds that Proof. In view of the second inequality presented in Remark 3.2, we start with bounding the norm of the continuous operator. Using Young's inequality, for 1 Here and in the following, inf {u∈X δ : should be read as 1 in the case To bound, in view of (3.2), the norm of the inverse of the Galerkin operator, we use the block-LDU factorization (3.3). With r : -norm of the inverse of the first factor at the right-hand side of (3.3) satisfies the same bound.
Moving to the second factor, we consider the Schur complement operator.
where we used that 1 ∆ by definition of ρ ∆ . One easily verifies (3.7) also in the case that A a = 0 i.e. ρ ∆ = 0.
∆ . In view of the second inequality presented in Remark 3.2 in combination with (3.2), the proof is completed by collecting the bounds that were derived.
3.3. Galerkin discretizations of (2.8). Although it is likely possible to generalize results to the case of A a = 0, as in [And13,Ste15] in this section we operate under the condition that Following [Ste15], for a given closed subspace Y δ ⊆ Y we define the 'meshdependent' norm on X by Note that X,Y = X . The following result generalizes the 'inf-sup identity', known for Y δ = Y, see e.g. [ESV17], to mesh-dependent norms.
If additionally γ 0 u ∈ H δ , then The second statement follows from The next theorem gives sufficient conditions for existence and uniqueness of solutions of the Galerkin discretization of (2.8), and provides a suboptimal error estimate.
Theorem 3.5 can be used to demonstrate optimal rates for the error in u δ in the X,Y δ -norm, and hence also in the Y-norm. Yet, for doing so one needs to control the error of best approximation in the generally strictly stronger X -norm, which requires regularity conditions on the solution u that exceeds those that are needed to guarantee optimal rates of the best approximation in the X,Y δ -norm. In other words, this theorem does not show that u δ is a quasi-best approximation to u from X δ in the X,Y δ -norm, or in any other norm. Remark 3.6. Theorem 3.5 provides a generalization, with an improved constant, of Steinbach's result [Ste15, Theorem 3.2]. There the case was considered that the initial value u 0 = 0, ran γ 0 | X δ = {0}, H δ = {0}, and Y δ = X δ . In that case the Galerkin discretization of (2.8) means solving u δ ∈ X δ from (Bu δ )(v) = g(v) (v ∈ X δ ) (indeed, Z δ in the proof of Theorem 3.5 is X δ × {0}). So with this approach the forming of 'normal equations' as in (2.9) is avoided.
In case of an inhomogeneous initial value u 0 ∈ H, one may approximate the solution asū + w δ , whereū ∈ X is such that γ 0ū = u 0 , and w δ ∈ X δ solves 2 In the (discontinuous) Petrov-Galerkin community, Y δ × H δ and Z δ are known under the names test search space (or search test space), and projected optimal test space (or approximate optimal test space), respectively.
. Although such aū ∈ X always exists, its practical construction becomes inconvenient for u 0 ∈ V. For u 0 ∈ V,ū can be taken as its constant extension in time.
To investigate in the setting of [Ste15] the relation between the X,X δ -and Xnorms, we consider X δ of the form X δ t ⊗ X δ x , where X δ t is the space of continuous piecewise linears, zero at t = 0, w.r.t. a uniform partition of I with mesh-size For some arbitrary, fixed 0 By substituting these estimates in the right-hand side of (3.12), we find that its As follows from the first inequality in Remark 3.2, this means that there exist solutions u ∈ X of the parabolic problem for which the errors in X-norm in these Galerkin approximations from X δ are a factor h − 1 2 δ larger than these errors in the best approximations from X δ .
Numerical evidence provided by [Ste15,Table 6] indicate that in general these Galerkin approximations are not quasi-optimal in the Y-norm either.
Returning to the general setting of Theorem 3.5, in the following theorem it will be shown that under an additional assumption quasi-optimal error estimates are valid.
Proof. As we have seen in the proof of Theorem 3.5, thanks to the assumptions X δ ⊆ Y δ and ran γ 0 | X δ ⊆ H δ , the component u δ ∈ X δ of the Galerkin solution of (2.8) is the Petrov-Galerkin solution of (2.6) with test space Z δ ⊂ Y δ × H δ . Equation (3.11) shows that the projector P δ : u → u δ satisfies P δ u X,Y δ ≤ u X . The proof is completed by X ≤ γ −1 ∆ X,Y δ on X δ by assumption (3.5), in combination with (3.1).
In [And13], Andreev studied minimal residual Petrov-Galerkin discretizations Remark 3.8. As was pointed out earlier in [And13], for practical computations it can be attractive to modify the Galerkin discretization of (2.8) by replacing E δ whose inverse can be determined cheaply (a preconditioner) 3 , such that for some constants 0 < c N ≤ C N < ∞, Indeed, in that case one can solve the then explicitly available Schur complement equation with precondition CG, instead of applying the preconditioned MINRES iteration. By redefining in the proof of Theorem 3.5, and by taking W δ to be its orthogonal complement in Y δ × H δ with Y δ 3 For Galerkin discretizations of (2.10), such a replacement of E δ Y ′ A s E δ Y by an equivalent operator will result in an inconsistent discretization. now being equipped with inner product (Ã δ s ·)(·), instead of (3.11) we now estimate for anyū δ ∈ X δ , Consequently, a generalization of the statement of Theorem 3.5 reads as and that of Theorem 3.7 as Remark 3.9. As we have seen in the previous section, under the condition that (3.5) is valid, Galerkin discretizations of (2.10) yield quasi-optimal approximations. Assuming A = A ′ , in the current section we have seen that the same holds true for Galerkin discretizations of (2.8) when in addition X δ ⊆ Y δ and ran γ 0 | X δ ⊆ H δ . For the latter discretization, however, a still suboptimal error bound is valid without assuming (3.5). This raises the question whether this is also true for Galerkin discretizations of (2.10).
As we have seen earlier, the Galerkin operator resulting from of (2.10) is invertible whenever X δ = {0}. Moreover, when equipping X δ with the 'meshdependent' norm X,Y δ , by adapting the proof of Theorem 3.3 one can show that the Galerkin operator is in Lis(Y δ × X δ , Y δ ′ × X δ ′ ) with both the operator and its inverse having a uniformly bounded norm. Despite this result, we could not establish, however, a suboptimal error estimate similar to Theorem 3.5.
Finally in this section we comment on the implementation of the Galerkin discretization of (2.8). This system reads as By eliminating σ δ , it is equivalent to the assumption that ran γ 0 | X δ ⊆ H δ which was made in Theorem 3.7, it can be omitted, or equivalently, it can be pretended that H δ = H, without changing the solution (µ δ , u δ ). The implementation of the resulting system is easier, and it runs more efficiently than (3.13).
Remark 3.10. The system (3.15) can be viewed as a Galerkin discretisation of but for the analysis of the discretization error in (µ δ , u δ ) it is still useful to view (3.15) before elimination of σ δ , as a Galerkin discretization of (2.8) which yielded the sharp bound on this error presented in Theorem 3.7.

REALIZATION OF THE UNIFORM INF-SUP STABILITY (3.5)
In Theorem 3.3 it was shown that Galerkin discretizations of (2.10) are quasioptimal when (3.5) is valid, and in Theorem 3.7 the same was shown for Galerkin discretizations of (2.8) when in addition X δ ⊆ Y δ and ran γ 0 | X δ ⊆ H δ (and A = A s ) are valid.
In this section we realize the condition (3.5) for finite element spaces w.r.t. partitions of the space-time domain into prismatic elements. In §4.1 generally nonuniform partitions are considered for which the partition in time is independent of the spatial location, and the spatial mesh in each time slab is such that the corresponding H-orthogonal projection is uniformly V-stable. In §4.2 we revisit the special case, already studied in [And13], of trial spaces that are tensor products of temporal and spatial trial spaces.
Proof. In [And13, Lemma 6.2] it was shown that inf 0 =u∈X With P n denoting the Legendre polynomial of degree n, extended with zero outside (−1, 1), for any u ∈ X δ , ∂ t u can be written as the L 2 (I; H)-orthogonal ex- which implies the result.
Remark 4.2. In view of Theorem 3.7, note that both X δ ⊂ Y δ and (3.5) are valid by Considering the condition on the collection O of spatial trial spaces X x , let us consider the typical situation that H = L 2 (Ω), V = H 1 0,γ (Ω) = {u ∈ H 1 (Ω) : u = 0 on γ} where Ω ⊂ R d is a bounded polytopal domain, and γ is a measurable, closed, possibly empty subset of ∂Ω. We consider X x ⊂ V to be finite element spaces of some degree w.r.t. a family of uniformly shape regular, and, say, conforming partitions T of Ω into, say, d-simplices, where γ is the union of some (d − 1)-faces of S ∈ T . When the partitions in this family are quasi-uniform, then using e.g. the Scott-Zhang quasi-interpolator ( [SZ90]), it is easy to demonstrate the so-called (uniform) simultaneous approximation property Writing for u ∈ V and any v ∈ X x , The uniform boundedness of Q x L(V,V) is, however, by no means restricted to families of finite element spaces w.r.t. quasi-uniform partitions, and it has been demonstrated for families of locally refined partitions, for d = 2 including those that are generated by the newest vertex bisection algorithm. We refer to [Car02,GHS16]. 4.2. Non-uniform approximation in space global in time, non-uniform approximation in time global in space. If in Theorem 4.1, the spatial trial spaces X i x are independent of the temporal interval (t i , t i+1 ), then X δ is a tensor product of trial spaces in space and time. In that case, one shows inf-sup stability for general temporal trial spaces, e.g. spline spaces with more global smoothness than continuity.
The proof of this result follows from the fact that thanks to the Kronecker product structure of ∂ t ∈ L(X, Y ′ ), for such trial spaces we have (To see this, one may use that for Hilbert spaces U and V, T ∈ L(U, V ′ ), and Riesz mappings R U : being self-adjoint and non-negative. In the above setting, it is a Kronecker product of corresponding operators acting in the 'time' and 'space' direction, respectively.) the space of discontinuous piecewise linears w.r.t. T , we will report on results obtained with this alternative choice for Y t .

NUMERICAL EXPERIMENTS
For the simplest possible case of the heat equation in one space dimension discretized using as 'primal' trial space X δ the space of continuous piecewise bilinears w.r.t. a uniform partition into squares, we compare the accuracy of approximations provided by the newly proposed method (i.e. the Galerkin discretization of (2.10) with trial space here denoted by Y δ new × X δ ) with those obtained with the method from [And13] (i.e. the Galerkin discretization of (2.8)). We implement the latter method in the form (3.15), i.e. after eliminating σ δ . The remaining trial space is denoted here by Y δ Andr. × X δ . So we take T = 1, i.e. I = (0, 1), and with Ω := (0, 1), The total number of nonzeros in the whole system matrix of the new method is asymptotically a factor 2 smaller than this number for Andreev's method.
Prescribing both a smooth exact solution u(t, x) = e −2t sin πx and a singular one u(t, x) = e −2t |t − x| sin πx, Figure 1 shows the errors e δ := u − u δ in X-norm as a function of dim X δ . The norms of the errors in the Galerkin solutions found by the two methods are nearly indistinguishable from one another. Furthermore, the observed convergence rates 1/2 and 1/4, respectively, are the best possible ones that in view of the polynomial degrees of X δ and Y δ (new method) or that of X δ (Andreev's method) and the regularity of the solutions can be expected with the application of uniform meshes. (For any ε > 0, e −2t |t − x| sin πx ∈ H 3 2 −ε (I × Ω) \ H 3 2 (I × Ω)). For both solutions and both numerical methods, the errors e δ (T, ·) measured in L 2 (Ω) converge with the better rate 1, i.e., these errors are asymptotically proportional to h 2 x = h 2 t , see left picture in Figure 2. To illustrate that the two methods yield different Galerkin solutions, we show e δ (0, ·), measured in L 2 (Ω)-norm in the right of Figure 2. The new method actually yields two approximations for u, viz. u δ and λ δ . This secondary approximation is not in X, but it is in Y = L 2 (I; V). For both solutions, the errors in λ δ measured in Y-norm are slightly larger than in those in u δ , see left picture in Figure 3.
Finally, we replaced the symmetric spatial diffusion operator by a nonsymmetric convection-diffusion operator a(t; η, ζ) := Ω η ′ ζ ′ + βη ′ ζdx. Letting β := 100 and again taking the singular solution u(t, x) = e −2t |t − x| sin πx, the errors e δ in X-norm of both Galerkin solutions vs. dim X δ are given in Figure 3. We once again see that the two methods show very comparable convergence behaviour.  Left: e δ Y and u − λ δ Y vs. dim X δ for the symmetric problem. Right: e δ X vs. dim X δ for the nonsymmetric problem.

CONCLUSION
Three related (Petrov-) Galerkin discretizations of space-time variational formulations were analyzed. The Galerkin scheme introduced by Steinbach in [Ste15] has the lowest computational cost, and applies on general space-time meshes, but depending on the exact solution, the numerical solutions can be far from quasioptimal in the natural mesh-independent norm. The minimal residual Petrov-Galerkin discretization introduced by Andreev in [And13] yields for suitable trial and test pairs quasi-optimal approximations from the trial space. For suitable pairs of trial spaces, Galerkin discretizations of a newly introduced mixed space-time variational formulation also yield quasi-optimal approximations, but for the same accuracy at a lower computational cost than with the method from [And13].