Numerical Results for Adaptive (negative norm) Numerical Results for Adaptive (negative norm) Constrained First Order System Least Squares Constrained First Order System Least Squares Formulations Formulations

We perform a followup computational study of the recently proposed space–time first order system least squares ( FOSLS ) method subject to constraints referred to as CFOSLS where we now combine it with the new capability we have developed, namely, parallel adaptive mesh refinement (AMR) in 4D. The AMR is needed to alleviate the high memory demand in the combined space time domain and also allows general (4D) meshes that better follow the physics in space–time. With an extensive set of computational experiments, performed in parallel, we demonstrate the feasibility of the combined space–time AMR approach in both two space plus time and three space plus time dimensions. © 2020TheAuthor


Introduction
This study is a continuation of previous work, [1][2][3] for discretizations and solvers of time-dependent problems discretized in combined space-time domain.We pursue a constrained first order systems least-squares approach, referred to as CFOSLS [4], which in addition to the more classical FOSLS [5][6][7], imposes certain equations (such as divergence) as constraints for better conservation properties of the discretized problem.
There has been a substantial interest for combined space-time approaches, especially in the solvers community motivated by the parallel-in-time approach for solving time-dependent PDEs, which is viewed as a feasible approach for better utilization of the computational power of the next generation (exascale) parallel computers.
We build upon the developed general classes of 4D finite element spaces (in fact for the whole 4D de Rham sequence) available in the highly scalable finite element software library MFEM [8].The results in the present paper are natural followup extension of the results from [3] where now we utilize our newly developed capability of AMR (adaptive mesh refinement) for the same general classes of 4D finite element spaces.The AMR allows the use of general meshes in 4D without having the notion of time-stepping, thus having high resolution only in directions of high variation of the physical quantities modeled by the time dependent PDEs of interest.In this study, we consider parabolic and hyperbolic (transport) PDEs.In the CFOSLS formulation we also exploit the use of weaker, negative Sobolev norms to impose less smoothness requirements on the solution.
Overall, our results indicate that the combined space-time AMR CFOSLS approach is feasible; it does capture the physics for problems with anisotropy and generates meshes with no time-stepping behavior, which in principle is an indication for alleviating the global CFL-condition (in the case of transport equation).
The CFOSLS approach though, as noted in [3], poses the challenge to the solvers since the FOSLS functionals are not uniformly elliptic and standard multigrid (with geometric coarse spaces) is not algorithmically scalable although our tests do show reasonable performance (in timings).This challenge needs to be addressed further, with possibly exploiting ideas from adaptive AMG that can detect problem anisotropies (cf., e.g., [9] and [10]) The remainder of the paper consists of the following sections.In Section 2 we briefly describe the model problems we consider in this paper.In Section 3, we define the space-time CFOSLS problem and derive a variational formulation.In Section 4, we discretize the variational problems, and introduce the negative norm CFOSLS formulation.In Section 5, we present solvers for the linear systems arising from the finite element discretization.Section 6 contains three numerical experiments we conducted, two examples for the parabolic model problem and one for the transport problem.Finally, in Section 7, we draw some conclusions and provide an outlook on future work.

Model time dependent PDEs used in our study
Throughout this paper, we consider the following model evolution problem: Find u such that where Σ 0 = Ω × {0}, Σ = ∂Ω × (0, T ), L x (•) is a spatial differential operator, and tr(•) represents a trace operator corresponding to the boundary condition(s).We rewrite this problem using the space-time divergence div x,t into with the space-time differential operator L(u) = [L x (u) u] ⊤ and a (possibly combined) space-time boundary condition (B.C.).We will consider two applications of the model problem (1), the parabolic heat equation i.e., L x (u) = −∇ x u, and the hyperbolic transport equation i.e., L x (u) = β u, and where Σ in = {x ∈ ∂Ω : β(x) • ⃗ n x (x) < 0} × (0, T ).

Space-time CFOSLS
In this section, we will describe the space-time least squares formulations, following the work of Voronin et al. [3].
In the following, we will make use of the function spaces: the space of square integrable functions Moreover, we need the following spaces with boundary conditions The corresponding FOSLS-functional is given by Then, the constrained first order system least squares (CFOSLS) problem is given by: find Note that, as we have the divergence constraint, an option could be to omit the second term in the FOSLS functional J .We keep though the second term of J for two reasons.
First is that for the remainder of the paper we impose the constraints weakly.Secondly, we use the localized functionals as indicators for adaptive refinement since the information from the data f is beneficial in that case.

Variational CFOSLS formulation
We introduce the notation (σ, u) = σ − L(u).The first order, or Karush-Kuhn-Tucker (KKT), optimality conditions for the constrained optimization problem (6) are given by the saddle point problem: find (σ, u, λ) We assume that the primal formulation (1) is well-posed, hence the above saddle point problem will possess at least one solution in appropriate function spaces for the solution u and σ, as well as for the Lagrange multipliers λ.The finite element discretization problem will also have at least one solution (if we do not overconstrain it), since it is a nonnegative quadratic constrained minimization problem in a finite dimensional space.To ensure uniqueness of the solution of the discrete problems, we can add some regularization, a topic touched upon later in Section 4.2.We note that these important theoretical issues are not addressed any further in the present paper.
We note that in (7), we look for a solution u ∈ V ⊂ H 1 (Q ), which may be too restrictive.For instance in case of the transport equation, the operator L(u) does not include any derivatives, hence it is enough to look also for solutions u ∈ W .Moreover, we can eliminate one variable from the system.In order to do that, we first deduce that we can express u in terms of σ by a simple manipulation of the first equation in (5), i.e., Inserting the above term into the FOSLS-functional J (σ, u) results in a slightly modified CFOSLS problem: find σ ∈ R such that with J (τ) := ∥K τ∥ 2 0 + ∥div x,t (τ) − f ∥ 2 0 and The resulting variational saddle point problem reads: find (σ, λ) ∈ R × W such that Remark.As already remarked in [3], we have to be careful with the above matrix K .The matrix is rank-deficient in any point (x, t) (as K • b ≡ 0), but the corresponding global operator is non-singular on the kernel of the divergence operator.This is due to the assumption that the underlying differential problem is well-posed.However, the rank deficiency will definitely influence the condition number of the resulting linear system.

Finite element discretization
In this section, we will discretize the saddle point systems from the previous section using the finite element method (FEM).We need a conforming subdivision T h of the space-time cylinder Q into (d + 1)-dimensional simplices, i.e., and we define the space-time element diameter h τ = diam(τ ), τ ∈ T h , and h = max τ ∈T h h τ .Moreover, we need the following finite element function spaces: V h ⊂ V is the space of globally continuous and piecewise polynomial functions, R h ⊂ R is the space of H(div)-conforming functions, and W h ⊂ W is the space of piecewise polynomial functions.The space R h is also referred to as a Raviart-Thomas (RT) space.Applying the usual (mixed) finite element discretization to the systems ( 7) and ( 9) results in the linear systems and respectively.Here σ h , u h , λ h are the coefficient vectors associated with the finite element functions σ h , u h , λ h , respectively.The matrices M, G, X , and K result from the discretization of the operators A(σ, u) and K σ , respectively, and D is the discrete space-time divergence operator.

Negative norm CFOSLS
Instead of minimizing the L 2 -norm of the two equations in the first order system (5), we can define an alternative FOSLS-functional, while keeping the constraint, i.e., min (σ,u)∈R×V where ∥.∥ −s denotes the norm of the negative index Sobolev space H −s , 0 < s < 1.The finite element counterpart of (12) reads similarly, i.e., min with J as above.In order to obtain a computationally feasible problem, we introduce two additional constraints, where Q h is the L 2 -projection on the space of piecewise constants w.r.t. the elements of the mesh T h , i.e., Note that the second additional constraint (14b) is automatically satisfied by the already existing divergence constraint, hence we can omit it.We refer to ( 13)-(14a) as negative norm CFOSLS.We will now derive a computationally feasible formulation of ( 13)-(14a).Using (14a) and the well known standard error estimate, we obtain for the first term in the negative norm functional Furthermore, as we are dealing with finite element functions, we can apply the inverse inequality [11,Thm. 4.6] with α = 0, and obtain Summing up over all τ ∈ T h , we can deduce the bound Deriving analogous bounds for the divergence constraint, and combining them with ( 15) and ( 16), we obtain an equivalent, but computationally feasible formulation of (13), i.e.
(σ h , u h ) = arg min with the locally scaled least squares functional ) . ( The saddle point problems corresponding to the negative norm formulations of ( 6) and ( 9) read then respectively.Here, subscript h indicates that the local element matrices and vectors have been scaled by h 2s τ , and E, F originate from the discretization of the additional constraint (14a).

Alternative formulation for the transport problem
We already mentioned in Section 3.1, that in general there is no guarantee that the discretized problem has a unique solution.However using regularization, we can indeed find a unique solution.In particular, we look for a solution with minimal oscillations, i.e., minimal L 2 norm of its gradient, which can also be interpreted as adding artificial viscosity to the least squares functional ∥ div x,t (bv) − f ∥ 2 0 , which leads to the following constrained least squares problem: with the constraint enforced weakly.Later, we will introduce some positive regularization parameter ε > 0. The choice of ε is not much of an issue, as we will show in a moment.We comment, that more sophisticated regularization approaches are of interest, such as for example, the L 1 minimization of the gradient of u, which however leads to a nonlinear method.We do not pursue the L 1 -norm minimization in the present paper.
At any rate based on formulation (21), proceeding as before, we define the bilinear forms, derive the KKT system, and after discretization, we obtain the saddle point system of equations where L is a discrete Laplace operator, which we can locally scale by ε.

Linear solvers for CFOSLS
In contrast to FOSLS-discretizations, which result in a system of linear equations with a positive definite system matrix, we have to deal with an indefinite system matrix in the case of CFOSLS.Hence, in general we cannot use the conjugated gradient method (CG), but we have to use some solver suitable for indefinite problems, e.g. the minimal residual method (MINRES) or the (flexible) generalized minimal residual method (GMRES).In order to speed up the convergence of the iterative solver, we will employ different preconditioning strategies, which we describe in the following subsections.

Block diagonal preconditioners
One straightforward method of preconditioning the linear systems (7) and ( 9) is to use the following block diagonal preconditioners and, , respectively, where The choice for the block preconditioners can be made w.r.t. the discretized problem, i.e., for the heat equation, we choose B M to be a Jacobi-smoother, and B X and B S(M) to be algebraic multigrid (AMG) preconditioners.
In case of negative norm CFOSLS, the saddle point system (19) has an additional block.We again formulate a Schur complement preconditioner for the last block, i.e.

⎡ ⎢ ⎣
We can derive an analogous preconditioner for (20), given by with B S 1 and B S 2 defined correspondingly.

Monolithic geometric multigrid preconditioners
Instead of treating the preconditioning of each block in (7) separately, we can apply a geometric multigrid method to the whole system.Let {P (k) R h } l k=0 , {P (k) V h } l k=0 and {P (k) W h } l k=0 be the sequences of prolongation operators between the hierarchy of corresponding finite element spaces.Then the prolongation operators for the block system (7) are given by Moreover, as we apply the solver in an adaptive scheme, we reuse the already assembled block system matrices from the coarser refinement levels in our multigrid hierarchy.We describe the application of one multigrid V-cycle in the following algorithm.
We still need to choose a smoother, or sequence of smoothers.A suitable choice is to use some least squares method, e.g.MINRES, preconditioned by the BDP-preconditioner from Section 5.1 as a smoother.We stop the iterative smoother either after reducing the relative residual by a factor γ (k) , or after reaching a fixed, but small number of maximum iterations n (k) M .In Section 6 we solve the linear system (7) be means of flexible GMRES with one V-cycle of the multigrid described above, using γ (k) ≡ 0.01 and n (k) M ≡ 100.We can derive analogous preconditioners for the systems ( 9), (19), and (20).

Divergence free solvers
An alternative way of solving systems ( 7) and ( 9) is to reformulate the problem already on the continuous level into a divergence-free setting.We will briefly summarize the method, for details see [3,Section 3].The key idea is to split the function into a function σ satisfying the divergence constraint, i.e., (div x,t (σ), µ) = (f , µ) ∀µ ∈ W , and a divergence free function σ fulfilling tr σ (σ) = 0. Next, we recall the properties of the de Rham sequence, i.e., there exist a differential operator d and a Sobolev space N, such that for any ψ ∈ N with tr d (ψ) = 0, the image is in R, i.e., dψ ∈ R and, in addition, div x,t (dψ) ≡ 0. For example, in 3D space-time, we use the canonical choice d = curl and N = H(curl, Q ), whereas for 4D space-time, we use d = Div and N = H(Div, Q , K), with Div the row-wise matrix divergence, and where K is the vector space of 4-by-4 skew-symmetric matrices; see [1] for details on this space.Now we can formulate the solving procedure in two steps: Step 2 Find the divergence free correction σ = dψ.We use the decomposition (23) to rearrange system (7) into Now we insert σ from Step 1, and replace σ = dψ, which automatically fulfills the second equation.Moreover, it is sufficient to only test with τ = dφ, φ ∈ N. Thus, we have to solve the following problem: Find (ψ, u) Then the solution σ is obtained by summing up σ + σ.
We can again derive an analogous scheme for system (9).

Implementation and experimental setup
We implemented the finite element methods described in Section 4 by means of the finite element library MFEM [8], which has also an implementation for 4D adaptive finite elements in a development branch. 1 In order to realize the solvers described in Section 5, we use the AMG implementation boomerAMG provided by the linear solver library hypre. 2 Both libraries are developed at LLNL.All of our numerical experiments are performed on the Quartz cluster, also located at LLNL.We tested the following combinations of functionals, constraints and spaces in the numerical experiments: • CFOSLS (6) for the heat equation ( 3) • CFOSLS (8) for the transport equation ( 4) • Negative Norm CFOSLS (13) for the transport equation ( 4) • Regularized constrained LS functional (22) for the transport equation ( 4) In all numerical experiments we use the localized least squares functional J τ as an indicator for refinement, e.g., for (P1) The elements then are marked using Dörfler's marking strategy [12], i.e., we mark all elements τ ∈ Th ⊆ T h , where Th is the set with minimal cardinality, which, for given θ ∈ (0, 1), fulfills In particular, we implemented the algorithm recently proposed by Pfeiler et al. [13], which in addition is very easy to parallelize.Once we have a set of marked elements, we apply the bisection algorithm by Arnold et al. [14] in 3D spacetime, and the bisection algorithm by Stevenson [15] in 4D space-time, to subdivide the marked elements.Both algorithms ensure that the family of adaptive refined meshes (T h ) h stays conforming, i.e. we do not introduce any hanging nodes, and that the mesh elements do not degenerate, i.e. there is a lower bound for the interior angles for all h.Unless stated otherwise, we will always use a bulk parameter θ = 0.25.
We also compared the convergence rates for different polynomial degree of the finite element ansatz functions.With p = 0 we denote the lowest order finite elements, i.e., piecewise constants for L 2 -conforming elements, and piecewise linear, globally continuous functions for H 1 -conforming elements.The use of next-to-lowest-order elements is denoted by p = 1.
When we refer to the solvers and preconditioners presented in Section 5, we will use the following abbreviations: BDP for MINRES preconditioned by the block diagonal preconditioner described in Section 5.1, MGMG for flexible GMRES preconditioned by the monolithic geometric multigrid introduced in Section 5.2, and MG for the divergence free solver defined in Section 5.3.

Example 1: Parabolic evolution problem
We consider the computational domain Q = (0, 1) d+1 , d = 2, 3, and use the manufactured solution We consider only the heat equation ( 3), thus we use formulation (P1).The manufactured solution is very smooth and has no distinct features, so adaptive refinement will improve the overall convergence rate only by a constant factor.In Table 1, we present the total number of iterations needed to reduce the initial residual by a factor of 10 −6 , with the elapsed time in parentheses, using 8 nodes (or 288 cores) on Quartz.While we can observe stable iteration counts for the monolithic geometric multigrid, the cost per iteration is significantly higher than for the other solvers.
For the sake of completeness, we also include plots of the convergence rates; see Figs. 1 and 2. As expected, for problems without singularities or without localized (anisotropic) features, adaptive refinement improves the convergence rates only by a constant factor.

Example 2: The transport equation
For our second example, we will consider only the transport equation ( 4).Here, the space-time cylinder is given by ).We choose the vector fields ] , and the initial data , and u 0 (x) = e −100 , for d = 2 and d = 3, respectively.We prescribe no source terms, i.e., f ≡ 0. Hence, we expect that the profile of the initial data should only be transported along the vector field β.Moreover, as we consider a hyperbolic evolution equation, there should be no dissipation.This solution is very localized and has sharp gradients, thus we expect a significant improvement in the convergence rates by using adaptive refinement.We will compare formulations (P2)-(P3), as well as different values of the index s for formulation ( P2).
First we consider the case d = 2. Here, the spatial domain is the unit circle and the flow field is a counter-clockwise, circular motion.In Fig. 3(a), we plot the relative L 2 -error in the variable u.We can observe that, in comparison with uniform refinement, more than one order of magnitude less #dofs is needed to reach a certain threshold with adaptive refinement.Moreover, comparing formulations (P2) and ( P2), we deduce that the additional constraint does not really improve the convergence behavior for this example.In Fig. 4, we present cuts through the space-time mesh at certain times t.The mesh was obtained after 17 adaptive refinements.We can clearly see that the mesh is very fine along the movement of the peak, whereas stays relatively coarse where nothing happens.Next we consider the case d = 3, where the spatial domain is the unit ball, and the flow field is again a circular motion in the x 1 -x 2 -plane, with a varying intensity depending on the x 3 variable.In Fig. 3(b), we also plot the relative L 2 -error in u.Again, the adaptive refinement results in a huge improvement in the convergence rates.However, here, formulations (P2) and ( P2) differ more than previously.The adaptive refinement for negative norm CFOSLS seems to result in worse convergence behavior than adaptivity for standard CFOSLS.In Fig. 5, we present the finite element solution obtained after 13 adaptive refinements.The 4D hyper-cylinder Q was cut at times t, and the finite element solution was then projected on these 3D cuts.

Example 3: Adaptive benchmark
For the third example we consider the NIST adaptive benchmark problem 3 ''Moving Circular Wave Front''.The space-time cylinder is given by Q = (0, 10) × (−5, 5) × (0, 10), i.e., d = 2.We again use a manufactured solution and where x c is the origin of the wavefront, α controls the steepness of the wave, and C is a constant to scale the function value.For this example, we choose α = 20, C = 10000, and x c = 0.The original benchmark is for a parabolic model problem, i.e., the heat equation (3).However, we can use the manufactured solution u also for the transport problem (4), with the flow field ] .
We again compare the convergence rates in the L 2 -norm of u h and σ h for the different formulations (P1)-(P3), as well as different values of s.For lowest order elements, i.e. p = 0, we observe an improvement in the convergence rates for both, the parabolic and the hyperbolic problem.In particular, for the heat equation, we need more than one order of magnitude 3 https://math.nist.gov/amr-benchmark/index.html   less dofs to reach a certain threshold compared to uniform refinement; see Fig. 7(b).For the hyperbolic problem, we can observe a similar behavior, where we also save roughly one order of magnitude of dofs when comparing adaptive and uniform refinement.For next-to-lowest-order elements, i.e. p = 1, we now in addition compare different values of the negative norm index s.The parabolic problem again shows a sharp improvement in the convergence rates for adaptive refinement, e.g., in order to reach an relative L 2 -error of 10% for σ, we need around 5 × 10 6 dofs for adaptive refinement, compared to >10 8 dofs needed in the case of uniform refinement; see Fig. 8(b).In Fig. 6, we present plots of finite element solution u h to formulation (P1) over the whole space-time mesh T h , where cut off parts of the space-time cylinder to visualize the spatial profile of the solution for non-zero times t are shown.The hyperbolic transport problem behaves a little different from in the lowest order case.In order to see an improvement in the convergence rates, we first need to reach certain amount of dofs.Then we again need around one order of magnitude less #dofs to reach a certain threshold in the errors.
When comparing the standard CFOSLS to negative norm CFOSLS, we can further reduce the needed number of degrees of freedom by a factor of 2, depending on the choice of s; see Fig. 8.

Conclusion
The overall conclusion of the presented computational study is that the space-time FOSLS technique poses in addition to the main constraint of high memory demand the challenge of finding scalable solvers which is due to the lack of full ellipticity of the FOSLS functionals.As we have demonstrated, that with the help of AMR, the severe memory constraints can be alleviated and also with the AMR we can benefit in terms of generating meshes (not necessarily exploiting timestepping) that follow the physics and not posing a global CFL constraint on the discretization.Future research is needed in the direction of designing more scalable solvers and one possible venue for that is to exploit recent advances of adaptive AMG methods that can take advantage of specialized coarsening to detect possible anisotropy [9,10] and not necessarily require ellipticity of the bilinear forms.Moreover, in order to take full advantage of the parallel AMR, we need to introduce some kind of load balancing to ensure that the work is equally distributed among the cores.
where Σ σ ess = ∅ and Σ u ess = Σ 0 ∪ Σ for the heat equation (3), and Σ σ ess = Σ in and Σ u ess = Σ 0 , for the transport problem (4).Introducing a new variable σ = L(u), we rewrite the model problem (2) as a first order system, and it also holds for L(u h ) = [βu h , u h ] T for β being a polynomial (as it is in our test Example 2).The derivations in what follows are used as a motivation to introduce the locally weighted least squares functional (18).

Algorithm 5 . 1 .
Assume we have a hierarchy of matrices {A (k) } l k=0 , prolongation operators {P (k) } l k=0 and smoothers {M (k) } l k=0 .At level k for given v, the computation of B (k) GMG v consists of:

Fig. 6 .
Fig. 6.Example 3: Finite element solution u h obtained by formulation (P1) with p = 1, plotted over the whole space-time mesh after 15 adaptive refinements with bulk parameter θ = 0.1, cut off at t.

Table 1
Example 1: Iteration numbers and solving times.Example 1: Convergence rates of uniform and adaptive refinements, for d = 3.