On the choice of finite element for applications in geodynamics

Geodynamical simulations over the past decades have widely been built on quadrilateral and hexahedral finite elements. For the discretisation of the key Stokes equation describing slow, viscous flow, most codes use either the unstable Q1×P0 element, a stabilised version of the equal-order Q1×Q1 element, or more recently the stable Taylor-Hood element with continuous (Q2×Q1) or discontinuous (Q2×P−1) pressure. However, it is not clear which of these choices is actually the best at accurately simulating “typical” geodynamic situations. 5 Herein, we are providing for the first time a systematic comparison of all of these elements. We use a series of benchmarks that illuminate different aspects of the features we consider typical of mantle convection and geodynamical simulations. We will show in particular that the stabilised Q1×Q1 element has great difficulty producing accurate solutions for buoyancydriven flows – the dominant forcing for mantle convection flow – and that theQ1×P0 element is too unstable and inaccurate in practice. As a consequence, we believe that theQ2×Q1 andQ2×P−1 elements provide the most robust and reliable choice for 10 geodynamical simulations, despite the greater complexity in their implementation and the substantially higher computational cost when solving linear systems.

geodynamical simulations, despite the greater complexity in their implementation and the substantially higher computational cost when solving linear systems. A third option is the use of Q 1 × Q 1 elements where both velocity and pressure use bi-or tri-linear shape functions. This combination of elements is not LBB-stable by default, but numerous stabilisation techniques -typically adding a pressure dependent term to the mass conservation equation -have been proposed in the literature (see e.g. Norburn and Silvester, 2001;Elman et al., 2014;Gresho et al., 1995). Herein, we will discuss in particular the variation by Dohrmann and Bochev (2004) that is simple to implement and does not involve any tunable parameter. This approach is used in the Rhea code (Burstedde et al., 2009(Burstedde et al., , 2013 in conjunction with Adaptive Mesh Refinement (AMR), allowing for the numerical solution of whole Earth 60 models at high resolutions (Stadler et al., 2010;Alisic et al., 2012). Another example of the use of this method is the work of Leng and Zhong (2011), also using AMR, to study thermochemical mantle convection. Both the ELEFANT code with an application to the 3D thermal state of curved subduction zones (Plunder et al., 2018), and the GALE code (Moresi et al., 2012) with application to the 3D shapes of metamorphic core complexes (Le Pourhiet et al., 2012) or oceanic plateau subduction (Arrial and Billen, 2013), use the stabilised Q 1 × Q 1 method. Finally the ADELI code was coupled to a stabilised Q 1 × Q 1 flow solver in the context of lithosphere-asthenosphere interaction studies (Cerpa et al., 2014(Cerpa et al., , 2015(Cerpa et al., , 2018. The availability of all of these options leads us to the main question of this paper: Which element should one use in geodynamics computations based on the Stokes equations?, or, in the absence of clear-cut conclusions, Which ones should not be used? On the face of it, this seems like a simple question: The consensus in the computational science community is that using moderately high degree elements (say, k = 3 or k = 4) yields the best accuracy for a given computational effort (measured 70 in CPU cycles), unless one wants to change the solver technology to use matrix-free methods where even higher polynomial degrees become more efficient. This conclusion is based on the higher convergence order of higher-degree methods, but balanced by the rapidly growing cost of matrix assembly and linear solver effort for higher-degree methods. On the other hand, the recommendation to use higher-degree methods is predicated on the assumption that the solution is smooth enough -say, the velocity is in the Sobolev space H k+1 of functions that have, loosely speaking, at least k + 1 derivatives -so that one 75 can actually achieve a convergence rate of O(h k ) in the energy norm and O(h k+1 ) in the L 2 norm where h is the mesh size.
This assumption generally requires that all coefficients, such as density and viscosity, are sufficiently smooth on length scales resolvable by the mesh. This may not be the case in realistic geodynamics problems given that density and viscosity often depend discontinuously on the solution variables (velocity or strain rate, pressure, temperature, and compositional variables); indeed, in many models, the viscosity may vary by orders of magnitude on short length scales.

80
Such considerations put into question whether higher order methods are really worth the effort for actual geodynamics simulations. Given these divergent theoretical thoughts, the only way to resolve the question is by way of numerical comparisons. We have consequently extended ASPECT so that it can use all of the element combinations above, and will use these implementations in the comparisons in this paper.
Goals of this paper. Having outlined the conflict between the expected superiority of higher-degree elements for the Stokes 85 equation on the one hand, and the expected lack of smoothness of solutions in realistic geodynamic cases, our goals in the paper are as follows: 1. To quantitatively compare the solution accuracy of the various options (Q 1 ×P 0 , Q k ×Q k−1 , Q k ×P −(k−1) and stabilised Q 1 × Q 1 ) using a variety of analytical benchmarks for which the exact solution is known. As we will see below, there is little point working with k > 2 in geodynamics applications, and so the only cases we consider for Taylor-Hood-like elements are Q 2 × Q 1 and Q 2 × P −1 .
2. To extend these numerical comparisons to cases where it is known that the stabilised Q 1 × Q 1 demonstrates problematic behaviour that may make it unusable in many practical situations. In particular, we will consider the case of buoyancydriven flows.
3. To conclude our considerations by comparing the available options using a realistic geodynamical application. This will 95 allow us to draw conclusions as to what element one might want to recommend for geodynamics applications.
While we have approached this study with an open mind and without a strong prior idea which element might be the best, let us end this introduction by noting that members of the crustal dynamics and mantle convection communities have occasionally expressed a dislike of the stabilised Q 1 × Q 1 element for its inability to deal with large lithostatic pressures and free surfaces absent special modifications of the formulation. For example, Arrial and Billen (2013) comment on the need to modify the 100 physical description of the problem due to the stabilisation (with references replaced by ones listed at the end of this paper): All the models were run with the open source code Gale. [...] Gale uses Q1-Q1 elements to describe the pressure and the velocity. However, this formulation is unstable and a slight compressible term is added in the divergence equation to stabilise it (Dohrmann and Bochev, 2004). Ideally, this term should be applied on the dynamic pressure and not on the full pressure. To fix this, a hydrostatic term corresponding to the reference density and temperature 105 profile, is subtracted from the full pressure and the body force vector.
Few other negative comments concerning the Q 1 × Q 1 element appear on record in the published literature, although one can find the following quote in Lehmann et al. (2015) : We do not consider the Q 1 ×Q 1 /stab element (Dohrmann and Bochev, 2004;Bochev et al., 2006;Burstedde et al., 2009), as stabilisation of this element is achieved by introducing an artificial compressibility that dominates for 110 flows mainly driven by buoyancy variations (May et al., 2015). In geophysical flow models this yields unphysical pressure artifacts for cases where both the free surface of the Earth and mantle flow are considered, because the driving density contrast between cold sinking plates and the warmer surrounding Earth's mantle is much smaller than the density difference between rocks and air (Kaus et al., 2010;Popov and Sobolev, 2008;Mishin, 2011). In our experience, this results in artificial "compaction" of the Earth's mantle if Q 1 × Q 1 /stab element is used, which 115 makes them unsuitable for these purposes.
Indeed, our numerical experiments will encounter a similar issue, see Section 6.
We are not aware of any other significant publications in the geodynamics literature that specifically discuss the relative tradeoffs between the elements we consider herein, specifically between the Q 1 × P 0 and Taylor-Hood elements, and consequently believe that our discussions here are useful for the community.

120
For the purpose of this paper, we are concerned with the accurate numerical solution of the incompressible Stokes equations: where η is the viscosity, ρ the density, g the gravity vector, ε(·) denotes the symmetric gradient operator defined by ε(u) = 125 1 2 (∇u + ∇u T ), and Ω ⊂ R d , d = 2 or 3 is the domain of interest. Both the viscosity η and the density ρ will, in general, be spatially variable; in applications, this is often through nonlinear dependencies on the strain rate ε(u) or the pressure, but the exact reasons for the spatial variability are not of importance to us here: What matters is that these coefficients may vary strongly and on short length scales.
In applications, the equations above will be augmented by appropriate boundary conditions and will be coupled to additional 130 and often time dependent equations, such as ones that describe the evolution of the temperature field or of the composition of rocks (see, for example, Schubert et al. (2001); Turcotte and Schubert (2012)). This coupling is also not of interest to us here.
3 Discretisation using finite element methods

Formulation and basic error estimates
For the comparisons we intend to make in this paper, equations (1)-(2) are discretised using the finite element method. A 135 straightforward application of the Galerkin method yields the following finite-dimensional variational problem: For simplicity, we have omitted terms introduced through the treatment of boundary conditions. The finite-dimensional, piecewise polynomial spaces U h and P h can be chosen 140 in a variety of ways, as discussed in the introduction. In particular, if they are chosen as U h = Q k and P h = Q k−1 -i.e., the Taylor-Hood element -then the discrete problem is known to satisfy the LBB condition and the solution is stable (Elman et al., 2014). Here, Q s is the space of continuous functions that are obtained on each cell K of a mesh T by mapping polynomials of degree at most s in each variable from the reference cell [0, 1] d . Likewise, the problem is stable if one chooses U h = Q k and P h = P −(k−1) where now P −s is the space of discontinuous functions obtained by mapping polynomials of total degree 145 at most s from the reference cell. In both of these cases, we expect from fundamental theorems of the finite element method (see, for example, Elman et al. (2014)) that the convergence rates are optimal, i.e., that the errors satisfy the relationships where h is the maximal diameter over all cells in the mesh T.

150
On the other hand, if one chooses U h = Q 1 and P h = P 0 , i.e., the unstable Q 1 ×P 0 element with piecewise linear, continuous velocities and piecewise constant, discontinuous pressure, then the best convergence rates one can hope for would satisfy the following relationships, based solely on interpolation error estimates:

155
In practice, if the numerical solution shows pressure oscillations (see for instance Sani et al., 1981a, b), one will not even observe the rates shown above, but might in fact obtain a worse pressure convergence rate, for example p−p h L2 = O(h 1/2 ).
Finally, if one uses U h = Q 1 and P h = Q 1 , then this unstable element combination can be made stable if one replaces the discrete formulation (3) by the following, stabilised version due to Dohrmann and Bochev (2004): Here, I is the identity operator and π 0 is the projection onto piecewise constant functions -i.e., π 0 f is the function that on each cell is equal to the mean value of f on that cell. For this element, the rates one might hope for are as follows (see again Dohrmann and Bochev (2004)): 165 Dohrmann and Bochev (2004) report that for some test cases, one might in fact obtain p−p h L2 = O(h t ) with t ≈ 1.5, though it is not clear whether this rate can be obtained for all possible applications. We also observe this improved rate in one of our benchmarks in Section 5.

A closer look at the error estimates
A comparison of (4) with (5) and (7) would suggest that the Taylor-Hood element can obtain substantially better rates of 170 convergence if one only chooses the polynomial degree k large enough.
However, this is an incomplete understanding because the O(h m ) notation hides the fact that the constants in this behaviour depend on the solution. More specifically, a complete description of the error behaviour would replace (4) by the following statement: There exist constants C 1 , C 2 , C 3 < ∞ so that The validity of these statements clearly depends on the solution to be regular enough so that ∇ k+1 u and ∇ k p actually exist and are square integrable -in other words, that u ∈ H k+1 and p ∈ H k . On the other hand, all that is guaranteed by the existence theory for partial differential equations is that u ∈ H 1 and p ∈ L 2 = H 0 ; any further smoothness should only be expected if, for example, the domain Ω is convex, and if viscosity η and right hand side ρg are also smooth. Indeed, this is the case for many 180 artificial benchmarks where these functions are chosen a priori; on the other hand, in "realistic" geodynamics applications, one might expect η and ρ to be discontinuous at phase boundaries, and potentially vary widely. In such cases, one needs to accept that the solutions only satisfy u ∈ H q and p ∈ H q−1 with q ≥ 1 but possibly q < k + 1. Numerical analysis predicts that in such cases, the best case rates in (8) will be replaced by the following: Similar considerations apply for the Q 1 × P 0 and the stabilised Q 1 × Q 1 combinations, where a closer examination yields the following rates that would replace (5) and (7): 190 In other words, we will only benefit from the added expense of the Taylor-Hood element with k ≥ 2 if the solution is sufficiently smooth, namely if at least q > k ≥ 2. The question whether indeed q > 2 for a given situation is one of PDE theory and difficult to answer in general without using particular knowledge of η, ρg, and Ω. On the other hand, one can observe convergence rates experimentally for a number of cases of interest, and so in some sense, it would be a legitimate question to ask "What is the regularity index q of typical solutions in geodynamics applications?" At the same time, this requires careful 195 convergence studies on problems that are typically quite challenging to solve already on any reasonable mesh, let alone several further refined ones. As a consequence, we can not answer this question in the generality stated above. Instead, we will below approach it by considering a number of benchmarks that illustrate typical features of geodynamic settings in an abstracted way (in Section 5), followed by a model application (in Section 6).
Before delving into the details of numerical experiments, let us consider one other theoretical aspect. An interesting complication of geodynamics simulations compared to many other applications of the Stokes equations is that the hydrostatic component of the pressure is often vastly larger than the dynamic pressure, even though only the dynamic component is responsible for driving the flow. As we will discuss in the following, this has no importance when using the Q 1 × P 0 or the Taylor-Hood elements, but turns out to be rather inconvenient when using a stabilised formulation that contains an artificial compressibility 205 term. This issue is also mentioned in the quote from Arrial and Billen (2013) reproduced in the introduction, and in May et al. (2015).
To illustrate the issue, consider the force balance equation (1). We can split the pressure into hydrostatic and dynamic components, p = p s + p d where we define the hydrostatic pressure via the relationship 210 coupled with the normalisation that p s = 0 at the top of the domain. In defining p s this way, we have made the assumption that the vertical component g z of the gravity vector dominates its other components. Furthermore, we have introduced a reference density ρ ref that somehow reflects a depth-dependent profile. As we will discuss below, there is really no unique or accepted way to define this profile, though one should generally think of it as capturing the bulk of the three-dimensional variation in the density via a one-dimensional function.

215
By splitting the pressure in this way, (1) can then be rewritten as follows: in Ω.
Since this is the only equation in which the pressure appears, it is obvious that the velocity field so computed is the same whether or not one uses the original formulation solving for u, p, or the one solving for u, p d . More concisely, the observation shows that the velocity field so computed does not depend on how one chooses the reference density ρ ref . The original formulation 220 is recovered by using the simplest choice, ρ ref = 0. As a consequence, many geodynamics codes use formulations that only compute the dynamic pressure p d , using a reference density ρ ref (z). Importantly however, there is no canonical way for this definition: one might choose a constant reference density, a depth-dependent adiabatic profile, or one computed at each time step by laterally averaging the current three-dimensional density field ρ(x, y, z, t); each of these options -and likely morehave been used in numerical simulations one can find in the literature. In any case, pressure-dependent coefficients such as the 225 density or viscosity are then evaluated by using p s + p d where p d is computed as part of the solution of the Stokes problem and p s is the hydrostatic pressure defined by (11) using the particular choice of reference density used by a code. On the other hand, the ASPECT code notably always computes the full pressure instead of splitting it in hydrostatic and dynamic components (see the discussion in Kronbichler et al. (2012)), corresponding to the particular choice ρ ref = 0.
The problem with the stabilised Q 1 × Q 1 formulation -different from the use of the other element choices -is that the 230 velocity field computed from the Stokes solution is not independent of the choice of the reference density. This is because the mass conservation equation is modified by the stabilisation term and -in the simple case of a constant viscosity -reads Here, Π = (I − π 0 ) is the operator that corresponds to the stabilisation term in (6). 1 The point of these considerations is that different choices of ρ ref (including the choice ρ ref = 0 that leads to the original 235 formulation) do have an effect here because they lead to different p d = p − p s for which Πp d is different: that is, the amount of artificial compressibility depends on the splitting of the pressure into static and dynamic pressures. In other words, the discretisation errors u − u h L2 and ∇(u − u h ) L2 discussed in the previous section will in general depend on the choice of the reference density profile, and the latter will need to be carefully defined in order to lead to acceptable error levels. As we will show in the benchmarking section, the specific choice of ρ ref in fact has a rather large effect. This is in line with the 240 previously quoted comments in Arrial and Billen (2013).
Let us end this section by commenting on two aspects why this issue may not be as relevant in other contexts in which stabilised formulations have been used. First, in many important applications of the Stokes equations, the flow is not driven by buoyancy effects but by inflow and outflow boundary conditions (e.g. Turek, 1999;Zienkiewicz and Taylor, 2002). Indeed, in those conditions both the density and the gravity vector is generally considered spatially constant, and the choice of reference 245 density and hydrostatic pressure is then obvious and unambiguous. In these cases, computations are always performed with only the dynamic pressure because the hydrostatic pressure does not enter the problem at all except in the rare cases of fluids with pressure-dependent viscosities.
Second, while we have here considered the stabilisation first introduced in Dohrmann and Bochev (2004), earlier stabilised formulations used a pressure Laplacian in place of the operator Π above. (See, for example, Brezzi and Pitkäranta (1984) or 250 the variation in Silvester and Kechkar (1990), as well as the analysis in Bochev et al. (2006).) That is, instead of (12) they used a formulation of the form where c is a tuning parameter that also incorporates the viscosity. If one uses this formulation for cases in which the reference density is chosen as a function that is constant in depth -as was often done in earlier mantle convection codes considering  Of course, whether one uses the Dohrmann-Bochev formulation (12) or the addition of a pressure Laplace as in (13), the formulation is consistent. That is, as the mesh size h goes to zero, the added stabilisation term also goes to zero. In the limit, the numerical solution therefore satisfies the original mass conservation equation. In other words, the limit is independent of the choice of ρ ref , even though the solutions on a finite mesh are not.

Numerical results for artificial benchmarks 265
In this section, let us present computational results for three analytical problems and a buoyancy-driven flow community benchmark. While the first of these (Section 5.1) is simply used to establish the best convergence rates one can hope for in the case of smooth solutions, the remaining test cases were chosen because they illustrate aspects of what we think how "typical" solutions of geodynamic applications look like in an abstracted, controlled way. In particular, the "SolCx" benchmark in Section 5.2 demonstrates features of solutions in which the mesh can be aligned with sharp features in the viscosity, and the 270 "SolVi" benchmark in Section 5.3 does so in the more common case where the mesh can not be aligned. Finally, the "sinking block" case in Section 5.4 shows a buoyancy-driven situation in which all of the discussions of the previous section on the choice of a reference density will come to play. All of these cases are simple enough that we know (quantitative or qualitative features of) the solution to sufficient accuracy to investigate convergence rigorously.
While these benchmarks provide us with insight that allows us to conjecture which elements may or may not work in practical 275 application, they still are just abstract benchmarks. As a consequence, we will consider an actual geodynamic application in Section 6.
All models are run with the ASPECT code. We have limited ourselves to two-dimensional cases as we do not expect that three-dimensional models would shed any more light on the reached conclusions. Although ASPECT is built for adaptive mesh refinement (AMR), we have chosen not to use this feature in order to reflect that the majority of existing codes use structured 280 meshes.

The Donea & Huerta benchmark
Let us start our numerical experiments with the simple 2D benchmark presented in Donea and Huerta (2003). The exact definition involves lengthy formulas not worth repeating here, but in short it consists of the following ingredients: (i) The domain is a unit square; (ii) the viscosity and density are set to 1; (iii) velocity and pressure fields are chosen to correspond to 285 smooth polynomials describing circular flow with no-slip boundary conditions. We then choose an (unphysical) gravity vector field that produces these velocity and pressure fields. This set up produces the smooth solution shown in Fig. 1 for which we would expect that the higher-order Taylor-Hood element is highly accurate.
We verify this in Fig. 2 for the four element choices of interest in this work: Looking at the velocity error, we recover a cubic convergence rate (q = 3) for the Q 2 × Q 1 and Q 2 × P −1 elements, and a 290 quadratic convergence rate (q = 2) for those choices using the Q 1 elements for the velocity. The pressure error is of linear rate for the Q 1 × P 0 element and of quadratic rate for the Q 2 × Q 1 and Q 2 × P −1 elements. All of these are as expected. For the stabilised Q 1 ×Q 1 , we obtain the better-than-expected rate of 1.5 already mentioned in Dohrmann and Bochev (2004), see also Section 3.

295
Again, the second-order elements are more accurate.
These results are not surprising: The solution is smooth, and consequently one would expect to obtain optimal order convergence in all cases. One can carry out similar experiments for the SolKz benchmark (Zhong, 1996), which also has a smooth solution; we have obtained identical error convergence rates.
Finally, we also investigate the cost associated with solving this problem using the various elements. Fig. 3 shows the number 300 of outer iteration of the Stokes solver as a function of the mesh size. This number is nearly constant with increasing resolution for the stable or stabilised elements while it becomes exceedingly large for the unstable Q 1 × P 0 element, reflecting the fact that lack of LBB stability corresponds to the smallest eigenvalue of the system matrix tending to zero -and thereby driving the condition number to infinity. Indeed, our linear solver does not converge in the 1000 iterations we chose as a limit for the smallest mesh sizes.

The SolCx benchmark
The SolCx benchmark is a common benchmark found in many geodynamical papers (e.g. Zhong, 1996;Duretz et al., 2011;Kronbichler et al., 2012;Thielmann et al., 2014). It uses a discontinuous viscosity profile with a large jump in the viscosity value along the middle of the domain, resulting in a discontinuous pressure field. The domain is a unit square, boundary conditions are free-slip on all edges, and the gravity vector points downwards with |g| = 1. The density for SolCx is given by 310 ρ(x, y) = sin(πy) cos(πx) and the viscosity field is such that 10 6 if 0.5 < x ≤ 1.   We show the velocity and pressure fields in Fig. 4. The discontinuous jump of the viscosity field by a factor of 10 6 results in separate convective cells on the left and right sides of the domain, though with vastly different strengths. The pressure also reflects this disjoint behaviour.

315
As in the Donea & Huerta benchmark, we compute the velocity and pressure error convergence for all four elements.
Those are shown in Fig. 5. As documented in Kronbichler et al. (2012), the second order element with discontinuous pressure Q 2 ×P −1 performs better (pressure error convergence is O(h 2 )) than its continuous pressure counterpart Q 2 ×Q 1 (convergence is only O(h 1/2 ), but the better convergence order with the discontinuous pressure can only be obtained if the discontinuity in the viscosity is aligned with cell boundaries -which is the case here. Also of interest here is the fact that the Q 1 × P 0 320 outperforms the Q 1 × Q 1 element for both velocity and pressure. All of these observations are readily explained by the fact  that a discontinuous pressure can only be approximated well when using discontinuous pressure elements with cell interfaces aligned with the discontinuity in the viscosity.

The SolVi (circular inclusion) benchmark
The SolCx benchmark in the previous section allows for aligning mesh interfaces with the discontinuity in the viscosity. This 325 is an artificial situation that will, in general, not happen in actual geodynamics applications where the interfaces between materials may be at arbitrary locations and orientations in the domain, and may also move with time. An example is the simulation of a cold, subducting slab (with correspondingly large viscosity) surrounded by hot, low-viscosity mantle material.
Consequently, it is worth considering a situation in which it is impractical to align mesh and viscosity interfaces. This is done by the SolVi inclusion benchmark which solves a problem with a viscosity that is discontinuous along a circle. This in turns 330 leads to a discontinuous pressure along the interface which is difficult to represent accurately. Using the regular meshes used by A characteristic of the analytic solution is that the pressure is zero inside the inclusion, while outside it follows the relation 340 where η i = 10 3 is the viscosity of the inclusion and η m = 1 is the viscosity of the background medium, r = x 2 + y 2 and θ = arctan(y/x), and˙ = 1 is the applied strain rate if one were to extend the domain to infinity. The formula above makes it clear that the pressure is discontinuous along the perimeter of the disk, with the jump largest at θ = 0, ± π 2 , π. Deubelbeiss and Kaus (2008) thoroughly investigated this problem with various numerical methods (FEM, FDM), with and without tracers, and conclusively showed how various schemes of averaging the density and viscosity lead to different results. element). The observation that the none of the elements reach their optimal convergence rate also supports our decision, briefly mentioned in the "Goals of this paper" part of the Introduction, to not further investigate higher-order Taylor-Hood elements We know from experiments such as the current one that these elements will not yield 355 better convergence orders despite their additional cost.
Since harmonic averaging yields the lowest errors we select this averaging and now turn to the pressure field for all elements as shown in Fig. 8. We find that the recovered pressures on the line y = 1 follow the analytical solution outside of the inclusion but are less accurate inside the inclusion where it should be identically zero (Fig. 9).

The sinking block 360
As discussed in Section 4, the stabilised Q 1 × Q 1 element is sensitive to the choice of a reference density profile as not only the computed pressure, but also the computed velocity field depends on this choice. This is only relevant for buoyancy-driven flows, but because none of the benchmarks shown previously are driven by buoyancy effects in the presence of a background In a geodynamical context, the block could be interpreted as a detached slab (δρ > 0) or a plume head (δρ < 0). As such its viscosity and density can vary (a cold slab has a higher effective viscosity than the surrounding mantle while it is the other way 375 around for a plume head). The block density difference δρ can then vary from a few to several hundred kg m −3 to represent a wide array of scenarios. As shown in Appendix A.2 of Thieulot (2011), one can independently vary η 1 , ρ 2 , η 2 , and measure |v z | for each combination: the quantity ν = |v z |η 1 /δρ is then found to be a simple function of the ratio η = η 2 /η 1 : at high enough mesh resolution all data points collapse onto a single line.
In the following, we will denote by "Method #1" the approach where we do calculations with the density field as specified 380 above. "Method #2" consists of a 'reduced' density field from which the quantity ρ 1 has been uniformly removed so that the block has a density δρ while the surrounding fluid has zero density. As discussed above, the two choices will result in different pressure, but the same velocity fields.
When using the full density, we see that all elements, with the exception of the stabilised Q 1 × Q 1 element, yield results which align on a single curve on the plots once sufficient resolution is reached. We find that measurements pertaining to a given resolution but different δρ are always collapsed onto a single line. Worth noticing is the Q 2 × P −1 element whose results seem to be the least resolution dependent. On the other hand the stabilised Q 1 × Q 1 element yields very anomalous results which are 390 orders of magnitude off at all resolutions, especially for η 1 /η 2 1. In addition, we find that for this element, the value of δρ strongly affects the measurements, as expected based on the discussions in Section 4; as a result, the curves for the same mesh resolution but different δρ 2 no longer coincide (see Fig. 10b).
When reduced densities are used results are unchanged for the stable elements (only Q 2 × Q 1 results are shown in Fig. 10e), and the results for the stabilised Q 1 × Q 1 results are substantially improved. For values η 1 /η 2 < 1 we see that all results align 395 on the expected curve but this is far from true for η 1 /η 2 1 even at high resolution.
In Fig. 11 we show the velocity field in the case η = 10 −4 (i.e. the viscosity of the block is 10,000 times smaller than the surrounding mantle) and δρ = 8kg m −3 . When the Q 2 × Q 1 element is employed in conjunction with Method 1 we see in c) Velocity field obtained with stabilised Q1 × Q1 with full density (left) and stabilised Q1 × Q1 with reduced density (right). Fig. 11a that the velocity field is strongest inside the block with a maximum value of about 5mm yr −1 in its center. We see that the Q 2 × Q 1 and Q 2 × P −1 elements yield near identical results (Fig. 11b) so we consider this to be the correct solution 400 of the physical experiment. The same setup with the stabilised Q 1 × Q 1 (left half of Fig. 11c) yields a velocity field that is also maximal in the middle of the block but nearly 1000 times larger in amplitude. If we now switch to Method 2 (right half of Fig. 11c) the amplitude of the velocity is reduced by two orders of magnitude but it is still much too large compared to the true solution.
These observations illustrate the unreliable nature of the results obtained with stabilised Q 1 × Q 1 elements in the context of 405 buoyancy-driven flows. Looking at Fig. 10f we see that increasing the resolution to 512 × 512 or 1024 × 1024 would probably yield the expected curve but such resolutions are intractable in three dimensions and better results can be obtained at much lower resolutions with other elements.
Finally, in Fig. 12 we plot the normalised pressure p = p/(δρ g L b ) at the center of the block (where L b is the size of the block), as a function of the viscosity ratio η in the case where a reduced density field is used. For the Q 2 ×Q 1 and the stabilised 410 Q 1 × Q 1 elements, the pressure at this point is uniquely defined since the elements have continuous pressures. For the other two elements the pressure is discontinuous across element edges and it is therefore not uniquely defined at our measurement point. We have then chosen to measure it at four locations corresponding to (x c ± δx, y c ± δy) where δx = δy = 0.1m, and show the normalized pressures at all four of these locations in the figure. For the Q 2 × P −1 element, the difference between these values is negligible, but not so for the Q 1 × P 0 for which the pressure is a stair-step function with very different values 415 depending on which step an evaluation point is on. The distance between the two lines for the Q 1 × P 0 element decreases with mesh refinement (indicating convergence of the pressure to the true value), but only slowly and, matching the observation in Section 5.1, at the cost of not only a fine mesh but also very large numbers of linear solver iterations.
In addition to the slow convergence of the Q 1 × P 0 element, the most striking conclusion of this benchmark is that for buoyancy-driven flows, the solution obtained using the stabilised Q 1 × Q 1 element on typical meshes does not only strongly Figure 12. Sinking block benchmark. Normalised pressure p/(δρ g L b ) in the center of the block as a function of the viscosity ratio η . These computations use a 256×256 mesh and the reduced density. For the Q1×P0 and Q2×P−1 elements with their discontinuous pressure spaces, we show the normalized pressures at several slightly displaced points (xc ± δx, yc ± δy). For the Q2 × P−1 element, the difference is not visible, but for the Q1 × P0 this yields the two very different red curves; this is due to the fact that the pressure for this element forms a stair-step function for which two of the evaluation points are on a lower and two on a higher step.
depend on the choice of the otherwise arbitrary reference density, but is also almost entirely unreliable even on meshes that are already quite fine.

Numerical results for a model application
While the previous sections have built our intuition for which element may actually work in the context of geodynamics applications, they have only done so through abstract and idealised benchmarks. It is therefore interesting to investigate what one 425 would find in more realistic setups, and consequently we have also investigated convergence for a situation still sufficiently simple that numerical simulations can reach reasonably high accuracy, but that has more of the complexity one would generally find in "real" simulations. Given that the previous examples have highlighted that the stabilised Q 1 × Q 1 element has difficulties with the pressure approximation, we are specifically interested in a situation where the material behaviour is pressure dependent.

430
To this end, we here consider an example of continental extension. The setup is similar to ones that can be found in Huismans and Beaumont (2002); Jammes and Huismans (2012); Naliboff and Buiter (2015); Brune et al. (2017), and we specifically use the one that can be found in the "continental extension" cookbook of the manual of the ASPECT code (Bangerth et al., 2021).
where A is a material constant, n is an index typically between 3 and 4, Q is the activation energy, V is the activation volume, R 440 the gas constant, T the temperature andε is the effective strain rate (the square root of the second invariant of the corresponding tensor). Stresses are limited plastically at a yield stress σ y = C cos(φ) + P sin(φ) via a Drucker-Prager criterion where C is the cohesion and φ the angle of friction. We use distinct values for some of these parameters in the initially 20 km-thick upper crust (wet quartzite), an initially 10 km-thick lower crust (wet anorthite), and the mantle (dry olivine) which initially occupies the remaining 70 km in depth. Deformation is seeded by a weak area within the mantle lithosphere. We only carry out a single 445 time step as obtained with a CFL number of 0.5.
A complete and concise description of this setup has more parameters than are worth spelling out in detail here. For a detailed description, see Naliboff and Buiter (2015) and the section of the ASPECT manual along with the corresponding input files. For the purposes of this paper, the important part is that both the yield stress and the dislocation creep rheology depend on the pressure; as a consequence, we can anticipate that elements that result in poor pressure accuracy may not yield accurate 450 simulations in general.
This setup produces localised shear zones that accommodate the majority of the deformation. Fig. 13 illustrates the structure of the resulting solution. Each panel of the figure shows in its left half the solution produced by the stabilised Q 1 × Q 1 element and its right half that produced by the Taylor-Hood Q 2 ×Q 1 element. Because the solution is symmetric, the two halves should be mirror images. It is, however, clear from several of the panels that this is not the case: the Q 1 × Q 1 element produces large 455 artifacts at depth where the pressure is large and the pressure-dependence of the material strong.
This effect is also demonstrated in a different way in Fig. 14 where we show laterally averaged quantities for the different elements and different mesh resolutions. Even though it is clear from Fig. 13 that lateral averaging should result in a better approximation (than pointwise evaluations) of the correct quantities for a given depth, Fig. 14 shows that even the average is far from correct. On the other hand, the figure shows that with increasing mesh resolution, the solutions produced by the 460 Q 1 × Q 1 seem to converge to the solutions generated by the other elements -albeit very slowly and at what one might consider an unacceptable cost.
To investigate the origin of these convergence problems of the Q 1 ×Q1 element, one should recall that the model is nonlinear.
As a consequence, the artifacts may be related to the discretisation, or to a failure of the nonlinear iteration -and the two may be connected. All of the solutions we show were taken after 100 Picard iterations to resolve the nonlinearity of the model, with 465 nonlinear convergence shown in Fig. 15. (One could accelerate convergence by using a Newton solver (Fraters et al., 2019), but this is not relevant for the work herein.) Looking at the evolution of the nonlinear residual during these iterations, we see that it decreases quickly and for most element choices then plateaus at about 10 −5 relative to the starting residual. In contrast, for the stabilised Q 1 × Q 1 element, increasing the mesh resolution yields lower nonlinear residuals -but even on the finest mesh, the nonlinear residuals are still substantially worse than for any of the other elements, with no apparent progress after     Our interpretation of this experiment is that the inability of the Q 1 × Q1 element to generate accurate pressure fields leads to values for the pressure-dependent rheology that are so far away from their correct values -and, indeed, from the values on nearby cells -that they greatly increase the condition number of the linear systems that have to be solved in each nonlinear iteration. The resulting difficulty of solving these Picard steps accurately then affects the speed with which the nonlinear 475 residual is reduced by the Picard iteration, to the point where the condition number is so large that no convergence can be achieved any more. Only mesh refinement, with the attendant increased accuracy of the pressure solution (and, consequently, a more accurate viscosity) helps in restoring the ability to actually solve this problem to small nonlinear residuals.

Conclusions
In this contribution, we have provided a side-by-side comparison of the most widely used quadrilateral finite elements. As 480 outlined in the introduction, most finite element solvers used in the geodynamics community rely on one or the other of these.
At the same time, we are not aware of a comprehensive comparison of their relative strengths -or their weaknesses, as they may be.
Using the artificial, linear benchmarks discussed in Section 5, we can infer that when the solution is smooth, the Taylor-Hood variations Q 2 × Q 1 and Q 2 × P −1 provide far better accuracy than the lower-order elements Q 1 × P 0 and the stabilised Q 1 × Q 1 . This advantage is largely lost when one considers problems in which the viscosity is discontinuous. Since we believe that the real Earth has relatively narrow phase transition zones where the viscosity may jump by large factors, benchmarks like the SolVi one in Section 5.3 are relevant and illuminate important aspects.
From these considerations, one may conclude that the Taylor-Hood variations are too expensive -in terms of their number of degrees of freedom and the attendant memory and CPU time cost. However, we believe that this is not so:

490
-For buoyancy-driven flows such as the sinking block benchmark in Section 5.4, the stabilised Q 1 ×Q 1 element is largely unable to reproduce the correct solution and, furthermore, depends on using a formulation in which one subtracts a reference density from the actual density; this is equivalent to defining a hydrostatic pressure profile and only attempting to solve for the "dynamic" component of the pressure. Crucially, however, there are many ways of defining such a reference density, neither of which is canonical and "obviously right" in complex mantle convection simulations. Since 495 the solution obtained with the stabilised Q 1 ×Q 1 element strongly depends on the specific choice of reference density, we conclude that the element cannot be made robust for the kinds of flows we encounter in real mantle convection situations.
We have also verified this assertion using an application where we consider continental extension (Section 6), and where the inability to produce accurate pressure solutions also greatly affects the convergence of the nonlinear solver, to the point where the computed solution must be considered unusable. We have shown that these errors can be reduced when 500 choosing very fine meshes, but the attendant cost is unacceptable when compared with that of using other elements on far coarser meshes.
There are other considerations to believe that the procedure of trying to subtract a reference density (or a hydrostatic pressure) can not be a successful strategy. For example, simulations of free or deformable surfaces (at the Earth surface as well as at the core-mantle boundary) require accurate knowledge of the total pressure. This is true for coupled for-505 mulations of flow and surface deformation (Rose et al., 2017) and approaches such as the "sticky air" method (Crameri et al., 2012). But similar considerations also apply to nonlinear material laws in which the pressure enters the viscosity or, more commonly, phase computations that determine the density and other thermodynamic material properties from the pressure and the temperature. Indeed, one could conjecture that the stabilised Q 1 × Q 1 element would also fail for compressible Stokes simulations, though we have not verified this here. 510 We conclude from these thoughts that the stabilised Q 1 × Q 1 element is not a viable choice for mantle convection simulations. It is important to point out that the cases we consider as crucial here -buoyancy-driven flows, large hydrostatic pressures, and pressure-dependent rheologies -are uncommon on most of the engineering applications for which the Q 1 × Q 1 was originally developed; as a consequence, it is not surprising that what we find here contradicts substantial parts of the engineering literature where the element remains widely used.

515
-We believe that the Q 1 × P 0 element is also not a viable choice. As shown by several of the analytical benchmarks, the errors that result from using this element can be orders of magnitude larger than the corresponding errors that result from the Taylor-Hood-type elements. This is no longer the case once we consider discontinuous viscosity profiles (see Section 5.3), but this element is also unable to accurately solve the buoyancy-driven case discussed in Section 5.4. has limited its use in combination with iterative methods: because of the corresponding condition number increase, the number of iterations is found to grow in a somewhat unpredictable manner with an increase in resolution. This may explain why, despite the Citcom codes' success over two decades with studies based on models counting up to ∼100M elements on several hundreds of processors (e.g. Jadamec and Billen, 2012), the current generation of massively parallel codes relies on either stable (Kronbichler et al., 2012;May et al., 2015) or stabilised elements (Burstedde et al., 2013), 525 or uses the finite difference method (Kaus et al., 2016).
In summary, we think that the Taylor-Hood variations Q 2 ×Q 1 and Q 2 ×P −1 present the best compromise for robust mantle convection and crustal dynamics simulations. This is not because these elements are "obviously better" than the others, but due more to a "last man standing" argument: The other choices simply disqualified themselves by failing to provide adequate accuracy in one situation or another. At the same time, the lack of regularity one expects of typical scenarios also implies 530 that we should not expect higher-order Taylor-Hood elements Q k+1 × Q k or Q k+1 × P −k with k > 2 to provide substantially better accuracy compared to their much higher computational cost. Although we have only shown results for two-dimensional simulations, experience -including the experience with the ASPECT code used here that solves two-and three-dimensional problems within the same framework -suggests that all of these considerations would also apply to the three-dimensional (hexahedral) analogs of the ones we have used.

535
Of course, the choices we have considered here are not the only ones. One could, for example, consider "simplicial" (triangular and tetrahedral) elements instead of the quadrilateral and hexahedral ones we have used here. Indeed, some existing mantle convection codes use this strategy. One successful example is the TERRA-NEO code that uses equal-order linear tetrahedra Weismüller et al., 2015) stabilised by means of a pressure-stabilisation approach based on the addition of linear least-squares terms (the "PSPG" approach, see Brezzi and Douglas (1988); Elman et al. (2014)); other examples include 540 Fluidity (Davies et al., 2011) andLaCoDe (de Montserrat et al., 2019). While we have not evaluated simplicial elements, one might conjecture that many of the same conclusions would also hold: The unstable P 1 × P 0 provides low accuracy and is unstable, the stabilised P 1 × P 1 has difficulties with buoyancy-driven flows and large hydrostatic pressures, and the Taylor-Hood element P 2 × P 1 is expensive but robust.
Finally, there are other, more exotic elements one could work with. Examples include the Rannacher-Turek element (Ran-545 nacher and Turek, 1992), the Crouzeix-Raviart element (Crouzeix and Raviart, 1973), or the DSSY element (Douglas et al., 1999). We have not investigated these kinds of choices for four reasons: (i) The manuscript at hand is long enough as it stands, (ii) these elements are not widely used, both within and outside our community, and (iii) many of these elements are difficult to implement in one regard or another, including complications with boundary conditions and with dealing with unstructured and possibly curvilinear cells; finally, (iv) the elements mentioned above are not as widely available or completely implemented in 550 common software frameworks, and their use thus requires substantial additional implementation work.
While we have not investigated these two possible directions for alternatives to the elements we have considered, we think that such studies would be interesting. We hope that our careful choice of test cases might also be useful to such studies.