Convergence and Quasi-Optimality of Adaptive FEM with Inhomogeneous Dirichlet Data

We consider the solution of a second order elliptic PDE with inhomogeneous Dirichlet data by means of adaptive lowest-order FEM. As is usually done in practice, the given Dirichlet data are discretized by nodal interpolation. As model example serves the Poisson equation with mixed Dirichlet-Neumann boundary conditions. For error estimation, we use an edge-based residual error estimator which replaces the volume residual contributions by edge oscillations. For 2D, we prove convergence of the adaptive algorithm even with quasi-optimal convergence rate. For 2D and 3D, we show convergence if the nodal interpolation operator is replaced by the L^2-projection or the Scott-Zhang quasi-interpolation operator. As a byproduct of the proof, we show that the Scott-Zhang operator converges pointwise to a limiting operator as the mesh is locally refined. This property might be of independent interest besides the current application. Finally, numerical experiments conclude the work.

1. Introduction 1.1.Model problem.By now, the thorough mathematical understanding of convergence and quasi-optimality of h-adaptive FEM for second-order elliptic PDEs has matured.However, the focus of the numerical analysis usually lies on model problems with homogeneous Dirichlet conditions, i.e. −∆u = f in Ω with u = 0 on Γ = ∂Ω, see e.g.[13,14,21,23,29].On a bounded Lipschitz domain in Ω ⊂ R 2 with polygonal boundary Γ = ∂Ω, we consider with mixed Dirichlet-Neumann boundary conditions.The boundary Γ is split into two relatively open boundary parts, namely the Dirichlet boundary Γ D and the Neumann boundary Γ N , i.e.Γ D ∩ Γ N = ∅ and Γ D ∪ Γ N = Γ.We assume the surface measure of the Dirichlet boundary to be positive |Γ D | > 0, whereas Γ N is allowed to be empty.The given data formally satisfy f ∈ H −1 (Ω), g ∈ H 1/2 (Γ D ), and φ ∈ H −1/2 (Γ N ).As is usually required to derive (localized) a posteriori error estimators, we assume additional regularity of the given data, namely f ∈ L 2 (Ω), g ∈ H 1 (Γ D ), and φ ∈ L 2 (Γ N ).
Whereas certain work on a posteriori error estimation for (1) has been done, cf.[4,27], none of the proposed adaptive algorithms have been proven to converge.While the inclusion of inhomogeneous Neumann conditions φ into the convergence analysis seems to be obvious, incorporating inhomogeneous Dirichlet conditions g is technically more demanding and requires novel ideas.First, discrete finite element functions cannot satisfy general inhomogeneous Dirichlet conditions.Therefore, the adaptive algorithm has to deal with an additional discretization g ℓ of g.Second, this additional error has to be controlled in the natural trace space which is the fractional-order Sobolev space H 1/2 (Γ D ).Since the H 1/2norm is non-local, the a posteriori error analysis requires appropriate localization techniques.These have recently been developed in the context of adaptive boundary element methods [2,10,11,15,16,20]: Under certain orthogonality properties of g − g ℓ ∈ H 1 (Γ D ), the natural trace norm g − g ℓ H 1/2 (Γ D ) is bounded by a locally weighted H 1 -seminorm h Here, h ℓ is the local mesh-width, and (•) ′ denotes the arclength derivative.Finally, in contrast to homogeneous Dirichlet conditions g = 0, we loose the Galerkin orthogonality in energy norm.This leads to certain technicalities to derive a contractive quasi-error which is equivalent to the overall Galerkin error in H 1 (Ω).In conclusion, quasioptimality and even plain convergence of adaptive FEM with non-homogeneous Dirichlet data is a nontrivial task.To the best of our knowledge, only [24] analyzes convergence of adaptive FEM with inhomogeneous Dirichlet data.While the authors also consider the 2D model problem (1) with Γ D = Γ and lowest-order elements, their analysis relies on an artificial non-standard marking criterion.Quasi-optimal convergence rates are not analyzed and can hardly be expected in general [13].
It is well-known that the Poisson problem (1) admits a unique weak solution u ∈ H 1 (Ω) with u = g on Γ D in the sense of traces which solves the variational formulation Here, the test space reads H 1 D (Ω) = v ∈ H 1 (Ω) : v = 0 on Γ D in the sense of traces , and • , • denotes the respective L 2 -scalar products.
1.2.Discretization.For the Galerkin discretization, let T ℓ be a regular triangulation of Ω into triangles T ∈ T ℓ .We use lowest-order conforming elements, where the ansatz space reads Since a discrete function U ℓ ∈ S 1 (T ℓ ) cannot satisfy general continuous Dirichlet conditions, we have to discretize the given data g ∈ H 1 (Γ D ).According to the Sobolev inequality on the 1D manifold Γ D , the given Dirichlet data are continuous on Γ D .Therefore, the nodal interpoland g ℓ of g is well-defined.As is usually done in practice, we approximate g ≈ g ℓ .Again, it is well-known that there is a unique U ℓ ∈ S 1 (T ℓ ) with U ℓ = g ℓ on Γ D which solves the Galerkin formulation Here, the test space is given by S 1.3.A posteriori error estimation.An element-based residual error estimator for this discretization reads with corresponding refinement indicators where [•] denotes the jump across edges.We prove reliability and efficiency of ρ ℓ (Proposition 2) and discrete local reliability (Proposition 3).Inspired by [26], we introduce an edge-based error estimator ̺ ℓ which reads For an edge E ∈ E ℓ , its local contributions read Here, ω ℓ,E ⊂ Ω denotes the edge patch, and f ω ℓ,E denotes the corresponding integral mean.The advantage of ̺ ℓ is that the volume residual terms |T | 1/2 f L 2 (T ) in ( 6) are replaced by the edge oscillations |ω ℓ,E | 1/2 f − f ω ℓ,E ω ℓ,E , which are generically of higher order.The choice of |E| (g−g ℓ ) ′ 2 L 2 (E) to measure the contribution of the Dirichlet data approximation is influenced by the Dirichlet data oscillations, cf.Section 3.1 below.We prove that ρ ℓ and ̺ ℓ are locally equivalent (Lemma 4) and thus obtain reliability and efficiency of ̺ ℓ (Proposition 5) as well as discrete local reliability (Proposition 6).
1.4.Adaptive algorithm.We use the local contributions of ̺ ℓ to mark edges for refinement in a realization (Algorithm 7) of the standard adaptive loop (AFEM) Our adaptive algorithm use variants of the the well-studied Dörfler marking [14] to mark certain edges for refinement.Throughout, we use newest vertex bisection, and at least marked edges are bisected.Given some initial mesh T 0 , the algorithm generates successively locally refined meshes T ℓ with corresponding discrete solutions U ℓ ∈ S 1 (T ℓ ) of (4).

Main results.
The first main result (Theorem 14) states that the adaptive algorithm leads to a contraction ∆ ℓ+1 ≤ κ ∆ ℓ for all ℓ ∈ N 0 and some constant 0 < κ < 1 (10) for some quasi-error quantity ∆ ℓ ≃ ̺ 2 ℓ which is equivalent to the error estimator.In particular, this proves linear convergence of the adaptively generated solutions U ℓ ∈ S 1 (T ℓ ) to the (unknown) weak solution u ∈ H 1 (Ω) of (2).The main ingredients of the proof are an equivalent error estimator ̺ ℓ ≃ ̺ ℓ for which we prove some estimator reduction for all ℓ ∈ N 0 and some 0 < κ < 1 and C > 0, (11) see Lemma 12, and a quasi-Galerkin orthogonality in Lemma 13, whereas the general concept follows that of [13].
The second main result is Theorem 18 which states that the outcome of the adaptive algorithm is quasi-optimal in the sense of Stevenson [29]: Provided the given data (f, g, φ) ∈ ) and the corresponding weak solution u ∈ H 1 (Ω) of (2) belong to the approximation class the adaptively generated solutions also yield convergence order O(N −s ), i.e.
Here, T N denotes the set of all triangulations T * which can be obtained by local refinement of the initial mesh T 0 such that #T * − #T 0 ≤ N.Moreover, osc T , * , osc D, * , and osc N, * denote the data oscillations of the volume data f , the Dirichlet data g, and the Neumann data φ, see Section 3.1.
The ingredients for the proof are the observation that the proposed marking strategy is optimal (Proposition 15) and the Céa-type estimate for the Galerkin solution U ℓ ∈ S 1 (T ℓ ) in Lemma 17.For 3D, nodal interpolation of the Dirichlet data g ∈ H 1 (Γ) is not well-defined.In the literature, it is proposed to discretize g by use of the L 2 -projection [4] or the Scott-Zhang projection [27].Our third theorem (Theorem 21) states convergence of the adaptive algorithm for either choice in 2D as well as 3D.The proof relies on the analytical observation that, under adaptive mesh-refinement, the Scott-Zhang projection converges pointwise to a limiting operator (Lemma 19), which might be of independent interest.Finally, we stress that the same results (Thm.14,18,21) hold if the element-based estimator ρ ℓ from ( 5)-( 6) instead of the edge-based estimator ̺ ℓ is used and if Algorithm 7 marks certain elements for refinement.
1.6.Outline.The remainder of this paper is organized as follows: We first collect some necessary preliminaries on, e.g., newest vertex bisection (Section 2.2) and the Scott-Zhang quasi-interpolation operator (Section 2.3).Section 3 contains the analysis of the a posteriori error estimators ρ ℓ from ( 5)-( 6) and ̺ ℓ from ( 7)- (8).Moreover, we state the adaptive Algorithm in Section 3.4.The convergence is shown in Section 4, while the quasi-optimality results are found in Section 5. Whereas the major part of the paper is concerned with the 2D model problem, Section 6 considers convergence of AFEM for 3D.Finally, some numerical experiments conclude the work.

Preliminaries
2.1.Notation.Throughout, T ℓ denotes a regular triangulation which is obtained by ℓ steps of (local) newest vertex bisection for a given initial triangulation T 0 .By K ℓ := K Ω ℓ ∪ K Γ ℓ , we denote the set of all interior nodes, respectively the set of all boundary nodes of T ℓ .By E ℓ , we denote the set of all edges of T ℓ which is split into the interior edges We restrict ourselves to meshes T ℓ such that each T ∈ T ℓ has an interior node, i.e. ∂T ∩ K Ω ℓ = ∅.Note, that this is only an assumption on the initial mesh T 0 .We assume that the partition of Γ into Dirichlet boundary Γ D and Figure 1.For each triangle T ∈ T ℓ , there is one fixed reference edge, indicated by the double line (left, top).Refinement of T is done by bisecting the reference edge, where its midpoint becomes a new node.The reference edges of the son triangles T ′ ∈ T ℓ+1 are opposite to this newest vertex (left, bottom).To avoid hanging nodes, one proceeds as follows: We assume that certain edges of T , but at least the reference edge, are marked for refinement (top).Using iterated newest vertex bisection, the element is then split into 2, 3, or 4 son triangles (bottom).

Neumann boundary Γ
) provides a partition of Γ D (resp.Γ N ).For a node z ∈ K ℓ , the corresponding patch is defined by For an edge E ∈ E ℓ , the edge patch is defined by Moreover, for a given node z ∈ K ℓ , denotes the star of edges originating at z. 2.2.Newest vertex bisection.Throughout, we assume that newest vertex bisection is used for mesh-refinement, see Figure 1.Let T ℓ be a given mesh and M ℓ ⊆ E ℓ an arbitrary set of marked edges.Then, denotes the coarsest regular triangulation such that all marked edges E ∈ M ℓ have been bisected.Moreover, we write if T * is a finite refinement of T ℓ , i.e., there are finitely many triangulations T ℓ+1 , . . ., T n and sets of marked edges M ℓ ⊆ E ℓ , . . ., M n−1 ⊆ E n−1 such that T * = T n and T j+1 = refine(T j , M j ) for all j = ℓ, . . ., n − 1.
We stress that, for a fixed initial mesh T 0 , only finitely many shapes of triangles T ∈ T ℓ appear.In particular, only finitely many shapes of patches ( 16)- (17) appear.This observation will be used below.Moreover, newest vertex bisection guarantees that any sequence T ℓ of generated meshes with T ℓ+1 = refine(T ℓ ) is uniformly shape regular in the sense of Further details are found in [31,Chapter 4].
2.3.Scott-Zhang quasi-interpolation and discrete lifting operator.Our analysis below makes heavy use of the Scott-Zhang projection P ℓ : H 1 (Ω) → S 1 (T ℓ ) from [28]: For all nodes z ∈ K ℓ , one chooses an edge E z ∈ E ℓ with z ∈ E z .For z ∈ Γ, this choice is restricted to E z ⊂ Γ.Moreover, for z ∈ Γ D , we even enforce E z ⊂ Γ D .For w ∈ H 1 (Ω), P ℓ w is then defined by (P ℓ w)(z) := ψ z , w Ez , for a node z ∈ K ℓ .Here, ψ z ∈ L 2 (E z ) denotes the dual basis function defined by ψ z , ϕ z ′ Ez = δ zz ′ , and ϕ z ∈ S 1 (T ℓ ) denotes the hat function associated with z ∈ K ℓ .By definition, we then have the following projection properties e. the projection P ℓ preserves discrete (Dirichlet) boundary data.Moreover, P ℓ satisfies the following stability property (22) and approximation property where C sz > 0 depends only on σ(T ℓ ).Together with the projection property onto S 1 (T ℓ ), it is an easy consequence of the stability (22) of P ℓ that for all w ∈ H 1 (Ω).In particular, P ℓ is quasi-optimal in the sense of the Céa lemma with respect to • H 1 (Ω) and ∇(•) L 2 (Ω) , i.e. (1 Moreover, P ℓ allows to define a discrete lifting operator whose operator norm is uniformly bounded in terms of σ(T ℓ ).Here, L ∈ L(H 1/2 (Γ); H 1 (Ω)) denotes an arbitrary lifting operator, i.e. (Lw)| Γ = w for all w ∈ H 1/2 (Γ), see e.g.[22].
Finally, we put emphasis on the fact that our definition of P ℓ also provides an operator Using the definition of H 1/2 (Γ) as the trace space of H 1 (Ω) and the stability (22), we see for all g ∈ H 1/2 (Γ), i.e.P ℓ : H 1/2 (Γ) → S 1 (E Γ ℓ ) is a continuous projection with respect to the H 1/2 -norm.In particular, P ℓ also provides a continuous projection P ℓ = P D ℓ : for all g ∈ H 1/2 (Γ D ).As before, this definition is consistent with the previous notation of P ℓ since (P Γ ℓ g)| Γ D = P D ℓ ( g| Γ D ) for all g ∈ H 1/2 (Γ).

A Posteriori Error Estimation and Adaptive Mesh-Refinement
3.1.Data oscillations.We start with the element data oscillations osc 2 T ,ℓ := for all T ∈ T ℓ (27) and where f T := |T | −1 T f dx ∈ R denotes the integral mean over an element T ∈ T ℓ .These arise in the efficiency estimate for residual error estimators.
Our residual error estimator will involve the edge data oscillations Here, ω ℓ,E ⊂ Ω is the edge patch from (17), and f ω ℓ,E ∈ R is the corresponding integral mean of f .
For the analysis, we shall additionally need the node data oscillations Here, ω ℓ,z ⊂ Ω is the node patch from (16), and f ω ℓ,z ∈ R is the corresponding integral mean of f .Moreover, the efficiency needs the Neumann data oscillations and where φ E := |E| −1 E φ dx denotes the integral mean over an edge E ∈ E N ℓ .Finally, the approximation of the Dirichlet data g ≈ g ℓ is controlled by the Dirichlet data oscillations osc D,ℓ := Recall that, on the 1D manifold Γ D , the derivative of the nodal interpoland is the elementwise best approximation of the derivative by piecewise constants, i.e., According to the elementwise Pythagoras theorem, this implies and all Dirichlet edges E ∈ E D ℓ .This observation will be crucial in the analysis below.Moreover, (32) yields The following result is found in [16,Lemma 2.2].
Lemma 1.Let g ∈ H 1 (Γ D ) and let g ℓ denote the nodal interpoland of g ℓ on Γ D .Then, where the constant C 1 > 0 depends only on the shape regularity constant σ(T ℓ ) and Ω.
To keep the notation simple, we extend the Dirichlet and the Neumann data oscillations from ( 30)-( 31) by zero to all edges to abbreviate the notation.

Element-based residual error estimator.
Our first proposition states reliability and efficiency of the error estimator ρ ℓ from ( 5)-( 6).
Proposition 2 (reliability and efficiency of ρ ℓ ).The error estimator ρ ℓ is reliable and efficient The constants C 2 , C 3 > 0 depend only on the shape regularity constant σ(T ℓ ) and on Ω.

Sketch of proof. We consider a continuous auxiliary problem
with unique solution w ∈ H 1 (Ω).We then have norm equivalence Whereas the second term is controlled by Lemma 1, the first can be handled as for homogeneous Dirichlet data, i.e. use of the Galerkin orthogonality combined with approximation estimates for a Clément-type quasi-interpolation operator.Details are found e.g. in [4].This proves reliability (37).
By use of bubble functions and local scaling arguments, one obtains the estimates where ω ℓ,E denotes the edge patch of E ∈ E ℓ .Details are found e.g. in [3,31].Summing these estimates over all elements, one obtains the efficiency estimate (38).
Proposition 3 (discrete local reliability of ρ ℓ ).Let T * = refine(T ℓ ) be an arbitrary refinement of T ℓ with associated Galerkin solution U * ∈ S 1 (T * ).Let R ℓ (T * ) := T ℓ \T * be the set of all elements T ∈ T ℓ which are refined to generate T * .Then, there holds with some constant C 4 > 0 which depends only on σ(T ℓ ) and Ω.
Proof.We consider a discrete auxiliary problem To estimate the H 1 -norm of W * in terms of the boundary data, let L * : H 1/2 (Γ) → S 1 (T * ) denote the discrete lifting operator from (26).Let g * , g ℓ ∈ H 1/2 (Γ) be arbitrary extensions of g * and g ℓ , respectively.Then, we have According to the triangle inequality and a Poincaré inequality for Moreover, the variational formulation for W * ∈ S 1 (T * ) yields , whence by the Cauchy-Schwarz inequality Altogether, this proves . Since the extensions g * , g ℓ were arbitrary and by definition of the H 1/2 (Γ D )-norm, this proves where we have finally used that g ℓ is also the nodal interpoland of g * so that Lemma 1 applies.For an element

and the last term thus satisfies
With the orthogonality relation (33) applied for g * ∈ S 1 (T * | Γ D ), we see Finally, we observe Arguing as in [13, Lemma 3.6], we see Finally, we again use the triangle inequality and the Poincaré inequality to see and thus obtain the discrete local reliability (40).The constant C 4 > 0 depends only on C 1 > 0 and on local estimates for the Scott-Zhang projection which are controlled by boundedness of σ(T ℓ ).
3.3.Edge-based residual error estimator.In the following, we show that the edgebased estimator ̺ ℓ from ( 7)-( 8) is locally equivalent to the element-based error estimator ρ ℓ from the previous section.The main advantage is that ̺ ℓ replaces the volume residuals by the edge oscillations osc E,ℓ .We define the edge jump contributions where [•] denotes the jump across an interior edge.Together with the edge oscillations from (28) and the Dirichlet oscillations from (31), our version of the residual error estimator from ( 7)-( 8) reads Note that osc E,ℓ (E ℓ,z ), η ℓ (E ℓ,z ), and res ℓ (ω ℓ,E ) are defined analogously to (36).The following lemma implies local equivalence of the estimators ρ ℓ and ̺ ℓ .Lemma 4. The following local estimates hold: ) for all z ∈ K Ω ℓ .The constants C 5 , C 6 , C 7 > 0 depend only on the shape regularity constant σ(T ℓ ), whereas C 8 > 0 depends on the use of newest vertex bisection and the initial mesh T 0 .Sketch of proof.The proof of (i) follows from the fact that taking the integral mean f ω is the L 2 best approximation by a constant, i.e.
and that the area of neighboring elements can only change up to σ(T ℓ ).The estimate (ii) is well-known and found, e.g., in [21, Section 2.2.4].Note that (ii) essentially needs the condition that each element T ∈ T ℓ has an interior node, cf.Section 2.1.The lower estimate in (iii) follows from the same arguments as (i), namely and the fact that -up to shape regularity-only finitely many edges belong to E ℓ,z .For f being a piecewise polynomial, the upper estimate in (iii) follows from a scaling argument since both terms, osc E,ℓ (E ℓ,z ) ≃ osc K,ℓ (z) define seminorms on P p ( T ∈ T ℓ : z ∈ T ) with kernel being the constant functions.Note that the equivalence constants depend on the shape of the node patch ω ℓ,z , but newest vertex bisection leads only to finitely many shapes of the patches.For arbitrary f ∈ L 2 (Ω), we first observe that the T ℓ -piecewise integral mean f ℓ ∈ P 0 (T ℓ ), defined by This and the Pythagoras theorem for the integral mean f ℓ prove Scaling with |ω ℓ,z | ≃ |ω ℓ,E | concludes the proof.
Proposition 5 (reliability and efficiency of ̺ ℓ ).The error estimator ̺ ℓ is reliable and efficient The constants C rel , C eff > 0 depend only on Ω, the use of newest vertex bisection, and the initial mesh T 0 .
Proof.With the help of the preceding lemma, we obtain equivalence ̺ ℓ ≃ ρ ℓ .Consequently, reliability and efficiency of ̺ ℓ follow from the respective properties of the element-based estimator ρ ℓ , see Proposition 2.
Proposition 6 (discrete local reliability of ̺ ℓ ).Let T * = refine(T ℓ ) be an arbitrary refinement of T ℓ with associated Galerkin solution U * ∈ S 1 (T * ).Let R ℓ (T * ) := T ℓ \T * be the set of all elements T ∈ T ℓ which are refined to generate T * and be the set of all edges which touch a refined element.Then, and with constants C ref , C dlr > 0 which depend only on Ω, the use of newest vertex bisection, and the initial mesh T 0 .
Proof.According to shape regularity, the number of elements which share a node z ∈ K ℓ is uniformly bounded.Consequently, so is the number of edges which touch an element T ∈ R ℓ (T * ) which will be refined.This proves the estimate #R ℓ (E * ) ≤ C ref #R ℓ (T * ).To prove (49), we use the discrete local reliability of ρ ℓ from Proposition 3.With the help of Lemma 4, each refinement indicator ρ ℓ (T ) for T ∈ R ℓ (T * ) is dominated by finitely many indicators ̺ ℓ (E) for E ∈ R ℓ (E * ), where the number depends only on the shape regularity constant σ(T ℓ ).

3.4.
Adaptive algorithm based on Dörfler marking.Our version of the adaptive algorithm has been well-studied in the literature mainly for element-based estimators, cf.e.g.[13].
3.5.Adaptive algorithm based on modified Dörfler marking.For (piecewise) smooth data f ∈ H 1 and g ∈ H 2 , uniform mesh-refinement guarantees osc E,ℓ = O(h 2 ) as well as osc D,ℓ = O(h 3/2 ), whereas the error and hence the error estimator ̺ ℓ may at most decay as O(h).Consequently, we may expect that the normal jump terms dominate the error estimator [12].This observation led to the following version of the marking strategy which has essentially been proposed in [7].We stress, however, that the algorithm in [7,6] is stated with node oscillations osc K,ℓ instead of edge oscillations osc E,ℓ .Moreover, certain details in the proofs of [6] seem to be dubious.Algorithm 8. Let adaptivity parameters 0 < θ 1 , θ 2 < 1 and ϑ > 0 and an initial triangulation T 0 be given.For each ℓ = 0, 1, 2, . . .do: (iv) Generate new mesh T ℓ+1 := refine(T ℓ , M ℓ ).(v) Update counter ℓ → ℓ + 1 and go to (i).
Lemma 12 (estimator reduction).Assume that the set M ℓ ⊆ E ℓ of marked edges satisfies the Dörfler marking (50) with ̺ ℓ and some fixed parameter 0 < θ < 1 and that T ℓ+1 = refine(T ℓ , M ℓ ) is obtained by local newest vertex bisection of T ℓ .Then, there holds the estimator reduction estimate with some contraction constant q ∈ (0, 1) which depends only on θ ∈ (0, 1).The constant C 11 > 0 additionally depends only on the initial mesh T 0 .
Sketch of proof.For the sake of completeness, we include the idea of the proof of (57).To keep the notation simple, we define so that all contributions of ̺ ℓ are defined on the entire set of edges E ℓ .First, we employ a triangle inequality and the Young inequality to see , where δ > 0 is arbitrary.Second, a scaling argument proves and the constant C > 0 depends only on σ(T ℓ ).Third, we argue as in [13,Corollary 3.4] to see Fourth, it is part of the proof of [2, Theorem 5.4] that which essentially follows from the orthogonality relation (33).Fifth, in [26, Lemma 6] it is proven that Plugging everything together, we see , where we have used that Lemma 11 guarantees the Dörfler marking for ̺ ℓ in the second estimate.Finally, it only remains to choose δ > 0 sufficiently small so that q := (1 + δ)(1 − θ/4) < 1.
The following lemma states some quasi-Galerkin orthogonality property which allows to overcome the lack of Galerkin orthogonality used in [13].
Lemma 13 (quasi-Galerkin orthogonality).Let T * = refine(T ℓ ) be an arbitrary refinement of T ℓ with the associated Galerkin solution U * ∈ S 1 (T * ).Then, for all α > 0, and consequently as well as The constant C orth > 0 depends only on the shape regularity of σ(T ℓ ) and σ(T * ) and on Ω.
Proof.We recall the Galerkin orthogonality . This and the Young inequality allow to estimate the L 2 -scalar product by for all α > 0. To estimate the second contribution on the right-hand side, we proceed as in the proof of Proposition 3 and choose arbitrary extensions g * , g ℓ ∈ H 1/2 (Γ) of the nodal interpolands g * , g ℓ from Γ D to Γ.Then, we use the test function Arguing as above, we obtain This concludes the proof of (59).
To verify (60)-(61), we use the identity Rearranging the terms accordingly and use of the quasi-Galerkin orthogonality (59) to estimate the scalar product, concludes the proof.

5.
Quasi-Optimality of Adaptive Algorithm 5.1.Optimality of marking strategy.With Theorem 14, we have seen that Dörfler marking (50) yields a contraction of ∆ ℓ ≃ ̺ 2 ℓ .In the following, we first observe that the Dörfler marking (50) is not only sufficient but in some sense also necessary to obtain contraction of the estimator.
Proof.We start with the elementary observation that q ≤ q ⋆ is equivalent to Using the discrete local reliability (49) and the quasi-Galerkin orthogonality (61), we see , where we have finally used Assumption (66).As in the proof of Proposition 3, we have Moreover, the identities osc Note that (69) led to the definition of R ℓ (E * ) given above.Together with the efficiency (46) and osc 2 , we may now conclude 2 and led to the definition of q ⋆ .

Optimality of newest vertex bisection.
The quasi-optimality analysis for adaptive FEM involves two properties of the mesh-refinement which are, so far, only mathematically guaranteed for newest vertex bisection [5,19,21,30] and local red-refinement with hanging nodes up to some fixed order [8].First, it has originally been proven in [5] and lateron improved in [30,21,19] that the sequence of meshes defined inductively by with some constant C nvb > 0 which depends only on T 0 .This proves that the closure step in newest vertex bisection which avoids hanging nodes and leads to possible bisections of edges E ∈ E ℓ \M ℓ may not lead to arbitrary many refinements.For newest vertex bisection, the original analysis of [5] as well as of the successors [21,30] required that the reference edges of the initial mesh T 0 are chosen such that an interior edge E = T + ∩ T − ∈ E Ω 0 is either the reference edge of both elements T + , T − ∈ T 0 or of none.For the particular 2D situation, the recent work [19] removes any assumption on T 0 .
Second, for two meshes T ′ = refine(T 0 ) and T ′′ = refine(T 0 ) obtained by newest vertex bisection of the initial mesh T 0 , there is a unique coarsest common refinement T ′ ⊕ T ′′ = refine(T 0 ) which is a refinement of both T ′ and T ′′ .It is shown in [29,13] that T ′ ⊕ T ′′ is, in fact, the overlay of these meshes.Moreover, it holds that

Definition of approximation class.
To state the optimality result, we have to introduce the appropriate approximation class.Let T := T : T = refine(T 0 ) (77) be the set of all triangulations which can be obtained from T 0 by newest vertex bisection.Moreover, let be the set of triangulations which have at most N ∈ N elements more than the initial mesh T 0 .For s > 0, the approximation class A s has already been defined in ( 12)- (13).The first step is to prove that, up to constants, nodal interpolation of the boundary data yields the best possible approximation of the exact solution.
where L ℓ denotes the discrete lifting operator from (26).For V ℓ ∈ S 1 D (T ℓ ), we thus have ) Ω according to the Galerkin orthogonality.Therefore, the Cauchy-Schwarz inequality provides the Céa-type quasi-optimality We now plug-in Since the extensions g, g ℓ of g, g ℓ were arbitrary, we obtain where we have used the quasi-optimality of the Scott-Zhang projection, see Section 2.3, and Lemma 1. Adding osc D,ℓ to this estimate, we conclude the proof.
5.4.Quasi-optimality result.Finally, we may formally state the optimality result (14) described in the introduction.
Theorem 18. Suppose that the adaptivity parameter 0 < θ < 1 in Algorithm 7 satisfies (65) so that the marking strategy is optimal in the sense of Proposition 15.Let U ℓ ∈ S 1 (T ℓ ) denote the sequence of discrete solutions generated by Algorithm 7. If the given data and the corresponding weak solution of (2) satisfy (u, f, g, φ) ∈ A s , there holds i.e. each possible convergence rate s > 0 is asymptotically achieved by AFEM.The constant C opt > 0 depends only on (u, f, g, φ) As , the initial mesh T 0 , and the adaptivity parameters.
Proof.Since the proof follows essentially the lines of [29,13], we leave the elaborate details to the reader.For any ε > 0, the definition of the approximation class A s guarantees some triangulation where the constant depends only on (u, f, g, φ) As .We now consider the overlay T * := T ε ⊕ T ℓ .With the help of Lemma 17 as well as the elementary estimates osc T , * ≤ osc T ,ε and osc N, * ≤ osc N,ε , we observe Note that Lemma 4 together with reliability and efficiency of ̺ * yield where osc T , * is replaced by osc E, * .Choosing with λ > 0 sufficiently small, we enforce the reduction (66) and derive that R ℓ (E * ) ⊆ E ℓ satisfies the Dörfler marking criterion, cf.Proposition 15 .Minimality of M ℓ thus gives We next note that N,ℓ ≃ ∆ ℓ according to reliability and efficiency of ̺ ℓ and the definition of the contraction quantity ∆ ℓ in Theorem 14. Combining the last two lines, we see By use of the closure estimate (75) of newest vertex bisection, we obtain Note that the contraction property (64) of ∆ j implies ∆ ℓ ≤ κ ℓ−j ∆ j , whence ∆ −1/(2s) j ≤ κ (ℓ−j)/(2s) ∆ −1/(2s) ℓ .According to 0 < κ < 1 and the geometric series, this gives Remark.All convergence and optimality results in this paper are stated for the edge-based error estimator ̺ ℓ .Nevertheless, it is only a notational modification to see that also the element-based error estimator ρ ℓ from (5)-( 6) leads to quasi-optimally convergent versions of AFEM.To that end, Algorithm 7 is slightly modified, and one seeks minimial sets of marked elements M ℓ ⊆ T ℓ instead.For each marked element T ∈ M ℓ , we mark its reference edge.The convergence result in Theorem 14 and the optimality result in Theorem 18 hold accordingly.

Some Remarks on the 3D Case
So far, we have only considered a 2D model problem (1).In 3D, one additional difficulty is that the regularity assumption g ∈ H 1 (Γ D ) is not sufficient to guarantee continuity of g.Therefore, one must not use nodal interpolation to discretize g ≈ g ℓ and to define the Dirichlet data oscillations osc D,ℓ .
If we do not use nodal interpolation to approximate g ≈ g ℓ , the estimator reduction estimate (57) becomes where C 11 > 0 additionally depends on Ω.The reason for this is that the analysis provides an additional term g ℓ+1 − g ℓ 2 H 1/2 (Γ D ) on the right-hand side of (57) since we loose the orthogonality relation (33) which is used in the form . Instead, an inverse estimate and the Rellich compactness theorem yield which proves (81).Note that this estimate holds for any discretization of g ≈ g ℓ ∈ S 1 (E D ℓ ) and even in 3D, where the arclength derivative (•) ′ is replaced by the surface gradient ∇ Γ (•); we refer to [18] for the inverse estimate.
A possible choice for g ℓ is g ℓ = Π ℓ g, where Π ℓ : L 2 (Γ D ) → S 1 (E D ℓ ) is the L 2 -orthogonal projection [4].Alternatively, g ℓ = P ℓ g, with P ℓ : H 1/2 → S 1 (E D ℓ ) the Scott-Zhang projection is chosen [27].Note that newest vertex bisection of T ℓ and hence of E D ℓ ensures that Π ℓ is a stable projection with respect to the H 1 (Γ D )-norm [19].In [20], we prove for either choice the approximation estimate Moreover, we show that, for g ℓ = Π ℓ g, the a priori limit g ∞ := lim ℓ g ℓ exists strongly in H α (Γ D ) for 0 ≤ α < 1 and even weakly in H 1 (Γ D ) provided that the discrete spaces S 1 (E D ℓ ) are nested, i.e. S 1 (E D ℓ ) ⊆ S 1 (T ℓ+1 | Γ D ) for all ℓ ∈ N 0 .Note, however, that this is always the case for adaptive mesh-refining algorithms.In particular, we have In the following, we even aim to prove that nestedness (83) implies the existence of the a priori limit lim ℓ U ℓ in H 1 (Ω).To that end, we need the following lemma.
Lemma 19 (a priori convergence of Scott-Zhang projection).We recall the Scott-Zhang projection P ℓ onto S 1 (T ℓ ) and make the additional assumption that the edges E z are chosen appropriately, i.e. for ω ℓ,z ⊂ (T ℓ ∩ T ℓ+1 ) we ensure that the edge E z is chosen for both operators P ℓ and P ℓ+1 .Then, the Scott-Zhang interpolands v ℓ := P ℓ v ∈ S 1 (T ℓ ) of arbitrary v ∈ H 1 (Ω) converge to some a priori limit in H 1 (Ω), i.e. there holds Proof.We follow the ideas from [25] and define the following subsets of Ω: Let ε > 0 be arbitrary.Since the space H 2 (Ω) is dense in H 1 (Ω), we find v ε ∈ H 2 (Ω) such that v − v ε H 1 (Ω) ≤ ε.Due to local approximation and stability properties of P ℓ , we obtain cf. [28].By use of (85), we may choose ℓ 0 ∈ N sufficiently large to guarantee There holds lim ℓ→∞ |Ω * ℓ | = 0, cf.[25,Proposition 4.2], and this provides the existence of due to the non-concentration of Lebesgue functions.With these preparations, we finally aim at proving that P ℓ v is a Cauchy sequence in H 1 (Ω).Therefore, let ℓ ≥ max{ℓ 0 , ℓ 1 } and k ≥ 0 be arbitrary.First, we use that for any T ∈ T ℓ , (P ℓ v)| T depends only on v| ω ℓ (T ) .Then, by definition of Ω 0 ℓ and our assumption on the definition of P ℓ and P ℓ+k on T ℓ ∩ T ℓ+k , we obtain Second, due to the local stability of P ℓ and (87), there holds Third, we proceed by exploiting (86).We have Combining the estimates from (88)-(90), we conclude P ℓ v − P ℓ+k v H 1 (Ω) ε, i.e. (P ℓ v) is a Cauchy sequence in H 1 (Ω) and hence convergent.Now, we are able to prove a priori convergence of U ℓ towards some a priori limit u ∞ .
Remark.Note that Proposition 20 also holds if the Scott-Zhang projection is used to discretize g ≈ g ℓ = P ℓ g.This immediately follows from Lemma 19, since Theorem 21.Suppose that either the L 2 -projection g ℓ = Π ℓ g or the Scott-Zhang operator g ℓ = P ℓ g is used to discretize the Dirichlet data g ∈ H 1 (Γ).Then, Algorithm 7 guarantees lim ℓ u − U ℓ H 1 (Ω) = 0 for both 2D and 3D.
Proof.With Proposition 20 and the estimator reduction (81), we obtain From this and elementary calculus, we deduce estimator convergence lim ℓ ̺ ℓ = 0, cf.[1] for the concept of estimator reduction.According to reliability of ̺ ℓ , this yields convergence of the adaptive algorithm.
Note, however, that this convergence result is much weaker than the contraction result of Theorem 14.With the techniques of the present paper, it is unclear how to prove a contraction result if the additional orthogonality relation (33) fails to hold.Then, f = −∆u ≡ 0, and the solution u as well as its Dirichlet data g = u| Γ D admit a generic singularity at the reentrant corner r = 0.
Figure 3 shows a comparison between uniform and adaptive mesh refinement.For the algorithm based on the modified Dörfler marking, we use θ := ϑ = θ 1 = θ 2 .For both algorithms, we then vary the adaptivity parameter θ between 0.2 and 0.8.We observe that both adaptive algorithms lead to the optimal convergence rate O(N −1/2 ) for all choices of θ, whereas uniform refinement leads only to suboptimal convergence behaviour of approximately O(N −2/7 ).
Note that due to f ≡ 0, we have osc E,ℓ ≡ 0 in this example.In Figure 4, we compare the jump terms the Dirichlet data oscillations osc D,ℓ , and the Neumann jump terms for uniform and adaptive refinement.Due to the corner singularity at r = 0, uniform refinement leads to a suboptimal convergence behaviour for η Ω,ℓ and even for osc D,ℓ and η N,ℓ , i.e. all contributions of ̺ 2 ℓ = η 2 Ω,ℓ + η 2 N,ℓ + osc D,ℓ show the same poor convergence rate of approximately O(N −2/7 ).For adaptive mesh-refinement, we observe that the optimal order of convergence is retained, namely ̺ ℓ ≃ η ℓ = O(N −1/2 ).Moreover, we even observe optimal convergence behaviour osc D,ℓ ≃ η N,ℓ = O(N −3/4 ) for the boundary contributions of ̺ ℓ .
Finally, in Figure 2, the initital mesh T 0 and the adaptively generated mesh T 9 with N = 10966 Elements are visualized.As expected, adaptive refinement is essentially concentrated around the reentrant corner r = 0. in Ω.
There holds g ∈ H 1 (Γ D ), φ ∈ L 2 (Γ N ), and f ∈ L 2 (Ω).Note that the Dirichlet data g has a singularity at the reentrant corner (0, 0), whereas the volume force f is singular along the circle around (0, 0) with radius r = 1.Again, we compare the standard Dörfler marking strategy as well the modified Dörfler marking with the uniform approach.Figure 6 shows a comparison between uniform and adaptive mesh refinement.The parameters θ = ϑ = θ 1 = θ 2 are varied between 0.2 and 0.8.Both adaptive algorithms lead to optimal convergence rate O(N −1/2 ) for all choices of θ, whereas uniform refinement leads only to a suboptimal rate of O(N −1/3 ).In Figure 7, we compare the estimator contributions which (in contrast to the previous example) include additional volume oscillations osc E,ℓ .Due to the data singularities, as well as the singularity introduced by the change of the boundary condition, uniform refinement leads only to suboptimal convergence rates for all estimator contributions.For adaptive mesh-refinement, we observe that the optimal order of convergence is retained.This means ̺ ℓ ≃ η ℓ = O(N −1/2 ) and includes even optimal convergence behaviour osc D,ℓ ≃ η N,ℓ = O(N −3/4 ) for the boundary contributions of ̺ ℓ .In Figure 5, one observes the adaptive refinement towards the singularity in the reentrant corner as well as the circular singularity of f and the singularities which stem from the change of boundary conditions.

Figure 2 .
Figure 2. Z-shaped domain with initial mesh T 0 and adaptively generated mesh T 9 with N = 10966 for θ = 0.5 in Algorithm 7. The Dirichlet boundary Γ D is marked with a solid line, whereas the dashed line denotes the Neumann boundary Γ\Γ D .

Figure 5 .
Figure 5. L-shaped domain with initial mesh T 0 and adaptively generated mesh T 9 with N = 12177 for θ = 0.5 in Algorithm 7. The Dirichlet boundary Γ D is marked with a solid line, whereas the dashed line denotes the Neumann boundary Γ\Γ D .