Global convergence of Newton's method for the regularized $p$-Stokes equations

The motion of glaciers can be simulated with the $p$-Stokes equations. Up to now, Newton's method to solve these equations has been analyzed in finite-dimensional settings only. We analyze the problem in infinite dimensions to gain a new viewpoint. We do that by proving global convergence of the infinite-dimensional Newton's method with Armijo step sizes to the solution of these equations. We only have to add an arbitrarily small diffusion term for this convergence result. We prove that the additional diffusion term only causes minor differences in the solution compared to the original $p$-Stokes equations under the assumption of some regularity. Finally, we test our algorithms on two experiments: A reformulation of the experiment ISMIP-HOM $B$ without sliding and a block with sliding. For the former, the approximation of exact step sizes for the Picard iteration and exact step sizes and Armijo step sizes for Newton's method are superior in the experiment compared to the Picard iteration. For the latter experiment, Newton's method with Armijo step sizes needs many iterations until it converges fast to the solution. Thus, Newton's method with approximately exact step sizes is better than Armijo step sizes in this experiment.


Introduction
Ice-sheet models spend most of the computation time solving the momentum equations, [1].These equations are nonlinear partial differential equations named p-Stokes equations.
We prove that Newton's method with Armijo step sizes converges to the solution of the p-Stokes equations if we add a small diffusion term.We control the step size with a convex functional, which is the anti-derivative of the p-Stokes equations.Evaluating this functional has nearly no computational cost.Moreover, the convexity of the functional allows us to approximate the exact step sizes.We slightly modify the numerical experiment ISMIP-HOM B to test Newton's method with Armijo step sizes.Moreover, we test this algorithm with a sliding block.We test both experiments with the Picard iteration and Newton's method with approximations of exact step sizes.
The small diffusion term is necessary to imply Gâteaux differentiability of the p-Stokes equations in the variational formulation.Furthermore, the shear-thinning viscosity term has a negative exponent.Thus, we need for differentiability a positive constant in this term.To conclude the theory, we show that the regularized solution converges to the solution of the p-Stokes equations under a slight regularity assumption.
Regarding the well-posedness of the equations, there are different types of literature results: nonlinear friction boundary conditions [2], an implicitly given viscosity [3] with a differentiable shear-thinning term, or a differentiable shear-thinning term with Dirichlet boundary conditions, [4].One recent publication with more general boundary conditions than we consider in two dimensions uses Newton's method in a finite-dimensional setting, see [5].However, we need a combination of a differentiable, explicitly given shear-thinning viscosity, and nonlinear friction boundary conditions.Glaciologists use such a formulation to simulate glaciers, [6].Gâteaux differentiability results are mainly devoted to optimal control and the necessary control-to-state mapping, see for example, [7] for the p-Laplace equations or [8] for the Navier-Stokes equations with a nonlinear p-Stokes term.In [8], the Gâteaux differentiability is shown for p ≥ 2. The idea of how to prove differentiability for 1 < p < 2 is briefly mentioned in [9, section 6].That paper uses the additional diffusion term for optimal control.It proves convergence in Sobolev spaces for vanishing regularization of the diffusion term.However, that paper has different requirements: It considers optimal control in a slightly different formulation, has Dirichlet boundary conditions, and has a convective term.It has more restrictions on the exponent of the shear-thinning fluids than we have.In [3], local quadratic convergence of Newton's method is shown for the finite-dimensional case.Glaciologists already consider Newton's method, [10], but we consider a different approach by adding a small diffusive term and using a convex functional for step size control.There is also a multigrid approach with Newton's method and a reformulation of the partial differential equation with first derivatives available, [11].
The paper is structured as follows: In section 2, we derive the variational formulation, project to divergence-free spaces, introduce a minimization problem that is equivalent to solving the full-Stokes equations, and verify existence and uniqueness of a solution.In section 3, we consider Gâteaux differentiability.In section 4, we verify that Newton steps can be calculated.In section 5, we use the functional to verify global convergence.Additionally, we prove that the solution of the regularized p-Stokes equations converges to the solution of the p-Stokes equations for vanishing regularization under some regularity assumptions.In section 6, we consider one experiment without sliding and one with.We summarize our results in the final section 7.

The p-Stokes equations
In this section, we formulate the p-Stokes equations in both the classical and the variational formulations.The p-Stokes equations can be used to, e.g., model the motion of glaciers.Their complexity results from nonlinear viscosity, also described as shearthinning, [6].Mass conservation and incompressibility lead to a divergence-free velocity v. Let N ∈ {2, 3}, Ω ⊆ R N be a Lipschitz domain.Let σ be the stress tensor, v the velocity, ρ the density, and g the gravitational acceleration.The p-Stokes equations are: To relate the stress tensor σ, the velocity v, and the pressure π, we introduce the identity matrix and the following definition: Let p ∈ (1, 2).The stress tensor σ is given by The case p = 2 reduces to the Stokes problem.In glaciological applications p = 4/3, see [6, section 1.4.3], or in more recent approaches p = 5/4, see [12,Abstract], is used.The following result states information about the integrability of S p : Proof.With the dual exponent p ′ = p/(p − 1) and p ∈ (1, 2) follows The Dirichlet boundary condition v = 0 is satisfied for those parts of the glacier frozen to the ground Γ d .The interaction of the glacier with the air is given by σ • n = 0 on Γ a with the matrix-vector multiplication of the matrix σ and the unit normal vector n.We assume nonlinear sliding at parts of the bedrock Γ b that are not frozen to the ground.This sliding is represented by tangential sliding.For these parts of the bedrock, we assume that the normal component of the velocity is 0 because we neglect the melting of ice or freezing of water at the bedrock.We remind of the definition of the tangential components We formulate the boundary conditions as follows: For the choice of the boundary conditions see also [2].The boundary conditions imply as a natural choice of the space Definition 2.3 (Function space).Let |Γ d | > 0. We define for all p > 1 with norm dx.
The Poincaré inequality implies that the W p -norm is equivalent to the W 1,p (Ω) N -norm.For example, the Poincaré inequality was proved in [14, Theorem 1.5] for p = 2.But the proof is identical for p ∈ (1, 2).
2.1.Variational formulation.We derive the variational formulation in the space W p .We define the double scalar product and obtain the following equation for σ, τ ∈ R N ×N immediately: Let p ′ be the dual exponent, ϕ ∈ W p ′ , σ ∈ W 1,p (Ω) N ×N .We conclude by multiplication with the test function ϕ and partial integration for the left-hand side of equation ( 1) Let v ∈ W p , π ∈ L p ′ (Ω).We use σ = −πI + BS p (Dv) and I : ∇ϕ = divϕ to obtain for the first summand on the right-hand side of equation ( 11) The second summand on the right-hand side of equation (11) vanishes on Γ d because the test function ϕ ∈ W p ′ vanishes on Γ d , see the definition of W p ′ in Definition 2.3.Moreover, σ • n = 0 is valid on Γ a , see equation (6).Thus, the integral over this domain disappears, too.It follows The relation between tangential and normal components, see (4), implies We split the boundary integral on Γ b into normal and tangential components and use the boundary conditions on Γ b , see equation (7), and the definition of ϕ ∈ W p ′ , namely Due to equation (4) and equation ( 8), we have v = v t on Γ b , which yields Then a weak solution of the problem is given by (v, π) with the operator A : The operator A is well-defined due to Lemma 2.2.
Definition 2.4 (Divergence free space).Let |Γ d | > 0. We define for all p > 1 the divergence-free velocity space with W p as in Definition (2.3).

Now, we examine surjectivity of the restricted operator
By solving the divergence-free problem, we solve the original problem: The inf-sup condition is proved by [15,Corollary 3.2] for Dirichlet boundary conditions: The inf-sup condition is also fulfilled for W p ⊇ W 1,p 0 (Ω) N .Thus, we can apply [16,Theorem IV.1.4].This yields for each solution in the divergence-free formulation a unique solution for the original problem.This theorem is stated in Hilbert spaces, but the proof is identical for operators on Banach spaces with an existing dual operator.
2.2.Regularization of the equation and the divergence-free space.In this subsection, we introduce a regularization and the divergence-free space.We add a small diffusion term that allows us to obtain a solution in V 2 .We need solutions in V 2 and not only in V p to obtain Gâteaux differentiability in section 3. We define for µ 0 > 0 the operator We define a solution of the p-Stokes equations: is called weak solution of the p-Stokes equations.
For vanishing µ 0 and δ, the weak solution of the p-Stokes equations converges to the solution of µ 0 = 0 and δ = 0 under slight regularity assumptions, see section 5.3.

Equivalent minimization problem.
In this subsection, we introduce a minimization problem, which is equivalent to finding a solution to the regularized p-Stokes equations.In [4], the convex functional was introduced for µ 0 = 0 and Dirichlet boundary conditions.In [2], it was introduced for µ 0 = 0 and δ = 0 for more general boundary conditions.We use those formulations for our convex functional with µ 0 , δ ∈ [0, ∞).
For δ > 0, the Gâteaux differentiability of the first and fourth summand are discussed in [4] on W 1,p 0 (Ω) N .Because the boundary conditions do not influence this integral, the Gâteaux differentiability is also clear on V 2 .The second summand can be handled identically to the first summand.Also the term v → µ 0 (∇v, ∇v) is Gâteaux differentiable.
Thus, J µ 0 ,δ is Gâteaux differentiable.Because A and (v, w) → (ρg, w) are continuous, J µ 0 ,δ is Fréchet differentiable.□ 2.4.Existence and uniqueness of weak solutions in the divergence-free space.Differentiability and uniqueness were proven for δ = 0 = µ 0 in [2] by using the strictly convex functionals as in Definition 2.6.For Dirichlet zero boundary conditions existence and uniqueness of solutions were stated without detailed proof via the Browder-Minty Theorem, e.g., in [17].For completeness, we prove existence and uniqueness of a solution with the Browder-Minty Theorem as our boundary conditions, see Equations ( 5) to ( 8) are more complicated than the standard Dirichlet boundary conditions and with δ > 0 and µ 0 > 0 different to the problem in [2].Thus, we prove a well-known result for a slightly different formulation.
Before we can state the existence and uniqueness result, we have to introduce the following definitions: The conditions for existence and uniqueness are formulated in the following theorem.
Theorem 2.9 (Browder-Minty Theorem).Let X be a reflexive separable Banach space and A : X → X * a strictly monotone, coercive, and continuous operator.Let f ∈ X * .Then, there exists a unique solution u ∈ X for the equation Proof.The existence of u ∈ X is proved in [18] Theorem 2 in a more general version.Uniqueness follows immediately with the strict monotonicity of A.

□
We analyze S p and S s to verify that A is strictly monotone, coercive, and continuous.For that purpose, we need the following result: Proof.See [19,Lemma 6.3].□ We calculate the derivative of S r , r ∈ (1, 2) and verify the properties stated above to apply the Browder-Minty Theorem 2.9: Lemma 2.11.Let r ∈ (1, 2).The function S r kl is continuously differentiable.Furthermore, the inequalities (18) and ( 19) are fulfilled for S r .
Proof.We verify the conditions in Lemma 2.10.Let P ∈ R N ×N , i, j ∈ {1, ..., N }.We calculate Let k, ℓ ∈ {1, .., N }.We infer We can verify inequality (19) with c1 ∈ R now: We verify inequality (18 (r − 2) We conclude with r − 2 ≤ 0, (P : We set P := I ij p and Q := I ij q with p, q ∈ R N for the vector-valued situation.□ We verify the three properties for the Browder-Minty Theorem, see 2.9 in the three following lemmata.We verify Lipschitz continuity of A instead of continuity: We obtain for the second summand of the operator A with Lemma 2.11 and C ∈ R We infer with the trace operator, [20, Theorem 1.12], and The same arguments are valid for the first summand of the operator A with Ω instead of Γ b , B instead of τ , and s = p.Thus, we obtain Lipschitz continuity with C 2 ∈ R and sup . The operator A is coercive.
Proof.We know Trivially, we have . The operator A is strictly monotone.
Proof.The strict monotonicity of S p and S s , proved in Lemma 2.11, yield monotonicity of The operator A is strict monotone because Hence, we can prove our existence result: Theorem 2.15.Let |Γ d | > 0, δ > 0, µ 0 > 0, and p ∈ (1, 2).There exists exactly one weak solution of the divergence-free regularized p-Stokes equations, see Definition (2.5).
Proof.(In [4] the case Γ d = ∂Ω is analyzed.)Lemma 2.12 yields Lipschitz continuity of the operator A and thus continuity.We verified the coercivity of A in Lemma 2.13.The operator A is also strictly monotone, see Lemma 2.14.We conclude the existence and uniqueness of a solution with the Browder-Minty Theorem, see Theorem 2.9.□

Differentiability
We calculate the Gâteaux derivative of the operator A to apply Newton's method.At first, we only consider the first summand of the operator A.
The operator G is Gâteaux differentiable: The directional derivative has the form Furthermore, the operator G : V 2 → V * 2 is Gâteaux differentiable.Proof.First, we consider the first summand on the right-hand side of equation (21).
The o-notation is the limit to zero in the following calculations.We prove pointwise convergence.For all i, j ∈ {1, ..., N }, we calculate the Taylor expansion of S p in Dv with the continuous derivative, see equation (20).This yields almost everywhere This implies for all i, j ∈ {1, ..., N } almost everywhere.We obtained the pointwise convergence.To apply the dominated convergence theorem, we calculate a majorant.Using the Lipschitz continuity of S p , see Lemma 2.11, and C ∈ R, we conclude The majorant is integrable as The directional derivative for the second summand on the right-hand side of equation ( 21) and the boundedness follow in the same ways if we use as the integration area Γ b instead of Ω and V, Φ with V ij := I ij v j and Φ ij := I ij ϕ j instead of Dv and Dϕ.With this notation, we relate the vector-valued expressions to the matrix-valued expressions.The last summand in equation (22) follows trivially due to the linearity of v → µ 0 (∇v, ∇ϕ), ϕ ∈ V 2 .
Thus, w → G ′ (v; w) is a bounded linear operator.Hence, G is Gâteaux differentiable.□ The operator G has only a directional derivative on V 2 not on V p : The second summand of G ′ (v; w) in Theorem 3.2 is not welldefined for all w, ϕ ∈ V p : Set B ≡ 1, v ≡ 0. Then we have for all w, ϕ ∈ Thus, the integral is not defined for suitable w, ϕ ∈ V p .

Infinite-dimensional Newton's Method
In this section, we state Newton's method in infinite dimensions and prove that we can calculate the Newton iterations.To our knowledge, this result is new as the Gâteaux derivative exists only in all directions for µ 0 > 0 and the combination of Newton's method and µ 0 > 0 was not considered before.Newton's method is: Choose and set v k+1 := v k + w k .
The problem to calculate a Newton step, see equation (24), is linear because we handle the linear problem (w, ϕ) → ⟨G ′ (v)w, ϕ⟩ V * 2 ,V 2 for all w, ϕ ∈ V 2 .Hence, we can apply Lax-Milgram's Lemma.Lemma 4.1.Let µ 0 > 0, δ > 0, and v ∈ V 2 .There exists exactly one solution w = w(v) ∈ V 2 such that equation (24) is fulfilled.Moreover, we have Proof.Let v ∈ V 2 .We show the continuity of (w, ϕ) → ⟨G ′ (v)w, ϕ⟩ V * 2 ,V 2 for all w, ϕ ∈ V 2 .Since the arguments in the calculation of relation ( 23) are also valid for the boundary term, we find C ∈ R with the trace operator, see [20, Theorem 1.12], such that We show the coercivity next.Let w ∈ V 2 .The last summand of the directional derivative of G ′ , see equation (22), is just We use Dw : ∇v = Dw : Dv for arbitrary v, w ∈ V 2 , see equation (10), to conclude The same arguments for the boundary term yield Hence, the conditions for applying Lax-Milgram's Lemma are validated, and a unique solution for each Newton step, see equation (24), exists.The coercivity constant is µ 0 .This implies the uniform bound

Globalized Newton method
In this section, we prove global convergence of Newton's method with a step size control.To our knowledge, an analysis of Newton's method with a step size control was only performed in finite dimensions; for details of this analysis, see, e.g., [4].Note that our minimization functional J µ 0 ,δ is not two times continuously differentiable as the second derivative is only a Gâteaux derivative.From the applied perspective, the usage of approximations of exact step sizes, see Definition 5.3, could be interesting as they perform well in our numerical examples, see chapter 6.
To prove the convergence, we use the convex functional J µ 0 ,δ introduced in section 2.3 for a step size control.We verify the equivalence of minimizing the convex functional and finding a weak solution of the p-Stokes equations: Lemma 5.1.Let µ 0 , δ ∈ [0, ∞).Then, we have with r = 2 for µ 0 > 0 and r = p for µ 0 = 0.
Because the necessary first-order optimality condition is sufficient for strict convex functions, the claim follows.□

5.1.
Step size controls.In this subsection, we remind of step size controls.
Step size controls guarantee convergence under certain conditions by setting The choice of α k should be such that the new functional value J µ 0 ,δ (v k + α k w k ) reduces compared to the old functional value J µ 0 ,δ (v k ) and a small reduction already implies convergence, see Lemma 5.4 condition c) for the mathematical details.A commonly used step size control that guarantees these properties under some conditions on J µ 0 ,δ is the Armijo step size, see for example [20, section 2.2.1.1]: Definition 5.2 (Armijo step sizes).Let J µ 0 ,δ : V 2 → R be continuously differentiable, γ ∈ (0, 1), v k ∈ V 2 the point, and w k ∈ V 2 the direction.Determine the biggest α k ∈ {1, 1/2, 1/2 2 , ...} such that For convex functions, we can try to find α k that minimizes J µ 0 ,δ (v k + α k w k ): In our case, we can only approximately solve this problem, which we discuss in the numerical experiment.

Global convergence.
In this subsection, we verify global convergence.We obtain global convergence by employing a step size control.The step sizes are calculated by reducing the functional J µ 0 ,δ , see Definition 2.6, in each iteration.We verify the conditions for the following convergence result: Lemma 5.4.Let J µ 0 ,δ : V 2 → R be continuously Fréchet differentiable, bounded from below, v 0 ∈ V 2 , v k+1 := v k + α k w k with the direction w k ∈ V 2 , and the step sizes α k ∈ (0, ∞).We additionally assume that we have a) descent directions: Then, we have [20].□ The first two statements are easy to verify: Lemma 5.5.Let µ 0 > 0, δ > 0. Newton steps fulfill the angle condition for the functional J µ 0 ,δ , see Definition 2.6, and are descent directions.
Proof.We calculated in Lemma 4.1 the coercivity constant µ 0 and the continuity constant C independent of k such that In [21,Theorem 1.2], the proof of the angle condition is done with η ≥ µ 0 /C for the finite-dimensional case.The proof for the infinite-dimensional case is identical.The angle condition implies that the Newton steps are descent directions.□ To verify large enough step sizes, we need the following lemma: Lemma 5.6.Let J ′ µ 0 ,δ be uniformly continuous.Let the step sizes (α k ) k be Armijo step sizes and let the direction w k fulfill with some φ : [0, ∞) → [0, ∞) monotonically increasing and satisfying φ(t) > 0 for all t > 0. Then the step sizes (α k ) k are admissible.
Proof.We want to apply Lemma 5.6.We verify the conditions.The function J ′ µ 0 ,δ is Lipschitz continuous, because the operator A is Lipschitz continuous, see Lemma 2.12, and (v, w) → µ 0 (v, w) is trivially Lipschitz continuous.Therefore, J ′ µ 0 ,δ is uniformly continuous.We already proved that the directions w k are descent directions.It remains to show that the descent directions are not too short.We know with the Newton steps and the right inequality in relation ( 27) Therefore, the step size is bounded from below by the monotonically increasing function Proof.We already explained why the step sizes are admissible, see Lemma 5.7.Moreover, the directions w k are descent directions.Furthermore, we validated the angle condition.
Trivially, the functional J µ 0 ,δ is bounded from below.Thus, all conditions for Lemma 5.4 are fulfilled.Hence, we conclude Convergence to the non regularized problem.In this subsection, we argue that the solution v = v µ 0 ,δ ∈ V 2 converges to v 0,0 ∈ V 2 .We only have to assume v 0,0 ∈ V 2 .
Theorem 5.9 (Convergence for smooth solutions).Let |Γ d | > 0 and v µ 0 ,δ be the solution of G µ 0 ,δ (v) = 0 with the variables µ 0 , δ ∈ (0, ∞) or (µ 0 , δ) = (0, 0) as in the definition of the operator G. Assume v 0,0 ∈ V 2 .Then there exists c ∈ R with Proof.We follow the idea in [4, Theorem 4.1].We set ṽ := v 0,0 −v µ 0 ,δ to shorten the notation.We use in the upcoming calculation the monotonicity of S p and S s , J ′ µ 0 ,δ (v µ 0 ,δ ) = 0, and the main theorem of calculus: In the last step, we inserted ṽ.We use to bound the functional J µ 0 ,δ by J µ 0 ,0 for all v ∈ V 2 with Hence, we obtain with the minimizer v 0,0 of J 0,0 and the definitions of J 0,0 and J µ 0 ,δ Thereby, we obtain with c ∈ R In this section, we describe two numerical experiments: ISMIP-HOM B formulated in [22] and a sliding block.We test these experiments with Newton's method with Armijo step sizes and compare them with approximately exact step sizes that we describe in the next subsection.We also use the approximately exact step sizes to modify the Picard iteration.We implemented the experiments in FEniCS, [23], version 2019.1.0.FEniCS is a tool that solves partial differential equations with finite elements.We use the Taylor-Hood element P 2 −P 1 for the velocity v and the pressure π.In all experiments, we calculate the initial guess by replacing |Dv| 2 + δ 2 (p−2)/2 with 10 6 and solving the resulting Stokes problem.
To evaluate this norm, we use the Riesz isomorphism: Definition 6.1 (Norm of the Riesz isomorphism).Let |Γ d | > 0, µ 0 > 0, δ > 0, and v ∈ V 2 .Let ṽ ∈ V 2 be the solution of Then the norm of the Riesz isomorphism is ries ∈ [0, ∞) is If the norm of the Riesz isomorphism is small for v ∈ V 2 , it follows G(v) ≈ 0. 6.1.Numerical solvers and computational effort.The Picard iteration was suggested for modeling glaciers in [24] and is used in ice models like ISSM, see [25].The Picard iteration is: For given for all ϕ ∈ V 2 .This algorithm is said to be often globally convergent in practice but slow, [10].It is, for example, used for the Navier-Stokes equations with proved convergence theory, [16].We will compare the following algorithms: • The Picard iteration.
• The Picard iteration with approximations of exact step sizes, see Definition 5.
We use a simple bisection and set b := (a + b)/2 for J ′ help ((a + b)/2) ≥ 0 and a := (a + b)/2 for J ′ help ((a + b)/2) < 0 until we obtain the wanted accuracy.Finally, we set t := (a + b)/2.The chain rule yields for t ∈ (a, b) Furthermore, we can do these calculations with mixed elements: Remark 6.2.The step-size control is identical with mixed elements.Only the calculation of the direction changes.
Proof.We calculate the initial guess (v 0 , π 0 ) ∈ (W 2 , L 2 0 (Ω)) by replacing |Dv| 2 +δ 2 (p−2)/2 with 10 6 and solving the resulting Stokes problem.This initial guess is divergence-free.Then, we calculate a divergence-free direction w 0 for the velocity and a direction ψ 0 for the pressure.Hence, we have for the convex functional . Additionally, our iteratives v k are divergence-free as a linear combination of the divergence-free v k−1 and w k−1 .
Thus, the step size control is identical with mixed elements.□ Now, we consider the computational effort.The computation consists of three parts: Assembling the matrices, solving the linear system of equations, and calculating the step size.
In [26], multigrid methods are used to solve the p-Stokes equations, even on real-world examples like the Antarctic ice sheet.In [27], the computational effort of multigrid method is discussed.They measured that evaluating the residual takes less computation time than, e.g., assembling the Jacobi matrix or necessary matrix-vector multiplications.For their results, they used a simplified version of the p-Stokes equations.As calculating the residual is not the computationally expensive part also calculating the step size control is computationally affordable compared to the other parts.6.2.Computational domain.A unit square is too simple to represent real-world glaciers.Instead, in the ISMIP-HOM experiments, see [22], a sinusoidal bedrock, see Fig. 1, is suggested, and Dirichlet boundary conditions are imposed at the bedrock.We set α := 0.5 • , L := 5000 m, and define the upper and lower boundary of the domain by x, and z b (x) = z s (x) − 1000 + 500 sin(ωx) with ω = 2π/L.The experiment needs periodic boundary conditions on the left and the right side of the domain and σ • n = 0 at the surface.6.3.Parameters.In this subsection, we discuss setting constants and forcing boundary conditions.We set the constants for the experiment corresponding to the ISMIP-HOM experiments, see [22]: The ice parameter B := 0.5 • (10 −16 ) −1/3 Pa −3 a −1 , the density ρ := 910 kg m −3 , the gravitational acceleration g := (0, −9.81) m s −2 , and the seconds per year 31 556 926.The nonlinear viscosity is (0.5|Dv| 2 + δ 2 ) (p−2)/2 with p := 4/3.The factor 0.5 within the nonlinear term is given by [22].However, we could modify B and δ to formulate the equations as in our convergence analysis before.
Periodic boundary conditions have some difficulties in implementation for unstructured grids.Moreover, they are not necessary for real-world applications.Thus, we extend our domain by three copies to the left and the right.We apply Dirichlet boundary conditions at these boundaries.This approach is suggested in the Supplement of [22].The resulting domain is shown in Fig. 1.
We rotate the gravity to obtain a horizontal upper surface.Formally, we should rotate the left and right boundaries, too.However, this only changes the position in x-direction by a maximum of 1500 m•sin(0.5 • )≈ 13 m.This is neglectable compared to the horizontal extent of 35 000 m. Thus, we simplify the domain with vertical left and right boundaries and a horizontal surface boundary.Moreover, this approach is more flexible because we can simply change the angle α without changing the domain to generate different experiments.Furthermore, changes between using periodic boundary conditions and the variant with copies of the glacier are easier.
The value δ 2 should be small compared to typical values of 0.
On the Fig. 2 are the surface velocities for all used algorithms restricted to the original glacier x ∈ [0, 5000].The surface velocities are approximately the same for all our methods.Moreover, the velocities are similar to [22,Figure 6].In All methods produce nearly the same surface velocity as the result.
compared to the initial norm of the Riesz isomorphism by approximately 5 • 10 3 .Newton's method with Armijo step sizes starts with a quadratic convergence.After a few iterations, this tends to a linear convergence.Then, the norm of the Riesz isomorphism is constant.The norm of the Riesz isomorphism of the Picard iteration decreases linearly but at a slower rate than Newton's method with Armijo step sizes.Newton's method and the Picard iteration with approximately exact step sizes converge similarly fast as Newton's method with Armijo step sizes.
In [4], it was discussed that a higher value of δ leads to higher accuracy and a lower number of necessary iterations.We reproduce this result in Fig. 3. Additionally, we see that the velocity field at the surface is nearly identical for δ := 10 −4 .With δ := 10 −4 , Newton's method is much better than the Picard iteration for both step size controls, see Fig. 4. To verify our claim regarding the computation time, see subsection 6.1, we calculated the mean and the standard deviation for 100 iterations for each method, see Table 1.In our experiment, the computation time with a step size control is within the standard deviation of the computation time for the Picard iteration.Moreover, the step size control needs only 1 % of the whole computation time for each iteration.Another important result is that the Picard iteration and Newton's method need comparable computation times.We fixed the number of integral evaluations for the Armijo step size to a maximum of 20 integral evaluations for each iteration and for the exact step sizes to 25. Newton's method  with a minimum step size of 0.5 needs one or two iterations.Thus, Table 1 implies that both step size controls need a similar computation time for each iteration.We did not try to reduce the number of iterations for the step size because they are not important compared to solving the linear system of equations.until it reaches quadratic convergence.The other initial guess does not resolve this issue.Newton's method with approximately exact step sizes reaches quadratic convergence much earlier.Also, the Picard iteration with approximately exact step sizes is good.For a low friction coefficient, all algorithms behave similarly regarding convergence to experiment ISMIP-HOM B, see Fig. 6, independent of the initial guess.

Conclusion
In conclusion, it is possible to obtain a global convergent Newton method with step size control with a convex functional.Moreover, the approximation of exact step sizes is also good.For a nonlinear sliding boundary, the convergence rate of Newton's method depends on the friction coefficient.Overall, the approximately exact step sizes seem slightly better because the convergence rate was in all experiments good.However, the convergence rate depends on the size of δ.More experiments should be done in three dimensions, with physically motivated sliding coefficients, and with different values of δ.Furthermore, an implementation in ice sheet models could be interesting.
3, with w k := v k+1 − v k .• Newton's method with approximations of exact step sizes, see Definition 5.3.• Newton's method with Armijo step sizes, see Definition 5.2.We can calculate the exact step size arbitrarily precise because we have a convex functional.Let 0 ≤ a < b < ∞.For the actual velocity v k and the direction w k , we can calculate min t∈[a,b] J help (t) := min t∈[a,b]

Figure 1 .
Figure 1.Considered domain in our experiment.

Figure 2 .
Figure 2. The velocity norms at the surface are shown in the left figure.All methods produce nearly the same surface velocity as the result.

Figure 3 .
Figure 3. Left: The relative norms of the Riesz isomorphism are visualized for different values of δ.Right: The velocity norms at the surface are shown in the right figure.

Figure 5 .
Figure 5.The friction coefficient is τ := 10 7 .Left: The initial guess is the same as in experiment ISMIP-HOM B. Right: The initial guess has the additional term Γ b τ v 0 • ϕ ds.

Figure 6 .
Figure 6.The friction coefficient is τ := 10 3 .Left: The initial guess is the same as in experiment ISMIP-HOM B. Right: The initial guess has the additional term Γ b τ v 0 • ϕ ds.

Table 1 .
Computation time each iteration Computation time step size Computation time in seconds.