An adapted deflated conjugate gradient solver for robust extended/generalised finite element solutions of large scale, 3D crack propagation problems

An adapted deflation preconditioner is employed to accelerate the solution of linear systems resulting from the discretization of fracture mechanics problems with well-conditioned extended/generalized finite elements. The deflation space typically used for linear elasticity problems is enriched with additional vectors, accounting for the enrichment functions used, thus effectively removing low frequency components of the error. To further improve performance, deflation is combined, in a multiplicative way, with a block-Jacobi preconditioner, which removes high frequency components of the error as well as linear dependencies introduced by enrichment. The resulting scheme is tested on a series of non-planar crack propagation problems and compared to alternative linear solvers in terms of performance.


Introduction
The eXtended and Generalised Finite Element Methods (XFEM and GFEM respectively) [1,2], form a class of multiscale finite element methods that have significantly contributed to the increase of the level of automation and computational efficiency of crack propagation simulations.These approaches substantially reduce, or even eliminate, the need for re-meshing, allowing small-scale features to be captured efficiently and accurately at a coarse, marco-scale representation.This is accomplished by means of Partition of Unity (PoU) enrichment [3], whereby polynomial approximation spaces of the Finite Element Method (FEM) are supplemented with appropriate additional enrichment functions.Importantly, these enrichment functions encompass known features or micro-scale information of the system.In fracture mechanics, features such as discontinuities and singularities can be represented independently of the underlying Finite Element (FE) mesh, offering significant computational advantages.This has contributed to the popularity of the methods, and has led to their use in a vast array of applications, now documented in numerous review papers [4,5,6,7].
The advantages of the method do not come without a price.Among other challenges, the method can produce ill-conditioned systems of equations, as a result of linear dependencies introduced by enrichment.In the case of large scale, three dimensional simulations, for which parallel iterative solvers are a necessity, this ill-conditioning, without special attention, leads to stagnation (often complete failure) of the solver.To mitigate these issues, several approaches have been proposed, seeking to resolve the linear dependencies introduced by the enrichment functions.For the discontinuous functions used to represent solutions along crack faces [1], strategies include the elimination of the pathological enriched degrees of freedom [8], and particular stabilisation techniques either at the element [9] or system level [10].For enrichment with asymptotic functions, typically used to represent singularities occurring at crack tips/fronts, alternative enrichment functions have been proposed [11,12], alongside enrichment modification strategies, such as degree of freedom gathering [13,14,15], the so-called stable GFEM [16,17,18] and orthogonalisation [19,20].Most of the above strategies seek to improve the conditioning of the resultant system of equations by acting directly on the formulation of the XFEM/GFEM methodology and the choice, selection or representation of the enrichment functions.Stepping back from this specific XFEM/GFEM setting, to the more general challenge of parallel iterative solvers for illconditioned systems of equations, a significant amount of work exists on robust preconditioners for parallel iterative solvers.For a general ill-conditioned system Ku = f , the idea of (left) preconditioning is to determine an operator M −1 for which M −1 K is well-conditioned and M −1 is efficient to construct; allowing a parallel iterative solver, e.g.Conjujate Gradient (pCG), to converge quickly.The most widely used preconditioners for iterative solvers in both commercial and scientific FE codes are Algebraic Multigrid (AMG) methods [21].They have demonstrated excellent scalability for a broad class of problems over thousands of processors, and have the advantage of operating exclusively on the matrix equations, so that they can be applied as a 'black-box'.As a preconditioner, AMG constructs the matrix M by repeatedly coarsening the full matrix K through recursive aggregation over the degrees of freedom.The aggregation process is algebraic and based on the fact that the solution at two neighbouring nodes will be similar if they are 'strongly connected'.The success of an AMG preconditioner depends on this aggregation process.In the vicinity of a crack, and with large simulation divided into connected subdomains, this aggregration can be difficult to design properly.A widely used alternative is additive Schwarz based preconditioners.Single level versions utilise independent solves of the local Dirichlet problem.These are computed in parallel, using robust direct solvers, with the aim of resolving the ill-conditioning of the problem at a local length scale.These can be particularly effective, yet are known to lack robustness in the case of large numbers of subdomains or with respect to large contrast in coefficients.A problem ever present in crack propagation problems.The problems in this case are linked to the fact that the localised solves do not resolve the global low energy modes of the system, meaning the ill-conditioning persists.To overcome this issue, preconditioners can be augmented with an additional step, which involves either deflation methods [22], or second level coarse-solvers.In the former case, the deflection vectors/ coarse basis for the problem can be identified through an analytical solution (e.g.ZEM [22]), while in the latter, optimal approximations the global low energy modes are identified via local eigenproblems, see for example methods involving Generalised Eigenproblems in the Overlap GenEO [23,24,25].It is interesting to note that construction of robust coarse spaces, via GenEO type methods bears strong ties to Generalised Finite Element Methods, as studied by Ma et al. [26].In some respect, this optimal basis is an optimal presentation of enrichment functions, which dooes not suffer from linear dependencies.Yet with this robustness comes the additional computational burden associated to solution of the local eigenvalue problems, which in many cases is an overkill.
Several techniques have also been proposed to improve the performance of iterative solvers for crack propagation problems discretised using the XFEM/GFEM.The preconditioners proposed by Béchet et al. [27] and Menk and Bordas [28] employ a Cholesky decomposition to remove linear dependencies introduced by enrichment.The application of AMG preconditioners to systems produced by the XFEM/GFEM has also been proposed in several works.However, as shown, for instance, in Berger-Vergiat et al. [29], it cannot be performed directly since it results in sub-optimal performance.This is attributed to the fact that discontinuities present along crack faces are not accounted for in the construction of prolongation/restriction operators.To overcome this limitation, some works [29,30,31] employ Schwartz preconditioners to decompose the problem domain into a healthy and a cracked part, for which different solvers, such as AMG or LU, can be used.It should be noted that this type of decomposition has also been exploited for other purposes in the XFEM/GFEM literature, for instance in the seminal work of Bordas and Moran [32,33] it is used to enable the combination of commercial and research codes, while in the global-local GFEM [34] it is employed to construct numerically derived enrichment functions.Other works [35,36] modify prolongation/restriction operators to account for discontinuities along crack surfaces.In a more recent work [37], the block structure of linear systems produced by the XFEM/GFEM is exploited to derive a modified system, where AMG and reanalysis techniques can be applied to accelerate the solution.This block structure is also exploited in the work of Fillmore et al. [38] through specialised block-Jacobi and block-Gauss-Seidel preconditioners.
In this work, a preconditioner based on adapted deflation [39] is proposed for crack propagation problems, discretised with the XFEM/GFEM.The deflation space is constructed based on domain partitioning [40], while additional deflation vectors are introduced to account for the discontinuities introduced by the cracks.The approach is combined with a block-Jacobi preconditioner, which offers several advantages such as high efficiency, straightforward parallelisation, and efficient update for crack propagation problems.Whilst the main target application is non-planar crack propagation, the approach would also be suitable for other types of problems, for instance inverse problems [41], where repeated solutions of linear systems with slight variations, introduced by different crack configurations, are required.
The remainder of the paper is structured as follows: after a brief presentation of the elasticity equations and adopted discretisation scheme in section 2, the different components of the proposed preconditioner are described in section 3, and tested in section 4 through a series of examples involving crack propagation in models of components of increasing geometrical complexity and size.Finally, in section 5, the results are summarised and conclusions are drawn.

Weak form
For linear elastic fracture mechanics, we consider the linear elasticity problem for a cracked domain Ω, whose boundary can be decomposed, as illustrated in Figure 1, in a part where free surface conditions apply (Γ 0 ), a part where displacements ū are imposed as Dirichlet conditions (Γ u ), a part where surface tractions t are applied (Γ t ) and the crack faces Γ c , where free surface conditions also apply.The weak form of the problem, assuming a linear elastic constitutive equation, can be written as: with : and In Equation 1, u is the displacement field, v is the virtual displacement field, D is the Hooke tensor, σ is the Cauchy stress tensor and is the linear strain tensor.

Crack representation
As typically done within the XFEM/GFEM framework, cracks are implicitly represented using the level set method.To this end, two signed distance functions are defined: • The normal level set φ, defined as the signed distance from the crack surface: where n + is the outward normal to the crack surface and sign ( ) is the sign function.
• The tangential level set ψ, defined as the signed distance function satisfying the conditions: where Γ f are the crack front/tips.
Based on these functions, a polar coordinate system, with its origin at the crack front, can be defined: In general, the integration of several evolution equations is required to update level set descriptions of propagating cracks [42], a procedure that can be both complex and computationally demanding.Herein, the vector level set method [43,44,45] is employed, which allows to update the crack description using only geometrical operations, thus simplifying the process.

Discretization
The weak form of Equation 1 is discretized using the standard XFEM displacement approximation with shifted jump enrichment functions and quasi-orthogonalized tip enrichment functions: where N I are the FE interpolation functions, u I are FE degrees of freedom (dofs), and b J , c T J are the enriched degrees of freedom.The shifted jump enrichment functions are defined as: where x J are the nodal coordinates of node J and: The quasi-orthogonalized tip enrichment functions are defined as in one of the previous works of the authoring team [19]: where r I , θ I are the nodal values of the radius r and angle θ and: The nodal sets in Equation 7 are defined as: N is the set of all nodes in the FE mesh.
N j is the set of jump enriched nodes.This nodal set includes all nodes whose support is split in two by the crack.
N t is the set of tip enriched nodes.This nodal set includes all nodes whose support includes the crack front or nodes with r I ≤ r e , with r e a user defined enrichment radius.
It should be mentioned that the above choice of enrichment functions, combined with linear tetrahedral elements, was shown to lead to well conditioned system matrices [19].

Linear system of equations
Substituting the displacement approximation of Equation 7 into the weak form of Equation 1 and carrying out the integrations, yields a system of equations with the following block structure: where K ∈ R n×n is the stiffness matrix, u ∈ R n is the vector of unknowns, f ∈ R n is the load vector, and n is the total number of unknowns.Moreover, u F ∈ R n F contains all the standard degrees of freedom, corresponding to u I from Equation 7, u X ∈ R n X contains all the enriched degrees of freedom, corresponding to b J and c T j from Equation 7, are the parts of the stiffness matrix corresponding to the standard and enriched degrees of freedom, as well as their interaction, are the parts of the load vector corresponding to the standard and enriched degrees of freedom, n F is the number of standard degrees of freedom, n X is the number of enriched degrees of freedom.From the above it should be obvious that n = n F + n X .
While Equation 12does not necessarily reflect the order in which equations are stored in practice, it helps gain some insight regarding the structure of the resulting systems.For instance, K F F will typically contain more than 90% of the total entries in the stiffness matrix and will remain constant during a crack propagation simulation, with only the other three blocks changing as cracks propagate and additional nodes are enriched.

The conjugate gradient method
For two dimensional or small three dimensional problems, the linear system of ( 12) can be solved efficiently using direct solvers.However, for larger three dimensional problems, where direct solvers can be inefficient due to excessive memory requirements, iterative solvers become an attractive alternative.For symmetric systems, the most widely used alternative is the conjugate gradient method.As shown in algorithm 1, the method iteratively improves upon an initial solution u 0 by adding vectors p i , which are conjugate with respect to K. Since the number of iterations required to obtain an accurate solution depends on the conditioning of K, a preconditioner M is typically employed, the choice of which can significantly affect the efficiency of the solver.In what follows, a deflation based preconditioner, specifically tailored for systems of the form of ( 12) is presented.
Algorithm 1: The conjugate gradient method.

Deflation
Given a basis for a subspace of R n×n , deflation [46,47] aims at accelerating the iterative solution of linear systems, such as the one of Equation 12, by solving a modified system, of the form: where: with W ∈ R n×k a matrix containing a basis of a subspace of R n , with k n.It can be shown that if W is full rank, and K is symmetric positive definite (SPD), the system of Equation 13can be solved using the conjugate gradient (CG) method, while the solution for the original system can be recovered as: In Equation 13, P projects the subspace defined by W out of the original system.While it can be shown that any full rank matrix will result in some reduction of the number of iterations required by the CG method, typically, columns of W are selected to approximate eigenvectors corresponding to the smallest eigenvalues of K. Since the projection of Equation 13essentially removes the corresponding eigenvalues of K, such a choice effectively reduces its condition number, resulting in a reduced number of CG iterations.More details about the construction of W for the problems considered herein are given in subsection 3.3.
Deflation is also related to multi-grid methods, more specifically, W T and W can be seen as prolongation and restriction operators respectively [48].Then, W T KW can be viewed as a coarse grid matrix and the application of Q to the residual during the iterative solution of the system represents a coarse grid correction, aiming at removing low frequency components of the error.
Deflation, viewed either as a projection or a coarse grid correction, can be combined to preconditioning in an additive or multiplicative way, resulting in multiple alternatives.For instance, applying deflation to a system preconditioned with an arbitrary SPD preconditioner M −1 results in the deflated CG algorithm introduced by Nicolaides [46].
In Tang et al. [48], a comparison of several alternatives was conducted, and it was concluded that a multiplicative combination of a general preconditioner with the coarse grid correction Q, as discussed, for instance, in Smith et al. [39], results in the most robust and efficient method.
The combination, termed "Adapted Deflation Variant 2" (A-DEF2) results in a preconditioner of the form: where M −1 can be an arbitrary preconditioner.The operator is not SPD, however it was shown to be equivalent to other symmetric operators, including the one corresponding to the deflated CG [46], provided that the following initial solution is used: where u −1 is some arbitrary initial solution vector.
In algorithm 2, the efficient computation of this initial solution is described, while in algorithm 3 the steps required to efficiently apply A-DEF2 as a preconditioner, given a residual r are described.It should be mentioned that, both of these can be used in conjunction to the standard CG method, described in algorithm 1.
Algorithm 2: Initial solution to be used with the A-DEF2 preconditioner.

Deflation space
The choice of deflation space is critical to the performance of deflation-based preconditioners.To this end, different approaches have been proposed to efficiently approximate lower 2 Solve: W T KW µ = W T Ky; 3 Solve: W T KW λ = W T r; eigenvalues of linear systems resulting from the discretisation of different problems.Extending the categorisation of Diaz-Cortes et al. [49], the following techniques can be identified for constructing deflation spaces: Recycling deflation In this technique, the most effective vectors of Krylov-subspaces used in previous solutions are identified and re-used as deflation vectors [50].
Subdomain deflation Using this approach, the problem domain is divided into subdomains and vectors corresponding to a constant solution in each subdomain are used to form the deflation space [46].

Multigrid and multilevel deflation
As mentioned in subsection 3.2, deflation spaces can also be seen as prolongation/restriction operators [48].Thus prolongation/restriction operators constructed for multigrid/multilevel methods can also be used as deflation vectors.
POD based deflation Using this approach, solutions from problems similar to the problem to be solved are compressed using the Proper Orthogonal Decomposition (POD) and used as deflation vectors [49].It should be mentioned that the POD can also be used in combination to Krylov solvers in reduced order modelling methods, such as the ones proposed by Kerfriden et al. [51,52], where a limited number of CG iterations are employed to enrich lower dimensional spaces used for fracture problems.
Herein we employ subdomain deflation, while we enrich the deflation space with additional vectors to account for cracks.

Deflation vectors for linear elasticity
For general linear elasticity problems, subdomain-based deflation spaces can be constructed by considering rigid body translations and rotations of each subdomain [40].This approach results in deflation matrices assuming the following values for each node of the subdomain of interest: where x I , y I , z I are the spatial coordinates of node I.
To render deflation effective, the problem domain should be divided evenly and subdomains should also be contiguous.In our implementation, METIS [53] is used for this task, which offers additional features, that, as described in subsection 3.4, can be used to further accelerate the solution.

Enriched deflation vectors for XFEM/GFEM
For enriched approximations, we introduce a set of additional deflation vectors to account for the presence of the crack.These additional modes are only defined in subdomains containing enriched nodes and are derived by considering modes of deformation corresponding to the enrichment functions.For instance, for the modified Heaviside enrichment function the following modes are considered: These modes, in accordance to the modes of Equation 18 correspond to rigid translation and rotation of the two parts of the subdomain, lying above and below the crack, as illustrated in Figure 2.
In order to derive the corresponding nodal values of the resulting deflation vectors, we start with the first mode considered, as well as the displacement approximation of Equation 7.Then, we observe that the following values for the standard and enriched degrees of freedom satisfy  the first of Equation 19: Similarly, the nodal values for the remaining modes can be derived, leading to the following deflation matrices: W enra corresponding to the standard and enriched degrees of freedom respectively.

Preconditioning and mesh partitioning
As shown in subsection 3.2, deflation can be combined with a preconditioner to further increase efficiency.Since lower eigenvalues of the system matrix are removed through deflation, the preconditioner should ideally remove higher eigenvalues to effectively reduce the condition number.If viewed as a multi-grid method, then the preconditioner assumes the role of a smoother, aiming at removing high frequency components of the error.Herein, this task is performed by a block-Jacobi preconditioner, which, additionally, can be computed completely in parallel.Block-Jacobi preconditioners approximate the system matrix, and subsequently its inverse, by a block diagonal matrix, obtained by removing off-diagonal blocks from the original matrix: where blocks K ij can be obtained by grouping together sets of unknowns.Since K is, in practice, sparse, the specific way in which unknowns are grouped can affect the performance of the preconditioner dramatically.More specifically, grouping dofs of neighbouring nodes together results in most of the off-diagonal blocks of K being zero, rendering the approximation of Equation 24 more accurate.Therefore, groups can be selected to minimise the number of non-zero off-diagonal terms.To maximise the effectiveness the block Jacobi preconditioner, METIS [53] is used to partition the mesh into subdomains, while minimising interactions between subdomains.
Regarding subdomains containing enriched nodes, two aternatives would be possible.The first would consist of constructing different blocks for the standard and enriched part of the stiffness matrix, and would resemble the approach used in some existing works dealing with preconditioning for XFEM/GFEM [28,38].This approach would have the advantage of keeping the part of the preconditioner corresponding to the standard part of the approximation unaltered during a crack propagation simulation.The second alternative, which is chosen here, consists of modifying blocks of enriched subdomains to include enriched degrees of freedom, leading to the following structure: where K F F ii , K XXii , K F Xii are the standard, enriched and interaction parts of block ii of the stiffness matrix.In the above, it is assumed that subdomain i, to which block ii corresponds, contains enriched degrees of freedom, while subdomains 1,2 and n b do not.
The above choice can be justified by the fact that including enriched degrees of freedom in each block allows the preconditioner to remove linear dependencies between the standard and enriched part of the approximation, significantly decreasing the condition number of the initial matrix.Furthermore, the computational overhead for updating enriched blocks of the preconditioner in crack propagation simulations is small since, as also mentioned in subsection 2.4, the number of enriched degrees of freedom, and as a result enriched subdomains, is only a small fraction of the total number of unknowns, and subdomains respectively.

Numerical examples
Implementation.The proposed approach is implemented in our in-house C++ code using the Eigen [54] library for linear algebra operations.The direct solutions employed in Algorithms 2 and 3 are carried out with Pardiso [55,56,57], a state of the art parallel direct solver from Intel, while the direct solutions required for the block Jacobi preconditioner are carried out using the Cholesky solver offered by Eigen [54], which, for small systems, as the ones arising within the preconditioner, is more efficient than Pardiso.The only parts of the solver/preconditioner that are implemented in parallel, are the matrix-vector multiplications involved in the CG algorithm, for which the shared memory parallelisation offered by Eigen is employed, and the block Jacobi preconditioner, for which, OpenMP is employed to carry out factorisations and backward substitutions in parallel.The relative tolerance for the CG solver is set to 10 −8 .
It should be noted that adopted meshes are not always the optimal choice for the problem tested, rather they are aimed at illustrating the performance of the proposed approach for systems of varying size.Optimised meshes can be obtained through local refinement at the vicinity of the crack or even through adaptive refinement based on error estimators [61,62,63].
All of the following examples were ran on a workstation equipped with an Intel Xeon E3-1275 quad core processor, running at 3.80GHz, and 32GB of memory.
Crack propagation.Crack propagation lengths are computed using a Paris law, as described in Agathos et al. [45] and crack propagation angles are computed using the maximum circumferential stress criterion.Stress Intensity Factors (SIFs) and energy release rates, are computed using the interaction integral method, as described in Agathos et al. [15].

Notched beam under three point bending
The first example involves crack propagation in a notched beam under three point bending, which has already been the subject of numerical [64,44] and experimental studies [65].As illustrated in Figure 3 Three different levels of mesh refinement are used, the statistics of which are provided in Table 1.Moreover, in Figure 4, the mesh with mesh size h = 1mm is illustrated along with the corresponding subdomains for two different cases.For all cases, a geometrical enrichment strategy is used with an enrichment radius r e = 3mm.

Effectiveness and robustness of the proposed preconditioning scheme and deflation space for enriched approximations
As a first test, in Figures 5 and 6, the effectiveness of both the proposed preconditioner and enriched deflation space are assessed in terms of numbers or iterations and wall time for the second level of mesh refinement (h = 1mm) used and different numbers of subdomains.More  specifically, results of Figure 5, refer to a beam without a notch, while in Figure 6 the proposed approach is tested for the beam with the notch depicted in Figure 3, which introduces enriched degrees of freedom.As a baseline for the comparisons, results were also obtained using a Jacobi preconditioner and Pardiso, which might often be the preferred option for problems of this size.
For the Jacobi preconditioner, the single subdomain case corresponds to no deflation, while for the block Jacobi preconditioner, this case was not included since it would correspond to a solution with a direct solver.
Regarding the numbers of iterations, the Jacobi preconditioner results in the typical behaviour observed for deflated CG solvers, with iterations decreasing as the number of subdomains increases.However, it should be noted that, for the enriched case (Figure 6), this decrease is not as substantial as a result of linear dependences introduced by enrichment.The block Jacobi preconditioner results in a significant reduction in the number of iterations, especially in the enriched case, since it is much more effective in removing high frequency components of the error in each subdomain by accounting for interactions between different dofs.Moreover, the dependence of the required iterations on the number of subdomains is less significant, but more complex, consisting of an initial increase (2-10 subdomains), followed by a decrease (10-1,000 subdomains) and then a second increase (1,000-10,000 subdomains).This can be attributed to the effect of the decreasing block size as the number of subdomains increase.More specifically, for small numbers of subdomains, blocks are large, and the preconditioner is a very accurate approximation of the system matrix, leading to a reduced number of iterations.As a limit case, if only one subdomain was to be used, the solver would converge in a single iteration.On the other hand, as the number of subdomains increases, blocks become smaller and the block preconditioner becomes less and less accurate, the limit case being the Jacobi preconditioner.
This trend is confirmed in both Figure 6 and 5, where the numbers of iterations for the two  cases tend towards the same value as subdomains increase.Between the two aforementioned regimes, the effect of preconditioning is combined to the effect of deflation leading to a decrease in the number of iterations.
In terms of wall time, in all cases it initially decreases as the number of subdomains increase, but tends to increase again after a certain point, as a result of iterations becoming more expensive for very large numbers of subdomains and the number of required iterations increasing in the case of the block Jacobi preconditioner.For the block Jacobi preconditioner, the initially increased computational time is attributed largely to the high cost of factorising blocks of the system matrix and performing backward substitutions.In fact, the peak in performance occurs for a block size, for which the direct solver used for these factorisations is the most efficient.Furthermore, while in the standard FE case (no enrichment, Figure 5) both preconditioners offer comparable performance, in the enriched case (Figure 6) the block Jacobi preconditioner is significantly more efficient, since it requires fewer iterations.This reduction comes as a result of linear dependences between enriched and standard dofs being removed within each block.
The use of an enriched deflation space always provides an additional improvement in terms of both iterations and wall time.When used in combination to the block Jacobi preconditioner, this improvement is more substantial.In terms of computational time, when used with an optimal number of subdomains, the proposed approach can be more efficient even than the direct solver, while compared to the baseline Jacobi preconditioned CG (1 subdomain case), it can be over 35 times as fast.
In Figure 7, the relative residual of the solution is given for different numbers of CG iterations  for all the alternatives considered and 500 subdomains.The proposed combination of enriched deflation vectors and block Jacobi preconditioning achieves the best performance, allowing to efficiently solve the problem even for very low tolerances.
Although our current implementation is not optimised for parallel performance and scalability, in Figure 8 a comparison is performed between the proposed approach and Pardiso, in terms of wall time, for different numbers of CPU threads.For the proposed approach 500 subdomains were used, which were found to yield the best performance.As can be seen, the proposed approach is more efficient, but it does not scale as well as Pardiso.This can be attributed to the fact that there are several parts of the solver, for instance mesh partitioning, that are not implemented in parallel, but are still accounted for in the computational time to provide a fair comparison.

Performance of the proposed scheme for different levels of mesh refinement
Next, the performance of the proposed scheme is tested for different mesh densities to provide some insight regarding the scalability of the approach.The results are given in Figure 9 in terms of iterations for different numbers of unknowns.Each point in the figure corresponds to one of the meshes of Table 1, while different lines correspond to different ratios between the total number of unknowns (n) and the number of subdomains (n sd ).Thus, for each line in Figure 9 the number of dofs per subdomain is kept roughly constant for the different mesh densities.As can be seen, although there are some fluctuations, the number of iterations does not change significantly for different system sizes, indicating that the proposed approach is      scalable.

Performance of the proposed scheme for propagating cracks
As a final test, the proposed approach is tested for a propagating crack.In Figure 10, the deformed mesh is illustrated for the initial crack and after 30 steps of crack propagation, while in Figure 11 the number of iterations and wall time required by the linear solver at each crack propagation step are given for the different alternatives considered using 500 subdomains.In general, similar trends to Figure 6 can be observed, however, crack propagation tends to affect the preconditioners considered in different ways.In particular, in all cases, the number of iterations and wall time tend to increase as the crack propagates, which can be attributed to the increased number of degrees of freedom and its effect on the solution, which, as illustrated in Figure 10, can be quite substantial.The Jacobi preconditioner, exhibits a much more oscillatory behaviour, indicating that its performance is strongly affected by the way in which subdomains are intersected by the crack.While enriching the deflation space improves the situation in terms of overall performance, it does not lead to a less oscillatory behaviour.On the other hand, the block Jacobi preconditioner is much more robust with respect to crack propagation and when combined to an enriched deflation space it leads to a further improvement in performance.
It should be noticed that, the difference between the proposed approach and Pardiso in Figure 11 is more pronounced than in Figure 6, since the timings reported in Figure 6 also include the time needed for domain partitioning, which in the crack propagation case only needs to be performed once.As an indication of the difference in performance, the total time     required for linear solutions for this example using Pardiso is 262s, while using the proposed approach 144s, including the time required for partitioning and all other necessary tasks, such as computation of the standard and enriched deflation vectors, projection of the system matrix in the deflation space, initialisation and update of the preconditioner.Furthermore, this difference could have been even more pronounced if a less conservative tolerance was used for the iterative solver, which based on our experiments might have been possible without affecting the accuracy of the results.

Notched bearing bracket
The next example involves the geometry of a bearing bracket, downloaded from grabcad.com and illustrated in Figure 12.The same material properties as the previous example are used: E = 2.1 × 10 5 N/mm 2 , ν = 0.3.Zero displacements are imposed at the two holes in the bottom (highlighted in blue) and uniform surface tractions are applied in the horizontal direction at the top hole (highlighted in red).The initial location of the notch is shown in Figure 15a.
An unstructured mesh with mesh size h = 1mm, whose statistics are given in Table 2, is used.For the enriched part, a geometrical enrichment strategy is used with an enrichment radius r e = 3mm.4.2.1.Effectiveness and robustness of the proposed preconditioning scheme for enriched approximations For this example, only the enriched deflation space, combined with the Jacobi and block Jacobi preconditioners is considered.The standard deflation space is not considered, since in  to a Jacobi and a block Jacobi preconditioner for the bracket without a notch.
the previous example it was shown to be less effective than the enriched one, while the direct solver would be inefficient due to excessive memory requirements.
In Figures 13 and 14, the performance of the two preconditioners is illustrated for different numbers of subdomains in terms of iterations and computational time for the bracket without and with a notch respectively.In this example, due to the small size of the notch compared to the component, the response is only slightly affected, resulting in identical numbers of iterations for the healthy and cracked case and almost identical computational times.Moreover, the block Jacobi preconditioner reduces the number of required iterations by almost a full order of magnitude, compared to the Jacobi preconditioner.For higher numbers of subdomains, this also translates to a significant decrease in computational time.

Performance of the proposed scheme for propagating cracks
Similar to the previous example, the final test for this example involves a propagating crack.In Figure 15, the deformed mesh is illustrated for the initial crack and after 15 crack propagation steps.In Figure 16 the number of iterations and wall time required at each crack propagation step are given for the two alternatives considered using 5,000 subdomains.
As shown in the previous paragraph, the proposed block Jacobi preconditioner is much more efficient for this example.Furthermore, in both cases the number of iterations, and, as a result, the computational time required, remains almost unaffected as the crack propagates.

Cracked compressor blade
In the final example, the geometry of a high-pressure compressor blade, obtained from grabcad.com, is considered, as illustrated in Figure 17.The material properties used are: E = 2.1×10 5 N/mm 2 , ν = 0.3, while boundary conditions applied are illustrated in Figure 17a, where blue areas correspond to fixed displacements and the red one to uniform tractions in the vertical direction.
The initial crack considered is circular 1mm crack, introduced at the location illustrated in Figure 19a.An unstructured mesh with size in the range h = 0.2 − 2mm is used, as illustrated in Figure 17b, whose statistics are given in Table 3.The smaller mesh size is used in the area around the crack to accurately represent its geometry and resulting stress, strain and displacement fields, as well as at other locations of the blade to represent smaller geometrical features.For the enriched part, the enrichment radius is set to r e = 0.2mm since it was observed that higher enrichment radii led to fluctuations of the crack surface along the boundaries of the domain.

Effectiveness and robustness of the proposed preconditioning scheme for enriched approximations
For this example, only results for the block Jacobi preconditioner are reported in Figure 18, in terms of the number of iterations and wall time required for the blade without and with the crack.The behaviour observed is slightly different from the previous examples, since the number of iterations seems to be decreasing throughout the full range of subdomain numbers tested, while wall time initially decreases and subsequently increases again.This can be attributed to a number of factors.Firstly, the time reported includes the time for partitioning the domain as well as the time for initialising the preconditioner, both of which are quite substantial for this example and can affect the overall timing.In fact, the time required for the solution itself is significantly smaller and keeps decreasing for the 10,000 subdomain case, as will be shown in the following paragraph.Additionally, for the numbers of subdomains required for this example, the size of the linear systems of Algorithms 2 and 3 becomes quite substantial.Since these linear systems have to be solved at each iteration, they can slow down the preconditioner, especially as the number of subdomains increases.
In the present example, the numbers of iterations required for the uncracked and cracked case are identical, since, in a similar fashion to the previous example, the crack is very small compared to the domain considered and affects the solution only locally.The differences in wall time observed, can be attributed to the fact that, in our implementation, the preconditioner is initialised without considering the crack and then updated at each crack propagation step by factorising the blocks of the stiffness matrix corresponding to enriched subdomains.

Performance of the proposed scheme for propagating cracks
As a final test the initial 1mm crack, illustrated in Figure 19a, is propagated for 60 steps, until it almost penetrates the blade, as shown in Figure 19c.The deformed blade is also illustrated for the initial and final state in Figures 19b and 19d respectively, where it can be noticed that the propagated crack substantially alters the response of the blade.For this test, 10,000 subdomains were used requiring between 76 and 77 iterations and 177s to 178s per solution throughout the simulation.Similar to the previous example, both the number of iterations and the wall time required were only minimally affected by crack propagation.

Conclusions and discussion
A preconditioning strategy was proposed to accelerate the iterative solution of linear systems resulting from the discretization of fracture mechanics problems with extended/generalised finite elements.The approach involves deflation and a block Jacobi preconditioner combined in a multiplicative way.The deflation space typically used for linear elasticity problems is enriched to account for discontinuities and singularities introduced in the solution by enrichment, thus effectively removing low frequency components of the error.The block Jacobi preconditioner is used to remove high frequency components of the error, as well as linear dependencies associated with enrichment, by accounting for interactions between different dofs.
A series of numerical studies were conducted to test the performance of the proposed scheme,  where the following trends were observed: • Enriching the deflation space consistently improves the performance of the preconditioner both in terms of iterations required and computational time.
• The proposed block Jacobi preconditioner effectively removes linear dependencies introduced by enrichment, leading to substantial improvements in iteration numbers and computational time when compared to a Jacobi preconditioner, which is a common choice for deflated CG methods.
• For the smaller systems tested (≈500,000 dofs), the approach outperforms state of the art direct solvers.Especially for crack propagation, where the preconditioner can be set up once at the beginning of the simulation, this difference in performance can be quite substantial.
• For the larger systems tested, where cracks are small compared to the size of the components considered, the number of iterations and commputational time required at each crack propagation step remains almost unaffected as cracks propagate.
While all of the methods employed in the current work are very well suited for parallel computing, the current implementation only makes limited use of this potential.Therefore, as a direction of future work, a fully optimised implementation of the methods will be developed, utilising distributed memory parallelism, to improve performance.Moreover, since the current approach is essentially a two level preconditioner, future work will focus on extending to a multi-level method, in order to improve scalability and allow the solution of larger problems.
Finally, the possibility of re-using solutions obtained in previous crack propagation steps to accelerate the solution of subsequent steps will be explored.

Figure 1 :
Figure 1: Cracked body and boundary conditions.

Figure 2 :
Figure 2: Standard and enriched deflation modes for a subdomain.
, the notch is not normal to the axis of the beam, thus resulting in non-planar crack propagation.The geometrical parameters of the problem are L = 260 mm, L 1 = 240 mm, H = 60 mm, d = 10 mm, α = 20 mm, β = 0 • , 45 • , while the material parameters are E = 2.1 × 10 5 N/mm 2 , ν = 0.3 and the applied load P = 2 kN.

Figure 4 :
Figure 4: Notched beam under three point bending.Subdomains used for constructing the deflation space for two different cases for the mesh with h = 1mm.Different colours correspond to different subdomains.

Figure 5 :
Figure 5: Notched beam under three point bending.Performance of deflation in conjunction to a Jacobi and a block Jacobi preconditioner for the beam without a notch.

Figure 6 :
Figure 6: Notched beam under three point bending.Performance of deflation in conjunction to a Jacobi and the proposed block Jacobi preconditioner with the standard and enriched deflation space.

Figure 7 :
Figure 7: Notched beam under three point bending.Relative residual of the CG solver for different numbers of iterations for all considered cases and 500 subdomains.

Figure 8 :
Figure 8: Notched beam under three point bending.Wall time for different number of cpu threads for Pardiso and the proposed approach using 500 subdomains.

Figure 9 :
Figure 9: Notched beam under three point bending.Number of iterations required using the proposed preconditioner for systems of different size and different numbers of subdomains.For each of the lines the ratio between the toal number of unknowns and the number of subdomains (n/n sd ) is kept roughly constant, resulting in subdomains of similar size.

Figure 10 :
Figure 10: Notched beam under three point bending.Deformed mesh for the initial and propagated crack after 30 steps of crack propagation for the mesh with h = 1mm.

Figure 11 :
Figure 11: Notched beam under three point bending.Number of iterations and wall time at each step of crack propagation for all alternatives considered using 500 subdomains.

Figure 12 :
Figure 12: Notched bearing bracket geometry and boundary conditions.Blue colour corresponds to fixed displacements and red to a uniform traction in the vertical direction.

Figure 13 :
Figure 13: Notched bearing bracket.Performance of deflation with an enriched deflation space in conjunction

Figure 14 :
Figure 14: Notched bearing bracket.Performance of deflation with an enriched deflation space in conjunction to a Jacobi and a block Jacobi preconditioner.
After 15 crack propagation steps.

Figure 15 :
Figure 15: Notched bearing bracket.Deformed mesh for the initial and propagated crack after 15 steps of crack propagation.

Figure 16 :
Figure 16: Notched bearing bracket.Number of iterations and wall time at each step of crack propagation for the alternatives considered using 5,000 subdomains.

( a )
Cracked compressor blade geometry and boundary conditions.Blue colour corresponds to fixed displacements and red to a uniform traction in the vertical direction.(b) Cracked compressor blade mesh.

Figure 18 :
Figure 18: Cracked compressor blade.Performance of deflation in conjunction to a block Jacobi preconditioner for the cracked and uncracked blade.

Figure 19 :
Figure 19: Cracked compressor blade.Section with highlighted crack surface and deformed mesh for the initial and propagated crack after 60 steps of crack propagation.

Table 1 :
Notched beam under three point bending.Mesh statistics for three different refinement levels, corresponding to different mesh sizes h.The number of enriched dofs refers to the initial configuration, before any crack propagation occurs.

Table 2 :
Notched bearing bracket.Mesh statistics for the mesh used.The number of enriched dofs refers to the initial configuration, before any crack propagation occurs.

Table 3 :
Cracked compressor blade.Mesh statistics for the mesh used.The number of enriched dofs refers to the initial configuration, before any crack propagation occurs.