A robust algorithm for rate-independent crystal plasticity

A new stable return-mapping algorithm enables crystal-plasticity solutions by using a regularized yield surface with very large exponents, for which the rate-independent limit of the Schmid assumption in practice is reached. Numerical stability is enabled by an improved initial guess for the stress solution and by applying a line search for each Newton iteration. A hypo-elastic–plastic corotational formulation is chosen, where the tensors are contracted in a way that naturally degenerate to the rigid plastic formulation. The consistent algorithmic tangent modulus is derived, and a fast and very stable open-source implicit implementation into a finite element software is explained and demonstrated for simulations of the necking of a single crystal and for deformation of a polycrystalline representative volume element. The simulations run stable allowing large time steps. Hence, the simulation times are significant shorter than for explicit finite element simulations. The framework enables use of arbitrary types of slip systems. As a demonstration, the importance and interpretation of the yield surface exponent and the asymptotic limit of very large exponent are discussed for bcc crystals with { 110 } ⟨ 111 ⟩ , { 121 } ⟨ 111 ⟩ and { 132 } ⟨ 111 ⟩ slip systems.


Introduction
The detailed modeling of metal plasticity, e.g. by finite element codes, requires a crystal-plasticity framework that can efficiently incorporate arbitrary slip systems as well as account for crystal elasticity.In practice, compromises must be made between model complexity and calculation time.This work will be limited to the formulation and demonstration of a stable numerical algorithm for a rate-independent crystal plasticity, without complex latent hardening of the slip systems [1], and without non-Schmid effects, see e.g.Soare [2].However, the framework is not limited to these simplifications.
The starting point for rate-independent crystal-plasticity theories is the Schmid assumption, [3].Mathematically, this can be expressed as a multi-surface formulation with one yield criterion for each shear stress τ α , with a corresponding critical resolved shear stress, τ c α , on each slip system, α.
With infinitely many slip systems, the criteria would correspond exactly to the isotropic Tresca yield criterion.However, slip is restricted to certain crystallographic planes and the densely packed atom directions in the crystal.
Hence, the yield stress of a crystal is highly anisotropic, where the inner envelope of the Schmid criteria defines a crystal yield surface with sharp corners.
At sufficiently low temperatures in densely packed crystal structures, dislocation glide will occur on the densely packed atom planes, due to their low Peierls barrier.However, at elevated temperatures the Peierls-Nabarro contribution to the critical resolved shear stress for glide on non-densely packed slip planes decreases, enabling slip to occur also on these more narrowly spaced slip planes.However, in non-densely packed structures without densely packed slip planes, e.g.ferritic steel with BCC structure, several slip systems with their respective critical resolved shear stress, will compete, even at room temperature.Employing additional slip systems results in a more complex crystal yield surface with more corners and facets.In hcp metals there are only three slip systems available in the densely packed planes, hence also here slip in less densely packed planes must be included in a model [4,5].Furthermore, twinning can be incorporated as pseudo-slip systems [6,7].To handle mathematically all mentioned cases, critical resolved shear stresses for slip to occur on different slip planes will be distinguished and be allowed to have individual work hardening in the model considered in this work.
The rule of normality i.e. associated flow, where the plastic rate-of-deformation is normal to the yield surface, holds for each facet on the crystal yield surface.However, in the rate-independent limit, several solutions meet in a yield-surface corner, and the Taylor ambiguity occurs.In the rate-independent theories, a variety of ambiguity solutions have been suggested, as discussed in the review by Mánik and Holmedal [8].
In general, the critical resolved shear stress depends on the shear rates of the slip systems, but at room temperature the strain-rate sensitivity is low, and the metal is commonly assumed to be rate-independent, which will be assumed in this work.However, the rate-independent models will always be simplifications of rate-sensitive models, hence it is important to understand the simplifications being made.Firstly, it is important to distinguish the instant strainrate sensitivity from the strain-rate sensitivity that influences the work hardening and consequently needs a certain amount of strain to change the critical resolved shear stress.The latter can be captured by a rate-insensitive model.
The origin of the instant strain-rate sensitivity is that the critical shear stress of a slip system depends on the shear rate of the same slip system.The popular viscoplastic power law [9,10] in Eq. ( 2) is an example, for recent CPFEM applications, see e.g.[11][12][13][14][15].
In this model there is no threshold for the critical resolved shear stress, and τ α = τ c α is the resolved shear stress.Furthermore, τ 0 α and γ0 are constants (that can be strain dependent) and m is the instant strain-rate sensitivity.For the CPFEM with the viscoplastic model, efficient implementations [16][17][18][19][20][21][22], comparison studies of different algorithms [23][24][25] and an extensive review [26] have been reported.Even higher numerical efficiency has been achieved by spectral solvers utilizing the fast Fourier transform, but then limited to cyclic boundary conditions [27][28][29][30][31][32][33].However, the viscoplastic model equations are increasingly difficult to solve numerically for small values of the strain rate sensitivity, m.So far, most of the numerical algorithms for solving the viscoplastic equations have not been optimized for dealing with the rate independent limit m → 0. The speed and stability of the CPFEM and spectral implementations worsen as m decreases.One approximation used to deal with small m, is to first perform expensive calculations for the crystal in the crystal coordinate system, and then map these solutions by a spectral representation, as suggested by Knezevic, Al-Harbi and Kalidindi [34] and applied in CPFEM by Zecevic, McCabe and Knezevic [35].Another approach to deal with arbitrarily small m, was suggested by Knezevic, Zecevic, Beyerlein and Lebensohn [36], by first calculating solutions by a relatively large m ≈ 0.05, and then obtain solutions for lower values of m by a scaling relation that applies to the viscoplastic law.As pointed out by Mánik and Holmedal [37] for the case of 12 slip systems in fcc, the slip systems that are most activated, in most cases do not change for m ≤ 0.1.Since the 56 corner solutions in fcc are quite well separated, this method works for most combinations of a given strain path and grain orientation.However, when more slip systems are activated, e.g. in bcc, some of the corners will disappear with lower values of m.Anyhow, even with m = 0.05, the time steps that can be made by implicit finite element integration are limited, and the line-search and choice of the initial guess, as proposed in the current work, will be very beneficial.
The instant strain-rate sensitivity, m, influences the crystal yield surface in two different ways.Firstly, it rounds off the corners of the crystal yield surface, which for rate-dependent models can be defined as the iso-surface with constant internal work and internal work rate.Note, that even when the strain-rate sensitivity is very low, say at room temperature, the round-off of the corners may still be significant.Secondly, the rate dependency causes the crystal yield surface to expand with increasing strain rate.At room temperature this expansion is very weak for most metals and justifies the use of a rate-independent yield surface.
However, the rounded corners must still be accounted for, as argued by Holmedal [38].Eq. ( 2) can be derived from a potential [38,39] proportional to the internal work rate and shape invariant for different internal work rates.As pointed out by Holmedal [38], an iso value of this potential corresponds to the regularized yield surface proposed by Gambin and Arminjon [40][41][42][43][44]. Since the plastic strain rate equals the gradient of the potential, the associated flow rule then applies.Hence, in the limit of m → 0, this model degenerates to a solution of the rate-independent Schmid model.However, since the Schmid model suffers from the Taylor ambiguity, this rate-independent solution is not unique.The physical interpretation of that is that different models for the strain-rate dependency gives different ambiguity solutions.
To avoid the Taylor ambiguity and the corresponding singularities, the regularization by Gambin/Arminjon has been applied in many rate-independent crystal-plasticity finite element model (CPFEM) implementations [45][46][47].Alternatively, a regularized yield surface, based on the approach by Kreisselmeier and Steinhauser [48], has also been commonly applied [49][50][51][52][53][54].Note that in the limit of large yield surface exponents, this approach becomes similar as the yield surface by Gambin and therefor degenerates to the same Taylor ambiguity solution, i.e. the one corresponding to the viscoplastic power law in Eq. (2).
Another rate-dependent model is the viscous over-stress model.
Here τ 0 α is a true athermal yield stress and η −1 is the viscosity of the metal.In the limit η → ∞ this model degenerates to a solution of the rate-independent Schmid model.Note, that this corresponds to another ambiguity solution than the viscoplastic power law in Eq. (2).Implementations have been made by Schmidt-Baldassari [55] and by [55,56], in which a minimization of an augmented Lagrangian is used to approximately obtain the rateindependent limit (η → ∞) of the viscous assumption in Eq. ( 3).Following the arguments by Mánik and Holmedal [8], this ambiguity solution must correspond to the one obtained by either quadratic programming [8,57] or by singular value decomposition.[58,59].
Not all ambiguity solutions have a physical interpretation.Many are simply efficient mathematical means to obtain a well-posed, non-singular mathematical problem.For an overview of various approaches, the reader is referred to recent reviews, [37,56,60].
In models for predicting texture evolution during fabrication, e.g.rolling or extrusion, rate-independent statistical aggregate models are useful, i.e. the classical full-constraint Taylor model [61], and the more recent advanced Taylor type models; the ALamel model [62], the ALamel3 model [8] the GIA/RGC model [63,64] and rateindependent self-consistent models, e.g.[65][66][67][68].In texture-prediction applications, elasticity is not important and plastic formulations without elasticity are commonly applied.A newly proposed way of contracting tensors [69] will be applied in this paper, where a rigid plastic formulation will follow naturally as a special case.
Even with the computer capacity available today, CPFEM simulations are challenging.The coupling between the elements leads to a large system of non-linear coupled equations to be numerically integrated one time-step at a time.This can be obtained, using either explicit or implicit finite element methods.Due to the stiff nature of the involved system of equations, explicit time stepping is restricted to very small time-steps, even with carefully use of mass scaling.In explicit CPFEM solvers, a major part of the computer time required for the calculations is related to calculating locally for each integration point the stress tensor and the lattice rotation for a given time step, i.e. as prescribed in a user-defined subroutine.However, with implicit time stepping, most of the computer time is spent solving the global finite element equations.To compete with explicit numerical integration schemes, the implicit schemes must be sufficiently stable to allow order of magnitude larger time steps.A stable return mapping algorithm is the key.
At room temperature, a realistic exponent for this yield surface is orders of magnitude larger than for highexponent yield surfaces applied in continuum plasticity.Numerical convergence of the return mapping algorithm is more and more difficult with increasing yield surface exponents and has until now been a major numerical challenge.However, recent progress has been reported within continuum plasticity, reporting stable return mapping algorithms, using a line-search approach [69][70][71][72][73].
In the current work, these algorithms are further developed and applied for the Gambin/Arminjon regularized crystal yield surface, enabling for the first time an implicit return mapping algorithm, allowing stable, effective calculations of the rate-independent limit with yield surface exponents as large as one million and strain steps as large as unity.Physical mechanisms, modeled by various types of slip systems, such as twinning, phase transformations, latent and reverse hardening etc., have not been included here.At the current stage, the paper presents a proof-of-concept of a rate-independent framework that enables this, without compromising numerical robustness.
The UMAT developed in this work can be freely downloaded from the following link: https://gitlab.com/ntnu-physmet/crystal-plasticity

Regularized single crystal plasticity model
Two coordinate systems will be considered for expressing vectors and tensors of the crystal plasticity model.The global (sample) system has basis vectors e i , while the co-rotated (crystal) coordinate system has basis vectors êi , i = 1, 2, 3, that coincides with the crystal lattice after deformation.The orthogonal transformation tensor from the global to the crystal coordinate system is denoted Q = R T .The transformation rules read where v is a vector and T is a second-order tensor.The initial orientation of the crystal coordinate system is given by the initial transformation matrix Q 0 = R T 0 , which can be calculated for a given set of Euler angles.The imposed velocity gradient L is given in the global system, while the constitutive equations are formulated in the co-rotated crystal system.The hypoelastic approach is employed, with additive decomposition of the rate-ofdeformation tensor.The hypoelastic-plastic models are typically used when elastic strains are small compared to the plastic strains.Except for some cases of complex elastic-dominated closed-loop cyclic loading, the non-conserved energy is negligible and the hypoelastic description is adequate.
In the co-rotated system D = sym ( L) is split into its elastic and plastic parts The rate of the co-rotated Cauchy stress is given by Hooke's law σ = Ĉ : De = Ĉ : where Ĉ is the fourth-order elastic stiffness tensor, given in the co-rotated system.The unity slip direction vector bα and the unity slip plane normal vector nα for each considered slip system α define the Schmid tensor Mα in the co-rotated system.
The plastic rate-of-deformation tensor, Dp , is related to the symmetric part of the Schmid matrix, Pα = sym ( Mα ) as where γα is the slip rate on slip system α.
Following Holmedal [38], the strain-rate independent regularized crystal yield surface is employed here, where ϕ ( σ ) is the yield function.The plastic rate-of-deformation tensor obeys the normality rule where ϕ σ = ∂ϕ/∂σ is the gradient of the yield function.When the exponent n is large, the parameters ξ α may be set to unity.It follows from Eqs. ( 8) and (10), that on the yield surface, i.e. for f = 0, Due to that f is a homogeneous function of the first order, it follows that where Ẇ is the plastic work rate.
To determine the single crystal rotation, the total spin tensor Ŵ = skw ( L) is additively decomposed into its elastic, plastic, and lattice-rotation parts.
Here, Ŵ p is the plastic spin tensor.The elastic deformations that contribute to the elastic spin tensor Ŵ e , are very small and neglected here.The constitutive lattice spin, Ŵ c , generates the lattice rotation.The origin of the plastic spin is the contribution to the spin from the shape-change caused by the slip activity, i.e.
The skew-symmetric part of the Schmid tensor Ωα = skw( Mα ) and γα are the slip rates.Hence, the constitutive spin can be estimated as The constitutive spin tensor dictates the crystal rotation, according to The work hardening of the critical resolved shear stresses is assumed to be functions, τ c α (Γ ), that depend on the accumulated slip Γ , which is defined by the differential equation In this paper, a simple model for the work hardening [74] is applied for demonstration where h α (Γ ) is the hardening moduli for each slip system.In this case, it can be integrated as the Voce equation Note, that replacing || by ⟨⟩, where ⟨⟩ denotes the Macaulay brackets, allows for distinguishing forward and backward slip activities as explained by Holmedal [38].This is necessary to model Bauschinger effect on the slip system level [75][76][77].

Rotation update
The update of the rotation tensor R is given by the differential equation ( 15).An analytical solution exists for the case of a constant W and can be written using the Euler-Rodriguez formula (for details see Brannon [78]), as where R 0 = R(0).In general, the spin W changes with time.Eq. ( 19) can then be applied as an approximation during each time increment ∆t, as where W n is kept constant during the time increment.Another alternative, is the symmetric numerical second-order update scheme by Hughes and Winget (1980), which is often employed (e.g. for calculating rotation increment matrix in UMAT in Abaqus/Standard), It assumes that the spin W is known at time t n +∆t/2, and the Cayley-Hamilton theorem [79] can be applied to avoid inverting matrices [78].
When using the corotated constitutive spin Ŵ c , the integration of Eq. ( 15) reads

Vector/matrix notation
Symmetric second-order tensors and fourth-order tensors with minor symmetry can be mapped into a vector and matrix representation, respectively.The most widely used vector/matrix representations are the Voigt and Mandel notations.The main purpose of a vector/matrix representation is to exploit the tensor symmetries, allowing symmetric stress and strain tensors to be stored as vectors, and fourth-order elastic modulus or plastic anisotropy tensors to be stored as matrices.In implementations of computational mechanics, this type of vector-matrix representation significantly reduces the number of operations and the computation cost.In rigid plastic crystal plasticity, other notations have been used (see Mánik [69] for a recent overview).
In this paper, the natural vector/matrix notation, originally suggested by Kocks, Tomé and Wenk [80] for use in crystal elasticity, is applied.This notation was recently adapted for use in continuum plasticity in a return mapping algorithm by Mánik [69].Due to its explicit representation of the deviator, this notation advantageously separates the deviatoric plasticity from the elasticity, i.e. the plastic part is equivalent to the notation proposed by Lequeu, Gilormini, Montheillet, Bacroix and Jonas [81].This notation enables a more concise algorithm formulation and, due to separation of pressure dependency, it reduces dimension of equation system and makes numerical computation more effective.Like the Mandel-but unlike the Voigt notation, it represents both stress and strain tensors equally, see Eq. ( 23).The brief description and the essentials of the natural notation is given in Appendix A. For an exhaustive description, see Mánik [69].

Return mapping algorithm
At time t (n) , we consider the Cauchy stress σ (n) and the internal variables q(n) expressed in the corotational crystal coordinate system.In the backward Euler integration scheme, the total rate-of-deformation D(n+1) is required at t (n+1) in the corotational crystal coordinate system.However, the orientation of the crystal coordinate system at t (n+1) is not known, hence D(n+1) needs to be calculated.What is known in the finite element code is D (n+1/2) , which is assumed to be constant in the reference system throughout the time step.To limit the complexity of the algorithmic modulus, D(n+1) is extrapolated within the accuracy of the numerical scheme (see Appendix B).The basic problem to be solved by a return mapping algorithm, is to find the Cauchy stress σ (n+1) and the internal variables q(n+1) at time t (n+1) = t (n) + ∆t, which satisfy the Kuhn-Tucker complementarity conditions The fully implicit backward Euler return mapping algorithm is employed in this paper.In the literature, returnmapping algorithms are generally formulated in their tensorial form, while numerical implementations employ either Voigt or Mandel vector/matrix notation.In the following, the return-mapping algorithm will be expressed directly in the natural notation [69].For the sake of clarity, the hat ( ˆ) designating the corotational aspect, will be omitted in the following, for all the vectorial and tensorial quantities.Given a total rate-of-deformation vector ⃗ d (n+1) , time increment ∆t and Cauchy stress ⃗ σ (n) , the trial stress is obtained by applying an elastic predictor.
Here, in the natural notation, C is a 6 × 6 diagonal matrix representing the elastic moduli with cubic symmetry.
For large trial stresses, this first guess is the key to numerically stable and robust return-mapping algorithm.The parameter k controls the distance of the initial stress guess from the yield surface, for k = 1, ⃗ σ (0) lies on the yield surface, as ϕ For a given ⃗ σ (0) , the initial guess for the plastic multiplier, λ(0) is given by Eq. ( 12) The return mapping is solved using a Newton-Raphson algorithm with a line search.The solution is sought in an iterative manner as where (k) is the iteration, ∆ ⃗ σ , ∆ λ and ∆Γ are the increment of the Cauchy stress, the plastic multiplier and the accumulated slip, respectively, and α (k) is a constant to be determined by the line search algorithm.By linearizing Eqs. ( 26), ( 27) and ( 28), the increment ∆ λ is calculated as The latter is then used for calculating the increment of the accumulated slip and both ∆ λ and ∆Γ are finally used for obtaining the increment of the Cauchy stress In Eqs. ( 32), ( 33) and ( 34), the matrix L and Y are given as For the calculations above, the following partial derivatives need to be calculated (note that the resolved shear stress For convergence, measures of the three residuals ⃗ r (k) , f (k) and q (k) for iteration (k) are defined as and If for an iteration (k) then the convergence is achieved.Recommended error tolerances, used throughout this work, are ε r = 10 −20 , ε f = 10 −8 , ε q = 10 −16 .

Line search
In each Newton-Raphson iteration, the line search is used to determine the step size α (k) .For this, ψ (k) serves as the merit function [72].For the search direction given by the increments ∆ ⃗ σ , ∆ λ and ∆Γ , the step length α (k) needs to be found such that 0 < α (k)  ≤ 1 and for which the merit function ψ (k) is minimized.In this paper, two methods for line search are employed and tested.A line search, calculating the minimum of the quadratic approximation to the merit function, is adopted here, similarly as previously applied in return mapping algorithms for continuum plasticity [69,71,72].This is very efficient method for the exponents of the regularized yield function up to ∼100.For large exponents up to 1 000 000, the quadratic approximation of the merit function becomes too poor, leading to α (k) 's far from the optimal ones.More effective line search is developed in this paper, employing new minimization algorithm which solves more efficiently crystal plasticity models with large exponents up to 1 000 000.

Quadratic line search
For each iteration, the Newton step is performed as the first attempt, i.e. α (k) = 1 in Eq. ( 31).This step will be accepted only if ψ (k+1) is lower than some fraction of the merit function ψ (k) , achieved in the previous Newton-Raphson iteration.

Line search by minimization
An alternative approach for finding the step length α (k) for which ψ (k) ( α (k) ) < ψ (k) (0), is to find the minimum within some tolerance ε.For this, the standard and general 1D minimization Brent's method [83] can be applied.The way the function is constructed by Eqs. ( 37) and (38) gives rise to some known properties e.g. the derivative at α (k)  = 0, reads ψ (k) ′ (0) = −2ψ (k) (0).A new minimization algorithm was tailor-made to utilize the known characteristics of ψ (k) ( α (k) ) making it faster than Brent's method.See Appendix C for the detailed description of this line search algorithm.

Consistent algorithmic modulus
The consistent algorithmic modulus is essential to calculate when the return mapping algorithm is employed as a part of an outer iteration.It must be provided as part of the user subroutine in the implicit FE solver.For the implicit backward Euler return mapping algorithm of the regularized single crystal plasticity model described above, it is calculated as in which matrix M is calculated as

Convergency results
Convergency behavior of the implicit, backward-Euler return-mapping algorithm with line search is examined as a function of the yield-surface exponent n.Note that according to Eq. ( 25), an arbitrary strain increment occurring during a time increment ∆t, is uniquely prescribed by ⃗ σ tr − ⃗ σ (n) .Hence, to effectively cover the space of possible strain-increment directions, an evenly distributed set of 10 000 trial stresses { ⃗ σ tr i } i=1,10000 was generated from a set of 5-dimensional vectors, approximately uniformly distributed on the 5-dimensional hypersphere, being generated by the algorithm described in Appendix D. The stress state before the strain increment to be iterated, ⃗ σ (n) can, without loss of generality, be set equal to ⃗ 0, i.e. strain path changes are also covered by this set.Each trial stress was chosen so that f ( ⃗ σ tr i ) = s, where s is a chosen constant.The magnitude of s represents the magnitude of the trial stress, which implicitly corresponds to the magnitude of the total strain increment ∆⃗ ε = ⃗ d∆t, for a given elastic modulus matrix C and a set of critical resolved shear stresses τ c α .In this manner, a large set of possible strain paths can be probed, independently of what the previous stress solution was, and arbitrary strain-path changes are included as well.To test different strain-increment magnitudes, four selected values of s = 2τ C , 10τ C , 100τ C and 1000τ C were included in the set of trial stresses.The largest amongst these trial stresses corresponds to a strain increment of order unity and in practice provides an ultimate stability challenge for the algorithm.In total, the 40 000 tested strain increments provide a large set that effectively covers the space of realistic strain increments in an implicit FE simulation.
The efficiency and stability of the return-mapping algorithm will in principle depend on the chosen slip systems and their corresponding work hardening.A BCC structure with ⟨111⟩ slip systems, and an FCC structure with { 111 } ⟨011⟩ slip systems were tested.Without loss of generality, the Euler angles used were (ϕ 1 , Φ, ϕ 2 ) = (0, 0, 0).The elastic constants and the critical resolved shear stress applied for the FCC and BCC case are listed in Table 1.Both a case without work hardening and a case of strong work hardening (linear hardening with h = 1000 MPa) were tested.For each yield surface exponent, the number of Newton iterations and the number of line-search iterations were counted for the 40 000 probed strain increments.
It turned out that the algorithm gave similar iteration statistics for all cases tested.Examples of the average number of Newton iterations and the average number of line-search iterations required for each Newton iteration to reach convergency, are shown in Fig. 1 for two of the cases tested.The largest system, for the BCC case with 48 slip systems, requires only slightly more iterations to converge.The results with and without work hardening are very similar.Note that for yield-surface exponents up to about 100, the quadratic line search is always faster, with less or equal number of Newton iterations and significantly fewer line-search iterations each.The simpler quadratic linesearch algorithm is competitive up to a yield-surface exponent of the order of 1000, which in practice is sufficient for a very good rate-independent approximation by the regularized yield surface.It is a remarkable result, that both line-search algorithms converge for all cases tested, up to an exponent of one million, keeping in mind that the largest strain steps included in the test set is of order unity.For these extremely high exponents, the full minimization requires significantly fewer iterations than the quadratic algorithm.
For low exponents, most cases converge within a few iterations.For larger exponents, some of the tested strain steps converge fast, while other require more iterations.In Fig. 2, examples of iteration statistics for the case of the quadratic line-search algorithm are shown, for cases where n = 10, 100 and 10 6 .The FCC crystal without hardening and the BCC crystal with 48 slip systems and hardening show very similar distributions.The average number of Newton iterations, as well as the spread of the distribution, increase with increasing exponent.For the cases of n = 10 and 100, the peak is at zero line-search iteration, i.e. a full Newton step, while for n = 10 6 several repeated quadratic line-search iterations are required for most cases.
Fig. 3 shows iteration statistics for examples where the full minimization is applied during the line search.Again, cases of n = 10 and 100 are compared for an FCC crystal without hardening and a BCC crystal with 48 slip systems and hardening.Unlike the cases with quadratic line search, the mean value and spread of the number of Newton iterations saturate at large exponents and the curves are similar for n = 100 and 10 6 .However, the number of line search iterations per Newton iteration slowly increases with increasing n, showing a broad peak at the largest number of iterations, as well as a narrow peak, for which convergency is accepted by one Newton step.

Application to CPFEM
The return-mapping algorithm was implemented into the user-material subroutine (UMAT) in the FE software Abaqus/Standard and tested for two cases covering simulation of single and polycrystal behavior of an FCC material  with { 111 } ⟨011⟩ slip systems.Elastic constants for aluminum were used as given in Table 1.The initial critical resolved shear stress was 10 MPa.The hardening law by Eq. ( 17) was applied.The exponent n = 100 of the yield function was used.The tolerances ε r , ε f and ε q in Eq. ( 39), for the return-mapping algorithm convergence, were

A Goss-oriented single crystal
Uniaxial tension of a notched single crystal specimen with an initial Goss orientation was simulated, i.e. with the crystal cube rotated 45 • around the tensile axis.The diameter of the specimen was 10 mm, the diameter inside the notch was 6.4 mm and the notch radius was 3.6 mm.Note, that due to the single crystal's material model symmetries, only the one-eighth of the specimen was computed.The FE mesh is shown in Fig. 4a.The smooth specimen was meshed with ∼18 000 linear three-dimensional eight-nodes elements with selective reduced integration (C3D8).A smaller element size was used close to the mid-section of the specimen to ensure an accurate description of the necking.Kinematic boundary conditions were imposed to the nodes located at the end of the specimen by prescribing an axial displacement of 0.8 mm.The average time increment used was ∼0.01 s.On average 7 iterations were needed for the return-mapping algorithm to converge.Hardening constants used for this case, were R sat α = 20 MPa and ∆γ sat α = 0.15 for all α.

A polycrystalline representative volume element (RVE)
The second case simulated by the CPFEM, was uniaxial tension of a RVE for a polycrystalline material.The RVE was modeled as a 1mm 3 cube consisting of 30 grains and was generated in the open source software DREAM.3D,see Groeber and Jackson [86].It was discretized by 50 × 50 × 50 reduced-integration elements (C3D8R).The deformed FE model of the RVE with grain morphology is shown in Fig. 6a.Periodic boundary conditions were applied to the nodes on the exterior boundaries to ensure periodicity [15,87].The uniaxial tensile mode for the RVE is defined by prescribing boundary conditions to the nodes A, B, C and D (Fig. 5).The nominal strain reached at the end of the simulation was 40%.The von Mises stress in the RVE at the end of the simulation is shown in Fig. 6b.The average time increment used was ∼0.004 s.On average 10 Newton iterations and 7 line-search iterations for each Newton iteration were needed for the return-mapping algorithm to converge.Hardening constants used for this case were R sat α = 63 MPa and ∆γ sat α = 0.1 for all α.The simulation took 24 h.

Comparison of computing times with explicit and implicit FE solvers
The computational efficiency of the implicit CPFEM calculations with the new algorithms presented here, is assessed by comparing to an explicit rate-dependent CPFEM implementation, which is efficient due the use of mass scaling in combination with an adaptive sub-stepping integration scheme using the modified Euler method [22].This explicit integration scheme is extremely robust and efficient, allowing using an instantaneous strain rate sensitivity m as low as 10 −5 to explore the very nearly rate-independent stress-strain response.The case chosen for the comparison is a simulation of uniaxial tension up to 1% strain, using an RVE consisting of 60 × 60 × 60 linear elements with full integration.The details of the explicit CPFEM simulation are given in [15].Note, that the explicit CPFEM is a rate-dependent formulation, and the FE solver used for this one was LS-DYNA.The purpose here is a coarse comparison of the total computational time.When m ≪ 1, the rate sensitivity, m, corresponds to the exponent n ≈ 1/m.In both cases, the simulations were run on the same PC using all 8 cores (see Section 7).The timing results are shown in Table 2.

Natural notation
Numerical implementations of return-mapping algorithms involve fourth-order tensors for the elastic stiffness and for the consistent modulus.However, the implementations and numerical schemes at the end of the day must be expressed in terms of matrices and vectors.To make the tensor contractions as simple as possible to handle, the natural vector/matrix notation is applied to represent the tensors involved in the considered crystal-plasticity model.This notation has an orthonormal basis, providing all the convenient properties of the Mandel notation and overcoming the cumbersome distinguishing of stress-and strain-like tensors, as opposed to the Voigt notation.Moreover, it allows more concise algorithm formulations with higher computational efficiency [69].In this matrix representation notation, the elastic stiffness tensor for crystals with cubic symmetry has a diagonal form.Hence the double contraction in the tensorial version of Hooke's law is reduced to a simple scalar multiplication.Furthermore, it results in an explicit split of the deviatoric and volumetric parts of symmetric second-order tensors, which is advantageous when applied to classical pressure-independent crystal plasticity.Consequently, only the deviatoric part of the constitutive equations enters the plastic corrector of the return mapping algorithm, which reduces by one the system of equations to be solved by the Newton-Raphson iterative method.For the volumetric part, only the elastic predictor, and no plastic corrector, is needed.The natural notation thus enables use of the same algorithm and numerical set-up for cases with only rigid plasticity (e.g.texture calculations), as for cases requiring full elasto-plastic calculations (e.g.CPFEM).

Line-search algorithms
Similar as for continuum plasticity, [71], a limited convergence of the pure Newton-Raphson method is observed with the regularized crystal yield surface, even with low exponents.For the relatively large strain increments, relevant for implicit FEM as tested in Section 6, the algorithm diverged for ∼ 10% of the strain paths with an exponent n = 5, without line-search.For a given exponent n, a certain maximum strain increment, |∆ε| max exists, allowing convergence for all strain paths.Applying the parameters of the two materials in Table 1, it was found that n ∝ 1/ |∆ε| max , both for the fcc aluminum and the bcc iron.To obtain convergence for n = 100, |∆ε| max ≈ 2.5•10 −5 and 10 −4 for fcc aluminum and bcc iron, respectively.When running implicit FEM simulations, the strain increments required for the desired accuracy, would be considerable larger than that.The overall efficiency of an implicit FEM implementation relies on that the strain increments can be sufficiently large, being controlled by global accuracy rather than the stability of the local iterations in the user subroutine, since most of the computer time is spent between the user subroutine calls.
To ensure stable convergence of the return-mapping algorithm presented here, the line-search algorithm plays a crucial role.Its purpose is to find a scaling α of the increment, ∆x, suggested by the Newton algorithm, i.e. x (n+1) = x (n) + α∆x.It makes sure that the scalar merit function, ψ, is always reduced compared to the previous step, i.e. ψ ( . In continuum plasticity, line search has been used in several works, see e.g.[69][70][71][72][73].In this work, two different line-search algorithms were employed and tested: the quadratic line search and line search by a minimization algorithm.The quadratic one returns the α that minimizes a quadratic polynomial, interpolating ψ(α), so that ψ(0), ψ (1) and ψ ′ (0) are exactly matched.As this approximation becomes less and less precise for large values of exponent n, the number of Newton iteration is increased (see Figs. 1 and 2).
The line search by minimization returns, within the prescribed numerical accuracy, the value of α that minimizes ψ.This new minimization algorithm is a tailor-made version of Brent's algorithm, (see Appendix C).The relative and absolute tolerance, ϵ and ϵ a , control the precision of finding the minimum of ψ.In general, tight tolerances require a larger number of line-search iterations but lead to a lower number of Newton iterations.This strongly depends on the exponent n.For large n, i.e. n > 10000, the number of Newton iterations is greatly reduced, when small tolerances are prescribed.For n < 100, the number of Newton iterations is less sensitive to how precisely the minimum of ψ is estimated, and coarser tolerances are more beneficial, reducing the number of line-search iterations to be performed.The relationships ϵ = min (0.3, 1/n) and ϵ a = 10 −2 ϵ are found to work optimally.Using this, the number of Newton iterations remains almost constant for all n > 100, while the number of line-search iterations still increases gradually (see Fig. 1 and Fig. 2).
A Newton iteration involves computing the Jacobian and solving a 6 × 6 linear system and is therefore about 4 times more computationally expensive as a line-search iteration.Hence, the monotonically increasing number of Newton iterations as a function of n makes the quadratic algorithm less competitive.The overall timing reveals that the quadratic line search converges faster than the line search by minimization for n up to about 500.

Approaching the limit of a rate-independent solution
Solutions of the rate-independent crystal plasticity, obeying Schmid's law, suffers from non-uniqueness, i.e. the Taylor ambiguity.Several attempts have been made to obtain a unique solution [37].As discussed by Holmedal [38] an equivalence exist between the rate-sensitive unique solution, using the viscoplastic law (Eq.( 2)) with strain rate sensitivity m, and the unique solution provided by the regularized yield surface defined by Eq. ( 9) with exponent n, i.e.
The proposed return-mapping algorithm enables, for the first time, calculations of extreme solutions a yield surface with exponent n up to a million, corresponding to a strain-rate sensitivity m = 10 −6 .Slip solutions for 10 000 different crystals with a uniformly distribution if their orientations (same as in Section 6) were calculated for each exponent n.For each crystal and each exponent, the relative error of the solutions compared to a limit solution γ lim α can be quantified by The limit γ lim α was calculated using n = 1000 000, which in practice is equivalent to the ambiguity limit within the numerical precision.This was done for an FCC structure with { 111 } ⟨011⟩ slip systems (Fig. 7a), a BCC structure with { 110 } ⟨111⟩ slip systems (Fig. 7b), to which { 121 } ⟨111⟩ slips (Fig. 7c) and { 132 } ⟨111⟩ slips (Fig. 7d) were added.The color map in Fig. 7 shows the distribution of the relative errors for all crystals for each exponent n.Fig. 7 demonstrates the existence of a smallest exponent, for which the solutions still practically are equal to the rate-independent Taylor ambiguity limit, i.e. the same set of slip systems is activated.This is in line with previous findings [37], who showed for an FCC polycrystal, that the texture change is not sensitive to the strain rate sensitivity, m, up to a value less than ∼0.1 (or correspondingly n larger than ∼10).
The results for BCC crystals (Fig. 7b, c and d) show that the smallest exponent giving this limit solution, increases with the number of slip systems.For crystals with 48 slip systems it may be as high as ∼400.This may influence texture calculations.As an example, solutions for n = 50 and 400 were calculated by the Taylor model for a rolling reduction of 90%.The ϕ 2 = 45 • section of the ODF is shown in Fig. 7e.As expected from Fig. 7d, the texture for the limit case, i.e. with n = 1000 000, was identical to the case of n = 400.However, the texture intensity distribution in this section with n = 50 is significantly different.According to Larour, Baumer, Dahmen and Bleck [88], the strain-rate sensitivities for various steel grades at room temperature may vary from m = 0.001 to 0.02 corresponding to n between 50 and 1000.If a rate-independent simulation of steel is desired, the exponent n should be chosen larger than ∼400 (m less than ∼0.0025) when accounting for the 48 slip systems.However, in many cases it is better to account for a realistic strain-rate sensitivity by choosing an appropriate exponent n in the range below 400 (m above 0.0025).Note that the scaling technique to speed up the viscoplastic calculations [36,89,90], would for bcc miss the correct corner solutions in these cases.This illustrates the need for line-search in the more general cases, to obtain an exact solution.

Comparison of computing times with explicit and implicit FE solvers
The model implemented in CPFEM, either as part of an implicit or an explicit solver for handling the stress balance and compatibility between the finite elements, is solved incrementally.For each increment, in the implicit approach, iterations are made to find a solution of the set of non-linear finite element equations.In the explicit approach, the dynamic inertial forces applied to the finite elements allow explicit time discretization without iterations at each time increment.Regardless of the choice of an explicit or implicit FE framework, the crystalplasticity equations are solved for each node, one time-step ahead, as prescribed inside the user subroutine (UMAT).In there, the equations may be solved without iterations in an explicit form, in a fully implicit form or by some type of semi-implicit scheme, in which only some of the variables are solved implicitly.The implicit solution is considerably more expensive, which amounts to the major part of the computational time by explicit FE solvers, whereas for implicit FE solvers the major part of the calculations is related to iterations on the balance between the elements.To compete, the implicit solvers must make significantly larger time steps.For the rate-dependent CPFEM, a thorough comparison of merits of both explicit and implicit CPFEM solvers were reported by Harewood and McHugh [24].They concluded that when the material deformation is the main part of the simulation, the implicit solver is several times faster.However, in problems with complex contact and sliding conditions, the implicit solver becomes significantly slower.Furthermore, they concluded that their implicit solver struggled to converge and required decreasing time steps with lower strain-rate sensitivities.The latter would not be an issue with similar line-search algorithms as reported here.
The limited conditional stability of the explicit time integration forces the explicit FE solver to use very short time increments, leading to large number of increments to be calculated.For quasi-static problems, a careful, proper mass scaling can help increasing the time step.Then the instability locally related to the integration of the equations for each node in the UMAT subroutine, becomes the bottle neck for reducing the time step.Stability can efficiently be gained by sub-stepping in the UMAT, as reported by Zhang, Hopperstad, Holmedal and Dumoulin [22].However, with decreasingly small strain-rate sensitivity, increasingly many sub-steps will be required.Hence, it will be beneficial at some small strain-rate sensitivity, to replace the sub-stepping approach by an implicit scheme with line-search, like the one suggested here for the rate-insensitive case.Further investigations are required but are beyond the scope of this work.
When using an implicit solver, the extra computing time required for the return mapping algorithm to solve the material model constitutive equations, is small compared to the time spent by the FE solver to solve the global finite element equations.The stability of the implicit scheme suggested here allows almost arbitrary large time steps and makes this the fastest alternative for e.g. the calculation of the RVE.The time increment is usually controlled by an automatic incrementation routine in the FE software and is limited by the desired accuracy rather than stability requirements.However, in some cases, contact or sliding conditions might limit the allowed time steps significantly.
In non-linear crystal plasticity analysis, Abaqus/Standard uses some variant of Newton's iterative solution method.For each iteration, it is necessary to solve a set of linear equations, which for 3D problems corresponds to a matrix with dimensions proportional to the number of nodes in the power of two.The direct matrix solver in Abaqus/Standard uses a sparse Gauss elimination method for each solution of these linear matrix problems involved.This is the most time-consuming part of the implicit analysis, especially for large models.Unlike the explicit solver, the storage of this matrix during each iteration requires a lot of computer memory, which is the limiting factor for large models with current computers.However, the available computer memory has increased rapidly during the last decades, providing increasing amounts of internal RAM memory and fast serial buffer memory on solid-state disks, even on regular PCs.

Conclusion
A numerically stable and efficient fully implicit return-mapping algorithm for rate-independent crystal plasticity was obtained by including a line-search algorithm as part of the Newton iterations and utilizing an improved first guess for the iterations.Fast convergence of the algorithm was proved for any realistic strain step and for very high exponents of the regularized yield surface.Full stability was maintained for an exponent of one million, allowing the Schmid limit to be approached.
It was found that with 12 slip systems, either in BCC or FCC, the set of active slip systems corresponding to the ambiguity solution is obtained whenever the yield surface exponent is larger than ∼10.However, for BCC with 48 slip systems, an exponent larger than 1000 will be required.The choice of the exponent is equivalent to prescribing an instant strain-rate sensitivity.To choose the correct exponent for the simulations, or alternatively to run a rate-dependent implementation with the correct strain-rate sensitivity, can therefore be an important issue in BCC texture simulations with 48 slip systems.
A co-rotational hypo-elastic-plastic implementation was made into the user material subroutine of Abaqus/Standard (made available as open source).Efficient computations were demonstrated for uniaxial tension of a polycrystalline representative volume element deformed up to large strains.It is concluded, based on timing of crystal plasticity finite element simulations, that such simulations are significantly faster with the new algorithm in an implicit FE solver than with a state-of-the-art explicit formulation in an explicit FE solver.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Bjorn Holmedal reports financial support was provided by Research Council of Norway.

Appendix A. Natural vector/matrix notation
The six independent components of a symmetric 2nd order stress or strain tensor can be represented by a 6 × 1 vector.Correspondingly, a 4th order symmetric tensor can be expressed as a 3 × 3 matrix.The Voigt stress vector contains a one-to-one list of the stress components, ⃗ σ V = (σ 11 , σ 22 , σ 33 , σ 23 , σ 13 , σ 12 ) T .However, in the corresponding Voigt strain tensor, ⃗ ε V = (ε 11 , ε 22 , ε 33 , 2ε 23 , 2ε 13 , 2ε 12 ) T , the shear strains must be multiplied by a factor two to preserve the inner product of stress and strain.Since the Voigt notation is frequently used in the literature, the natural notation as defined by Eq. ( 23), will be related to the Voigt notation.However, the natural notation has an orthonormal basis, which makes the notation the same for both strain and stress tensors, like the Mandel notation but unlike the Voigt notation.Furthermore, the natural notation has the advantage that it decouples the hydrostatic pressure, and it diagonalizes the stiffness matrix for cubic crystals, as shown below.
The transformations matrices T σ and T ε , transform the stress vector ⃗ σ V and the strain vector ⃗ ε V from the Voigt notation to the vectors ⃗ where ⃗ σ and ⃗ ε are vector representations of stress and strain in the natural notation, respectively.These transformations are related, as T σ = T −T ε .The transformation matrices T σ and T ε read Fourth order elasticity stiffness tensors are linear mappings of a second order strain tensor to a second order stress tensor, which in the contracted notations is a 6 × 6 elastic stiffness matrix, relating the strain vector to the stress vector.From the Voigt notation, they transform into the natural notation as The elastic stiffness matrix, for both isotropic and cubic symmetry, are given in the Voigt notation as where K = λ+ 2 3 µ is the bulk modulus.This is convenient for numerical computation, e.g.computing of the matrix inversion and for a matrix storage.
Assume an orthonormal transformation tensor R in the cartesian orthonormal basis as R = R i j e i ⊗ e j .The coefficients R i j can be arranged in a 3 × 3 matrix R as A 2nd-and 4th order tensor, A and A, respectively, transform as ) and For expressing 2nd order tensors A and Â and 4th order tensors A and Â in the natural notation as ⃗ a and ⃗ â (6 × 1 vectors) and A and Â (6 × 6 matrix), respectively, there exist a 6 × 6 matrix R so that The matrix R and its transpose R T read (DROT) stored as a 3 × 3 matrix, and the deformation gradient tensor with respect to the initial configuration computed both at the beginning and at the end of the increment, F n (DFGRD0) and F n+1 (DFGRD1), respectively, both stored as 3 × 3 matrices.At the end of this increment, i.e. at t n+1 , the Cauchy stress vector ⃗ σ V n+1 (STRESS) needs to be updated and the Jacobian matrix of the constitutive model, i.e. the algorithmic modulus ( ∂∆σ ∂∆ε ) n+1 (DDSDDE) needs to be computed as a 6 × 6 matrix in Voigt notation.
When a user defined material model is used for continuum elements Abaqus/Standard employs Jaumann stress rate (note that Green-Naghdi stress rate is employed for structural elements in Abaqus/Standard and for any type of elements in VUMAT in Abaqus/Explicit).A local orientation is not applied (*ORIENTATION keyword is not used), hence the components of all tensors are given in the reference (global) coordinate system.Note, that with use of a local orientation, Abaqus/Standard provides the components of all tensors in the local corotated coordinate system at time t n+1 rotated from t n to t n+1 by ∆R n+ .To account for the rigid body rotations, Abaqus/Standard applies the rotation ∆R n+ 1 2 of the Cauchy stress before passed to the UMAT as σ n for the next time increment.
However, the crystal constitutive equations and the stress update are calculated in the corotated crystal lattice coordinate system.Hence, the Jaumann rotation of the Cauchy stress by ∆R n+ 1 2 done by Abaqus must be undone, followed by a rotation R n into the crystal lattice system at time t n as

Fig. 1 .
Fig. 1.Average number of Newton iterations for FCC in (a) and BCC in (c).Average number of line-search iterations for each Newton iteration for FCC in (b) and BCC in (d).Cases with and without hardening are compared for iterations by quadratic or full minimization during the line search.

Fig. 2 .
Fig. 2. Line search with quadratic approximation.Distributions of Newton and line-search iterations for the test case of an FCC crystal without hardening (a), (b) and a BCC crystal with hardening (c), (d).

Fig. 3 .
Fig. 3. Line search with new minimizing algorithm.Distributions of Newton and line-search iterations for the test case of an FCC crystal without hardening (a), (b) and a BCC crystal with hardening (c), (d).

Fig. 4 .
Fig. 4. Uniaxial tension of a notched single crystal in a Goss orientation, shown half of the sample.

C 11 C 12
λ is the Lamé's first parameter, µ the shear modulus and C 11 , C 12 , C 22 are cubic elastic constants.In the natural notation, both transform into a diagonal matrixC iso =

Table 1
Constitutive parameters used in the convergence study.

Table 2
Comparison of computing times with explicit and implicit CPFEM solvers for near-faceted (Schmid) crystal plasticity.