An efficient and robust approach to determine material parameters of crystal plasticity constitutive laws from macro-scale stress–strain curves

Abstract A severe obstacle for the routine use of crystal plasticity models is the effort associated with determining their constitutive parameters. Obtaining these parameters usually requires time-consuming micromechanical tests that allow probing of individual grains. In this study, a novel, computationally efficient, and fully automated approach is introduced which allows the identification of constitutive parameters from macroscopic tests. The approach presented here uses the response surface methodology together with a genetic algorithm to determine an optimal set of parameters. It is especially suited for complex models with a large number of parameters. The proposed approach also helps to develop a quantitative and thorough understanding of the relative influence of the different constitutive parameters and their interactions. Such general insights into parameter relations in complex models can be used to improve constitutive laws and reduce redundancy in parameter sets. The merits of the methodology are demonstrated on the examples of a dislocation-density-based crystal plasticity model for bcc steel, a phenomenological crystal plasticity model for fcc copper, and a phenomenological crystal plasticity model incorporating twinning deformation for hcp magnesium. The approach proposed is, however, model-independent and can be also used to identify parameters of, for instance, fatigue, creep and damage models. The method has been implemented into the Dusseldorf Advanced Material Simulation Kit (DAMASK) and is available as free and open-source software. The capability of translating complex material response into a micromechanical digital twin is an essential precondition for the ongoing digitalization of material property prediction, quality control of semi-finished parts, material response in manufacturing and the long-term behavior of products and materials when in service.


Introduction
Crystal plasticity (CP) models cast the extensive knowledge gained from experimental and theoretical studies of single crystal deformation and dislocation physics into anisotropic elastic-plastic mean-field continuum approximations applicable to a wide range of scenarios from small-scale micromechanical up to engineering time and length scales (Roters et al., 2010). These models provide well-validated tools for predicting the evolution of crystallographic texture, dislocation density, stored mechanical energy, damage information about the gradient of the objective function. In these methods, first a set of potential solutions for the optimization problem is randomly generated. Then, the next generation of potential solutions and the possible solutions for the optimization problem are determined only by the evaluation of the objective functions at the pre-existing points. Direct search methods are generally proven to be robust and reliable (Andrade-Campos et al., 2007;Qu et al., 2005). However, the robustness is achieved at the cost of efficiency (Furukawa et al., 2002).
For direct search methods, a substantial number of evaluations of the objective functions to determine the quality of the potential solutions is needed. Therefore, the efficiency of these methods depends directly on the computational costs of the evaluated functions. In the case of fitting constitutive parameters, each evaluation requires to perform at least one if not several simulations, extract the outputs, and compare them with the experimental data. The computational costs of CP simulations are in general too high to repeat this procedure frequently. Therefore, despite the robustness of these optimization methods, they are inefficient in determining the material parameters for complex and computationally expensive CP laws. Direct search methods have been, however, used for numerically less demanding models, c.f. (Agius et al., 2017).
For the routine use of genetic algorithms to identify the parameters of complex material laws, the computational costs of evaluating the constitutive response need to be reduced. Therefore, we introduce here an approach to identify CP constitutive parameters efficiently by approximating the constitutive response with the help of the response surface methodology (RSM) (Moran et al., 2004;Bezerra et al., 2008). More precisely, the stress response at different loading conditions is approximated using relationships defined between the adjustable material parameters and the stress response. This allows to perform more than 10 6 evaluations of the constitutive response in approximately the same time as needed for one CP simulation with a fairly complex dislocation-density-based constitutive model used in this study. It should be noted that in this study a Fast Fourier Transform (FFT) based spectral method is used to solve the partial differential equations for static equilibrium, which is a fast and time efficient alternative to the finite element method (Eisenlohr et al., 2013).
The usage of the RSM comes at the cost of performing a series of simulations using the original CP model to build an off-line database. The computational cost to build the database depends on the number of parameters used in the optimization. However, this number is still significantly lower than the number of simulations needed for a GA. The methodology developed in this study is demonstrated by determining the constitutive parameters of three widely used crystal plasticity models: a phenomenological model (Hutchinson, 1976), a dislocation-density-based model (Ma and Roters, 2004) and a phenomenological model incorporating deformation twinning (Kalidindi, 1998), all implemented in the DAMASK package (Roters et al., 2019).

Fig. 1.
Flowchart of the genetic algorithm: The algorithm starts from an initial, randomly generated population. Then, in an iterative process, the population is evolved toward better solutions. In each generation, the fitness of the chromosomes is evaluated, and the chromosomes with better fitness are selected to produce a new generation using biological processes such as crossover and mutation. The algorithm terminates when the stop criterion is satisfied.

Methodology
In the following, a novel approach is introduced to determine constitutive parameters of (possibly complex) CP laws from macroscopic stress-strain curves using a genetic algorithm (GA, see Section 2.1). For identifying a valid set of material parameters, stress-strain curves obtained under different loading conditions, such as strain rate, temperature, and loading direction, are required. When using a GA, for each loading condition hundreds of thousands of CP simulations are required. Since this is unfeasible, the response surface methodology (RSM, see Section 2.2) is employed to replace the CP simulations required to calculate a stress-strain curve.

Optimization: Genetic algorithm
Genetic algorithms are powerful optimization methods that usually converge to a global minimum of the objective function rather than getting stuck in a local minimum (Goldberg, 1989). They are based on a randomized search technique which is motivated by the principles of natural selection and evolution processes (Goldberg, 1989;Beg and Islam, 2016). The terminology of GAs is therefore related to genetics, i.e.: • A gene is an adjustable parameter of the objective function, i.e. in the current case a constitutive parameter.
• A chromosome is a full set of genes that are required to evaluate the objective function, i.e. in the current case a complete set of adjustable constitutive parameters for a specific CP law. • A population is a set of chromosomes which are processed and compared to each other, i.e. translating in the current case to a collection of parameter sets for the CP law.
A GA is composed of different steps and processes. In the first step, a number of chromosomes are randomly selected to produce the initial population. In the next step, the chromosomes are manipulated to generate a new generation. This step involves the evaluation and selection of chromosomes inspired by biological processes such as crossover and mutation. The details of these processes are outlined in the following, and Fig. 1 summarizes the employed algorithm graphically.

Initial population
The first step in the GA is to define an initial population. The initial population of N p chromosomes is randomly generated, where N p is the size of the initial population. A chromosome, Θ, is made up of a number of genes, Θ i , The length of a chromosome, k, is equal to the number of adjustable (unknown) parameters in the optimization procedure. Each gene in the initial population is selected randomly from numbers spaced evenly in its specified range. The range for each parameter is selected based on known or reasonable physical bounds of the parameter. If the range of a parameter is larger than one order of magnitude, the parameter value is selected from numbers spaced evenly on a logarithmic scale.
It should be mentioned that defining ranges for the optimization is an advantage of the presented approach. All gradient-based optimization algorithms, such as the steepest descent algorithm, require an initial guess to be defined before starting the optimization process. The proper determination of such an initial guess is difficult and vital since the optimization result is highly sensitive to this initial choice. Usually, the optimal solutions are located in very close vicinity of the initial guess. Although these algorithms are theoretically not constrained with ranges, in practice, they only search between two local maxima. In the presented methodology, a range for the parameters needs to be presented instead of the specific initial guess. Typically, selecting a range is more convenient than selecting a strictly defined initial guess. It should be noted that the optimization result is not sensitive to the selected ranges as long as the optimal solution is located inside of these ranges.

Objective functions and fitness
For chromosomes in the population a fitness value is determined which signifies its capability of reproducing the target response, i. e. the set of all measured stress-strain curves. To this end, one or more objective functions are introduced to measure the agreement between target function and the result obtained with a specific chromosome.
2.1.2.1. Objective functions. The first objective function used here is the difference between the experimentally determined stress and the simulated stress at selected strain values. In this study, the commonly used L 2 -norm, also called the Euclidean norm (Harth et al., 2004), is used to measure the distance, d 1 , between both curves under loading condition m: where σ i is the experimentally measured stress at strain level ϵ i , and N is the total number of all strain levels considered. b σ i is the predicted stress response for the set of parameters Θ. The distance function defined in Eq. (1) is used frequently as the sole objective function for optimization, e.g. (Herrera-Solaz et al., 2014;Furukawa and Yagawa, 1997;De-Carvalho et al., 2011). However, this objective function captures only the absolute distance between two curves and fails in capturing the sign of deviation. In other words, a positive or negative distance between the simulation result and the experimental data cannot be distinguished using Eq. (1). Therefore, response stress-strain curves which are lower than, higher than, or crossing the experimental data may result in the same value for the error. To penalize a difference in the slope of the curves, a second type of objective function, d 2 , is defined which measures the difference of the average slope between the response and experimental stress.
where indexes 0 and N represent the yield strain and the maximum evaluated strain, respectively. This objective function can give zero error for a stress-strain curve parallel to the experimental data and becomes non-zero for a crossing or non-parallel stress-strain curve. This value is added to the fitness value as a penalty for the deviation from the experimentally measured average hardening. Both objective functions mentioned above cannot capture explicitly the temperature and strain rate sensitivity. In other words, they fail to capture the directional sensitivity of the stress-strain curves to a change in temperature or strain rate. Therefore, two additional types of objective functions are defined in this study, one evaluating the strain rate sensitivity: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi and one evaluating the temperature sensitivity: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi where ð_ ϵ k ; _ ϵ l Þ and (T k , T l ) represent different combinations of strain rates and temperatures, respectively.
Finally the fitness of a chromosome is calculated by combining the four types of objective functions as: where w j are weights for each type of objective functions.

Genetic algorithm processes
2.1.3.1. Selection. A chromosome with a better fitness value has a higher chance of being a parent which mates and recombines to create offspring for the next generation. In this study, the rank-based wheel approach (Ahn and Ramakrishna, 2002) is used for selecting pairs of parents. According to this method, the chromosomes are first ranked based on their fitness values. Then, each chromosome gets a weight solely based on its relative ranking in the population, N p (number of chromosomes in the population).

Crossover.
Crossover is an operation in which a pair of chromosomes swap their genes to generate a new pair of offspring. Several approaches for crossover have been proposed (Beg and Islam, 2016;Umbarkar and Sheth, 2015). For this study, the single-point and reduced surrogate methods (Umbarkar and Sheth, 2015) are considered. In the single-point crossover, the chromosomes of both parents are divided into two parts. The division is applied at the same position for both parents which is selected randomly under the condition that each part contains at least one gene. The genes up to the division point for each parent are then combined with the remaining genes of the other parent to generate a pair of offspring. In the reduced surrogate crossover, single-point crossover is used for recombination of the genes from parents. However, in this method, the division is limited to those positions which result in offsprings with different genes. If the parents are similar, and there are no possible crossover points, no action is taken. This crossover method reduces unwanted recombination in case parents have similar genes, and it prevents generating duplicate offspring. The effect of the crossover method on the robustness and efficiency of the GA is discussed in Section 3.2.8.

Mutation.
Mutation randomly changes one or more genes in a chromosome with a probability equal to the mutation rate (Soni and Kumar, 2014). Mutation is used to maintain diversity during evolution of the population. Diversity helps to avoid local minima by preventing the generations from becoming very similar. High mutation rates render a GA into a random search algorithm. In this study, the mutation only changes one gene. In addition, the mutation rate is 0.01, which means the mutation changes approximately 1 out of 100 chromosomes.
2.1.3.4. Elitist selection. By means of elitist selection a small proportion of the best chromosomes from the current generation is preserved and passed without any changes to the next generation. Elitist selection increases the performance as it ensures that the achievement is not lost from one generation to the next. The elite chromosomes are not allowed to be crossed over or subjected to mutation. However, the elite chromosomes are eligible for selection as parents.
For this purpose, the population is first ranked according to the fitness value calculated using Eq. (6). Then, the best chromosomes are passed to the next generation. It is expected that the effect of the parameters is different for the different types of objective functions defined in Eq. (1) to Eq. (4). This implies that a chromosome with a high fitness value for one type of objective functions may contain a few high-quality genes. Therefore, the elitist selection is performed independently for each type of objective function using Eq. (5). In this study, the elitist operation preserves 1% of the chromosomes for each type of objective functions.

New generation
Finally, a new generation consists of chromosomes obtained by the following processes: • Elite chromosomes (�5%) • Offspring with mutation (�1%) • Offspring without mutation (rest)

Stop criterion
The genetic algorithm stops when one of the following conditions is met: • A large number of the chromosomes in a generation become equal (70% of the population in this study).
• There has been no improvement in the best solution for a number of generations (50 generations in this study).
• The maximal number of generations is reached (500 generations in this study).
Solutions are accepted only if one of the first two criteria are responsible for the termination of the GA. Terminating the GA by the third stop criterion shows that the optimization process has been unsuccessful, and the result from the optimization procedure is not acceptable.

Approximate evaluation function: Response surface methodology
A straightforward and simple way of estimating the fitness of a chromosome is to perform a simulation and compare the outputs with experimental results with the help of above defined objective functions. However, a GA involves a substantial number of evaluations. This makes the use of GAs time-consuming, if not even impossible when the evaluation function, i.e. here the CP simulation, is computationally expensive. Here, a cost-effective approximate evaluation function based on the response surface methodology (RSM) is used to evaluate the fitness of the chromosomes.
The response surface methodology is a statistical technique that gives the relationship between explanatory variables with one or more response variables (Moran et al., 2004;Bezerra et al., 2008). Based on this approach, a polynomial is fitted to approximate the response variables. This method is especially useful when the effects of an explanatory variable are dependent on the level of the other explanatory variables. In this study, the response variable is the stress and the explanatory variables are the adjustable constitutive parameters. Applied strain, temperature, and strain rate are independent (conditional) input variables.
The response surface methodology requires a series of designed experiments in the domain being studied to estimate the response. A factorial experiment or a fractional factorial design can be used to estimate a first-degree polynomial model (Moran et al., 2004;Bezerra et al., 2008). To build up a second-order (quadratic) polynomial model, which considers curvature in the response, more complex experimental designs such as three-level factorial, Box-Behnken, central composite, or Doehlert designs are needed (Moran et al., 2004;Bezerra et al., 2008). Design of experiments (DOE) helps to select a set of experiments (i.e. CP simulations) to efficiently fit the response surface.

Design of experiments
Design of experiments (DOE) is an efficient and systematic procedure of planning and conducting a set of experiments to analyze and interpret a group of explanatory (input) variables (factors) that controls a desired response (output variable). The term experiment 7 refers to this systematic procedure of conducting the experiments. This method allows to determine the effect and interaction effect of different explanatory variables on the response variable, and what the target level of these explanatory variables should be to achieve a desired response. DOE involves selecting the variables, a range for each variable, and a design strategy.

Factorial design.
There are different approaches available for a multi-factor DOE. One approach is called a full factorial experiment, in which all possible combinations of the variables are considered for all variable levels.
2.2.1.2. Two-level factorial design. In a two-level factorial experimental design, each variable has two levels. The variables are normalized as: where θ i is the normalized value for variable (i.e. gene) Θ i . Θ m i and Θ s i are respectively the center point and the span of the range for Θ i . Hence, À 1 and 1 correspond to the lower and upper limits of an explanatory variable, respectively.
For a two-level full factorial design, the number of runs is 2 k , where k is the number of variables. Here, k corresponds to the length of the chromosomes in the GA optimization procedure. Fractional factorial design can be used to prevent the number of runs from becoming too large. The number of runs for a fractional factorial design is 2 kÀ k 0 , where k 0 describes the size of the fraction of the full factorial used. Since the two-level factorial design is efficient and economical, it has been extensively used. However, it is only possible to build up a first-degree polynomial model when using a two-level factorial design.

Central composite design.
In order to include the curvature of the response surface, at least a three-level factorial design is needed. However, a three-level full factorial design requires 3 k experiments, making it unfeasible even for a moderate number of variables. Therefore, the three-level factorial design has been rarely used. Alternatively, the central composite design (CCD) (Box and Wilson, 1951), also known as Box-Wilson Central Composite Design, can be used to include curvature without needing to use the full three-level factorial design. A CCD consists of a two-level factorial or fractional factorial design with center points and a group of star points. The star points are defined at a distance of η from the center point, and they allow considering curvature in the response function. In this study, the face-centered (CCF) composite design is used in which the star points are at the center of each face of the factorial space, i.e. η ¼ 1. Using this method the physical lower and upper limits are maintained. The number of the runs required for a central composite design is k 2 þ 2k þ c p , where c p is the number of center points.

Polynomial approximation
A polynomial is fitted to the values obtained at the support points to describe the response variable, here stress, as a function of the explanatory variables, here the adjustable material parameters. In this study, a second-order polynomial including up to four-way interaction effects is used: where θ i and b σ are the explanatory variables and response variables, respectively. e represents the error, the difference between the observed results and the predicted values. β 0 is the constant effect. β i is the main effect of parameter θ i which determines the significance of the variable. β ij , β ijl and β ijlm are two-way, three-way, and four-way interaction effects. β ii are curvilinear (or quadratic) effects which represent the curvature in the response. The main effects and the interaction effects can be estimated from a two-level design. β ii can be calculated only when a three-(or higher) level design such as CCD is used. Eq. (8) can be rewritten in matrix notation as: where Y b σ is the response matrix, X θ is the full experimental design matrix, β is the full parameter matrix, and E is the error matrix. β can be approximated using the method of least squares (Bezerra et al., 2008), which is a multiple regression technique to minimize the residual: It should be noted that the predicted polynomial is not a physical model, it is only a statistical approximation developed using regression analysis to predict the physical response.
The response stress b σ is impacted by both explanatory variables and independent variables. The explanatory variables, θ, are the unknown constitutive material parameters which are needed to be identified. The independent variables are those variables at which a CP model should be able to predict the deformation behavior correctly. In other words, for a properly identified set of parameters Θ, a CP model should predict the deformation behavior correctly for all loading conditions considered. Then, the response stress for different loading conditions can be approximated as: where b σ i ðΘ; T; _ ϵÞ is the approximate response stress for explanatory variables Θ at a discrete strain point i, temperature T, and strain rate _ ϵ. β i ðT; _ ϵÞ is the effects vector predicted at discrete strain point i, temperature T, and strain rate _ ϵ. θ is the full normalized experimental design vector. In this study, the independent variables are the applied strains, temperatures, and strain rates.

Approximate evaluation function correction
The polynomial used to predict the response stress is an approximation for the CP simulation, and it includes some statistical error. To compensate for this error and to improve the results, CP simulations are performed at the best solution determined through the use of GA and RSM. In the next step, when using the CP simulations, the constant effect in the off-line database is corrected such that the RSM matches the simulation result for this set of parameters. The updated database is then used to obtain the next best solution. This process is continued until the results are converged and no more improvement is observed.

Sensitivity analysis
As mentioned earlier, the number of experimental runs grows exponentially with an increase in the number of variables. A CP model, especially a physics-based one, has a large number of variables. However, depending on the domain of interest (the applied strain, deformation temperature, and strain rate), some parameters may have a negligible effect on the response stress. Therefore, considering these ineffective parameters in the RSM is not efficient and practical. One way to increase the time efficiency of the methodology is the identification of these ineffective parameters using a sensitivity study. A sensitivity study is a technique to efficiently investigate how an explanatory variable impacts the response variable in the domain of interest. For conducting such a study, a low-resolution fractional factorial design is used, for more information see Section 2.2.1. The statistical significance of each parameter can be evaluated using the analysis of variance (ANOVA) (Doncaster and Davey, 2007). The results from the sensitivity study serve as a tool to select proper constitutive parameters in the optimization procedure. Here it is used as an optional step before the actual optimization procedure.

Summary
The methodology presented in this section is summarized in the flowchart presented in Fig. 2.

Results and discussion
In this section, the application of the methodology for a phenomenological CP model, see Section 3.1, a dislocation-density-based CP model, see Section 3.2, and a phenomenological CP model incorporating deformation twinning, see Section 3.3, is discussed. The   Fig. 2. Flowchart of the optimization procedure. spectral solver implemented in DAMASK (Eisenlohr et al., 2013;Roters et al., 2019;Shanthraj et al., 2015) is used to conduct the CP simulations using these three constitutive laws. The representative volume element (RVE) is made out of 512 grains. One grid point is used per grain which results in a grid of 8 � 8 � 8 points. The RVE is subjected to periodic boundary conditions.
The robustness of the optimization approach is demonstrated using three sets of reference stress-strain curves computed with the employed CP models. The performance of the optimization methodology is accessed by comparing the optimization outputs with the original values of the parameters. In addition, two other main advantages of using such synthetic reference data are: Firstly, the reproduced stress-strain curves can be considered as a perfect set of reference data. Therefore, it can be best used to investigate the capabilities of the methodology without disturbance from experimental inaccuracies. Secondly, this data set follows the physics of the CP model. Therefore, the shortcomings and physical limitations of the CP model do not deteriorate the performance of the optimization methodology.

Phenomenological crystal plasticity model
The phenomenological model used in this study was first introduced by Hutchinson for fcc crystals (Hutchinson, 1976). This model uses a critical resolved shear stress, τ 0 , as the state variable on each slip system α. This CP model is modified in DAMASK to be used for other crystal structures (Roters et al., 2010(Roters et al., , 2019. The kinematics for elasto-plastic behavior is defined within the finite deformation framework.

Constitutive law
In the phenomenological model, shear on each slip system occurs according to where τ 0 is the slip resistance, _ γ 0 is the reference shear rate, and n determines the strain rate sensitivity of slip. The influence of any set of slip system, α 0 , on the hardening behavior of a slip system α is given by: where h αα 0 is the hardening matrix. It empirically captures the micromechanical interaction among different slip systems: where h 0 , a, and τ ∞ are slip hardening parameters, which are assumed to be the same for all 12 slip systems of the {111}〈110〉 type in fcc crystals. q αα 0 is a measure for latent hardening, and its value is taken as 1.0 for coplanar slip systems α and α 0 , and 1.4 otherwise.
This renders the hardening behavior anisotropic.

Numerical inputs
3.1.2.1. Reference data sets. The material parameters listed in Table 1 are used to calculate reference stress-strain curves using the phenomenological constitutive model for a strain rate range from 10 À 5 s À 1 to 10 3 s À 1 . These material parameters are according to those reported for fcc copper (Roters et al., 2019). In this example, a random orientation distribution was used. Periodic boundary conditions were applied to this RVE which was loaded in uniaxial compression up to 40% strain. Table 2 lists the ranges selected for the adjustable parameters of the phenomenological model. These ranges are used to build the off-line databases. The ranges of the parameters are selected in a way to be according to the values reported in the literature for fcc copper and to include the original set of data.

Parameter effects and their interaction
The fitting methodology presented here does not only provide a set of parameters in a "black box" style, but it also enables to quantify the effects and interactions of the individual parameters. Such insights help to develop a quantitative and thorough understanding of the role of the underlying single crystal parameters in predicting the deformation behavior of a polycrystalline aggregate. This knowledge can be used to explore the shortcomings and possibly redundant parameters of a constitutive law towards improving existing or developing new constitutive laws.
This subsection provides a discussion on the effect of each parameter and interaction effect among different parameters in a polycrystalline aggregate. A main effect shows how the response variable is affected with a change in one of the independent variables, ignoring the effects of all other independent variables. An interaction effect implies how the effect of one independent variable may depend on the level of the other independent variables. It should be noted that the magnitude of the effects depends on the range selected for the parameters, a larger range for a variable results in a larger effect. The magnitudes of the effects are calculated using Eq. (10) and Eq. (11). In this paper, only effects with an absolute magnitude larger than 30% of the absolute magnitude of the largest effect will be shown. However, it should be noted that based on the output from ANOVA, there are many more significant effects than those shown in the figures. Fig. 3(a) shows the main effects and the interaction effects for the stress at the yield point. The only significant parameters at the yield point are slip resistance, τ 0 , and strain rate sensitivity parameter, n, with positive and negative effects, respectively. A positive main effect indicates that an increase in the variable increases the stress at the considered plastic strain level, whereas a negative main effect indicates that an increase in the variable decreases the resulting stress. For example, the main effect for τ 0 has a value of 58 for the case of _ ϵ p ¼ 10 s À 1 and ϵ p ¼ 0. This reveals that a change from 15 MPa (À 1 level) to 40 MPa (þ1 level), when all other parameters are at base level 0, results in approximately 2 � 58 MPa increase in the yield stress. Fig. 3 (b) shows the dominating main effects and the interaction effects for the stress at a plastic strain of ϵ p ¼ 0.36. At this plastic strain, the effect of other parameters becomes significant. The main effect of the strain rate sensitivity parameter, n, and the slip hardening parameter, a, is negative, while the main effect of slip resistance, τ 0 , and saturation stress, τ ∞ , is positive. The main effect of the slip hardening parameter, h 0 , is positive but much smaller than the main effect for other parameters.
The interaction between some of the parameters is significant. For example, the interaction effect between a and τ ∞ is significant and negative. A negative two-way interaction means the main effect of one variable decreases as the level of the other variable increases. Therefore, the main effect of τ ∞ is lower at a higher level of a. A positive two-way interaction effect between two variables shows the opposite, the main effect of one variable increases if the level of other variable increases. In addition, the noticeable quadratic effect for n and a is an evidence for curvature in the stress response.
The effect of all parameters increases with an increase in strain rate. However, the increase is noticeably larger for the strain rate sensitivity parameter n in comparison to the other parameters. This confirms that the rate sensitivity is mainly controlled by n. Fig. 4(a) shows the effects plot for the stress at different applied strain levels. It can be seen that the magnitude of the effect for different parameters is highly dependent on the applied strain. For all parameters, the magnitude of the effect is larger at higher applied strains except for τ 0 , for which the magnitude of the main effect decreases with an increase in applied strain. Fig. 4(b) shows the effects plot for the average strain hardening rate. Obviously, the main effect for all parameters is significant. The main effect of τ 0 , n and a is negative, while the main effect of τ ∞ is positive. Again, the effects are stronger at higher strain rates. Fig. 5 shows the results from 20 independent optimization runs. The error analysis shows that the maximum error in the predicted stress-strain curves for all 20 optimization runs is around 0.5 MPa for the load case of _ ϵ ¼ 1000 s À 1 at ϵ p ¼ 0.4. Therefore, all the solutions can be considered as a good solution to predict the deformation behavior.

Determined parameter set
The optimized values for τ 0 and n are exactly the same as the original values up to two significant digits. This reveals that these parameters have a unique contribution to the CP model, and the optimization methodology is able to capture this uniqueness.
The optimized solutions for the other parameters are located in a distribution around the original values. The optimization results for τ ∞ are distributed between 57.4 MPa and 63.1 MPa. This is around 19% of the optimization range, which is in general acceptable.
Therefore, this parameter can be predicted properly, however, with some deviation from the original value. The main reason for this error can be attributed to the boundary conditions used in the optimization. τ ∞ determines the steady-state stress at high applied strains. However, the reference data used for the optimization are limited to a maximum strain of 0.4. Therefore, no unique solution can be achieved for τ ∞ .   Table 1.

Fig. 4. (a)
The effects plot for the stress at different applied strains for _ ϵ ¼ 1 s À 1 , (b) the effects plot for the average strain hardening rate. The magnitudes of the effects are calculated using Eq. (10) and Eq. (11), and they are in MPa. The material parameters are defined in Table 1. The optimization results for a are distributed between 1.77 and 2.28, which is around 26% of the optimization range. This distribution is relatively wide but around the original reference data. This parameter mainly affects the hardening behavior. As it was observed, there is a strong interaction between a and τ ∞ and both parameters can compensate each other to a high degree. Therefore, with no unique solution for τ ∞ , it is not possible to reach a unique solution for a.
The optimization results for h 0 are distributed between 73.6 MPa and 87.6 MPa, which is around 35% of the optimization range. This distribution is relatively wide but again it is essentially accumulated around the original reference data. The main effect for h 0 is considerably smaller than those of the other parameters. Therefore, the scatter in the optimized values for this parameter has a negligible effect on the stress-strain curves. In general, it is difficult to reach a unique solution for parameters with a small effect.

Dislocation-density-based crystal plasticity model
In this subsection, the novel methodology is applied to determine parameters for a dislocation-density-based model first introduced by Ma et al. (Ma and Roters, 2004;Ma et al., 2006a). The kinematics for elasto-plastic behavior is defined within the finite deformation framework, and the model is implemented in DAMASK, for more detail see (Roters et al., 2010(Roters et al., , 2019.

Constitutive law
The shear rate, _ γ α , is related to the average velocity of mobile dislocations of the slip system, v α , by the Orowan equation: where ρ α is the mobile dislocation density of the slip system α, and b is the magnitude of the Burgers vector.
The average velocity can be determined as: where l is the average distance between the short-range barriers, t w is the waiting time to overcome a barrier, and t r is the running time for a dislocation to move from one barrier to the next one. The running velocity can be estimated as (Cereceda et al., 2016;Wang et al., 2007): where B is the drag coefficient and τ *α T is the thermal stress as defined in Eq. (19). The velocity v α b is described by an activated glide model first proposed by Kocks et al. (1975) where ΔF is the total short-range barrier energy, i.e. the activation energy for glide in the absence of any applied stresses. v 0 is the dislocation glide velocity pre-factor. p and q determine the shape of the short-range barrier. Typically, ranges of 0 < p � 1 and 1 � q � 2 are proposed (Kocks et al., 1975). τ *α T is the thermal stress and it is calculated as: τ α is the total resolved shear stress on the slip system α, and τ α μ is the athermal component of the resolved shear stress which is defined as: where μ is the shear modulus, ρ α d is the dislocation dipole density of the slip system α, and ξ αα 0 is interaction matrix between the different slip systems α and α 0 . τ * 0 is the barrier's strength, i.e. the stress required to overcome short-range barriers without thermal assistance: The evolution rates of the mobile dislocation density and the dislocation dipole density are given respectively as: and _ ρ α The first term in Eq. (22) is the dislocation multiplication rate. The multiplication rate is controlled by the dislocation mean free path Λ α , which is given as: where d g is the average grain size, and where C λ is a coefficient determining the number of dislocations that a dislocation passes before it is trapped by forest dislocations. g αα 0 are projections for the forest dislocation density as introduced in (Ma and Roters, 2004). The second term in Eq. (22) represents the decrease in the mobile dislocation density due to annihilation and dipole formation. Dislocation annihilation happens if, during a time increment dt, there is a mobile dislocation with opposite sign within the area 2d α anni v α dt. d α anni is a critical value for two mobile dislocations with opposite sign to annihilate. It is preferable to express d α anni as multiples the Burgers vector b: where C anni is the dislocation annihilation coefficient. The annihilation term in Eq. (22) is estimated based on the assumption of the same density of positive and negative dislocations.
The mobile dislocation density also decreases due to dipole formation. A dipole is formed when two mobile dislocations have a distance larger than d α anni but smaller than the critical distance for dipole formation, d α dipole , see also the first and second terms in Eq. (23). The critical distance for dipole formation is given by The third term in Eq. (23) represents the decrease in the dislocation dipole density due to the thermal annihilation of edge dislocations by climb. Thermal annihilation plays a dominant role at high temperatures (Nix et al., 1985;Zhang et al., 2013). The dislocation climb velocity, v climb , is given as: where D 0 is the self-diffusion coefficient, Ω is the activation volume for climb, and Q c is the activation energy for climb. Tables 3 and 4 are used together with temperature-dependent elastic constants from (Dever, 1972;Adams et al., 2006) to compute stress-strain curves for a temperature range from 30K to 900K and a strain rate range from 0.01 s À 1 -10 s À 1 . As the model is strain rate and temperature sensitive, the respective objective functions are used. These material parameters are according to the parameters identified from experimental data for IF steel. The orientation Table 3 Material parameters used to produce reference stress-strain data for the dislocation-density-based CP model. distribution was chosen to match the weak texture commonly observed for hot-rolled IF steel as shown in Fig. 6. Periodic boundary conditions were applied to this RVE which was loaded in uniaxial compression up to 40% strain. Table 5 list the ranges selected for the adjustable parameters for the dislocationdensity-based model. This ranges are used to build the off-line databases. The ranges of the parameters are selected relatively wide and include the original set of data. It should be noted that the optimization result is not sensitive to the selected ranges as long as the RSM can statistically describe the CP model.

Recovery parameters.
The parameters defining recovery due to dislocation climb, see Eq. (28), have a significant effect on the strain hardening behavior. However, the recovery term in the investigated constitutive law is not able to describe recovery over the range of temperatures considered in this study. More precisely, the model predicts a very rapid increase of climb velocity even for a small increase in the temperature and, hence, the formulation becomes unstable after a temperature rise of around 200K-300K. Due to resulting convergence issues, it was not possible to identify values for the recovery parameters in a systematic way for a large temperature range. Therefore, the climb based dynamic recovery is neglected in this study. Although the effect of recovery may be negligible at low temperatures, it has a considerable effect on the strain hardening behavior at high temperatures. While a modified description of recovery is currently under development, this topic is out of the scope of the current work, and it will be presented in a separate publication.

Sensitivity analysis
A sensitivity study was performed to determine the statistically significant material parameters for the ranges presented in Table 5. The significance of each parameter was evaluated using the ANOVA test. It was observed that, despite the relatively large ranges selected for d g and B, their effects are not statistically significant for any loading condition. Therefore, they have been removed from the optimization process, and their values have been fixed at 50 μm and 10 À 4 Pa s, respectively, to save computational time.

Parameter effects and their interaction
3.2.4.1. Yield stress. Fig. 7 shows the effects plot of the different parameters at the yield point for the resulting stress. It can be seen that the main effect for p, τ * 0 , ΔF, and ρ α 0 is positive. Therefore, the resulting yield stress increases with an increase in these variables. The main effect for q and v 0 is negative, and an increase in the magnitude of these variables decreases the resulting yield stress.
Obviously, there are significant interactions between some of the parameters. For example, the interaction between p and τ * 0 is noticeably strong and positive. Therefore, the main effect of p is higher at a higher level of τ * 0 . The negative two-way interaction effect between p and v 0 shows the opposite, the main effect of p decreases if the level of v 0 increases.
The ANOVA also shows that some of the three-way and four-way interaction effects are statistically significant. This is especially the case at high temperatures and for interaction effects between p, q, τ * 0 , ΔF, and v 0 . Including the high-order interaction terms in the polynomial approximation therefore results in a better prediction of the response stress. Hence, in this study, up to four-way Table 4 Coefficients of interaction between different slip systems for iron from (Madec and Kubin, 2017 Fig. 7. The main effects and interactions effects for the stress at the yield point (a) for _ ϵ ¼ 1:0s À 1 at varying temperatures, (b) for T ¼ 373 K at varying strain rates. The magnitudes of the effects are calculated using Eq. (10) and Eq. (11), and they are in MPa. The material parameters are defined in Table 3.  Table 3. interaction effects are used, see Eq. (8). Fig. 8(a) shows the influence of the dominant parameters on the temperature sensitivity, ðb σ T1 À b σ T2 Þj _ ϵp¼const , at the yield point. The results show that the temperature sensitivity is mainly controlled by the parameters which define the barrier, i.e. τ * 0 , ΔF, p, and q. The main effect for p, τ * 0 , and ΔF is positive, while the main effect for q is negative. There are also significant interactions between these parameters. Fig. 8(b) shows the influence of the dominant parameters on the strain rate sensitivity, ðb σ _ ϵp 1 À b σ _ ϵp 2 Þj T¼const , at the yield point. Similar to the temperature sensitivity, the strain rate sensitivity is mainly controlled by the barrier's parameters. However, the influence of v 0 is more pronounced than for the temperature dependency. The main effect for p and τ * 0 is positive, while the main effect for q and v 0 is negative. As for the temperature sensitivity, there are some significant two-way interaction effects.

3.2.4.4.
Hardening-rate sensitivity. Fig. 9 shows the effects plot for the average strain hardening rate. The most significant parameters are C anni and C λ . The main effect for both parameters are negative while their interaction effect is positive. Therefore, the effect of each of these two parameters is stronger when the other parameter is at its lowest level. The noticeable quadratic effect for C λ is an evidence for curvature in the stress response. It can be seen that the magnitude of the main effect is strongly dependent on the applied strain. Moreover, the effect of C anni is stronger than C λ at higher strains. Therefore, a higher value for C anni results in an earlier steady-state condition in the stress-strain curve. This result shows that, despite the strong interaction between these two parameters, there is a relevant difference between their roles.

Grouping the parameters
It is important to notice that there is no significant interaction between C anni and C λ with the other parameters, i.e. p, τ * 0 , ΔF, ρ α 0 , q, and v 0 . This indicates the possibility of grouping the parameters into two blocks in the RSM study. The main advantage of grouping the parameters is the substantial decrease in the number of simulations required to build the off-line database. This reduction is especially significant when a model requires a large number of parameters and when these parameters can be split into independent groups. For the considered case of the dislocation-density-based model, 2 8 ¼ 256 simulations are needed per loading conditions to build a fullfractional database. By grouping the parameters in two blocks of six and two parameters, only 2 6 þ 2 2 ¼ 68 simulations are needed. In addition, grouping results in a lower error in the response estimated by the RSM. Therefore, a better convergence for the optimization procedure can be achieved.
A closer look on the physical effect of the parameters reveals that the yield stress is determined mainly by p, τ * 0 , ΔF, ρ α 0 , q, and v 0 , while C anni and C λ are dominating the hardening behavior. Therefore, the two groups of the parameters can be determined independently from each other. For example, using the experimental data for yield strength and in the absence of hardening data, one can still accurately identify p, τ * 0 , ΔF, ρ α 0 , q, and v 0 . This is of practical relevance since for many materials the yield strength is well characterized at different deformation temperatures and strain rates.
The results in the following are obtained by separating the parameters into two groups based on their interactions and their effects.
The first group includes p, τ * 0 , ΔF, ρ α , q, and v 0 , and the second group includes C anni and C λ . Fig. 10 shows the results from 20 independent optimization runs, and the complete set of optimized parameters for five Fig. 9. The main effects and interactions effects for stress at varying applied strains. The magnitudes of the effects are calculated using Eq. (10) and Eq. (11), and they are in MPa. The material parameters are defined in Table 3.

Determined parameter set
optimization runs is listed in Table 6. Except for one optimization (run 11, see the outliers in Fig. 10), the optimization methodology recovers the original parameters with good accuracy. Even for run 11 the relative error in yield stress is below 1%. The optimized values for ρ α , C anni , and C λ are exactly the same as the original values up to two significant figures. This reveals that these parameters have a unique contribution to the CP model, and the optimization methodology is, therefore, able to recover the original values.
For v 0 , the optimization outputs are located in a relatively narrow distribution around the original value between 2.29 � 10 4 ms À 1 to 7.94� 10 4 ms À 1 , which is around 5.7% of the optimization range. The main effect for v 0 is relatively small in comparison to the other parameters. Therefore, it is in general difficult to reach the original value for this parameter, and the obtained values scatter around the original value.
For p, q, ΔF, and τ * 0 the optimization outputs are located in a wider distribution around the original values. For p the optimization results are distributed between 0.32 and 0.36, which is around 9.6% of the optimization range. The distribution for q is between 1.43 and 1.58, which is around 25% of the optimization range. The optimization results for ΔF are distributed between 1.73 � 10 À 19 J to 1.92 � 10 À 19 J, which is around 19% of the optimization range. The distribution for τ * 0 is between 388 MPa and 412 MPa, which is around 9.6% of the optimization range.
p, q, ΔF, and τ * 0 have a strong interaction with each other, and the stress-strain curves from the CP model are underdetermined for a Fig. 10. The box plot of the optimized solutions from 20 optimization runs for the dislocation-density-based model. Table 6 The complete set of optimized parameters for five different optimization runs. optimization run  large range of temperatures and strain rates. As a result, the difference between the error from different sets of solutions is very subtle and reaching the exact original set of values for these parameters is not possible. This topic will be discussed in more detail in a subsequent paper which is focused on the numerics of the dislocation-density-based model. Fig. 11 shows the error in the yield stress prediction for 20 optimization runs. It can be clearly seen that the error for the different sets of the optimized solutions is considerably small, even for optimization run 11. The maximum error mostly occurs at the lowest temperature loading condition, i.e. T ¼ 30K, where the stress magnitude is in the order of 1 GPa. Here, the maximum relative error is approximately 1% for the optimization run 11.

Robustness
Here we used simulated data as reference, however, the experimental data usually used as input contains non-systematic errors. These errors may affect the robustness and performance of an optimization algorithm in reaching a solution. To investigate the effects of imperfection on the robustness of the optimization methodology introduced in this study, an imperfect set of reference data was created by introducing a random error between � 2.5% of the yield stress into the reference stress-strain curves.
Although the error in an experimental set of data can exceed �2.5%, these errors are normally systematic, e.g. creating a shift in the experimental data. Such a shift in the reference data set does not affect the robustness of the methodology. However, it results in a different set of optimized parameters compared to the original reference data set. Unlike the systematic errors, random non-systematic errors can deteriorate the performance of an optimization methodology. Therefore, in this study, a random non-systematic error of �2.5% has been introduced into the perfect synthetic reference data set. The value of �2.5% is selected to keep the correct trend in the strain rate and temperature sensitivity, e.g. to avoid a lower yield point at higher strain rates. As in the case of experimental results, the employed model is not able to reproduce these deteriorated curves exactly. Fig. 12 shows the optimization outputs for 20 independent optimization runs. It can be seen that the random error does not affect the values of the obtained parameters noticeably. The converged values for ρ α 0 , C λ , and d α anni are almost the same as the original values with less than 1% deviation from the original data. The optimization results for v 0 are distributed between 1.2 � 10 4 ms À 1 to 4.6 � 10 4 ms À 1 , which is around 1% of the optimization range. The distribution for ΔF is between 1.73 � 10 À 19 J to 1.84 � 10 À 19 J, which is around 11% of the optimization range. For τ * 0 the distribution is between 385 MPa and 411 MPa, which is around 10.4% of the optimization range; For p it is between 0.32 and 0.35, which is around 6.7% of the optimization range. The distribution for q is between 1.42 and 1.54, which is around 20% of the optimization range.
Interestingly, the distribution for v 0 , p, q, ΔF, and τ * 0 is narrower for the case of imperfect reference data than for the case of perfect reference data. This can be explained as follows: For the case of the perfect set of reference data, the ideal solution results in a fitness value of zero. A small deviation from the ideal case leads to a small error in the reproduced stress-strain curves. According to the equations presented in Section 2.1.2, the individual objective functions are calculated by the sum of the square of the errors. Consequently, the fitness values is less affected by small errors. While for the case of the imperfect set of reference data, the error for an imaginary best solution is already large. Therefore, a deviation from the best solution results in a relatively larger error and the convergence to the optimal solution is easier.

Performance analysis for the genetic algorithm
As mentioned in Section 2.1, a genetic algorithm consists of several steps and operations. In this section, the effect of two important features of the GA, namely, population size and type of crossover, on the robustness and efficiency of the GA are studied using the perfect reference data introduced above in Section 3.2.2. Two types of crossover, namely single-point crossover (SPC) and reduced surrogate crossover (SRC), are investigated for different population sizes ranging from 200 to 8000 chromosomes. For each case study, 25 GA runs are performed, and the resulting data is statistically analyzed.
3.2.8.1. Efficiency of the GA. Fig. 13(a) shows the number of generations required for the GA to converge to a solution. It can be seen that the number of generations needed for the GA to converge is dependent on the type of the crossover used. The number of Fig. 12. The optimized solution from 20 optimization runs for the dislocation-density-based model in the case of imperfect set of reference data. generations to convergence is higher for the reduced surrogate crossover than for the single-point crossover. This is because the reduced surrogate crossover avoids producing parent clones. Therefore, for this crossover technique, the second stop criterion, i.e. no improvement in the best solution for 50 generations, is usually responsible for terminating the GA. The population size also has a noticeable effect on the number of generations required to convergence. For both crossover methods, the number of generations needed for convergence increases with an increase in the population size. Fig. 13(b) shows the distribution of the fitness value of the converged solutions for different population sizes and types of crossover. The population size has a significant effect on the mean and the range of the distribution. For a comparably small population size of 200, the range of the distribution is wide, and most of the optimization processes do not converge to the optimal solution. The distribution range of the fitness values for the population size of 500 is almost two times smaller than the distribution of the fitness values for the population size of 200. The rate of improvement in the quality of the optimal solution, the fitness value, decreases for large population sizes.

Robustness of the GA.
A comparison between the results from the two types of crossover reveals that the reduced surrogate crossover results in a narrower distribution of the fitness values and a higher quality of the solution. However, this distinction becomes insignificant at high population sizes. Since single-point crossover needs a smaller number of generations to convergence, and the time efficiency of the GA has a  Percentage of GA runs terminated by the second stop criterion, i.e. no improvement in the best solution for 50 generations. The rest of the GA runs were terminated by the first stop criterion, i.e. 70% of the chromosomes in a generation become equal, and none of the GA runs was terminated by the third stop criterion, i.e. reaching 500 generations. The results are shown for different population sizes and two types of crossover. SPC and RSC stand for single-point crossover and reduced surrogate crossover, respectively. strong relationship with the number of generations, it can be concluded that the single-point crossover has better efficiency. Fig. 14 shows the percentage of GA runs terminated by the second stop criterion. It can be seen that in the case of using the single-point crossover and small population sizes most of the optimization runs are terminated by the first criterion, i.e. 70% of the chromosomes in a generation become equal. Hence, only for a few limited cases the second criterion, i.e. no improvement in the best solution for 50 generations, is responsible for the termination. However, at large population sizes, i.e. 4000 and 8000, the second criterion is responsible for the termination of most of GA runs. For the reduced surrogate crossover, the second criterion is responsible for the termination of most of the GA runs independently of the population size. None of the optimization cases was terminated by the third stop criterion, i.e. reaching 500 generations.

Phenomenological crystal plasticity model including deformation twinning
As a third and final application example, the performance of the methodology is investigated when numerous deformation mechanisms are available. For this purpose, the presented parameter identification procedure has been used to identify the constitutive parameters of a CP model for a Magnesium alloy with hexagonal close-packed crystal (hcp) structure. The availability of multiple slip and twin systems makes the plastic deformation of Mg alloys strongly anisotropic and non-linear. The used CP model is an extension of the above outlined phenomenological CP model (Hutchinson, 1976;Roters et al., 2019) in which twinning is modeled in a pseudo-shear approach (Kalidindi, 1998). It is briefly outlined in the following section. The kinematics for elasto-plastic behavior is defined within the finite deformation framework, and the model is part of the current DAMASK version 2.0.3. For more details the reader is referred to (Roters et al., 2010(Roters et al., , 2019.

Constitutive law
The plastic component of the material's internal state is parametrized in terms of a set of shear resistances τ 0 on N sþtw ¼ N s þ N tw slip and twin systems. During the deformation, the resistances on the slip systems α ¼ 1, …, N s evolve from their initial value τ 0 asymptotically to a system-dependent saturation value depending on the shear on different slip and twin systems as: where h are the slip-slip [142,143] and slip-twin interaction matrices, h sÀ s 0 is a slip hardening fitting parameter, and τ α 0 ∞ is the set of saturation stresses, respectively.
In a similar way, the resistances on the twin systems β ¼ 1, …, N tw evolve as: Table 7 Material parameters used to produce reference stress-strain data for the phenomenological model incorporating deformation twinning.
where h twÀ s 0 and h twÀ tw 0 are twinning-slip and twinning-twinning hardening fitting parameters, respectively. Given a set of current slip resistances, shear on each slip system occurs according to and shear due to mechanical twinning occurs according to where H is the HEAVISIDE or unit step function. f tot tw is the total twin volume fraction, and it is given by where γ char is the characteristic shear due to mechanical twinning, the value of which depends on the twin system.

Numerical inputs
3.3.2.1. Reference data sets. The material parameters listed in Table 7 have been used to calculate reference stress-strain curves using the phenomenological constitutive model incorporating deformation twinning. These material parameters have been chosen according to those reported for magnesium alloys in (Roters et al., 2019). It should be noted that n tw has a significant effect on the convergence and the numerical costs of the simulations. Zhang and Joshi (2012) and Herrera-Solaz et al. (2014) observed that for a large n tw , the convergence becomes significantly difficult and time-consuming. In this study, we also observed that the convergence time increases significantly when n tw exceeds 10. Furthermore, it was observed for some combinations of the material parameters that no convergence is achieved when n tw is larger than 10. Therefore, here, a smaller value for n tw has been used compared to what was reported in (Roters et al., 2019). The crystallographic orientation distribution used here is a strong basal texture with the c-axis parallel to the normal direction (ND), a typical crystallographic texture observed for rolled Mg alloys. The pole figures of the crystallographic orientation distribution are plotted in Fig. 15. For such a strong texture, the plastic deformation behavior is highly anisotropic, especially for an hcp crystal structure. This indicates that for different loading directions different slip and/or twin systems are activated. If a slip or twinning family is not activated for any of the imposed loading conditions, it will not be possible to identify the values of the underlying material parameters. To enable the identification of all material parameters, it is necessary to use reference data for different strain modes. For this purpose, the simulations have been performed under both compression and tension for seven different loading conditions, as shown in Fig. 16. In the figure, the arrow indicates the loading orientation (loading angle) for tension/compression, and the dark gray face indicates the constraint direction for plane-strain loading conditions. All simulations have been performed at different strain rates ranging from 10 À 5 s À 1 to 1s À 1 . Fig. 15. Pole figures of the initial texture of the Mg alloy used as input to create the RVE for the simulations. The intensities in the legend are given in units of multiples of a random distribution, whereby a random texture would give a value of unity. Table 8 lists the ranges selected for the adjustable parameters of the CP model. These ranges have been selected according to those experimentally reported in the literature for single crystal Mg alloys, see Zhang and Joshi (2012) for more detailed information. The spread of the reported data in the literature for τ 0,pyr〈cþa〉 and τ 0,C2 is larger in comparison with the rest of the parameters. Therefore wide ranges for these two parameters have been considered.

Sensitivity and effect analysis
A sensitivity analysis is performed to determine the statistically significant material parameters for the ranges presented in Table 8. Fig. 17 and Fig. 18 show the effects plot for the resulting stress, respectively, at ϵ p ¼ 0 and ϵ p ¼ 0.3 under different loading conditions. It should be noted that based on the output from ANOVA, there are many more statistically significant effects than those shown in the figures, however, only the most pronounced effects are shown here.
The anisotropic plastic behavior is fully reflected in the effect analysis. The effects and interaction effects of different parameters strongly depend on the loading conditions (loading orientation, loading direction, and loading mode). For example, a compressive loading along the c-axis (load case c) can be accommodated by pyramidal 〈c þ a〉 slip and compression twin modes. As expected, the effect analysis shows that under this loading condition the main effects and interaction effects for the pyramidal 〈c þ a〉 slip and compression twin modes are high, while the effects of the other slip and twinning modes are relatively small or even insignificant.
We observed from the effect analysis that some of the loading conditions used in this study are mutually dependent, i.e. loading condition a (uniaxial compression/tension along RD) with loading condition g (plane-strain compression/tension along RD with ND as the constraint direction), loading condition b (uniaxial compression/tension in the RD-ND plane at 45 � from both RD and ND) with loading condition e (plane-strain compression/tension in the RD-ND plane at 45 � from both RD and ND with TD as the constraint direction), and loading condition c (uniaxial compression/tension along ND) with loading condition f (plane-strain compression/ tension along ND with TD as the constraint direction). Therefore, the results of the effect analysis for loading conditions e, f, and g are not presented here. It should be noted that, although these loading conditions have been considered in the optimization procedure, their effects on the optimized solutions are negligible.
The accuracy of the optimization output for each parameter is dependent on how significant the effects of that parameter are. It is obvious that if a slip or twinning family is not activated in any of the imposed loading conditions, the effect for the related parameters will be negligible, and it will be impossible to identify those parameters accurately. According to the effect analysis, for all parameters, except h twÀ s 0 , there is at least one loading condition where the effect of the parameter is significant. Therefore, theoretically, the loading conditions used in this study suffice to determine all the parameters except for h twÀ s 0 .

Grouping the parameters
The only statistically significant parameters at the yield stress are τ 0,basal , τ 0,pris , τ 0,pyr〈a〉 , τ 0,pyr〈cþa〉 , τ 0,T1 , τ 0,C2 , n s , and n tw . As mentioned earlier, splitting the parameters into different groups results in a substantial decrease in the number of simulations required to build the off-line database. Therefore, in this application example, the parameters have been divided into two different groups. The first group includes τ 0,basal , τ 0,pris , τ 0,pyr〈a〉 , τ 0,pyr〈cþa〉 , τ 0,T1 , τ 0,C2 , n s , and n tw . These parameters will be determined using the yield stress The arrows indicate the loading orientation for the loading direction (tension/compression), and the dark gray faces indicate the constraint direction for the plane-strain loading conditions. (a) uniaxial compression/ tension along RD, (b) uniaxial compression/tension in the RD-ND plane at 45 � from both RD and ND, (c) uniaxial compression/tension along ND, (d) plane-strain compression/tension along RD with TD as the constraint direction, (e) plane-strain compression/tension in the RD-ND plane at 45 � from both RD and ND with TD as the constraint direction, (f) plane-strain compression/tension along ND with TD as the constraint direction, (g) planestrain compression/tension along RD with ND as the constraint direction.
It should be noted that τ 0,pyr〈a〉 has been considered in both groups. Although the effect of τ 0,pyr〈a〉 is statistically significant for loading condition a, uniaxial compression/tension along RD, its magnitude is much smaller than the rest of the parameters. Besides,   Table 7. there is a strong interaction between this parameter and τ 0,pris . As a result, uniquely identifying τ 0,pyr〈a〉 using only the yield stress data is difficult. To overcome this problem, this parameter was considered in both groups. Despite the relatively large range selected for h twÀ s 0 , its effect is not significant at any loading conditions. Therefore, it has been removed from the optimization procedure, and its value has been fixed as 150 MPa to save computation time. Fig. 19 shows the results from 20 independent optimization runs. It can be seen that the optimization methodology recovers the original parameters for the first group of parameters, namely τ 0,basal , τ 0,pris , τ 0,pyr〈a〉 , τ 0,pyr〈cþa〉 , τ 0,T1 , τ 0,C2 , n s , and n tw , with an accuracy of � 1%. This reveals that all these parameters have a unique contribution to the CP model, and the optimization methodology is able to capture this uniqueness. Unlike the first group of parameters, a distribution is achieved for the optimized solutions of the second group of parameters, namely τ ∞,basal , τ ∞,pris , τ ∞,pyr〈a〉 , τ ∞,pyr〈cþa〉 , h sÀ s 0 , and a. It should be noted that the error in the predicted stress-strain curves is less than 0.6 MPa for all 20 optimization runs despite the distribution achieved for these parameters. As mentioned in Section 3.1, a and τ ∞ are strongly correlated and mutually dependent. Considering both parameters in the optimization leads to an underdetermined system that has multiple feasible solutions. Therefore, no unique solution can be achieved when both a and τ ∞ are considered in the optimization procedure. The mutually dependent parameters in CP models are, in general, a serious obstacle when aiming to identify a unique solution for the associated optimization problems. This issue is often not noticed (or maybe not reported) since the optimization methodologies are usually performed using experimental data. In this case, there is no reference set of parameters to evaluate the quality of the solution, and a parameter set is considered as good if the experimental data can be reproduced. One way to overcome such a problem is to identify these parameters and to remove them from the optimization process. Fig. 20 shows the results from 20 independent optimization runs when a is removed from the optimization procedure and fixed at a value of 2.5. It can be seen that now all the parameters can be determined uniquely, and the optimization methodology can recover the original values for the parameters. On the long run, it is desirable to develop constitutive models without such redundant parameters.  Fig. 16, (a) loading condition auniaxial compression/tension along RD, (b) loading condition buniaxial compression/tension in the RD-ND plane at 45 � from both RD and ND, (c), loading conditions cuniaxial compression/tension along ND, and (d) loading condition dplane-strain compression/tension along RD with TD as the constraint direction. The results are shown for _ ϵ ¼ 0.1 s À 1 under compression and tension. The magnitudes of the effects are calculated using Eq. (10) and Eq. (11), and they are in MPa. The material parameters are defined in Table 7.

Conclusions
In this study, a novel and computationally efficient approach was developed for identifying constitutive parameters from macroscopic stress-strain data. The power of this methodology was demonstrated on the examples of a phenomenological, a dislocation-density-based CP model, and a phenomenological model incorporating deformation twinning. It was shown that the methodology reliably reproduces the known parameters of simulated reference compression/tension tests. Moreover, the outcome of the identification procedure helps to quantitatively and systematically study the role of the underlying single crystal parameters in predicting the deformation behavior of a polycrystalline aggregate.
The main observations for the phenomenological model are: • The only significant parameters at the yield point are τ 0 and n with positive and negative effects, respectively.
• At higher strains, all parameters, except h 0 , have a significant effect with negative effect for n and a and positive effect for τ 0 and τ ∞ .
• The main effect for h 0 is much smaller than for all other parameters.  • The optimization methodology can reach a unique solution for τ 0 and n, while it predicts a (small) distribution for the other parameters.
The main observations for the dislocation-density-based model are: • The main effect for p, τ * 0 , ΔF, and ρ α 0 is positive, i.e. the resulting stress increases with an increase in these parameters. • The main effect for q and v 0 is negative, i.e. the resulting stress decreases with an increase in these parameters.
• p, q, ΔF, and τ * 0 have a strong interaction with each other, and the CP model is underdetermined for a large range of temperature and strain rates.
• There is no significant interaction between d α anni and C λ and the other parameters, i.e. p, τ * 0 , ΔF, ρ α 0 , q, and v 0 . Therefore, the parameters can be grouped into two independent blocks.
• p, τ * 0 , ΔF, ρ α 0 , q, and v 0 can be determined from the yield stress data, while d α anni and C λ can be determined from the hardening behavior.
• The optimization methodology can reach a unique solution for ρ α , d α anni , and C λ , while it predicts a narrow distribution for τ * 0 , ΔF, p, q, and v 0 .
The main observations for the phenomenological model incorporating deformation twinning are: • The significant parameters at the yield point are τ 0,basal , τ 0,pris , τ 0,pyr〈a〉 , τ 0,pyr〈cþa〉 , τ 0,T1 , τ 0,C2 , n s , and n tw . These parameters can be uniquely determined if a sufficiently variable set of loading conditions is used in the optimization procedure to capture the strong anisotropic plastic deformation behavior of hcp materials.
• One way to overcome the problem of correlated parameters is to identify these parameters with the help of the current methodology and to remove them from the optimization process. The results show that a unique solution for τ ∞;basal ; τ ∞;pris ; τ ∞;pyr〈a〉 ; τ ∞;pyr〈cþa〉 ; h sÀ s 0 can be achieved, if a is removed from the optimization procedure.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.