1 Introduction

Numerous constitutive soil models were developed in the past and are still being developed today. Many of the constitutive models utilized today originate either from the framework of hypoplasticity or elasto-plasticity. Prominent and established representatives of such advanced models for cohesionless soils are Sanisand [9, 51] for elasto-plasticity as well as hypoplasticity according to Wu et al. [59] or in the version of von Wolffersdorff [57] with the extension by intergranular strain by Niemunis & Herle [38]. For cohesive soils, the anisotropic visco-hypoplastic model by Niemunis et al. [39] or the clay-hypoplasticity by Mašín [34, 35] can be listed at this point.

Recently, Fuentes et al. extended the hypoplastic constitutive model with the intergranular strain anisotropy (ISA, an extension of the intergranular strain concept first presented in [12]) to describe liquefaction phenomena of sands [13, 40]. For the Sanisand model of 2004 [9], many extensions have been proposed. For instance, Taiebat & Dafalias [51] added a yield surface with closed cap, Liu et al. [30] and Liu and Pisano [29] introduced a memory surface which improves the model’s performance under cyclic loading and allows for the simulation of several thousands of loading cycles and Barrero et al. [3] improved the model for cyclic shearing of sands in semifluidized states, among others. New and further developments in the field of constitutive models for cohesive soils are for example the anisotropic visco-ISA (AVISA) model by Tafili & Triantafyllidis [50], the coupling of clay-hypoplasticity with ISA-plasticity by Fuentes et al. [14], the barodesy with intergranular strain extension by Bode et al. [5] or a novel hypoplastic model for overconsolidated clays by Wang and Wu [55, 56].

Even if the above-mentioned works are only an excerpt from the constitutive models published in the recent years, they are testimony to the rapid and varied developments that have taken place in this field of soil mechanics. Many new models extend and improve the prediction quality of the old ones. This is usually accompanied by an increase in the number of material parameters which need to be calibrated. The quality of the forecast depends significantly on a suitable choice of these parameters. However, their calibration is often time-consuming and requires a high degree of experience in handling the model to be calibrated because many of the parameters cannot be identified straight forward by specific experiments or empirical equations. This makes the application of such advanced soil models a very challenging task not only for beginners but also for experienced engineers and researchers and by this often prevents the application in practical projects and to boundary value problems. Automatic Calibration (AC) aims to simplify and speed up the calibration process and, in particular, to reduce the application hurdles when using advanced constitutive soil models. In addition, AC helps to reduce the "human factor" in the calibration of model parameters (e.g., subconscious personal preferences, outcome-based calibrations and experience of the person performing the calibration). Another, not as obvious advantage of AC is the possibility to identify previously unknown relations between individual parameters. The detection of such relations can be used for a possible reduction in the number of required parameters of the model. A well-established tool for AC of constitutive soil models is ExCalibre by Kadlíček et al. [23,24,25] (https://soilmodels.com/excalibre/). It allows to estimate parameters of constitutive soil models such as clay and sand hypoplasticity based on the results of oedometric compression and monotonic triaxial tests. However, the software only allows to calibrate a full set of parameters (calibration of only individual parameters is not possible) and, in case of the hypoplastic model for sands, produces parameters that underestimate dilatancy [36]. In addition, custom implementations of soil models cannot be linked to the ExCalibre. Recently, an AC method for a hypoplastic model based on genetic algorithms was presented by Mendez et al. [36]. The therein developed AC uses a Fréchet Distance-based similarity measure and was able to determine parameters for the hypoplastic model that lead to simulation results closer to the experiments than those calibrated by hand. However, the influence of the optimization method and of the similarity measure used for determining the discrepancy between experiment and simulation remains unclear. In addition, the previous work was limited to a rather small number of laboratory tests when calibrating the parameters (in [36] a maximum of two oedometric compression tests and three drained monotonic triaxial tests were used for calibration). It is unclear how AC performs with a larger data set and what possible limitations follow from a calibration based on a reduced data set.

In this paper, a calibration software for constitutive soil models is presented. The calibration is based on two optimization methods: Differential Evolution (DE) [49] and Particle Swarm Optimization (PSO). The influence of the similarity measures used to quantify the differences between laboratory tests and simulation is investigated. The software is presented by means of the calibration of the parameters for a hypoplastic constitutive model for two different sands. The number of experiments taken into account is varied in order to check the influence of the data basis.

Both, DE and PSO fall in the category of so-called direct or derivative-free algorithms. Such algorithms make few or no assumptions about the underlying optimization problem and can quickly explore very large design spaces (ranges of parameters). In the field of geotechnics, they have been applied to the calibration of the parameters of the Mohr-Coulomb model by means of inverse analyses [41], to the calibration of an elasto-plastic constitutive model [43], to the automatic selection of constitutive models [21] or to the continuous parameter identification in staged excavation simulations [22]. In [36], a Genetic Algorithm (GA, also belonging to the category of direct algorithms) was used to calibrate the parameters of a hypoplastic model and in [45] for the identification of maximum discontinuity frequency in a complex rock structure. Recently, [16] used a combination of DEM simulations and Artificial Neural Networks to determine the parameters of Duncan–Chang model, namely the tangent elastic modulus and the ultimate deviatoric stress.

The paper is structured as follows. The developed global optimizer, including the constitutive equations, the definition of the cost function as well as the optimization methods are introduced in Sect. 2. Section 3 presents the experimental data set, and the corresponding reference parameter sets, based on which the performance of the optimization is evaluated. The influence of the choice of optimization method and the choice of similarity measure to calculate the error function is investigated in Sects. 4 and 5, respectively. Section 6 focuses on the solution uniqueness and Sect. 7 investigates the influence of the extent of the available data for calibration. The conclusions of the study are given in Sect. 8.

2 Numerical tool set

The Automatic Calibration (AC) software has been developed with the aim to provide easy access to the calibration of different (advanced) constitutive soil models. Several of such advanced constitutive models are implemented in the finite element program numgeo (Machaček & Staubach, [31, 32, 47, 48]). The parameter calibration should preferably be performed by using the identical implementation of the constitutive model as for the later simulation of the boundary value problem. In particular for advanced constitutive soil models the implementation can have an important influence on the response of the model. Therefore, the calibration software was built on top of numgeo. This allows easy access to all constitutive soil models implemented in numgeo and thus to utilize the same implementation for calibration and the subsequent calculation of boundary value problems. Although this work is restricted to the calibration of parameters for a hypoplastic constitutive model (see Sect. 2.1), the implementation can easily be extended to other constitutive models implemented in numgeo.

The AC software is implemented in Python, leveraged by numerical libraries such as numpy [17] and scipy [53]. An overview of the software is given in Fig. 1. A database has been designed to access and store several (and different in kind) standard laboratory tests such as oedometric compression tests or (drained monotonic) triaxial tests. Functions are available for pre-processing (if desired/required) of the raw data, such as filtering, data reduction or interpolation. Besides the global optimizer, which is the focus of this article, the software also offers the possibility to automatically estimate the parameters based on simplified calibration methods, such as the one according to Herle [18]. Two heuristic optimization algorithms are available for global parameter optimization, namely PSO and DE [49]. The global optimizer operates on the set of model parameters, comparing the numerical prediction of the constitutive model to the experimental results stored in the database until a best set (the parameters that characterize better the material behavior) is identified, see Sects. 2.3.2 and 2.3.1. Interfaces to various comparison measures are implemented as explained in more detail in Sect. 2.2.

Fig. 1
figure 1

Overview of the AC tool set for the optimization of parameters of constitutive models for soils

2.1 Hypoplastic model for sand

The finite element program numgeo implements a hypoplastic constitutive model for granular materials [59] with a predefined limit state surface [57] and intergranular strain extension [38]. The basic hypoplastic model of von Wolffersdorff [57] interrelates the stress rate \({\dot{{\varvec{\sigma }}}}\) with the strain rate \({\dot{{\varvec{\varepsilon }}}}\):

$$\begin{aligned} {\dot{{\varvec{\sigma }}}} = {\mathsf {L}}:{\dot{{\varvec{\varepsilon }}}} + \mathbf{N}\Vert {\dot{{\varvec{\varepsilon }}}\Vert }. \end{aligned}$$
(1)

Therein, L is a fourth order tensor being linear in \({\dot{{\varvec{\varepsilon }}}}\), whereas \(\mathbf{N}\) is a second order tensor being nonlinear in \({\dot{{\varvec{\varepsilon }}}}\). Both stiffness tensors L and \(\mathbf{N}\) are functions of stress and void ratio and given by the following equations [57]:

$$\begin{aligned} \mathsf{L} =\, &f_b f_e \frac{1}{\text{ tr }(\hat{{\varvec{\sigma }}}\cdot \hat{{\varvec{\sigma }}})} (F^2 \mathsf{I}+ \textit{a}^2 \hat{{\varvec{\sigma }}} \hat{{\varvec{\sigma }}}), \end{aligned}$$
(2)
$$\begin{aligned} {{\textbf {N}}} =\, &f_b f_e f_d \frac{Fa}{\text{ tr }({\hat{{\varvec{\sigma }}}}\cdot {\hat{{\varvec{\sigma }}}})} ({\hat{{\varvec{\sigma }}}}+{\hat{{\varvec{\sigma }}}}^*). \end{aligned}$$
(3)

Therein \({\hat{{\varvec{\sigma }}}} = \dfrac{{\varvec{\sigma }}}{\text {tr}{\varvec{\sigma }}}\) and \({\hat{{\varvec{\sigma }}}}^{*} = {\hat{{\varvec{\sigma }}}}-\frac{1}{3}\mathbf{I}\) are used. The scalar factors are defined by

$$\begin{aligned} a=\frac{\sqrt{3}(3-\sin \varphi _c)}{2\sqrt{2}\sin \varphi _c}~,~ f_d=\left( \frac{e-e_d}{e_c-e_d}\right) ^\alpha ~,~f_e=\left( \frac{e_c}{e}\right) ^\beta \end{aligned}$$
(4)

and

$$\begin{aligned} f_{b} =\, &\frac{h_s}{n}\left( \frac{e_{i0}}{e_{c0}}\right) ^{\beta }\frac{1+e_i}{e_i} \left( \frac{3p}{h_s}\right) ^{1-n}\nonumber \\ & \left[ 3 +a^2 -a\sqrt{3}\left( \frac{e_{i0}-e_{d0}}{e_{c0}-e_{d0}}\right) ^{\alpha }\right] ^{-1}. \end{aligned}$$
(5)

\(\varphi _c\), \(h_s\), n, \(e_{i0}\), \(e_{d0}\), \(e_{c0}\), \(\alpha \) and \(\beta \) are parameters and e is the actual void ratio. The pressure-dependent void ratios \(e_i\), \(e_c\) and \(e_d\) in Eq. (4), describing the loosest, the critical and the densest state, are calculated using the following relation [4, 59]

$$\begin{aligned} \frac{e_i}{e_{i0}}=\frac{e_c}{e_{c0}}=\frac{e_d}{e_{d0}}=\exp \left[ -\left( \frac{3p}{h_s} \right) ^n \right] . \end{aligned}$$
(6)

\(p=\sigma _{ii}/3\) is the mean effective stress. The scalar factor F in Eq. (2) and Eq. (3) is given by

$$\begin{aligned} F=\sqrt{\frac{1}{8}\tan (\psi )^2+\frac{2-\tan (\psi )^2}{2+\sqrt{2}\tan (\psi )\cos (3\theta )}} -\frac{1}{2\sqrt{2}}\tan (\psi ), \end{aligned}$$
(7)

with \(\tan (\psi )=\sqrt{3}\Vert {\hat{{\varvec{\sigma }}}}^*\Vert \) and

$$\begin{aligned} \cos (3\theta )=-\sqrt{6}\frac{\text{ tr }({\hat{{\varvec{\sigma }}}}^*\cdot {\hat{{\varvec{\sigma }}}}^*\cdot {\hat{{\varvec{\sigma }}}}^*)}{\left[ \text{ tr }({\hat{{\varvec{\sigma }}}}^*\cdot {\hat{{\varvec{\sigma }}}}^*)\right] ^{3/2}}. \end{aligned}$$
(8)

To improve the performance of the hypoplastic model in the range of small strains Niemunis & Herle, [38] introduced a new tensorial state variable, the intergranular strain h, which memorizes the recent deformation history. However, since the present work focuses on monotonic loading, the introduction of the intergranular strain extension is omitted for the sake of brevity. The present implementation uses an adaptive Newton scheme for the integration of Eq. (1) considering the rate of convergence within a substepping scheme. For void ratios e exceeding the loosest possible void ratio \(e_i(p)\) at a given mean effective stress p, a special treatment originally developed by A. Niemunis and described in [33] is applied.

2.2 Objective/cost function

The basis of any optimization is the existence of a (scalar) objective function (“cost function”) which has to be minimized by changing the parameters in the course of the optimization. The cost function in this work is defined as follows:

$$\begin{aligned} \epsilon = \sum _{iT}^{nT} W_{iT} \epsilon _{iT} = \sum _{iT}^{nT} W_{iT} \left( \sum _{it}^{nt} w_{iT}^{it} \epsilon _{iT}^{it} \right) , \end{aligned}$$
(9)

where \(\epsilon _{iT}\) is the resulting error for laboratory test type iT and nT is the total number of different types of laboratory tests which form the basis for the optimization. In the present work, oedometric compression tests (\(iT = oc\)) and drained monotonic triaxial tests (\(iT = dmt\)) are used. \(W_{iT}\) are the weighting functions, which control the calibration procedure with regard to whether the simulated material behavior should represent all test types equally well or whether the behavior in individual laboratory test types (e.g., the behavior under oedometric conditions) is more important than the others. The resulting error \(\epsilon _{iT}\) is again a weighted sum (thus an averaged value) of all errors \(\epsilon ^{it}_{iT}\) of all individual tests it of one test type iT. \(w^{it}_{iT}\) are the associated weightings and nt the total number of all individual tests of test type iT. Note that the closure conditions \(\sum _{iT}^{nT} W_{iT} = 1\) and \(\sum _{it}^{nt} w^{it}_{iT}=1\) hold. In this work equal weight is put on the models’ performance in the simulation of the oedometric compression test and drained monotonic triaxial test, i.e., \(W_{oc}=1/2\) and \(W_{dmt}=1/2\). The calculation of \(\epsilon ^{it}_{iT}\) depends on the laboratory test type. It is a natural choice to judge the quality of the optimization on the basis of the comparison of the experimental data with the simulation results. For this purpose, a comparison of the experimentally measured and simulated relationships in the stress and strain spaces is useful. For the present work, the cost function for oedometric compression tests is evaluated in the axial stress–axial strain (\({\tilde{\sigma }}_1-{\tilde{\varepsilon }}_1\)) plane, whereas for the drained monotonic triaxial tests, both the axial strain– deviatoric stress (\({\tilde{\varepsilon }}_1-{\tilde{q}}\)) and the axial strain–volumetric strain (\({\tilde{\varepsilon }}_1-{\tilde{\varepsilon }}_V\)) planes are used:

$$\begin{aligned}&\epsilon ^{it}_{oc}({\tilde{\varepsilon }}_1,{\tilde{\sigma }}_1) ~~~ ; ~~~ \epsilon ^{it}_{dmt}({\tilde{q}},{\tilde{\varepsilon }}_1,{\tilde{\varepsilon }}_V)\nonumber \\&\qquad \qquad \qquad = w_{dmt}^{\varepsilon q} \epsilon ^{it,\varepsilon q}_{dmt}({\tilde{\varepsilon }}_1,{\tilde{q}}) + w_{dmt}^{\varepsilon \varepsilon }\epsilon ^{it, \varepsilon \varepsilon }_{dmt}({\tilde{\varepsilon }}_1,{\tilde{\varepsilon }}_V). \end{aligned}$$
(10)

Therein, \(w_{dmt}^{\varepsilon q}\) and \(w_{dmt}^{\varepsilon \varepsilon }\) are weights controlling whether the focus during calibration should be on achieving the best possible reproducibility of the response in the \(\varepsilon _1-q\) plane, \(\varepsilon _1-\varepsilon _V\) plane, or a combination of both. Note that in Equation (10) scaled stress and strain measures are used instead of the actual values. This is necessary to ensure that the variables used for the error calculation—despite their different value ranges and units (\(\varepsilon \in [0-0.3]\) and \(q \in [0-500]\) kPa)—have a comparable influence on the optimization. Therefore, both the experimental (exp) and numerical (sim) data are made dimensionless using the following relations:

$$\begin{aligned} {\tilde{\varepsilon }}_1^{\sqcup } =&\frac{\varepsilon _1^{\sqcup }}{\max (\max (\varepsilon _1^{exp}),\max (\varepsilon _1^{sim}))} ~~~\text {with:}~~~\sqcup =\{exp,sim\} \end{aligned}$$
(11)
$$\begin{aligned} {\tilde{\varepsilon }}_V^{\sqcup } =&\frac{\varepsilon _V^{\sqcup }}{\max (\max (|\varepsilon _V^{exp}|),\max (|\varepsilon _V^{sim}|))} \end{aligned}$$
(12)
$$\begin{aligned} {\tilde{q}}^{\sqcup } =&\frac{q^{\sqcup }}{\max (\max (q^{exp}),\max (q^{sim}))} \end{aligned}$$
(13)

This normalization transfers the experimental and numerical results into representations which have their origin in (0, 0) and range to (1, 1) in the case of the \({\tilde{\sigma }}_1-{\tilde{\varepsilon }}_1\) and the \({\tilde{\varepsilon }}_1-{\tilde{q}}\) plane and up to \((-1,1)\) in the case of the \({\tilde{\varepsilon }}_1-{\tilde{\varepsilon }}_V\) plane.

The cost functions \(\epsilon ^{it}_{iT}\) driving the optimization of the parameters are thus built by accounting for the discrepancy between the numerical predictions and the measured (experimental) data. Traditionally, this discrepancy is often quantified using a sum-of-square-based cost function. However, in the present case this is not always possible. One problem is a potentially different number of data points on the experimental curve compared to the numerical curve. Admittedly, this circumstance could be remedied by linear interpolation—provided there are enough data points to keep the introduced error low. However, in many cases there is no unique relationship between stress and strain—the material behavior is path dependent. An example is the oedometric compression test with loading, unloading and reloading. Thus, the use of alternative objective functions is advised to measure the similarity between the experimental data and the simulation results. In a recently published study [36], the Fréchet Distance was used for this purpose. [20] compared the performance of different similarity measures for the automatic parameter calibration of a kinematic hardening material model for steel. Up to now, the influence of the similarity measure applied to evaluate the cost function on the calibration result of constitutive soil models was not reported to the authors’ best knowledge. For the present work, the same similarity measurements (and implementations) as in [20] were used:

  • Dynamic time warping (DTW)

    DTW [42, 52] is a well-established method to account for temporal variations in the comparison of related time series. First applied to speech processing, the application of DTW ranges from data mining over signal processing to issues of different engineering disciplines, see e.g., [1] and the therein included references.

  • Fréchet Distance (FD)

    The Fréchet Distance [10, 11] is a measure of the similarity between the curves taking into account the location and ordering of the points along the curves.

  • Area Between Two Curves (ABTC)

    The Area Between Two Curves is the integral of the absolute value of their difference (evaluated as the sum of piece-wise integrals). The resulting integral corresponds to the amount of mismatch between the two curves. The approach proposed and implemented in [20] is used in this work.

  • Curve Length Measure (CLM)

    The Curve Length Measure [2, 6] compares a point \(p^A\) on curve \(c^A\) (experimental data) to its corresponding curve length point \(p^B\) on curve \(c^B\) (simulation result). \(p^B\) is evaluated on \(c^B\) at the equivalent curve length distance of \(p^A\) on \(c^A\). The error is then calculated based on the distance between \(p^A\) and \(p^B\).

An illustrative graphical description of the above similarity measurements is given in [20], to which reference is hereby made. In addition, a similarity measure based on the sum-of-square method proposed by [28] and also used in [25, 60] was tested which takes the general form:

$$\begin{aligned} \epsilon = \sqrt{ \frac{1}{N} \sum _{i}^{N} \frac{U^{sim}_i - U^{exp}_i}{U^{exp}_i} }. \end{aligned}$$
(14)

Therein, N is the number of values, \(U^{exp}_i\) is the value of measurement point i and \(U^{sim}_i\) is the value of the simulation at point i. Contrary to the other similarity measures, no scaling according to Eqs. (11-13) is required for this method. However, it has to be ensured that the arrays \({\varvec{U}}^{exp}\) and \({\varvec{U}}^{sim}\) have an identical number of data points. In the following, this method will be referred to as the Modified Least Square method (MLS).

In general, it is nearly impossible for numerical models to match experimental data exactly. It is not likely that a constitutive material model matches the real (measured) material behavior perfectly. This applies in particular to the very complex behavior of soils, which is highly nonlinear, anelastic and path dependent. Even if a “perfect” constitutive model existed, measurement errors and scattering of experimental results (e.g., resulting from the preparation of soil samples by different technicians) would prevent a perfect fit. Note that it is therefore very unlikely that the cost function will take values smaller than \(\epsilon < 10^{-2}\), especially if the parameter calibration is carried out on more than a single test. A good fit must therefore be judged by both a stagnant decrease in the error function and a visual inspection of the simulation results with the experimental data.

At this point it becomes obvious that due to the required specification of weighting factors and the formulation of error measures an absolute objectivity of the AC cannot be achieved and is subject to human influences. In the case of the AC, this corresponds to the influence of the engineer who developed the AC but also to that of the user of the AC. This is shown in Sect. 5 for the choice of similarity measure as well as the choice of weighting factors \(w_{dmt}^{\varepsilon q}\) and \(w_{dmt}^{\varepsilon \varepsilon }\). The different combinations of weighting factors considered in this work are summarized in Table 1.

Table 1 Combinations of weighting factors for judging the constitutive models performance in drained monotonic triaxial tests

2.3 Optimization algorithm

When choosing an optimization algorithm, it should be noted that the objective function of constitutive soil models as represented by the hypoplastic model is a nonlinear, complex multi-peak and non-differentiable function. The reason for this is the high sensitivity of the hypoplastic material model to changes in the parameters. In addition, some parameters of the hypoplastic model are interdependent (see also Sect. 6). For the application of traditional direct search algorithms and optimization algorithms, depending on the gradient information, this will lead to local optimization and premature convergence [41]. In such cases, the use of so-called direct or derivative-free algorithms is recommended, since they do not use derivatives or finite differences. Popular representatives of these methods are the Nelder–Mead method [37], Differential Evolution algorithms [49, 58], Genetic Algorithms or Particle Swarm Optimization algorithms [26, 44], among others. Such algorithms make few or no assumptions about the underlying optimization problem and can quickly explore very large design spaces (ranges of parameters). Compared to algorithms depending on gradient information, stochastic optimization algorithms require generally a large amount of evaluations of the objective function.

2.3.1 Differential evolution

Differential evolution (DE), proposed by Storn and Price [49], is a population-based metaheuristicFootnote 1 search algorithm that optimizes a problem by iteratively improving a candidate solution \({\varvec{x}}^j\) (where j is the candidate number) based on an evolutionary process. In this work, the DE implementation of [53] is used. That implementation is based on the method proposed in [49] and later extended by [58]. In the DE, the solution of the problem is described by a vector \({\varvec{x}}=[x_1, x_2, ..., x_{nD}]\) with the dimension of the search space nD. In the present work, nD is equal to the number of parameters of the constitutive model to be calibrated. The solution space is investigated by multiple candidate solutions \({\varvec{x}}^j\) simultaneously. The set of solutions is called population \({\varvec{p}}=[{\varvec{x}}^1, {\varvec{x}}^2, ..., {\varvec{x}}^{nC}]\), with nC individuals (candidate solutions). For the present work, the overall population size is \(nC = \max \{15\cdot nD, 2^p\}= 124\) (the next power of 2 after \(15\cdot nD\), with \(nD=7\)). For the initialization of \({\varvec{p}}_0\), each parameter varies within a user provided range. These ranges remain fixed for all candidates during the whole optimization process. Details on the ranges are provided in Sect. 2.3.3. The initialization of the candidate solution \({\varvec{p}}_0\) is based on Sobol sequences, which is known to cover the solution space evenly [27]. The quality of the solution is determined based on the cost function as described in Sect. 2.2.

At each iteration I (also often referred to as evolutionary step or generation), DE uses a mutation operator for producing a so-called donor vector \({\varvec{v}}^j=[v^j_1, v^j_2, ..., v^j_{nD}]\) for each individual \({\varvec{x}}^j\) in the current population \({\varvec{p}}\). For each target vector \({\varvec{x}}^j\) at iteration I, the associated mutant vector \({\varvec{v}}^j\) can be produced using a specific mutation scheme. Various mutation strategies exist in the literature for this, see for example [15] for a comparison of the most widely used strategies. In this work, the "best/1/bin" strategy is used:

$$\begin{aligned} {\varvec{v}}^j = {\varvec{x}}_{\text {best}} + F \Big ( {\varvec{x}}_{\text {rand}(1)} - {\varvec{x}}_{\text {rand}(2)} \Big ), \end{aligned}$$
(15)

where \({\varvec{x}}_{\text {best}}\) is the currently best vector in the population \({\varvec{p}}\), \({\varvec{x}}_{\text {rand}(1)}\) and \({\varvec{x}}_{\text {rand}(2)}\) (with \({\varvec{x}}_{\text {rand}(1)} \ne {\varvec{x}}_{\text {rand}(2)}\)) are randomly chosen members of \({\varvec{p}}\). Their difference is used to mutate \({\varvec{x}}_{\text {best}}\). F is the differential weight, a positive control parameter for scaling the difference vector. In this work the differential weight randomly varies after each generation in the range of \(0.5 \le F \le 1\).

2.3.2 Particle swarm optimization

The Particle Swarm Optimization (PSO), first introduced by Kennedy, Eberhart & Shi [26, 44], is a nature inspired heuristic optimization method. In this work, the PSO implementation of [26, 44] is used. In the PSO a candidate solution \({\varvec{x}}=[x_1, x_2, ..., x_{nD}]^T\) (with the dimension of the search space nD, cf. Section 2.3.1) is iteratively adjusted using a velocity update method. Similar to the DE, the PSO considers a set (a swarm) of solution candidates named particles \({\varvec{s}}=[{\varvec{x}}^1, {\varvec{x}}^2, ..., {\varvec{x}}^{nC}]^T\), with nC individuals (candidate solutions). The position update at time \(t+1\) of each particle in the swarm is defined as follows:

$$\begin{aligned} {\varvec{x}}_i(t+1) = {\varvec{x}}_i(t) + {\varvec{v}}_i(t+1), \end{aligned}$$
(16)

where \({\varvec{x}}_i(t)\) is the position of the considered particle at time t (the last iteration) and \({\varvec{v}}_i(t+1)\) is the velocity of the particle at time \(t+1\). To calculate the velocity, the following rule is applied:

$$\begin{aligned} {\varvec{v}}_i(t+1) =\, & w {\varvec{v}}_i(t) + c_1 r_1 \Big ( {\varvec{x}}_{\text {p-best}} - {\varvec{x}}_i(t)\Big ) \nonumber \\ &+ c_2 r_2 \Big ( {\varvec{x}}_{\text {g-best}} - {\varvec{x}}_i(t)\Big ) \end{aligned}$$
(17)

Therein, \({\varvec{x}}_{\text {g-best}}\) is the best candidate solution ever explored by the swarm \({\varvec{s}}\) and \({\varvec{x}}_{\text {p-best}}\) is the personal best position of the considered particle so far. \(c_1\) and \(c_2\) are often referred to as “cognitive weights.” They control if the particle follows the overall swarm’s best solution or its personal one. w is the “inertia weight” and controls how much the particles velocity influences the update. \(r_1\) and \(r_2\) are random numbers between 0 and 1. In the present work, \(c_1 = 0.5\), \(c_2 = 0.3\) and \(w=0.9\) were used.

2.3.3 Bounds and constraints

Whenever parameter optimization is performed for advanced material models, it is necessary to limit the search space of possible parameters. If this is omitted, it may result in nonphysical values for some parameters (e.g., negative limiting void ratios in hypoplasticity) or in an unstable numerical solution. On the other hand, the optimization algorithm can only identify the best fit parameters if they are within the search space. Assigning an appropriate search space is a non-trivial task and might require some iterations. However, identifying a too narrow search space is straight forward, since a too narrow search space leads to clustering of the population at the boundaries.

For the selection of the parameter limits, the parameters are divided into two groups. The first group includes the parameters \(\varphi _c\), \(e_{c0}\) and \(e_{d0}\). For these three parameters a suitable starting point for the optimization can be estimated with comparatively high confidence from simple laboratory tests (\(\varphi _c\) as the angle of repose from loosely pluviated cones of dry sand, \(e_{c0}\) and \(e_{d0}\) from index tests on maximum and minimum void ratio). For these parameters the bounds are set at \(+/- 10\) % of the corresponding values determined in the laboratory (\(\varphi _c^{lab}\), \(e_{c0}^{lab}\) and \(e_{d0}^{lab}\)). The remaining parameters (\(e_{i0}\), \(h_s\), n, \(\alpha \) and \(\beta \)) belong to the second group. Their determination from laboratory tests or correlations is associated with greater uncertainties. Their limits are chosen freely, but in such a way that clustering does not occur. It is imperative to check this. Of course—if desired—the parameters of group 1 can also be treated like parameters of group 2. The bounds of the search space used in the present work are summarized in Table 2.

Table 2 Bounds of the search space for the optimization. \(\varphi _c^{lab}\), \(e_{c0}^{lab}\) and \(e_{d0}^{lab}\) are evaluated from standard laboratory tests as explained in the text

3 Experimental data and reference parameter sets

The performance of the calibration software is investigated by calibrating the parameters of the hypoplastic material model for two different sands. The first sand is the so-called Karlsruhe Sand (KSa): a medium coarse sand which has already been the basis for numerous model tests [54] and simulations [8, 33, 46]. The laboratory tests of this sand comprise oedometric compression tests and drained monotonic triaxial tests and were carried out at the Karlsruhe Institute of Technology. The second sand is the so-called UWA Silica Sand (SSa): a fine-to-medium-grained sub-angular sand used for centrifuge tests at the University of Western Australia. The drained monotonic triaxial tests were carried out at the Technical University of Hamburg and are documented in [7]. The oedometric compression tests were carried out at the Ruhr-University Bochum. The index parameters of both sands are given in Table 3.

Table 3 Index parameters of the Karlsruhe Sand and UWA Silica Sand used in the experiments

A summary of the drained monotonic triaxial tests is given in Tables 4 and 5 for KSa and SSa, respectively.

Table 4 Drained monotonic triaxial tests on Karlsruhe Sand (KSa)
Table 5 Drained monotonic triaxial tests on UWA Silica Sand (SSa)

For both sands, hypoplastic parameters were determined by hand by the authors in previous projects as outlined in the following:

  • In a first step, the parameters were calibrated by hand following the procedure proposed by [18]. The eight parameters of the hypoplastic model have been calibrated based on loosely pluviated cones of dry sand (\(\varphi _c\)), index tests on maximum and minimum void ratio (\(e_{d0}\), \(e_{c0}\)), oedometric compression tests (\(h_s\), n, \(\beta \)) and drained monotonic triaxial tests (\(\alpha \)).

  • The hypoplastic parameters have been further optimized by recalculating the laboratory tests using numgeo. The parameters have been iteratively adjusted in order to receive a better fit of the material behavior under monotonic loading conditions.

In addition, a second parameter set was determined for each sand using the AC tool ExCalibre [23, 24]. ExCalibre uses a combination of empirical relations and automatic calibration to determine the hypoplastic parameters. The parameters \(h_s\) and n are calculated from oedometric compression tests as described in [23]. The void ratio \(e_{c0}\) is assigned to the initial value of the void ratio of the oedometric compression test performed on the loosest available sample [23]. In doing so, it is assumed that the sample of the oedometric compression test is in its loosest possible state. The void ratios \(e_{d0}\) and \(e_{i0}\) are calculated using the empirical relations \(e_{d0}=0.5\cdot e_{c0}\) and \(e_{i0}=1.2\cdot e_{c0}\) [23]. The parameters \(\alpha \) and \(\beta \) are calibrated by using back-calculations of drained monotonic triaxial tests. \(\alpha \) and \(\beta \) are iteratively adjusted until a good fit with the experimental results is obtained. To the best of the authors’ knowledge, it is not publicly documented how the discrepancy between experiment and simulation is determined or how the iterative adjustment is performed. A comparison of the simulation results with the parameters determined by hand and by means of ExCalibre with the experimental data is given for the Karlsruhe Sand in Fig. 2 and for the UWA Silica Sand in Fig. 3. The corresponding hypoplastic parameters are summarized in Table 6. The comparison of the parameters shows two peculiarities: first, the void ratios \(e_{d0}\) determined by ExCalibre are significantly lower than those determined in the laboratory and used for calibration by hand. Secondly, an unusually high value for \(h_s\) is determined by ExCalibre for the Karlsruhe Sand.

Table 6 Parameters of the hypoplastic model for Karlsruhe Sand (KSa) and UWA Silica Sand (SSa)
Fig. 2
figure 2

Comparison of experiments (black) and simulation results (colored) for oedometric compression tests and drained monotonic triaxial tests on samples of Karlsruhe Sand with different initial densities and different initial mean effective stresses. The hypoplastic parameters were calibrated by hand in previous work [33] and by ExCalibre [23, 24]

For both sands, an acceptable agreement between the experimental data and the results of the numerical simulations can be observed for the parameters calibrated by hand. The parameters \(h_s\), n and \(\beta \) of the hypoplastic model have been calibrated to reproduce the first loading curves of the oedometric compression tests on loose samples, thus the good prediction at this phase of the test. For the UWA Silica Sand, noticeable differences for the oedometric compression tests on dense samples are observed. In comparison, the simulations with the parameters obtained from ExCalibre show a worse agreement with the experimental data. This applies in particular to the test on the loose sample of Karlsruhe Sand. For the UWA Silica Sand the parameters calibrated with the help of ExCalibre rendered better results compared to the ones calibrated by hand.

Fig. 3
figure 3

Comparison of experiments (black) and simulation results (colored) for oedometric compression tests and drained monotonic triaxial tests on samples of UWA Silica Sand with different initial densities and different initial mean effective stresses. The hypoplastic parameters were calibrated by hand and by ExCalibre [23, 24]

From the simulations of the drained monotonic triaxial tests, one may note that the peak strength is slightly overestimated for both sands. This is more pronounced in case of the UWA Silica Sand, especially for initially loose samples at low initial stress (\(p_0=\{50~\text{ kPa }, 100~\text{ kPa }\}\)). Except from the test at high initial stress (\(p_0=400\) kPa), the residual shear strength is reproduced reasonably well in the simulations of the Karlsruhe Sand. The simulated volumetric strain behavior for initially loose samples can be judged as satisfactory. However, significant differences are noted in the \(\varepsilon _v-\varepsilon _1\) plots for the medium-dense and dense samples. The inability of the hypoplastic model in reproducing the volumetric strain behavior of dense samples is clearly visibly. This is because the parameter \(\alpha \) controlling these curves was previously calibrated to reproduce the peak stress of the test (a common approach to calibrate the hypoplastic constitutive model). For Karlsruhe Sand, the parameters calibrated using ExCalibre overestimate the initial stiffness and show slightly worse agreement with the experimental results compared to the parameters calibrated by hand. For the UWA Silica Sand, the simulations using the parameters determined with the help of ExCalibre show a worse match with the experimental data than the ones using the parameters calibrated by hand. In this case, the initial stiffness in the \(q-\varepsilon _1\) plot is underestimated and the \(\varepsilon _v-\varepsilon _1\) behavior noticeably underestimates dilatancy.

In summary, the simulation results obtained with the help of the software ExCalibre are of similar quality to those using parameters calibrated by hand. The parameters determined in this way provide a reference for assessing the quality of the parameters obtained by AC. In addition, they also reveal some weaknesses, and it will be interesting to see whether the AC can reduce or even avoid them.

4 Influence of the optimization method

So far, there is little experience regarding the choice of optimization algorithm for determining the parameters of advanced constitutive soil models. In the following, the results of the parameter calibration of the hypoplastic material model using both, DE and PSO are compared with each other. For this purpose, 200 optimization runs were performed with each optimization method. A "run" denotes a complete calibration using AC, which again includes several iterations. The parameters were determined for Karlsruhe Sand on the basis of all available laboratory test data (2 oedometric compression tests and 7 drained monotonic triaxial tests). The discrepancy between the experimental data and the simulation results are quantified using the Fréchet Distance (see Sect. 2.2). The cost function considers both the discrepancies for the oedometric compression tests as well as for the drained monotonic triaxial tests. Thus, the calibration considers both types of experiments simultaneously. In general, the quality of the parameters determined in a calibration run by optimization is measured by the final value of the cost function \(\epsilon \) (see Sect. 2.2). The smaller this value, the better the agreement between experimental data and simulation results. The final values of the cost function of all runs of the optimization using DE and PSO are compared in Fig. 4.

Fig. 4
figure 4

Final values of the cost function \(\epsilon \) for 200 repeated runs of the automatic parameter calibration for Karlsruhe Sand using DE and PSO

The comparison clearly shows that the results of the optimization obtained by DE, by means of final cost function value, have a significantly lower scatter than the optimization using PSO. It is particularly noteworthy that the worst optimization result obtained in the 200 runs with the DE (\(\epsilon _{DE}^{worst}=0.0884\)) is only slightly worse than the best result obtained with PSO (\(\epsilon _{PSO}^{best}=0.0864\)). The worst run with PSO (\(\epsilon _{PSO}^{worst}=0.1349\)), on the other hand, leads to an almost 50 % larger value for \(\epsilon \) than the worst run with DE. In terms of the achievable quality of the optimization and the reproducibility of the solution (both measured by means of \(\epsilon \)), the DE is preferable to the PSO. The extent to which the scattering of the optimization results influences the actual forecast quality is illustrated in Figs. 5 and 6. Therein, the simulation results obtained with the “best fit” and the “worst fit” parameter set from 200 calibration runs are compared. The corresponding parameters are summarized in Table 7. The reference sets correspond to the one calibrated “by hand” and by means of ExCalibre, see Sect. 3.

Table 7 “Best fit” and “worst fit” parameter sets of the hypoplastic model for Karlsruhe Sand obtained by optimization with DE and PSO
Fig. 5
figure 5

Comparison of the simulation results for the parameter sets for Karlsruhe Sand with the “best fit” (solid) and the “worst fit” (dashed) from 200 repeated runs of the AC using DE

Fig. 6
figure 6

Comparison of the simulation results for the parameter sets for Karlsruhe Sand with the “best fit” (solid) and the “worst fit” (dashed) from 200 repeated runs of the AC using PSO

The comparison given in Figs. 5 and 6 shows that the simulation results obtained by AC are in satisfactory agreement with the experimental data. Furthermore, there are only small deviations in the simulation results based on the “best fit” parameter sets of the DE and the PSO. The largest deviations are found for the recalculation of the triaxial test on the very dense sample (\(D_{r0}=0.97, p_0=100\) kPa). Here, the simulations strongly overestimate the maximum deviatoric stress q, but show a good agreement in terms of volumetric strain \(\varepsilon _v\). Compared to the manually determined parameters, which serve here as a reference parameter set, the automatically calibrated parameters can be attested a similarly good agreement for the oedometric compression tests. In contrast, the automatically calibrated parameters render better simulation results for the oedometric compression tests than the parameters calibrated by means of ExCalibre. The largest deviations between both reference parameter sets and the parameters obtained by AC can be seen in the simulation results of the drained monotonic triaxial tests. While the reference parameter sets show a good prediction of the peak deviatoric stress for all relative densities, in the case of the AC there is the above-mentioned overestimation of the peak deviatoric stress for the very dense sample. For all other samples, the maximum deviatoric stress is well predicted. With respect to the residual shear strength, there is a slightly better agreement for the parameters obtained by AC. The largest difference is observed for the predicted volumetric strain \(\varepsilon _v\). Here, the simulations with the automatically calibrated parameters achieve a significantly better agreement with the values measured in laboratory tests. Overall, the parameter sets obtained by AC can be attested a better agreement with the laboratory tests than both reference parameter sets.

For the parameters calibrated by DE, a very small deviation for the “worst fit” (\(\epsilon =0.0884\)) and the “best fit” (\(\epsilon =0.0854\)) can be observed. This indicates good reproducibility and is in good agreement with the low scatter of the cost function in Fig. 4. From a user perspective, both parameter sets would be equally suitable, as measured by the comparisons shown in Fig. 5. A different picture emerges for the parameter sets optimized by means of PSO. Already in Fig. 4 it became clear that results obtained using PSO show a significantly larger scatter than the ones obtained using DE. This comparatively large deviation in the error function is also reflected in the reproducibility of the simulation results, as can be seen in Fig. 6. In general, both DE and PSO are capable of determining parameters of comparable quality. Also to be emphasized at this point is that only for \(h_s\), n, \(e_{c0}\) and \(\alpha \) (for the “best fit”) clearly different parameters of PSO and DE are determined. The other parameters are comparable. However, this good agreement in terms of simulation results only refers to the best of 200 calibration runs. The fact that not every calibration run achieves the same final value of the error function \(\epsilon \) has already been shown in Fig. 4. Compared to the DE, the optimization by means of PSO gets stuck in local minima more often. This could possibly be improved by an adjustment of the optimization constants (see Sect. 2.3.2). However, there are no general rules according to which these constants have to be chosen. The only possibility would be to search for better optimization constants by trial and error. However, this is very time-consuming and contradicts the actual goal of AC: simplification of the calibration process. Due to the significantly lower scatter, optimization by means of DE is preferred for further comparisons.

5 Influence of the objective function

The objective function, which is to be minimized in the course of the optimization, consists of two components: the similarity measure used to evaluate the discrepancy between the experimental data and the simulation results as well as the weighting of the influence of different laboratory experiments. To the authors’ best knowledge, neither the influence of the similarity measure nor the influence of the applied weighting factors were yet investigated in the light of AC of soil models.

In what follows, the performance of five widely used approaches to quantify the discrepancy between experimental data and simulation results for their use in automatic parameter calibration will be shown. These approaches are namely the Area Between Two Curves (ABTC), the Dynamic time warping (DTW), Fréchet Distance (FD), Curve Length Measure (CLM) and the Modified Least Square (MLS), see Sect. 2.2. Since the similarity measures are defined differently, a comparison of their results in terms of a comparison of a numerical value is not possible. Instead, their applicability must be based on a personal comparison of experiment and simulation. This seems like a step backwards, since it is this subjective comparison carried out by an engineer that is supposed to become superfluous by the development of a calibration software. However, this path is inevitable. Taking into account the study of the influence of the optimization algorithm presented in Sect. 4, the calibrations required for this purpose were performed using DE. The parameters were calibrated for Karlsruhe Sand and the results of this study are shown in Fig. 7 for the variation of weighting factors and in Fig. 8 for the variation of similarity measures.

Fig. 7
figure 7

Influence of the weighting factors on the results of the AC. Simulation results (blue) vs. experimental data (black) for seven monotonic drained triaxial tests (left and middle) and two oedometric compression tests (right) on Karlsruhe Sand using DE with the Fréchet Distance

As can be seen from Fig. 7 the overestimation of the maximum deviatoric stress, especially visible for the very dense sample, can be reduced by modifying the weights in favor of a better fit in the \(q-\varepsilon _1\) plane (\(w_{dmt}^{\varepsilon q}=2/3, ~w_{dmt}^{\varepsilon \varepsilon }=1/3\) and max(\(w_{dmt}^{\varepsilon q},w_{dmt}^{\varepsilon \varepsilon }\))). However, this leads to an underestimation of residual deviatoric stress and a slightly worse fit in terms of the behavior in the \(\varepsilon _v-\varepsilon _1\) plane, especially in case of the max(\(w_{dmt}^{\varepsilon q},w_{dmt}^{\varepsilon \varepsilon }\))-weighting. Not surprisingly, the best overall agreement in terms of the behavior in the \(q-\varepsilon _1\) plane is observed for the calibration that considers only the discrepancy in the \(q-\varepsilon _1\) plane to evaluate the fitness in the triaxial tests (\(w_{dmt}^{\varepsilon q}=1, ~w_{dmt}^{\varepsilon \varepsilon }=0\)). However, this comes at the price of a noticeable discrepancy between simulation and experimental results in the \(\varepsilon _v-\varepsilon _1\) plane. It should be noted that this is due to the inability of the hypoplastic constitutive model to capture both the behavior in the \(q-\varepsilon _1\) and in the \(\varepsilon _v-\varepsilon _1\) plane equally well with one set of parameters, and is not related to the algorithm. In the authors’ experience, the default equal weight settings (\(w_{dmt}^{\varepsilon q}=1/2, ~w_{dmt}^{\varepsilon \varepsilon }=1/2\)) work very well for other constitutive models, such as Sanisand [9] for instance. For the forthcoming simulations, setting \(w_{dmt}^{\varepsilon q}=2/3, ~w_{dmt}^{\varepsilon \varepsilon }=1/3\) is considered to be a good compromise between reducing the overestimation of the peak deviatoric stress and still providing good agreement in the \(\varepsilon _v-\varepsilon _1\) plane.

It should first be noted that most similarity measures are capable of producing acceptable parameter sets, see Fig. 8. Probably the worst results, measured by the comparison of the simulation results with the experimental data, are noted for the determination of the error by means of ABTC and CL. In both cases, a very significant overestimation of the peak deviatoric stress is observed for initially dense soil samples. In terms of the \(\varepsilon _v-\varepsilon _1\) plane on the other hand, both simulations show the best agreement with the experimental data. However, the deviations in the \(q-\varepsilon _1\) plane in the triaxial tests outweigh this. Thus, further use of the ABTC and DTW is omitted. The best agreement in the \(q-\varepsilon _1\) plane is observed for the calibration using the MLS similarity measure. Both peak and residual shear strength are well-matched in the simulations. However, these simulations show a significant underestimation of the dilatancy in the \(\varepsilon _v-\varepsilon _1\) plot. A compromise between good agreement with the experiments in the \(q-\varepsilon _1\) and \(\varepsilon _v-\varepsilon _1\) plots is shown by the calibrations with the CLM and FD similarity measures, with a slight preference for FD. In terms of agreement with the oedometric compression tests, on the other hand, the worst agreement is obtained with the MLS approach since the simulation results show a too soft material response at higher effective stress (\(\sigma _v > 100\) kPa). The simulation results obtained with the other similarity measures show no noticeable differences in the oedometric compression tests and reveal very good agreement with the experiments.

Fig. 8
figure 8

Influence of the similarity measurement on the results of the AC with “2/3, 1/3”-weighting. Simulation results (blue) vs. experimental data (black) for seven monotonic drained triaxial tests (left and middle) and two oedometric compression tests (right) on Karlsruhe Sand

From the authors’ perspective, the use of the similarity measure Fréchet Distance (FD) with \(w_{dmt}^{\varepsilon q}=2/3, ~w_{dmt}^{\varepsilon \varepsilon }=1/3\)-weighting (from now on referred to as “2/3, 1/3”-weighting) for the calibration of the hypoplastic constitutive model leads to the overall best agreement between simulation and experiment on average. Regarding the performance of the hypoplastic model in the oedometric compression tests, the AC was—regardless of the chosen similarity measure—able to either improve or maintain the simulation quality compared to the reference parameter sets calibrated “by hand” and using ExCalibre (compare Figs. 2, 7 and 8). To check whether these settings also give good results for another sand, the parameters of the UWA Silica Sand are calibrated (using FD and “2/3, 1/3”-weighting). The simulation results obtained with this calibration are compared with the experimental data in Fig. 9. By calibrating the parameters by means of AC, significant improvement of the agreement between simulations and laboratory tests can be achieved for the UWA Silica Sand compared to the results obtained with the reference parameters. This is especially true for the triaxial tests and applies to both the behavior in the \(q-\varepsilon _1\) plane but also (and especially) in the \(\varepsilon _v-\varepsilon _1\) plane. The overestimation of the peak deviatoric stress for dense samples can also be observed in these simulations, although not as pronounced as for the Karlsruhe Sand. As already stated, this is due to the inability of the hypoplastic constitutive model in representing both the behavior in the \(q-\varepsilon _1\) and the \(\varepsilon _v-\varepsilon _1\) planes equally well. For the oedometric compression tests, a slight or a significant improvement is observed compared to the simulations using parameters calibrated by means of ExCalibre and “by hand,” respectively. The resulting parameters for the optimization using the Fréchet Distance are summarized in Table 8.

Table 8 Parameters of the Hypoplastic model for Karlsruhe Sand and UWA Silica Sand obtained from DE with Fréchet Distance and “2/3, 1/3”-weighting
Fig. 9
figure 9

Comparison of experiments (black) and simulation results (blue) for oedometric compression tests and drained monotonic triaxial tests on samples of UWA Silica Sand with different initial densities and different initial mean effective stresses. The hypoplastic parameters were calibrated using DE with FD and “2/3, 1/3”-weighting

6 Range of parameter values

The final parameters for Karlsruhe Sand obtained from 500 AC runs using DE in combination with the Fréchet Distance and “2/3, 1/3”-weighting are shown as a scatter plot in Fig. 10. Looking at Fig. 10, the large scatter of the values determined for \(h_s\) is striking. If the standard deviation is determined for all parameters, these take the following values in relation to the mean value: the largest standard deviation of approximately 19 % is determined for the parameter \(h_s\), followed by approx. 4.2 for \(e_{i0}\), 2.8 % for n as well as 2.2 % and 1.8 % for \(\alpha \) and \(\beta \), respectively. The standard deviation for the parameters \(\varphi _c\), \(e_{c0}\) and \(e_{d0}\) is less than 0.5 %. This non-negligible variance can be associated with the parameter uncertainty and the very wide limits chosen for the parameters \(h_s\), n, \(\alpha \) and \(\beta \). In particular for \(h_s\) and n lower standard deviations could be achieved by choosing narrower bounds, see for example [36]. However, this would have the consequence that the bounds for different sands would have to be examined anew each time. Considering the results presented in the previous sections, this is not necessary, since the parameters determined by means of DE show a good agreement with the experimental results (see Fig. 5) as well as an extremely low scatter with respect to the simulation results (despite strongly differing parameters). This is due to the lack of physical significance of the parameters. At this point, the name granular hardness for the parameter \(h_s\) is certainly misleading. \(h_s\) is a fitting parameter which (together with n) describes (among other relations) the development of the critical void ratio with respect to mean effective stress.

Fig. 10
figure 10

Collection of scatter plots showing the mutual distribution of different parameter sets considering the results from 500 runs using DE with the Fréchet Distance and “2/3, 1/3”-weighting for Karlsruhe Sand

Another striking feature in Fig. 10 is the apparently strong correlation of various parameters. These are particularly noticeable for the parameter pairs \(n-h_s\), \(\beta -h_s\), \(e_{c0}-n\), and \(e_{c0}-\beta \). The correlation \(h_s-n\) is certainly known to exist [19] and is in good agreement with the linear correlation presented in [36]. However, the comparison with the results shown in [36] should be treated with caution, since therein much tighter bounds were chosen for the parameters \(h_s\) and n, and the parameters were determined for synthetic laboratory tests. However, the general statement of [36] that potential correlations exist between some parameters of the hypoplastic constitutive model can be confirmed.

7 Required test data

The calibrations presented in the previous sections were done based on the entire data set, i.e., taking into account all available experiments. Accordingly, the parameters determined in this way represent the soil behavior under these test conditions very well, see Figs. 8 or 9. However, such comprehensive laboratory data is not always available, especially when dealing with practical problems or at the beginning of new project phases. In these cases, the question of the (minimum) number of tests required for a (first) calibration of the parameters of a constitutive model often arises.

The AC offers a solution for this. This is exemplarily shown for the determination of the parameters for Karlsruhe Sand. For this purpose, four additional calibration runs were performed using DE with the Fréchet Distance and “2/3, 1/3”-weighting. Each run considered only a part of the experimental data. Which experiments were used for the calibration varied from run to run. The different combinations are shown in Table 9. The parameters determined on the basis of all experiments serve as the reference parameter set, see Sect. 4.

Table 9 Overview of the laboratory tests considered in the different calibration runs

The results of the different runs are given in the form of a comparison with the experimental data in Fig. 11. Blue are the simulation results which belong to a laboratory test considered in the calibration. Red, on the other hand, are those whose experimental counterpart was not used in the calibration. These correspond to “blind predictions.”

Fig. 11
figure 11

Comparison of experiments (black) and simulation results (colored) for the parameter sets based on different sets of laboratory tests as basis for the calibration. Blue: tests used for calibration (training data), Red: tests not used for calibration (test data)

The results shown in Fig. 11 show that considering only an oedometric compression test on an initially loose sample in the AC results in a parameter set that underestimates the stiffness of dense samples (see Data Set 3). Obviously, as can be seen from Data Set 4, it is not possible to completely dispense with oedometric compression tests for calibration. Furthermore, the result of the calibration with Data Set 3 seems to allow the conclusion that the influence of the mean effective stress can be reproduced by the hypoplastic model even if no triaxial tests on samples with different mean effective stresses are available. Of course, these observations are not universal. For different sands other conclusions could be drawn, which has to be investigated further. However, they serve as an illustration of the possibilities offered by AC. Aspects of calibration and the influence of individual parameters can be investigated in a comparatively simple way.

8 Conclusions

An automatic calibration software has been developed, which performs the calibration of constitutive model parameters by solving a regression problem. It is based on the minimization of a cost function that quantifies the discrepancy between experiment and simulation. For the hypoplastic model, this is a nonlinear and non-differentiable multi-peak function. Five different methods were used to quantify the discrepancy: Area Between Two Curves, Dynamic time warping, Partial Curve Mapping, Fréchet Distance and Curve Length Measure. Two heuristic optimization algorithms were used to solve this minimization problem: Differential Evolution (DE) and Particle Swarm Optimization (PSO). Parameter calibration was based on oedometric compression tests and drained monotonic triaxial tests. However, the consideration of other test types is also possible. The findings can be summarized as follows:

  • In principle, parameter sets obtained with both PSO and DE showed good agreement with the experimental data. However, the DE showed a much lower scatter and is therefore the recommended optimization algorithm.

  • The choice of the similarity measure to quantify the discrepancy between laboratory test and simulation was found to be anything but trivial. The same applies to an appropriate choice of factors for weighting different aspects of soil behavior (\(q-\varepsilon _1\) plane vs. \(\varepsilon _v-\varepsilon _1\) plane). In the authors’ opinion, the combination of the Fréchet Distance with a “2/3, 1/3”-weighting showed the best results and was thus chosen for the remaining comparisons. However, as already stated, this is influenced by the disadvantages of the hypoplastic constitutive model in representing both the behavior in the \(q-\varepsilon _1\) plane and the \(\varepsilon _v-\varepsilon _1\) plane equally well. Thus, the question of the most appropriate method to quantify the discrepancy between simulation and experiment cannot be answered conclusively. In particular with regard to the calibration of the “cyclic parameters” individual similarity measures could turn out to be unusable. The study of these issues is the subject of future work.

  • The scatter of the determined parameters was examined using 200 repeated calibration runs. It was shown that especially for the parameter \(h_s\) with a standard deviation of up to 19 % a large variance exists. However, this has only little influence on the quality of the simulations which are carried out with these parameters. Using DE, the difference between the simulations using different parameter sets obtained by AC is negligible. This shows that the actual value of individual parameters does not have a large influence on the prediction of the constitutive model for the element tests considered, as long as the totality of the parameters is taken into account in the calibration.

The automatic calibration is able to successfully calibrate parameters for advanced constitutive soil models. This was demonstrated for the hypoplastic model for sands. Automatic calibration reduces the entry barriers for the application of advanced constitutive models and the associated effort (of calibration) and is able to find a better parameter set compared to hand calibration.