Comparison of reliability-and design-based code calibrations

Some of the early probabilistic code calibrations minimized an error term defined as difference between the optimal or target design and the design resulting with the trial partial factors and the code format. In contrast to these design-based calibrations, in recent years, the error was defined in terms of a reliability deviation. Aiming at minimizing societal costs, reliability-based calibrations probably provide more accurate results than design-based code calibrations, however at the cost of a significantly lower computational efficiency. The long duration of reliability-based code calibrations impedes repeated code calibrations, which are needed in the development of a code format or for a more profound understanding of it. This paper compares the reliability-based with the design-based code calibration procedure theoretically and by using an example. The results showed that the design-based code calibration is efficient for the development of a code format. However, the differences between both calibration procedures regarding the calibrated partial factors are significant. As the reliability-based calibration has a more solid theoretical background, it is preferred for the final calibration of the code format.


Code calibration levels
The fact that structural codes are not constantly under discussion is the consequence of an appropriate calibration of them.The calibration defines the compromise between the maximization of safety and the minimization of construction costs while keeping the code format 4  simple to avoid unnecessary complexity and errors in its application.Four different complexity levels for code calibration targets and design verification can be distinguished, 5 following [25,1].Level 4 Risk-informed approach Level 3 General reliability-based approach Level 2 Simplified reliability-based approach Level 1 Semi-probabilistic approach Level 4 provides the balancing between safety and construction costs by minimizing the total expected costs of a building (e.g.[26]).Simplified, the total costs consist of the construction costs and the costs associated with a failure and its probability (i.e. the risk).With increasing safety level, the probability of failure of a structure and thus also the risk reduces roughly inversely proportional, while the conhttps://doi.org/10.1016/j.strusafe.2020.102005Received 12 February 2020; Received in revised form 28 July 2020; Accepted 3 August 2020 struction costs increase nearly linearly (Fig. 1).The probability of failure is assessed in a reliability analysis and strictly monotonically decreases with increasing safety.Thus the optimum level of safety, i.e.where the total costs are minimum, can also be expressed through the probability of failure respectively the reliability index .
Basically, any design could be optimized based on a level 4 cost optimization.However, this would be extremely cumbersome for practical application, as a probabilistic risk based analysis was required.Thus optimal reliability indexes were derived for certain consequence classes and relative costs of safety measures (e.g.[17]).These reliability indices are targets for design verification on levels 2 and 3, where the reliability index of the structure is required to be higher than or equal to the target reliability index.Level 2 reliability analyses are basically of historical interest, since all uncertainties must be represented by normal distributions and the limit state function must be a linear combination of them [25].Level 3 methods do not have these limitations.However, most methods used in a level 3 verification, e.g. the first order reliability method (FORM, [14]), provide approximations of the correct result when used with non-normal distributions.Even if level 3 methods are available in modern software, its use would still be too cumbersome for the daily practice of structural engineers.Thus modern codes use a semi-probabilistic code format (level 1), e.g."load and resistance factor design" (LRFD, [13]) or "partial safety factors design"6 [11].
As any design verification method on a certain level is a simplification compared to higher level methods, accordingly designed structures are generally not optimal with respect to higher level targets.Therefore, a code calibration of a lower-level code format is an optimization such that corresponding designs will deviate as little as possible from the target set on the higher level.For the general applicability of the code format, the calibration process aims at the homogenization of the safety level among a set of reference structures 78 i that are within the scope of the code.Thus a code format is usually calibrated by the minimization of an error term E through the variation of the calibratable factors.This paper focuses on semi-probabilistic code formats (level 1) such as the Eurocodes [8] and the partial factors j are to be calibrated (Eq.1): The error E is defined in terms of the level, against which the code format is calibrated.Semi-probabilistic code formats (level 1) are probably most commonly calibrated against a target reliability t (level 3).The error term E thereby can be defined as shown in Eq. (2), where ( ) i j denotes the reliability index reached with the given set of partial factors j for the reference structure i.For a calibration against a level 4 'target', i.e. minimal costs (including construction costs, risk of failure and benefit of the structure during the chosen time horizon, see e.g.[15]) per structure c i min , , the error term could be written as in Eq. ( 3).The additional divisor normalizes the error of each structure, such that there is no inadvertent weighting by the cost of a structure. (2) Conversely to reliability-based (Eq.2) or cost-based (Eq. 3) calibration, some semi-probabilistic codes were calibrated against antecedent codes, e.g. the Eurocode, as stated in [8] (C4.4).Thereby the error term was specified in terms of the design variable 9 d, a geometrical property, which for example is the section modulus for a beam in bending or the cross section for a (stocky) column in compression.Such a calibration is referred to as design-based calibration.Trying to assign a complexity level to such a calibration in aforementioned scheme, level 0 would probably be appropriate.As also in a design-based code calibration a perfect match of the target designs d i t , is impossible for the whole scope of the code, an optimization is needed.Analogous to Eq. ( 3), the error term E can be defined as in Eq. ( 4) and expresses the relative deviation of the actual designs to the target designs.In a calibration against antecedent codes, Please note that in all the error definitions above, deviations from the target are treated symmetrically, i.e. a deviation on the unsafe side (i.e. a negative deviation) is weighted the same as a deviation on the safe side (positive).Whether this is reasonable or whether negative deviations should be additionally penalized, can be discussed controversially and also depends on the error definition used.Appendix A in [1] provides a good overview of different error-definitions and their meaning.
An important difference between a design-based calibration towards antecedent codes (as mentioned above) and any of the other calibrations is the calculation of the error, respectively the comparison with the target.In level 3 and 4 calibrations, the comparison always requires the calculation of each structure's reliability, which is probably the most objective parameter that a comparison can rely on.In contrast, in a design-based calibration towards an antecedent code, this code's inherent reliability level and its scatter among structures is implicitly deemed to satisfy without assessing them explicitly.

Relative and absolute calibration targets
As proposed by [10], code calibration targets can be classified into absolute and relative.A calibration towards an antecedent code is a relative calibration, as the inherent reliability level of this code is deemed to satisfy without that the actual reliability is assessed.In the above example of the Eurocode calibration, the relative target was set directly on the design variable, however it could as well be expressed in terms of the reliability index.In such a relative, but reliability-based code calibration the mean reliability level of the antecedent code is assessed and could be argued that the term "partial factor" is overly general and not specifically related to safety, this paper uses the standardized term. 7In this paper a structure relates to the limit state function of the dominant failure mode. 8'structures' are sometimes also called 'systems'. 9d can be understood as design or dimension used as the relative target for the new code calibration.In contrast, an absolute reliability target would be derived from a cost optimization on level 4, as discussed before.In order that an absolute code calibration is consistent in itself, the reliability assessment must be able to return the unbiased, 'true' probability of failure of a structure, as this is assumed in the cost optimization. 10Any bias in the reliability model thus has a direct impact on the calibrated result.In contrast, with any kind of relative target, model biases will appear both in the derivation of the target as well as in the reliability analysis of the calibration and thus will roughly cancel out.A relative code calibration on trend allows the reliability model to be more crude, as long as the level of crudeness within the model is consistent [7].The drawback on the other hand is that the absolute reliability level stays unknown.For absolute code calibrations, it is important that a 'structure' represents the same in the derivation (i.e.cost optimization) of the target as in the code calibration, e.g. the dominant failure mode (as in [26]) or the complete structure.

Objective
Optimization as in Eq. ( 1) is required in any calibration approaches of a semi-probabilistic code format, since the code format is too simplistic to lead to the exact target for arbitrary structures.Therefore, it is important to understand which structures deviate, how much and why.This can help to reduce the calibration error by e.g. a redefinition of the scope of the code format or by changes in the code format itself.For an in-depth understanding of a code format over its scope, repeated calibrations, e.g. with a subset of the reference structures, are often unavoidable.Therefore computationally effective calibration procedures are necessary.
This paper compares two calibration procedures with reliabilitybased targets: the common reliability-based (level 3) calibration procedure (e.g.[28]) and the design-based (level 0) calibrations procedure (e.g.[4]).In Section 2 the calibration procedures are explained and compared under theoretical considerations regarding computational efficiency and accuracy.Section 3 introduces the setting for an exemplary code calibration, which is then used in Section 4 to compare both procedures in an exemplary application.

Calibration procedures
The code calibration process towards reliability-based targets can be split into three components: (The colors and line styles noted in parenthesis are used in Figures 2, 3, 9 and 10 to differentiate the components.)i) Code calibration (blue, solid) The code calibration is the main component and interacts with the other two (sub-) components, i.e. the code format and the reliability analysis.The first step is to define the target reliability and the scope of the calibration.The target reliability can either be a relative or an absolute target as discussed in Section 1.The calibration is executed for a set of reference structures that shall represent the scope of the code format.Each reference structure must define the properties that are needed in the limit state function (s).In calibrations for universal structural code formats (e.g.load safety factors to be applied with various materials), the limit state function and thus also the reference structures can often be expressed in generic terms (see also Section 3 or e.g.[1,2,20]).The generic representation cannot only represent load ratios as stated in aforementioned literature, but actually should be understood more generally, as it can also represent different kinds of load effects (e.g.compression or bending or its combination).In this way, also future structures, i.e. that are not known at the time of calibration, are covered in the calibration.Finally, generic structures can often be defined with dimensionless parameters, since the mean values of loads and resistances cancel out as they appear in the code format as well as in the reliability analysis.That means that loads and resistances can be represented by probability distributions with a mean of unity and a certain coefficient of variation (CoV).This is very useful since it shows that for example different steel or timber grades are identical in terms of their reliability, as long as they have the same CoV and distribution type.Thus in the set of reference structures any normalized/dimensionless parameters do not need to be varied anymore, which reduces the required number of reference structures I by magnitudes: Assuming p parameters are varied n times each, then a total of = I n p reference structures are required.Note that some structures (respectively limit state functions) cannot be represented by a generic formulation, e.g. the structural resistance of timber in fire [10].In such situation, the number of varied parameters in the reference structures will be significantly higher and the computational cost increases by magnitudes.The actual calibration is the minimization of the measure of closeness, i.e. the error (e.g.Eqs. ( 2)-( 4)) by adapting the partial factors.Obviously, in each iteration, the code format is evaluated for every reference structure i.Finally, it is important to analyze the remaining errors of the reference structures in order to recognize excessive or systematic deviations.As every error is the result of the simpler code formats compared to the targets they are calibrated to, this analysis gives insight about the simplifications the code format is making and may point out, where oversimplification should be alleviated by either a reduced scope of the code format or by adjusting it.ii) Code format (green, dashed) The code format defines the format for verification of each type of reference structure, that means it also defines which fractiles of the random parameters are used, which partial factors exist and how/ where they are applied.During the calibration, the code format is used to calculate the necessary design d i (i.e. the geometry) for a given reference structure, such that the verification is exactly met with a given set of partial factors.The design can be represented for example by the cross section of a structural member under normal loads or the section modulus of a bending member.iii) Reliability analysis (pink, dotted) The reliability analysis assesses the reliability i of a certain reference structure for a given design d i .For each type of structure, a reliability model consisting of the limit state function and the probabilistic distributions must be defined.Usually, it is important that the reliability model does not only cover the aleatoric and epistemic uncertainties of each parameter, but also a term for the model uncertainty [18].In order to avoid a bias (see Section 1.2), the reliability models must/should be the same or equivalent in terms of their crudeness as were used for the target derivation.With relative targets derived from antecedent codes, this is always feasible.In contrast, this bias is often unavoidable with absolute targets given in codes (e.g.[8]), since the underlying models and assumptions are not specified.Finally, different reliability methods such as the first order reliability method (FORM, [14]) or Monte Carlo simulations (MCS, described e.g.[27]) can be operated on the reliability model to calculate the reliability index.
All three components are used in both investigated calibration procedures.The difference lies in the call sequence, as shown next.

Procedure 1: Common reliability-based code calibration
Fig. 2 shows a flow chart of the calibration procedure in the commonly used reliability-based code calibration (e.g.[9,25]).Initially, the reference structures, the target reliability, the code format as well es the reliability models must be defined.Then the actual calibration, i.e. the iterative optimization of the partial factors, begins with an initial guess of the latter.Then each reference structure is 'designed' according to the code format using the chosen partial factors j .After, the reliability of each structure given that design d i is assessed and finally, the deviation between the reliabilities and the target, i.e. the error, is calculated.By changing the partial factors in each iteration, the error term is minimized and the optimum partial factors can be found.
The required time for the calibration of the partial factors can roughly be expressed by formula (5): i is the number of reference structures.The number of iterations needed (k) is dependent on the number of partial factors that are calibrated and might also depend slightly on the number of structures.Only the times required for each reliability analysis (t r ) and for each code format evaluation (t c ) are taken into account, since other processes take comparably little time.It should be noted that generally t t c r since the code format evaluation consists just of simple mathematical expressions, whereas the reliability analysis is itself a lengthy and often iterative process.

Procedure 2: Design-based code calibration with reliability-based targets
As introduced in Section 1, the intention of this calibration procedure is to combine the computationally fast code calibration of designbased optimization (Eq.4) with a reliability-based target.Therefore the reliability target has to be translated into a target design.Fig. 3 shows this process on the left (part 1).It bases on the idea that for each reference structure there is a design associated with the target reliability.Unfortunately, deriving the necessary structural design for a given target reliability is in general not possible, as the calculation of the reliability is a one-way function, i.e. the reliability can be calculated for a certain design, but not vice versa.Thus, after having defined the target reliability, the reference structures and the reliability model, the design associated with the target reliability (d i t , ) must be found iteratively.That means, in every iteration the reliability is assessed for a given trial design until the resulting reliability matches the target.This procedure is done for every reference structure.As the reliability analysis is time consuming, this process takes a long time.After that, the actual calibration of the partial factors can be done (Fig. 3, part 2).This second step is computationally fast, as only the code format is repeatedly evaluated.
Assuming it takes l iterations to find the target design per structure, the time required for part 1 can be expressed as shown in Eq. ( 6).The duration of the actual calibration (part 2) is expressed in Eq. ( 7).As Assuming that the number of iterations needed for the derivation of the target is the same as is needed for the calibration ( = k l), the total time ( ,2 ) for a calibration is identical with both calibration procedures.The important difference however is, that with the designbased calibration, the code optimization does not involve reliability analysis anymore.This decoupling of the code format application and the reliability analysis now allows that a repeated calibration, e.g. with another code format or with a subset of reference structures, but for the same target reliability, runs nearly instant, while with the common reliability-based calibration procedure also the reliability analysis would have to be repeated.Additionally, calibrations have shown that l k.This is because (1) calculating the design for a certain reliability is a nearly linear root finding problem and thus l is from experience  2), the number of function evaluations needed (k) in the optimization of the partial factors is significantly higher (e.g.60) as this is a multidimensional minimization problem.Additionally, k increases with the number of partial factors that get optimized, since more gradients have to be calculated in each iteration.
The design-based calibration with reliability targets is not new: In several publications of Ellingwood [4][5][6] this approach was used some decades ago, however since then it appears that only the reliabilitybased calibration procedure shown in Section 2.1 was used.The reason can only be conjectured.However, [4] states that using procedure 2 was simpler, as the target geometries were already developed in a previous step of the project.At the same time, the first order second moment (FOSM) reliability method (level 2) was used, where calculating the design for a given target reliability is similarly (little) complex as the calculation of the reliability given the design.This corresponds approximately to = l 1 in above formula and thus the design- based code calibration with reliability targets is obviously much faster than the reliability-based code calibration (procedure 1).With the emergence of the more accurate first order reliability method (FORM, level 3, [14]), this advantage was gone as the computation of the necessary design for a certain reliability index with FORM required iteration now.Thus, under the assumption of = k l, both calibration procedures were identical in terms of computation duration.The reliability-based calibration then probably prevailed, since its error definition (Eq.2) might appear more logical to a person used to reliability than the definition in terms of a design (Eq.4).
With the introduction of inverse FORM [3], the calculation of a design corresponding to a certain reliability index got possible and thus the iteration in part 1 in Fig. 3 is omitted.Hence, the design-based code calibration with reliability targets again has a computational advantage over the common reliability-based code calibration.However, it seems like this advantage was never used so far in a code calibration.It should be noted that the advantage does not apply to reliability methods other than FORM, e.g.not for MCS.

Difference between target reliability and the resulting reliability
In both procedures, when the calibrated partial factors are applied to all the structures, the resulting mean reliability over all structures will not be exactly equal to the target reliability.This difference also depends on the method.As the difference might be unwanted and for comparison of both methods, an additional constraint can be added to ensure that the mean reliability over all structures is equal to the target reliability.Basically, many optimization algorithm implemented in standard software provide this possibility, e.g.'fmincon' in MATLAB [24] or 'minimize' in SciPy [19] (Python).Though, they are usually not efficient for this kind of problem, as additional reliability analyses must be performed for each constraint evaluations and as the number of optimization function evaluations increases.
In reliability-based calibrations, the additional reliability analyses can be saved, since the reliability analyses done in the actual optimization can be reused.To do so, the constraint must be included in the optimization function, i.e. the minimization of the error.Eq. ( 8) shows a possible implementation, as e.g.used in [10].Thereby the constraint is implemented such that for = mean t the constraint part diminishes, while for mean t the unconstrained error term is increased.W is a weight that guarantees that the constraint is met to a satisfactorily degree.In this paper, the method is referred to as 'augmented error term'.
For design-based code calibrations the above method is not applicable since the reliabilities of the structures are not calculated in each iteration.The alternative is the following procedure: A virtual target reliability is iteratively searched such that after calibrating with this virtual target the resulting mean reliability equals the target reliability.This is especially efficient for design-based calibrations as the reliability analyses are needed only after each calibration, but it works also for reliability-based calibrations.Since an increase of the virtual target results in nearly the same increase of the resulting mean reliability, usually not more than three calibrations 11 are need until the virtual target is satisfactorily found.
In this paper, both methods will be used and a comparison between the methods is shown for reliability-based calibrations.

Setting of the exemplary code calibration
In the exemplary code calibration the load partial factors of a generic code format are calibrated, as it was done for example for Eurocode by [1] (Appendix B).The reference structures comprise four different materials and three live loads.As the resistance partial factors are not calibrated (except in calibrations 12 to 15), any inherent reliability difference between the materials will not diminish.Additionally, the permanent/dead 12 load partial factor is the same for all materials.This tends to result in larger differences between reliability-based and design-based calibrations: The relationship between the reliability and the design d i can be regarded as linear for small variation of the values.Thus the two different error terms (Eqs. 2 and 4) are equivalent.However with increasing differences of the reliabilities between structures, the relationship is increasingly non-linear and thus also the differences between reliability-based and design-based calibrations will increase.The comparison setup therefore is rather severe.However, the calibration is still exemplary and the conclusions drawn might not be applicable to some calibration problems.
The generic structures are defined by a resistance variable r i , a dead load g i and a live load q i .Additionally a model uncertainty is applied on each variable.The 'load ratio' q (0 1 q ) is varied among the reference structures and represents different types of structures and load effects such as beams in bending or columns under combined action.d is the design variable, i.e. a geometrical property.Eqs. ( 9) and (10) show the limit state function (i.e. the reliability model) respectively the code format of the generic structures.Both equations have the same structure: resistance -dead load -live load.The main difference is that the limit state function is written with the probability distributions X u of uncertainties u, whereas in the code format the respective char- acteristic values k and partial factors u are applied.Separate partial factors u are applied on each load and on the resistance.
With aforementioned definition of the limit state function and the code format, the generic structure is represented mostly analog to e.g.[12,1,23].
The combinations of 3 loads, 4 materials and 13 equally spaced values for q define the 156 reference structures.The probability distributions for the resistance and the permanent load, the set of 'load ratios' q as well as the resistance partial factor r depend on the material.Additionally, the material also defines the weight of the structures.All parameters of the reference structures are given in Tables 1-3 in the Annex.As it is not the intention of this paper to provide results of an actual code calibration for a certain code, the materials and live loads do not represent actual values and thus are just numbered and not 11 A custom root finding algorithm was used.The standard root finding function of Matlab ('fzero') requires more iterations, as it does not have the additional knowledge that the gradient between virtual target and mean reliability is nearly 1.
12 for simplicity, permanent and dead load are not treated separately.
named.However the set of materials and live loads is constructed such that rather strong, but still realistic differences in the reliabilities appear and thus a severe scenario is created.The error term is analog to Eqs. ( 2) and ( 4), but with added weights w i .For the reliability-based calibration, the error is given by Eq. ( 11).It is analog for the design-based code calibration.
The reliability target in the calibrations was 4.5.Four load partial factors are calibrated: g is the unitary partial factor for the dead load, whereas the partial factor q is differentiated according to the live load.All calculations are done in MATLAB.UQLab [22,21] is used for reliability analyses.The optimization uses fmincon with the interior-point algorithm.Bounds for the partial factors were set as 0.5 and 2. The upper bound is important to prevent that the calculated reliabilities get overly high, which would cause problems for the reliability analysis.

Application and comparison of both methods
This section compares the results of various code calibrations with varying setup (Table 4 in the Appendix).Each calibration has a unique number throughout this section.Table 5 in the Appendix shows the numerical results.The interpretation is supported with several plots: • Box plots (e.g.Fig. 4) are used to show the scatter in reliability of the structures when the calibrated partial factors are applied.Thereby the box represents the interquartile range (IQR), the whiskers indicate the minimum and maximum values and the center line marks the median.The star in the box plot denotes the unweighted mean value and thus can be different from the weighted mean values shown in Table 5 in the Appendix.On the abscissa, eight different sets of structures are differentiated.In the leftmost box.all structures are represented.To the right, first the subsets for each live load follow and finally the subsets for each material are shown.Note that thereby each structure appears three times in the plot: once in a live load box, once in a material box and in the 'all'-box.
• Scatter plots (e.g.Fig. 5) are used to analyze the reliabilities of structures in detail.Each plot shows the calibration result of one calibration.Each structure is represented with one data point, where the materials are differentiated by color and the loads with the symbol used.The load ratio is indicated on the abscissa.
Experience showed that repeated calibrations might have unexpectedly large differences (e.g.15%) in their duration.Thus, comparing the duration is delicate.The duration of design-based code calibrations is always given for the case where the target designs d t were not precalculated, i.e the worst case.For a comparison of the computational effort within different reliability-based calibration, the number of reliability calculations is a more reliable parameter.

Conventional calibration (unconstrained mean reliability)
Calibrations 1 and 2 are the reliability-and design-based calibrations for the setup as described in chapter 2. The box plots given in Fig. 4 and Fig. 5 show the reliabilities associated with the calibrated partial factors of all structures.The calibrated partial factors deviate significantly between both procedures, where the partial factors of the design-based calibration are up to 0.1 lower.This is also reflected in the mean reliability, which is 4.52 for the reliability-based calibration and thus slightly above the target, whereas it is 4.39 for the design-based calibration.The same can be seen in the box plots.What stands out is moreover that scatter of the mean reliabilities for the different materials is large.Additionally, the difference between the minimum and maximum reliabilities is lower in the reliability-based calibrations, whereas there is a tendency that the IQR is smaller with the design-based calibration.The reason is probably that the squaring of the error at larger deviations is more pronounced in the reliability-based calibration, which tends to reduce the total span of the structure's reliabilities.At the same time this means that smaller deviations will get relatively more weight in the design-based calibration, which is why the IQR is smaller.
In the material specific boxes, the lower whiskers are significantly larger than the upper whiskers.From the scatter plots (Fig. 5) it can be seen that this is owed to the low reliabilities of structures with small load share Q , i.e. low live load compared to the total load.It shows that the dead load partial factor G , which has more impact on the reliability for low alphas is actually too low for those structures.However, as the optimization shows, having a larger G would increase the overall error, especially because roughly 10 out of 13 load shares are very well calibrated this way.The fact that the dead load partial factor has less importance on the result already with a load share of 0.25 comes from the significantly lower uncertainty of the dead load and its modeling compared to the live load.Finally, the larger scatter at lower load shares is also due to different dead load uncertainties for different materials, while having a common G .Comparing the scatter between different materials it could be expected that material 4 has the highest scatter as it has the lowest weight (Table 3), however the scatter is larger for materials 2 and 3.This is caused by the range of the load shares, which is lower for materials 2 and 3, where the reliabilities deviate most as explained above.
Concerning computational time, the comparison between designbased and reliability-based code calibrations shows that the designbased calibration is a magnitude faster (152 s vs 1608 s).Referring to the equations for estimating for the duration in Section 2, the difference in the overall duration proofs that k l.In fact, the difference is large: For the case without a good initial guesses, l 6, whereas in calibration 1 = k 81.
Additionally, when the same design-based calibration is repeated, i.e. the target geometries do not need to be calculated again, then the calibration is another magnitude faster and takes only 12 s.Thus design calibrations are comparably instant, which helps a lot when performing repeated calibrations, e.g. in the development of an optimized code format.Whether the difference in the calibrated partial factors and in the interpretation of the reliability scattering is small enough for this use, is investigated in Section 4.4.

Calibration with constrained mean reliability
In calibrations 1 and 2 the different mean reliabilities hinder a direct comparison.Thus, both calibrations were repeated with constrained mean reliability.The reliability-based calibration was performed once with the virtual target method (calibration 3) and with three different weighting factors W of the augmented error method (calibrations 5 to 7).As all results are comparable, the box plot in Fig. 6 only compares the calibrations 3 and 4, where in both calibration procedures the virtual target method was used.
Comparing the box plots in Figs. 4 and 6, it can be seen that the boxes of the constraint calibrations are very similar to the unconstrained calibration, except that they are shifted in vertical direction such that = mean t .The virtual targets, 4.4787 for the reliability-based and 4.6067 for the design-based calibration, were found in the third iteration in both calibrations.Interestingly, the partial factors did not increase (respectively decrease) in a similar magnitude, neither relatively nor absolutely seen.For example in the design-based calibration (4), G increased by 0.0100 (from 1.0681 to 1.0781; 0.94%) whereas the Q s increased between 0.0582 (from 1.4785 to 1.5367; 3.93%) and 0.0921 (from 1.7906 to 1.8827; 5.15%).Comparing the partial factors between both calibration procedures, the differences changed, but overall they are still significant and constraining the mean reliability does render the procedures more comparable.As the shape of the error term must be rather flat in the vicinity of the optimum partial factors, it could be concluded that the reliabilities of the structures would be similar despite the significant differences in partial factors.However, in this exemplary calibration with rather strong inherent reliability differences between the structures, the reliability deviation from the target can be quite large for some structures.Those structures also show significant changes in their reliability when the partial factors change.This can be seen by comparing the whiskers in Fig. 6.Thus even when the shape of the error term is basically flat around the optimum, which suggests that small changes of the partial factors are not significant, the change in reliability can be high if there are strong inherent reliability differences between the structures.In contrast, when the reliability levels are rather homogeneous, small changes in the calibrated partial factors do not change the reliabilities significantly (see Section 4.3).
Despite performing three calibrations in each virtual target calibrations (3 and 4), the duration did not triple.In the case of the reliability-calculation, the reason is that in every repeated calibration the resulting partial factors of the former calibration are used as a starting point, which is beneficial compared to partial factors initialized with unity.In the design-based calibration, finding the target design utilizes the same effect by starting the iterations at the target design of the closest, precalculated target reliability compared to the current target.This often reduces the needed iterations (l) to 3. The duration of all calibrations with the augmented error method were in the present case longer than with the virtual target method.Comparing the required reliability evaluations per structure, calibration 7 with the lowest weighting ( = W 10 3 ) still required more evaluations (202) than with the virtual reliability method (187), while at the same time the discrepancy in the mean reliability was larger (i.e. the same discrepancy would have required even more evaluations).With increasing weighting W, the number of required evaluations increases significantly (see calibrations 5 to 7).The reason is that on the surface described by the error term, the normalized gradient at each sample point is increasingly dominated by the constraint-term with increasing weighting W. Therefore, the optimization algorithm also reduces the step size in the direction of the actual optimum, which increases the number of steps needed until the optimum is found.

Dependencies of the method differences
As stated before, the basic setup of the calibration is rather severe for the comparison of the calibration results between both procedures due to the inherent, large scatter of reliabilities.This statement can be proven by performing a calibration with only one material included (calibrations 8 to 11) or alternatively by reducing the inherent reliability differences between the different materials (calibrations 12 to 15) by calibrating the partial factors of materials 1 to 3 as well.The results of both approaches are discussed here.The plots show the results for the calibrations with constrained mean (calibrations 10, 11, 14 and 15), while Table 5 in the Appendix also gives the results with unconstrained mean (calibrations 8, 9, 12 and 13).As can be seen in the results table, the mean reliability with the calibrated partial factors hardly deviated from the target in both approaches.
When only material 4 is included in the calibration, the box plots of both calibration procedures are mostly the same (Fig. 7a), while the partial factors anyhow deviate up to 0.064 from each other.This shows that for the case, where rather many parameters are calibrated compared to the variations in the structures and thus the inherent reliability scatter is low, the calibration quality is not very sensitive to changes in the partial factors (see also Section 4.2).
In Fig. 7b the result is shown for a calibration with structures of all material, but where in addition to the load partial factors also the resistance partial factors of materials 1 to 3 were calibrated.The partial factor of one material (material 4) was kept constant, otherwise infinite solutions would exist and the optimization would fail.The homogenization of the mean reliabilities per structure becomes clear in the box plot.The scatter of the reliabilities in this approach is significantly reduced compared to calibrations 1 and 2 (Fig. 4), but was much higher than with material 4 only.Nevertheless the maximum difference of the partial factors was similar (0.066).This may be explained by the fact that many structures deviate very little from the target (see low IQR), where the error terms E of both procedures (Eqs. 2 and 4) are more comparable than at higher deviations.

Comparison concerning code format optimization
Any of the above investigations showed that the results in terms of partial factors differ in a magnitude that is not neglectable.However, the main intention to use a computationally faster alternative to the reliability-based code calibration is for the development of a code format.An exemplary development of the present code format is used to investigate this application.It is assumed that the first iteration of the code format had one common partial factor for all live loads and a partial factor for the dead load.The resulting reliability scatter is shown in blue in Fig. 8 (calibrations 16 and 17).For both calibration procedures, the plots clearly show that the live load 1 has a higher reliability on average and a higher scatter than the other two live loads.Thus it can be concluded that the code format should be changed to have a partial factor per live load and so the present code format was found.From the results of this second calibration (shown in orange; calibrations 1 and 2) it is clear that for a further improvement of the reliability homogeneity among the structures also the partial factors of the resistances must be calibrated.Thus including the resistance partial factors for materials 1 to 3 in the calibration leads to the results shown in green (calibrations 12 and 13).
Despite the fact that the calibrated partial factors, the resulting mean reliability and its scatter differ between both calibration procedures, they both lead to the same code format development process.Thus, both calibration procedures are equivalent concerning the code format development, but the design-based code calibration is the preferred procedure, as the time consuming reliability analysis is decoupled from the partial factor optimization and it runs within a few seconds after the target designs are precalculated, compared to many minutes in a reliability-based calibration.The presented exemplary code format optimization was a rather simple case.However, in more complex calibrations (e.g. with multiple random parameters and additional fractiles to set) the number of iterations until a satisfactorily code format is found could be significantly higher.Also, additional questions such as whether a better calibration result was possible with a different scope of the code (i.e. a different set of structures), can be answered efficiently with a design-based code calibration procedure.

Note on inherent reliability differences
In all code calibrations in this paper (except where also the resistance partial factors were calibrated) there are significant differences between the mean reliabilities of the different materials.It is probably a good assumption, that the reliability scatter among structures of the same material is little affected by the inherent reliability differences between materials.However, the difference affects the calibration result significantly, since it introduces an inadvertent weighting between structures of different materials: Assuming all materials had the same weight w, but structures of material 1 had a reliability that on average is significantly higher than the mean of the other three materials, then the contribution of material 1 to the total error E will be significantly higher than the other materials contribution.The squaring of deviations in the error term pronounces this effect additionally.The error term of the optimization should actually only represent how well the partial factors are calibrated.However the inherent reliability of one material increases the error term by a factor that cannot be reduced by the optimization itself and thus acts as a weighting.It is thus important to keep an eye on systematic deviations of the mean reliabilities of parameters, over which the reliability homogenization is not intended and for which no calibratable partial factor exists.

Conclusion
In contrast to the predominantly used reliability-based code calibration, the advantage of the design-based code calibration is that in the iterative optimization of the partial factors no computationally costly reliability analyses need to be done and thus a calibration can be performed within seconds.A reliability-based calibration, instead, would take minutes or even hours.The setup of the exemplary comparison calibrations was chosen such that the expected difference regarding the calibrated partial factors between the calibration procedures was rather large.This was achieved by forcing a large reliability scatter among the reference structures by including structures of four different materials under three different live loads.The calibrations showed that the resulting partial factors of a design-based calibration differ significantly from those of a reliability-based calibration.The resulting reliability scatter of the reference structures is larger for the design-based calibration procedure in terms of the difference between minimum and maximum reliability, but the interquartile range is lower.At the same time, the difference between the calibrated mean reliability and the target reliability was significantly larger than with the reliability-based calibration and was in an unacceptable range especially for relative code calibrations.Thus, computational advantages for designbased calibration come at costs of a larger deviation of the target reliabilities.The use of a virtual target reliability eliminates this problem and is advised especially for design-based calibrations.
Despite the differences between both calibration procedures, they proofed to be similar enough that the design-based code calibration can be used in the development of an appropriate code format.For this application the computational efficiency of design-based code calibrations is particularly beneficial, since code format optimizations require repeated calibrations with changed code formats or different (subsets of) reference structures for the better understanding of the homogeneity of the reliability with the calibrated partial factors.For the final calibration of partial factors, only reliability-based calibrations should be used and, if required, with constrained mean reliability.

Table 5
Results of all calibrations.Setups see Table 4.The duration of design-based code calibrations represents the case when the target geometries are not precalculated.Note: The number of decimal places given is not representative for the accuracy of a reliability analysis.However, comparison between calibrations with the same bias is possible.

Fig. 1 .
Fig. 1.Level 4 cost optimization (schematically).Commonly the safety level at minimum total cost is expressed as reliability and used as level 3 calibration or design target.

Fig. 2 .
Fig. 2. Reliability-based code calibration chart.Colors identify the three different components of the process: calibration (blue), code format (green), reliability analysis (purple).The whole calibration process takes place in one iterative optimization.Fig. 9 in the appendix shows a more detailed flow chart of the same.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. Design-based code calibration chart with reliability-based targets.Colors identify the three different components of the process: calibration (blue), code format (green), reliability analysis (purple).Part 1 (left) translates the reliability-based target reliabilities into geometrical targets for each reference structure and involves probabilistic reliability analysis.Part 2 (right) is the design-based calibration of the code format, involving deterministic calculations only.Fig. 10 in the appendix shows a more detailed flow chart of the same.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. Comparison between reliability-based (blue, calibration 1) and designbased (orange, calibration 2) code calibrations.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 .
Fig. 6.Comparison between reliability-based (blue, calibration 3) and designbased (orange, calibration 4) code calibrations with constrained mean reliability.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 7 .
Fig. 7. Comparison between reliability-based (blue, calibration 10/14) and design-based (orange, calibration 11/15) code calibrations with constrained mean reliability.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 1
Properties of resistance random parameters.

Table 2
Properties of load random parameters.The reference period is one year.

Table 3
Material dependent weights, resistance partial factors R s, and 'load ratio' ranges

Table 4
Setups of all calibrations.Results see Table5.