Systematic derivation of safety factors for the fatigue design of steel bridges

This paper presents a probabilistic framework to derive the safety factors for fatigue of steel and composite steel concrete road bridges. Engineering models are used for the design and the safety factor is derived in such a way that the design meets the target reliability set by international Eurocode and ISO standards, estimated using measured data and advanced probabilistic models. Engineering model uncertainties and dynamic amplification factors are established through comparison of measurements and models. The value of visual inspections is quantified based on observations from practice and expert opinions. The safety factors are derived for Eurocode’s Fatigue Load Model 4 and Eurocode’s tri-linear S-N curve. The study shows that the safety factors for fatigue as currently recommended by the Eurocodes need to be raised.


Introduction
Fatigue is one of the main causes of structural failure of steel bridges [1][2][3][4][5]. The fatigue design of a bridge should hence satisfy a certain target reliability. Many international standards and guidelines use the concept of (partial) safety factors to meet the target reliability, so-called level I method [6]. In this concept, the characteristic fatigue loads are multiplied by factor and fatigue resistances are divided by factor , or a combined safety factor (= ) is adopted. The partial safety factor for the load is highly depending on the definition of the load model. The partial safety factor for the resistance provided in standards and guidelines depends on: • The definition of the S-N curve. This involves the adoption of a linear, a bi-linear or a tri-linear curve (on double-logarithmic scale), the stress range at the transitions of the linear parts of the curve, and the survival probability at which the corresponding characteristic S-N curve is defined (usually the 95% or 97.5% lower bound). • The target reliability, which is related to economic and human safety related consequences of failure. Amongst others, this involves whether failure of details may result in collapse of the entire structure as well as the possibility to inspect and repair fatigue-prone details.
Case studies on specific -mainly existing -bridges are reported where the reliability against fatigue failure was estimated through calculation [7][8][9][10]. Not surprisingly, the results in terms of failure probability are different in these studies, caused by differences in * Corresponding author at: Eindhoven University of Technology, Groene Loper 3, Eindhoven, 5612 AE, The Netherlands. E-mail address: j.maljaars@tue.nl (J. Maljaars).
fluctuating loads and associated uncertainties during the life of the bridge, but also caused by differences in the probabilistic models and associated model uncertainties. Measurement campaigns may reduce (aleatory and) epistemic uncertainties especially on the load effect and therefore may increase the calculated reliability [11,12]. These studies used measured strains in the assessment. However, such measurements of the load effects are available only for existing bridges and not for newly designed ones.
Comprehensive probabilistic studies for fatigue of newly designed motorway bridges in the United States were carried out in the years [1975][1976][1977][1978][1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989][1990], predominantly by Moses et al. [13][14][15][16][17]. They used measurement campaigns and best-guess distributions for all relevant fatigue load and fatigue resistance variables and they established the safety factors for fatigue in the AASHTO bridge design specifications [18]. These have been updated since then [19,20]. The loads, the load models and the design procedures in [18] and the Eurocodes -the latter applicable to European bridges -are different. In comparison to AASHTO, the partial safety factors for fatigue in the Eurocodes EN 1993-1-9 [21] (fatigue resistance) and EN 1991-2 [22] (bridge loads) have been established based on limited studies and assumptions. The background of Eurocode's fatigue partial safety factors in [23] informs that ''there is little information concerning the variation of the fatigue loading. The standard deviation of the fatigue loading must be evaluated or estimated and depends very much upon the type of traffic load (railways, roadway or highway). [ Comparing the design models specified in EN 1993-1-9 [21] and EN 1991-2 [22] with actual traffic and fatigue resistance, Leander [24] demonstrated that the reliability varies greatly, depending on the span and the shape of the influence line and the number of heavy vehicles. The estimated reliability can be significantly lower than the target values as given in the international standards ISO 2394 [25] and EN 1990 [26]. Similarly, D'Angelo, Faber and Nussbaumer [27] and Sørensen et al. [28] showed that the safety factors are too low and must be increased with maximum 15% [27] or 40% [28]. D'Angelo and Nussbaumer [29] showed that the partial safety factors for the fatigue resistance can be too low or too high depending on the design procedure and the type of weld detail. Three causes for the discrepancy between the required safety factors according to these studies and the values in the standards are distinguished here: • The reliability levels at which the recommended safety factors for fatigue in the Eurocodes are derived [23], appear different than the recommended target reliability levels in EN 1990 [26]. • The characteristic fatigue loads are based on limited measurements carried out decades ago and uncertainties in engineering models are not considered in [23], causing the recommended partial safety factor on especially the load side, = 1, being generally too low.
• Any model comes with assumptions and approximations. The design models for the fatigue resistance are developed for ease of use and they are not necessarily accurate. The bias and the uncertainty involved when these models are used in probabilistic assessments influences the estimated reliability.
Consequently, significant discrepancies exist between the recommended safety factors for fatigue in the main body of EN 1993-1-9 [21] and the safety factors applied in National Annexes of some member states [30]. The expert committee responsible for the development of EN 1993-1-9 [21], CEN-TC250-SC3-WG9, has therefore formulated the task to re-calibrate the safety factors for fatigue [30]. Significant effort has been put in calibrating safety factors for fatigue of offshore structures in the last decade [31][32][33]. Such comprehensive studies are currently lacking for fatigue of bridges designed according to the Eurocodes, with typical European traffic and European design models. Load spectra, environmental conditions, failure consequences, structural models and type of inspection are different for bridges compared to offshore structures. This paper presents a probabilistic framework to estimate the reliability for the fatigue design of steel bridges, where the fluctuating load and the fatigue resistance are modelled as realistically as possible and where the effects of uncertainties are considered. Based on this framework, it derives partial safety factors for Eurocode's design models and using traffic representative for European motorways. However, the framework is not restricted to these design models. Loads are obtained from Weigh In Motion (WIM) measurements. Dynamic amplification factors and model uncertainties are determined through strain measurements. Future changes in traffic load and traffic composition are estimated. Regarding the fatigue resistance, a conventional S-N curve is compared with a random fatigue limit model. The probability of all load effects remaining below the transition between finite to near infinite life of the fatigue resistance is considered in the latter model. The reliability is determined with Crude Monte Carlo (CMC) analysis, the so-called level III method [6]. The focus is on welded details, as these are most prone to fatigue deterioration, and on the nominal (i.e. far field) stress concept, as this is the most used concept in bridge engineering. Based on a reliability assessment with probabilistic load and resistance models, the elastic section modulus for that influence line is determined such, that the target reliability indices according to [25] or [26] are met. Subsequently, the design elastic section modulus is determined using standardised design models for the load and the resistance. A safety factor is determined such, that = . This framework allows to use simplified models in the design (Fig. 1(c)) with a safety factor (level I) such, that the reliability of the same design but based on accurate models and measurements ( Fig. 1(b)) meets the target reliability (level III). For practical reasons, the flowchart of Fig. 1(a) is sometimes followed in opposite direction, i.e. assuming a certain safety factor and calculating the corresponding reliability index . The derived factors are modified to take account of the effects of general visual inspections, if any, for which use has been made of expert opinions and observations from practice, through a Bayesian approach outlined in Section 4.2.

Simulations with measured loads
WIM systems measure the dynamic axle loads of heavy vehicles. Here, use is made of six WIM databases of recent dates from European motorways with a substantial variation in the annual number of heavy vehicles, see Table 1.
The load spectrum is computed from the WIM data using the following procedure. An array is made of all recorded axles with the axle loads and the distances between axles and vehicles. A computer algorithm is developed in which the axle array is pulled over an influence line and the bending moment history is recorded. Rainflow counting is applied to that history, resulting in the bending moment ranges, , and the corresponding number of cycles, . The black curves in Fig. 2 present a few of the resulting moment range spectra for a simply supported beam at midspan loaded by traffic on a single (slow) lane, where the abscissa gives the cumulative number of cycles divided by the recorded number of vehicles, (red curves are introduced below). Despite of the large difference in motorway characteristics expressed through , most of the normalised spectra are very similar. Only the 1/E62 WIM database results in a smaller fraction of fatigue-relevant ranges -a smaller concave-down shape of the spectrum -than the other databases. The Rayleigh distribution proposed for traffic loads in [34] better fits the data of the 1/E62 database as compared to the others. This database contains a relatively large number of 2-axle vehicles, more representative of what EN 1991-2 [22] refers to as medium distance traffic, whereas the other databases are clearly composed of long-distance traffic. The 1/E62 database is therefore not considered further.
A WIM system has a certain guaranteed accuracy designated with a classification system [35,36], but the actual accuracy is often higher than the guaranteed value, see e.g. [24]. The A16/E19 WIM databasecontaining the largest number of vehicles from the collected databasesis described and evaluated in [37]. A comparison with recorded strains in a bridge in that paper revealed that the accuracy of the WIM system is very high: the shapes of the strain and WIM-based spectra were closely aligned and the WIM-recorded axle loads stress ranges had to be altered with a fraction between 0% and 3% (bias) to obtain the same spectra as the strain measurement. Most other WIM databases in European motorways show similar axle loads, but the recorded number of heavy vehicles on A16/E19 is relatively large [37]. Most of the calculations presented later in this paper are carried out with the A16/E19 database and comparisons with the other databases are added.
The data in the low frequency tail of the histograms, → 0, are fitted with an extreme value distribution. Of the candidate distributions Gumbel, Weibull max and Frechet, the Gumbel distribution gives the best fit according to Pearson's chi-square test. The distribution is: where is the number of weeks in the considered period, and Table 2 gives the parameter and the mean and standard error of parameter ( ( ) and ( ), respectively) for simply supported beams with different spans loaded by the slow lane of the A16/E19 WIM database. Note that dynamic effects are not included in the values given.

Model uncertainty
Engineers use various methods to determine influence lines. They apply simplifications and approximations and they may occasionally make errors. The ratio between the actual nominal stress range and the prediction thereof by the engineer is referred to as Model Uncertainty (MU), where the nominal stress is defined as the far field stress excluding any notch effect. Bayesian updating is often applied to establish the MU, e.g. [38] (general case) and [39] (specific for finite element models). An alternative is error-domain model falsification as proposed in [40]. However, both methods require some type of measurement and are therefore suited for existing structures. In absence of such measurement data, the JCSS probabilistic model code [6] provides a mean of 1 and a standard deviation of 0.1 for the MU related to the nominal stress for fatigue, i.e. for linear elastic response where stress concentrations are not to be determined by the engineer. Braml et al. [41] use a mean of 1 and a standard deviation of 0.07 and 0.1 for bending of beams and plates, respectively. Raju et al. [17] use a mean of 1 and a standard deviation of 0.1 to reflect the uncertainty in calculating the effective section modulus, and in addition a mean of 1 and a standard deviation of 0.13 in the girder lateral distribution. The latter is applicable to two dimensional models of multi-girder bridges typically used in the US at the time of publication. Leander et al. [42] use a mean of 1 and a standard deviation of 0.04 for the fatigue-relevant nominal stress in steel bridges, but the same author uses a mean of 1 and a standard deviation of 0.1 in [24]. Cheung and Li [43] use a mean of 1.15 and a standard deviation of 0.1 for fatigue of composite bridges, but this includes uncertainty in the dynamic vehicle-bridge interaction, which is considered separately in this paper (Section 2.3).
Larger standard deviations are sometimes used for offshore structures and for the hot-spot stress approach [44][45][46], although [47] uses again a standard deviation of 0.1.
For fatigue of steel bridges, the uncertainty in the prediction of the linear elastic stress range in steel beams subjected to bending and cables subjected to tension is of importance. The MU may depend on the type of engineering model. Typical two-dimensional models are usually conservative with respect to the spatial distribution of loads between members, but they ignore or do not accurately account for aspects such as distortion-induced fatigue [48] or flange rotation caused by deck deflection, which may cause fatigue cracks in stringers [49]. Nowadays, engineers typically use three-dimensional models in the structural design of bridges that have an expected longitudinal and lateral load transfer, which take these aspects into account. A small bias to the safe side is still expected because aspects such as weld volume are usually ignored in the stress calculation.
Ditlevsen and Madsen [50] emphasise the importance of the MU for structural reliability, but also note the difficulty to obtain its distribution for specific structures due to scarcity of data and the many influential factors involved. The MU distribution in most of the above listed references is therefore based on expert judgement.
The distribution of the MU in the current work is established by comparing predicted strains using engineering models with strain gauge measurements in different bridges. These measurements provided the strain range in a bridge component caused by the crossing of a vehicle with low speed (between 5 km/h and 15 km/h) on a further empty bridge. The axle loads and axle distances of the crossing vehicle were known through measurement. Calculated and measured strain comparisons are included in the estimate of the MU only if they satisfy each of the following criteria: • The influence lines are predicted by engineers working in practice, not by scholars because scholars may have a different level of understanding of the important aspects in modelling and they may use different software. • They are derived for the purpose of regular assessment of structures, asked for by owners or authorities. They were not part of a research campaign as the purpose of and therefore the budget for research may be different. • The engineers were not aware that their influence lines would be compared to measurements afterwards as this may impact the effort spent on the model. • The engineering models were produced within the last two decades because models have changed with computer power.
The purpose of adopting these criteria is to obtain an unbiased set of data, representative of current engineering practice. Six bridges with 18 components, having different influence lines, assessed by five engineering companies are collected that satisfy all criteria. The study comprises of the following bridge components: the top flange of a simple single span box girder [51], the top flange of a two-span box girder [51], the moment-resisting connections between the arch and hangers in an arch bridge [52,53], the bottom flange of a main beam in a plate girder bridge [54], the bottom flange of a main beam in an arch bridge that acts as a tension tie for the arch and simultaneously as a beam spanning the distance between hangers [37], and the stay cables, main girders and crossbeams in a complex skewed multi-girder cable-stayed bridge [55], see Fig. 3 for the latter bridge. Hence, the bridge systems range from simple to complex, but the components do not include orthotropic bridge decks. The first two mentioned bridges are railway bridges. They are included here, because railway bridges are designed by the same companies and using the same tools as road bridges. Components from railway bridges may be subject to distribution of loads through non-modelled rails, sleepers and ballast [56], which may create bias in the MU [57]. Care has been taken that such components are not part of the current study.
The first mentioned bridge was designed by hand calculation. Finite element models consisting of beam and/or shell elements were applied to design the other bridges. The model uncertainty, , is defined as the ratio between the measured maximum strain range and the predicted maximum strain range per influence line. Fig. 4 presents the 18 results as dots. A correlation between the complexity of the bridge system and the accuracy of the prediction is not observed. The smallest and the largest deviation between measurement and prediction are obtained for the two arch bridges. The results of the complex cablestayed bridge but also those of the simple girder bridges are in between. It is evident from the models that the more complex the bridge, the more effort has been put into the engineering models of it. The mean and standard deviation of derived from the comparison, 0.97 and 0.085, respectively, are further used in the study. These values appear close to the recommendation in [6]. The number of data is too small to derive the shape of the distribution. A lognormal distribution is assumed (continuous curve in Fig. 4) as this has a larger upper tail as compared to a normal distribution and it agrees with most of the references given above.

Dynamic amplification
WIM systems installed in pavement record dynamic axle loads at locations with relatively smooth pavement and they are calibrated with scale weighting. Dynamic interaction between the vehicle and the bridge is not included in the databases and it must be added explicitly. The resulting Dynamic Amplification Factor (DAF) has been subject of many studies, most of these focusing on the extreme load effect. The DAF for repetitive loading of heavy vehicles, relevant to fatigue, may be different. The fatigue-relevant DAF depends on many aspects [58], which makes a reliable prediction with analytical or numerical models difficult [59]. Paultre et al. [59] therefore advocate using measurements. One of the most important aspects is the road roughness [60], and many authors have therefore applied planks or other obstacles to evaluate the dynamic amplification. However, even though an incidental obstacle may give a large amplification of the maximum load effect, the expected effect on fatigue is small given the short duration that such obstacles are present.
Suspension characteristics of heavy vehicles influence the DAF [58] and these have changed in the last decades. Consequently, the DAF should be evaluated based on recent measurement campaigns. Žnidarič, Kalin and others [61][62][63][64] present extensive studies of recent Bridge-WIM measurements to establish the DAF, based on more than 10 5 vehicle crossings on many bridges. In line with [65,66], they observed that both the mean and the standard deviation of the DAF decreases with increasing vehicle weight, as the dynamic component of the measured strain is largely independent of the static component. As an example, for (combinations of) vehicle crossings with a (combined) weight exceeding 500 kN in [64], both the bias and scatter of the dynamic amplification are a few percent only.
The studies mentioned above mainly focused on reinforced concrete bridges and slab bridges with spans ranging between 5 m and 35 m and first resonance frequencies ranging between 1.5 Hz and 15 Hz. In the current study, a limited set of measurements is carried out on steel bridges in order to evaluate if the low DAF found in these recent studies also applies to steel bridges. This study considers bridges with spans -defined as the distance between (intermediate) supports -ranging between 1 m and 100 m and influence line lengths ranging between 1 m and 300 m and the measurements were performed on bridges with the same range of influence line length. Iatsko et al. [67] demonstrate that relatively few, relatively heavy -overloaded -trucks provide the largest contribution to the fatigue damage in actual traffic. The axle and vehicle weights in the current measurements were therefore selected as the upper values according to legislation: 100 kN axle load for short influence lines, 1200 kN vehicle load for the longest influence lines (representing a condition with multiple vehicles present on the influence line) and 500 kN vehicle load for influence lines in between. Use is made of the same measurement campaigns as in Section 2.2, but excluding the two railway bridges, excluding components close to the expansion joints and including two additional measurement campaigns [68,69] that did not satisfy the criteria for establishing the MU. Railway bridges are excluded because the structural stiffness, vehicle stiffness, and the ratio between vehicle and bridge mass are different for railway bridges compared to road bridges. Components close to expansion joints are excluded because EN 1991-2 [22] requires to explicitly account for the dynamic amplification caused by the expansion joint in the design model. Repeated crossings with the same speed revealed that the measured strains were consistent, i.e. giving the same strain for the same speed at each measurement location. The pavement quality was good in all cases.
Several definitions are used in the literature for the DAF [70]. The DAF used in the current work, , is defined as the ratio between the maximum dynamic strain measured at a bridge component divided by the static strain due to the crossing of the same vehicle and measured at the same component. Assuming that dynamic amplification is negligible at low speed, is estimated as the ratio between the measured strain range at normal speed crossing (80 km/h or 90 km/h) and the measured strain range at low speed crossing (5 km/h to 15 km/h). Fig. 5 presents the results. The blue dots in these figures are outliers. They may have been caused by a difference in lateral position of the vehicle within a lane between the low and normal speed crossings and they are further ignored. Fig. 5(a) indicates a weak positive trend between and . It may be caused by the fact that the first resonance frequency of the structure tends to reduce with increasing span [59]. However, because of the low value and the low scatter of , this trend is further ignored. It should also be noted that the fatigue dominating load effects for very long spans are caused by combinations of several vehicles on this influence line and they are expected to be out of phase, causing damping, for most crossings. A lognormal distribution is used to fit the data with a mean of 1.015 and a standard deviation of 0.023, Fig. 5(b). The bi-modal nature of the data in Fig. 5(b) is thus ignored, because it has a negligible influence on the upper tail that is relevant in the reliability calculations.
The measured values of are in line with recent data using heavy vehicles as reported before, e.g. [64].

Trends in traffic composition
Future changes in traffic composition, including axle loads, number and lay-out of vehicles and intervehicle distances, are extremely difficult to estimate. Yet, trends are of great significance for structural reliability. Fatigue problems of some bridges built decades ago, that can be attributed to underestimation of road freight transport in the design stage back then, clearly demonstrate this.
The number and growth rate of heavy vehicles is different between motorways. Evaluation of WIM measurements in A16/E19 from the period 2008 up to 2018 show no significant change in number of axles. Absence of growth of the number of axles is also observed in other busy motorways in Europe and it may be caused by 'saturation' of the road: the road is fully occupied and any increase in the number of vehicles results in more traffic jam. In this study, no trend on the number of vehicles is applied. The philosophy is followed that the number of heavy vehicles for a specific road should be forecasted and used with the fatigue load model in the design of a bridge.
A trend factor on axle load with a mean of 1.15 and a standard deviation of 0.10 was assumed for deriving the safety factor for fatigue in AASHTO in 1986 [71]. A trend on axle loads has not been observed in the A16/E19 WIM database between 1998 and 2018. Croce [72] suggests that the heavy axle loads recorded in Auxerre around 1985, used for developing the load models in EN 1991-2 [22], are still representative for today's traffic on European motorways. An estimate by Dutch traffic experts is considered for the future trend in axle loads. Each axle load is multiplied by a factor : where is number of years after recording the WIM database ( Table 1). The expectation of estimated by the Dutch traffic experts is 0.002 for medium and long-distance traffic, based on a mobility forecast scenario with 'high' population and welfare growth in [73]. A coefficient of variation of is assumed equal to 0.05 based on the authors' estimate.
It should be noted that the trend does not take account of traffic load changes that require a change in legislation, such as long road trains or automated vehicle driving. Such changes may require an update of the design fatigue load model.

Design load model
Most European bridges are designed according to either Fatigue Load Model 3 (FLM3) or 4 (FLM4) as given in the standard EN 1991-2 [22]. FLM3 consists of one lorry. The stress resulting from crossing the lorry over the calculated influence line is subsequently multiplied by factors depending on the span and the shape of the influence line and by factors depending on the number and type of heavy vehicles on the road considered. FLM4 consists of five vehicles, each having prescribed axle loads and axle distances, with depending on the type of road, see Table 3 for motorways. The annual numbers sum up to 2 million, i.e. the same as in the WIM database of A16/E19. Each vehicle crosses the influence line individually. Dynamic amplification and trend factors are deemed to already be included in the axle loads [74]. In a previous work, the authors have demonstrated that FLM4 gives a more consistent description of the deterministic fatigue damage compared to the WIM database than FLM3 [37]. Other researchers currently re-calibrate the factors of FLM3 for the updated version of Eurocode EN 1993-2 [75]. For this reason, FLM4 is adopted as the design load model in this paper.
The red curves in Fig. 2 present the stress range histogram following FLM4, where the partial safety factor for multiplication of the axle loads is adopted as 1, following the recommended value in EN 1990 [26]. The figure shows that, compared to the WIM data, FLM4 overestimates the number of medium weight vehicles and it does not account for very heavy vehicles with low frequency. This is not necessarily problematic, as the model is not intended to represent the actual traffic, but instead to represent the fatigue damage created by the vehicles for a finite life design. Note that FLM4 is not intended for use in an infinite life design, where all ranges are below the Constant Amplitude Fatigue Limit (CAFL). The accuracy of the damage representation will be evaluated below.

Resistance models
S-N curves are applied to model the fatigue resistance in the nominal stress approach. In this approach, S-N curves relate the nominal stress range , to the number of cycles to failure , and they are based on the results of fatigue tests. Two probabilistic fatigue resistance models and one design model are considered.

Conventional probabilistic S-N model
The first model represents the current state-of-the-art and it is described in the JCSS probabilistic model code [6]. This Conventional Probabilistic S-N Model (CPSNM) consists of a bi-linear S-N curve where the predicted number of cycles to failure, , for a stress range follows from: where 1 and 2 are location parameters and 1 and 2 are slope parameters, respectively. Slope parameter 1 is derived from Constant Amplitude (CA) fatigue tests. For welded details, its value is often close to 1 = −3 [76] and this parameter is also adopted in EN 1993-1-9 [21] and used here. The mean value and standard deviation of 1 depend on the detail type and they are also derived from CA fatigue tests. Gurney [77] and Keating and Fisher [76] relate the standard deviation to the category of S-N curve -i.e. characteristic value of 1 -with values 0.18 ≤ ( 1 ) ≤ 0.25 ([77], for fixed slope 1 ) or 0.10 ≤ ( 1 ) ≤ 0.22 ([76], for category-specific slope 1 ). The standard deviation also depends on the way the fatigue tests are evaluated and grouped to a certain category [78]. For general application -i.e. not limiting to a certain type of detail -the standard deviation is often assumed as ( 1 ) = 0.2 [79]. This value, applied in many papers such as [28,31,80], is also used here.
The extension of the S-N curve with slope parameter 2 accounts for the resistance in case of stress ranges smaller than the CAFL in Variable Amplitude (VA) loading. Haibach [81] proposed 2 = 2 1 + 1 whereas Fisher et al. [82] proposed 2 = 1 . These different proposals have been adopted in EN 1993-1-9 [21] and AASHTO [18], respectively. It should be noted that the choice of 2 is rather a convention instead of based on physics. Using fracture mechanics, it can be demonstrated that 2 depends on the shape of the fatigue load spectrum. Soliman et al. [83] demonstrate that the reliability for fatigue of bridges depends significantly on the choice of 2 . Following prEN 1993-1-9 [84], 2 = 2 1 + 1 = −5 is adopted here and 2 is defined such, that the transition of the two parts of the S-N curve is at log 10 ( ) = 7 or 6.7, depending on the value of 1 : The location parameters 1 and 2 are hence assumed fully correlated, which is in agreement with [6]. The cumulative fatigue damage in a period ,  ( ), is determined using the Palmgren-Miner linear damage accumulation rule: The cumulative fatigue damage should be compared with the critical fatigue damage,  . This variable considers the additional scatter related to VA loading, since the standard deviation of 1 is derived from CA tests and 2 is fully correlated to 1 . This approach follows the basic concept of [85] and adopted by most researchers since then. According to D'Angelo and Nusbaumer [86], the distribution of  does not depend on the detail. Wirsching [44] proposed a log-normal distribution for  with mean (  ) = 1 and standard deviation (  ) = 0.3. This distribution is applied here, as it is also given in guidelines [6,79] and applied in many other studies, such as [87][88][89][90][91][92]. Smaller and larger standard deviations are occasionally applied, e.g. [15,93]. Zhang and Maddox [94] show that the mean of  depends on the type and the shape of the load spectrum because of load sequence effects. However, the calculated reliability appears not sensitive to the distribution of the critical damage, [44].

Six-parameter random fatigue limit model
The second probabilistic S-N curve considered is a random fatigue limit model. Such a model considers not only failed data from fatigue tests, but also tests that were stopped before failure -so-called run-outs -as right-censored data, to estimate the CAFL [95]. It should be noted that such a model is able to also describe the transition from the finite life to the very long or near-infinite life regions in case a conventional CAFL -e.g. defined at 5 million cycles -does not exist [96]. However, the calibration of the model is limited to numbers of cycles typical for bridges, implying that surface cracks are considered, but sub-surface cracks are not considered. Here, use is made of the Six-Parameter Random Fatigue Limit Model (6PRFLM) [97], in [98] referred to as generalised random fatigue limit model. The S-N curve for CA loading is: where = log 10 ( ) is the 10-base logarithm of the predicted number of cycles to failure and 0 , 1 , and 0 are detail dependent model parameters, with 0 being the CAFL. The CAFL is assumed to be log-normal distributed. Using = log 10 ( 0 ) , the Probability Density Function (PDF) is: where is the standard normal variable. The PDF of the conditional probability of the finite fatigue life (for > 0 ) is: where is the standard deviation of the fatigue life. The marginal PDF of the fatigue life is: The formulation of the model is extended to VA loading in [99]. Based on fracture mechanics principles, a crack grows because of increasing cumulative fatigue damage and the threshold stress for fatigue damage accumulation under VA loading, ℎ , reduces as a result. Fig. 6 visualises this process. The fatigue limit follows from: where is a detail and spectrum dependent deterioration rate factor, is the fatigue damage according to the 6PRFLM and is the associated critical fatigue damage, which follows a log-normal distribution. The 10-base logarithm of the fatigue life for a stress range in the VA spectrum is referred to as and it is depending on the fatigue threshold:  [99]; VA datasets with overloads or with a geometry that is classified differently in [21] are excluded.
During the cyclic process, the fatigue threshold continuously drops. As the load history for bridges can be considered an ergodic process, the fatigue damage created by the entire load history can be estimated with: where ℎ is the number of integration steps and is determined with Eq. (11) using the fatigue threshold ℎ of Eq. (10) with the damage of the previous step (ℎ−1). Obviously, the estimate accuracy increases for an increasing value of ℎ . The reliability analysis that will be presented in Section 5 are determined with ℎ = 1000, i.e. damage increments equal to 0.1% of the critical damage.
The advantage of the 6PRFLM over the CPSNM is that the former naturally considers damaging and non-damaging stress ranges: in deterministic terms, fatigue damage does not develop if all stress ranges remain below the CAFL. The 6PRFLM gives a higher likelihood than the CPSNM to fatigue tests loaded in CA [97] and VA [99].
The model variables of the CA part of the 6PRFLM -being 0 , 1 , , , and -are determined using the maximum loglikelihood with CA fatigue test data. The log-likelihood function is: where is the marginal cumulative distribution function of . The model variables of the VA extension of the 6PRFLM -being ( ) , ( ) and -are determined using the maximum log-likelihood with VA fatigue test data. The log-likelihood function is: where is the cumulative distribution function of the standard normal distribution. The variables are estimated in [97,99] for the four welded details of Fig. 7. The underlying fatigue test databases were not extensive. A fifth database is therefore added, referred to as 'detail (e)', where an effective notch stress approach was applied to a very large CA test database of various welded joints in [98]. As an evaluation of VA data with this approach is lacking, the critical damage distribution is assumed equal to  in Section 3.1, i.e. having a mean of 1 and a standard deviation of 0.3, and the average values of details (a) to (d) are used for . Tables 4 and 5 provide the estimates and standard errors, respectively, of the model variables. The correlation matrix of the variables can be found in [98,99].

Design S-N curve
The tri-linear S-N curve for VA loading in EN 1993-1-9 [21] is used as the design S-N curve:     [99]; VA datasets with overloads or with a geometry that is classified differently in [21] are excluded.
where 1 is defined as giving a 95% exceedance fraction of the fatigue life, with values according to [21] for details (a) to (d) and according to [98] for detail (e), see Table 6. Parameter 2 is related to 1 in the same way as parameter 2 is related to 1 with Eq. (4). The cumulative fatigue damage is determined using the Palmgren-Miner linear damage accumulation rule of Eq. (5). Failure is assumed at  = 1.
Using the already defined value ( 1 ) = 0.2, the mean values of 1 in the CPSNM of Section 3.1 can be derived from 1 . Table 6 provides the values of the five details considered. Fig. 8 compares the two probabilistic S-N curves (mean values) and the design curve (95% exceedance fraction) for detail (a) subjected to VA loading. As explained above, the 6PRFLM curve changes with the damage developing. The figure provides the 6PRFLM curve at three distinct stages during damage development, where the highest (solid) curve applies at zero damage and the lowest (dotted) curve at almost full damage. Note that the design curve uses a cut-off value at = 10 8 . This is typical for EN 1993-1-9 [21] and not applied in most other international standards. A comparison with tests and simulations for some details in [86,101] indicates that the cut-off stress range value in [21] is too high -if it exists at all. The CPSNM does not contain such a cut-off value. This causes the mean CPSNM curve to cross the design curve at approximately = 8 ⋅ 10 8 .

Calculation of reliability index and partial safety factor
Textbooks such as [102] provide procedures to estimate the structural reliability. One of these procedures is adopted for fatigue of motorway bridges in this section. The probabilistic S-N curves are applied using stress ranges calculated as:  [99] and [98] are applied in the analyses.
where are the bending moment ranges. Two datasets are applied for . The first dataset consists of the recorded data as displayed with the black curves in Fig. 2. However, this data is bounded by the maximum range recorded during the measurement period, which is shorter than the design life of the bridge. In order to prevent such a truncation, the second dataset uses the WIM data of Fig. 2 for frequencies higher than once a week, combined with Eq. (1) for less frequent ranges down to 0.01 annually (once during the design life). The resulting predicted fatigue life is used together with the applied number of cycles in a period of years in Eq. (5) or Eq. (12) to determine the fatigue damage. Table 7 gives an overview of the random variables as described in the previous chapters. CMC is used to determine the probability of failure, , , for a given reference period of years. For the CPSNM: and for the 6PRFLM: CMC sampling was stopped after at least 10 6 samples with at least 100 failures were obtained for each case considered. The corresponding reliability index for the same entire reference period, , is defined as: The reliability index is also determined per year. The last year of the reference period then appears decisive. Given that a structure can only fail once and using the third probability axiom, the maximum annual failure probability is: The design S-N curve of Eq. (16) is used with stress ranges calculated as: where 4 are the bending moment ranges of FLM4 (red curves in Fig. 2). (In one occasion, mentioned explicitly below, 4 is replaced with in Eq. (22)). The resulting predicted fatigue life per stress range together with the applied number of cycles 4 are used in Eq. (5) to determine the fatigue damage, . The safety factor is tuned such that  = 1 for = , see Fig. 1.

Adaptation for a damage tolerant concept
The previous section considered a design for a structural detail not intended to be inspected in service, further called the 'safe life' concept. Contrary, bridges may be designed according to the 'damage tolerance' concept, implying that the fatigue prone details should be readily inspectable and any fatigue cracks repairable. Bridges designed with this concept are usually inspected with the naked eye under hand or elbow distance, with an inspection interval between 5 and 10 years. This paper does not consider more detailed inspections using nondestructive testing techniques, because new bridges are usually not intentionally designed for such inspections.
Fracture mechanics is usually applied to evaluate the effect of inspections on the failure probability in fatigue. The update in reliability of fatigue of bridges heavily depends on the inspection method [80,103]. The probability of detection of visual inspections is relatively poor, reflected by the high mean and scatter values as compared to other inspection methods [104,105]. A collection of fatigue cracks in bridges by Al-Emrani [106,107] demonstrates that cracks often have through-thickness lengths larger than 0.1 m at first visual detection. This low probability of detection, combined with generally conservative assumptions applied in fracture mechanics for the growth rate of relatively large cracks and for the critical crack size, results in negligible added value of visual inspection if based on a fracture mechanics analysis. Because practice shows differently [49], an alternative assessment route is followed here.
The number of cycles in the fatigue tests used to develop an S-N curve are associated with a surface breaking crack or, in case of small specimens, failure of the detail. If such details are applied in a whole structure, collapse does not necessarily occur yet if is reached or, in case of VA loading, if = . The failure probability for an inspected structure | can be described with: where is the failure probability of the safe life concept of the previous section, and ( det.| > ) is the conditional probability that a fatigue crack that has developed is detected in time, i.e. before collapse of the entire structure takes place.
The members of the combined European expert groups European Convention of Constructional Steelwork -Technical Committee 6 (ECCS-TC6) and CEN-TC250-SC3-WG9 are asked to estimate the conditional probability. Table 8 presents their responses. In addition, use is made of [3], where a data collection from literature is presented of bridge failure causes. A total of 164 failures were reported, with 87 classified as bridge collapse, of which 13% were attributed to fatigue, and 73 as non-collapse, of which 67% were attributed to fatigue. The fatigue cracks that did not result into collapse were apparently detected in time (or have stopped growing). Since most bridges are expected to be only visually inspected, this leads to the expectation of the conditional probability of 1 − 0.67 ⋅ 73∕ (0.67 ⋅ 73 + 0.13 ⋅ 87) = 0.81. This value may be biased, e.g. because the relative number of collapses not reported in the open literature is probably much smaller than the unreported relative number of fatigue cracks detected in time. On the other hand, cracks observed in orthotropic bridge decks, which occur relatively frequently, generally do not cause collapse, but this may be different for other structural parts. Therefore, a standard error of 0.10 is assumed for the conditional probability. A two parameter Weibull distribution is adopted to describe the conditional distribution.
The two sources are applied in a Bayesian inference, where the expert's opinions are used as prior and the data of [3] with the Weibull distribution are used as likelihood. The posterior, , is approximated numerically. Fig. 9 gives the corresponding probability densities. The conditional probability can subsequently be calculated as:

Target reliability index
The European standard EN 1990 [26] recommends target values of the reliability index for the ultimate limit state and = 50. These are related to the Reliability Class (RC), where RC3 is for structures for which failure has a large economic impact and/or causes significant casualties, RC1 is for small consequences, and RC2 is for consequences in between. The bold values of columns 2-4 of Table 9 are the target reliability values in EN 1990 [26]. The same values of the target reliability apply to fatigue of structures that are not designed for inspection. The international standard ISO 2394 [25] gives models for the target reliability values for = 1, based on monetary optimisation and considering costs of safety measures, economic loss upon failure and casualties expressed in monetary units. The bold values in the last two columns of Table 9 are the target reliability values that follow from this concept in case of European bridges with small relative costs of safety measures. Classes 2 and 4 are defined for comparable consequences as RC2 and RC3 in EN 1990 [26].
Bridges in Europe are usually designed for a life of 100 years. The reliability indices are calculated below for the entire life of 100 years 100 and for the decisive year that gives the lowest annual reliability 1 . Therefore, the target reliability indices for the given reference period of should be transformed to other reference periods ′ . If all variables are fully correlated in time, the (target) reliability is independent of the reference period: where superscript indicates full correlation in time. On the contrary, if all variables are uncorrelated in time, the (target) reliability follows from: where superscript indicates no correlation in time. Reality is that some variables are not correlated, others are fully correlated, and many are partially correlated in time. The target reliability is hence bounded by Eqs. (25) and (26): Rows 2 to 4 of Table 9 provide the resulting target reliability values for reference periods of 100, 50 and 1 year when assuming = 0.5, i.e. taking the average of , ′ and , ′ . The table shows that the target reliability values of the two standards are almost equal for = 0.5. Fig. 9. Probability density of the conditional probability that a fatigue crack is visually detected before collapse of the structure.

Results
Fig . 10 presents the ratio of the required elastic section modulus using FLM4 and using the WIM database of A16/E19, in both cases applying the design S-N curve, i.e. these results are not obtained from a probabilistic analysis. Results are given for loading by a single (slow) lane of traffic, for five shapes of the influence line I -V, and for spans ranging between 1 m and 100 m. The influence lengths range between 1 m and 300 m, i.e. the same range as used to derive the distributions of the MU and the DAF in Section 2. The figure shows the level of conservatism, or allowance, of FLM4 to cover up for (load related) uncertainties. The figure shows that the conservatism generally reduces with span, and especially so for hogging bending moments at the supports (IV and V).
The remainder of this section gives the results of the probabilistic analyses using the A16/E19 database unless otherwise stated. Simulations with the two datasets for the load -using the complete WIM database or the WIM database and Eq. (1) for ranges with high and low frequency, respectively -appear to give unnoticeable differences in calculated reliability for both probabilistic S-N curves: the reliability index for simulations with the two datasets showed no bias and the variation was 0.01 only. The small variation can well be due to the estimation with CMC. This implies that the data in the tail are of such low frequency (once per week gives a fraction of 3 ⋅ 10 −5 of the total number of cycles) that they do not contribute noticeably to the fatigue damage. In other words, the period of record of the WIM database is sufficiently large. This observation agrees with the findings in [37]. Fig. 11(a) and (b) present 100 for = 1 and = 1.35, respectively, for influence line I and details (a) to (e) designed with the safe life concept. The selected safety factors cover the range of factors currently provided in EN 1993-1-9 [21]. The differences between the two probabilistic resistance models are due to the different nature of the models, even though they both aim at describing reality. The CPSNM uses a fixed format of the S-N curve based on conventions, irrespective of the damage developed and the shape of the spectrum, whereas the 6PRFLM evolves with the damage. The generally higher reliability indices of the 6PRFLM are caused by the near infinite life predicted by this model when only a small fraction of the stress ranges exceeds the CAFL. This agrees with test observations e.g. in [101]. The fatigue life is longer with the 6PRFLM than with the CPSNM if not more than approximately 1% of the number of cycles is above the CAFL, constituting approximately 20% of the calculated damage (average of all influence lines). It is possible to calibrate the slope parameter 2 of the CPSNM in such a way that the two models provide similar reliability values, but this calibrated slope parameter will then depend on the shape of the spectrum and on the target reliability. The variation in 100 of the different details is larger for the 6PRFLM as compared to the CPSNM. This is predominantly caused by the similarity of the shapes of the S-N curves of the design model and the CPSNM, whereas the 6PRFLM is based on a different concept.     5 m), which gives a lower reliability than that of all other cases. This is attributed to the large number of cycles for short spans, making the CAFL relatively important. The relatively large uncertainty of the CAFL of detail (b) then provides a low reliability. It is expected that this is an a-typical situation, caused by the small number of fatigue test data around the CAFL, and not representative of actual performance. Fig. 13(a) and (b) present the results of detail (c) with a span of = 5 m and = 100 m, respectively, for different shapes of the influence line. All results are for the safe life concept. The reliability calculated for the hogging bending moment influence lines IV and V are lower than those calculated for the midspan bending moment influence lines I-III at = 100 m. This agrees with the observation of limited or no conservatism of FLM4 for lines IV and V at = 100 m in Fig. 10. Fig. 14 provides the required value of the safety factor, , as a function of 100 , for a type I influence line with = 100 m. The figure gives the envelope, i.e. maximum required , of the five details considered. Subfigures (a) and (b) are for a safe life and a damage tolerant concept, respectively. The vertical lines represent the target reliability values for the three RC-s. Comparing subfigures (a) and (b), it appears that that the partial factor can be reduced with approximately 10% if regular visual inspections are carried out, compared to a noninspected structure. The 6PRFLM requires lower values than the CPSNM to satisfy 100 values. A structure in RC2 or RC3 requires = 1.27 or 1.48, respectively, to meet 100 for the 6PRFLM. Additional CMC analyses were carried out to compute the minimum annual reliability J. Maljaars et al.   Table 1 (except for the motorway 1/E62 dominated by medium-distance traffic). With the proposed road-specific adjustment of the number of vehicles in FLM4, the coefficient of variation in reliability for the different databases and further equal case is 0.06, i.e. relatively small, and the A16/E19 database presents an approximate average reliability of the databases considered. The simulations are repeated without any trend on the axle loads -= 0 in Eq. (2) -to study the influence of this uncertain variable. The required factors then reduce with approximately 10%.
The partial safety factor required for RC3, influence line I, = 100 m and a safe life concept using the CPSNM is approximately 1.6. As a comparison, the Danish National Annex to EN 1993-1-9 [21] prescribes a factor of 1.88 for a safe life concept with high consequences. The background of this factor [28] shows that it is derived from a similar CPSNM as used here, but with applying expected load ranges, provided the standard deviation of the fluctuating load is not larger than 0.1. The ratio between the partial safety factors of 1.88/1.6 = 1.18 is indeed close to the conservatism in FLM4 for the same influence line, which is 4 ∕ = 1.21 (Fig. 10).

Discussion
The smooth transition from the finite life to the CAFL and the absence of damage by stress ranges below the fatigue threshold are the main causes of the lower required safety factor for the 6PRFLM as compared to the CPSNM. This is expected to also hold for other random fatigue limit models. The CPSNM is added here for comparison purposes, because it has been used in a large number of probabilistic calculations reported in the literature, Section 3.1. Because of the more realistic and physics-based representation of the fatigue damage process with the 6PRFLM, with a higher likelihood to fatigue test results as compared to the CPSNM, the authors consider it appropriate to base the safety factors for the design of bridges on the 6PRFLM. The largest safety factors are required for the intermediate support regions of large span structures. However, it would be unduly conservative for most designs if the safety factors for general application in standards are  [22]. The safety factors are therefore based on the type I influence line with = 100 m, Fig. 14. Note that this is a choice made by the authors; [108] gives an alternative method to derive a partial factor. Whereas a RC refers to the whole structure, fatigue is about the details. Therefore, failure consequences ''high'', ''medium'' and ''low'' are introduced that apply to a detail and that meet the target reliability of RC3, RC2 and RC1, respectively. This allows a bridge in RC3 to be designed with the main load bearing structure in ''high'' and the bridge deck in ''medium'' fatigue failure consequence classes. Rounding up the safety factors of Fig. 14 at the target reliabilities to the nearest values of 0.05N, where N is a natural number, the safety factors of Table 10 are established. The partial factors required for a damage tolerant design, intended for periodic visual inspection, appear to be approximately 10% lower than those of the safe life design.
A few of the reliability calculations have been repeated with the First Order Reliability Method (FORM). Not all FORM calculations converged, but the ones that did gave approximately the same reliability index as the CMC calculations used here. The combined FORM sensitivity factors for the load and the resistance are approximately = 0.5 and = 0.85, respectively, with (the CAFL) being the dominant variable. The sensitivity factors allow to determine the design values of all variables in the design point. The calculations reveal that, especially for high consequences, the design value of the combined load variable ∕ 4 is larger than 1. Hence, it is reasonable to split the safety factor into partial safety factors for the fatigue load, with a proposed value of = 1.1, and for the fatigue resistance, with = ∕ . The resulting partial safety factors for the resistance, Table 11, are then in reasonably close to the partial safety factors for the fatigue resistance currently adopted in EN 1993-1-9 [21]. However, the partial safety factor for FLM4 should be increased from the current recommended value of 1.0 to a value of 1.1. The values of Table 11 are now included in the updated version of the standard prEN 1993-1-9 [84] and the fatigue-load-model-depending increase of the partial factor is now included in the updated version of the standard prEN 1990 [109]. Note that the partial safety factors apply under the condition that the number of heavy vehicles for FLM4 is based on road specific data and forecasts, see Section 2.4. This implies that the increase in partial safety factor as proposed requires heavier design of road bridges only if the number of (forecasted) heavy vehicles in the road is relatively large.
The level of conservatism of FLM3 is more scattered -i.e. more depending on the span and shape of the influence line -than that of FLM4 [37]. The required partial safety factor for FLM 3 would need to be raised to a higher value than 1.1 to be safe sided for the general case. However, as indicated before, FLM3 is currently being re-calibrated.

Conclusions
This paper proposes a derivation framework for partial safety factors, to be used for the fatigue design of steel bridges. The factors are derived for European road traffic and design models, specifically Fatigue Load Model 4 and a tri-linear S-N curve, but the framework can also be applied to other traffic and design models. The following conclusions apply: 1. A comparison between measured strains and strains calculated by engineers in practice confirms the mean of 1 and the standard deviation of 0.1 as proposed for the model uncertainty in the JCSS probabilistic model code. 2. The dynamic amplification factor on the fatigue relevant load levels that takes account of dynamic vehicle-bridge interaction appears low: a mean of 1.02 and a standard deviation of 0.02. This implies that the dynamically measured axle loads obtained with road weigh in motion systems can directly be applied in the fatigue design of bridges. 3. Random fatigue limit models with a fracture mechanics-based extension for variable amplitude loading naturally take account of: a) the damage developed by stress cycles in the regime of finite to near infinite life; and b) the gradual transition of the S-N curve in case of variable amplitude loading. The shape of the load spectrum and the fraction of the spectrum with ranges above the constant amplitude fatigue limit determine whether such models provide a lower or a higher reliability as compared to conventional probabilistic S-N models. For load spectra typical of road bridges, the random fatigue limit model adopted in this paper appears to give a higher estimated reliability as compared to a conventional probabilistic S-N model. 4. Nonetheless, even with such a model, the partial safety factor for Eurocode's Fatigue Load Model 4 needs to be raised to approximately 1.1 in order to meet the target reliability set by the Eurocode and the number of heavy vehicles in the load model can be adjusted based on forecasts of the specific road or road network. In addition, the load model needs to be modified to accommodate the intermediate support region of long spans. 5. Periodic visual inspections (with the naked eye, under hand or elbow distance, once every five to ten years) of readily inspectable and repairable details may be taken into account in the fatigue design of bridges by lowering the partial safety factors derived for non-inspectable details by approximately 10%.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.