Uncertainty quantification of the failure assessment diagram for flawed steel components in BS 7910:2019

The failure assessment line (FAL) describes the interaction between plastic failure and fracture of flawed steel components subjected to tension or bending. This paper quantifies the model uncertainty of the FAL as provided in the internationally used British Standard BS 7910:2019 by comparing the assessment with the actual failure load of 82 wide plate and 4 tubular joint tests. In line with findings of others, it is demonstrated that the accuracy of the assessment is significantly improved if the crack tip constraint is considered in the assessment. Irrespective of this crack tip constraint consideration, a non-negligible number of wide plate tests has a lower failure load than the one predicted by the FAL in BS 7910:2019, if based on three fracture toughness tests. A penalty or safety margin on the FAL is advocated to compensate for this. It appears advantageous to base the assessment on the average instead of the minimum of three equivalent fracture mechanics tests together with an associated (quantified) safety margin.


Introduction
A flaw of a certain size in a steel structure influences the failure load. The failure mode may be related to void growth (ductile) or cleavage (brittle) fracture, depending on the material characteristics, the geometry, the loading rate, and the applied temperature. Extensive research carried out in the past has resulted in standards that describe assessment procedures for determining the acceptability of flaws in steel structures. Some of the internationally applied standards with a wide application range are BS 7910:2019 [1], API 579:2016 [2] and R6:2001 [3]. This paper focuses on [1]. An overview of the standard is given in [4]. The standard provides the failure assessment diagram for evaluating the acceptability of a flaw, see Fig. 1, where the abscissa gives the plasticity ratio, , which is a measure for the proximity to plastic collapse, and the ordinate gives the fracture ratio, , which is a measure for the proximity to unstable fracture. The two curves in Fig. 1 represent the Failure Assessment Lines (FAL) and they distinguish materials with or without a Lüders Plateau (LP). A flaw giving an assessment point within the bound provided by the FAL is acceptable, whereas a flaw resulting in an assessment point outside the FAL is unacceptable.
The assessment procedures and FAL have been developed as an acceptability criterion, not as a quantification of the proximity to failure [5]. To achieve this, the assessment procedures for and are traditionally determined such that fracture tests on large scale components fail outside of the FAL [6,7]. Yet, the procedure is also adopted in probabilistic assessments, for determining structural reliability [8], the safety on flaw size [9], or the derivation of (partial) safety factors on fracture toughness and applied stress [10,11]. Such analyses require the quantification of the difference between the predicted load at which the assessment falls J. Maljaars  test assessment point to the FAL. Dijkstra [12] derived the MU using the assessment procedures in the guideline PD 6493:1980 [13], which is a predecessor of BS 7910 [1]. He expressed the FAL as a circle [8], see Fig. 2a: where is normally distributed with a mean of = 1.7 and a standard deviation of = 0.4. This model and MU distribution are included in the JCSS probabilistic model code [14]. Fig. 2(a) provides the mean and the 90% two-sided Confidence Interval (CI) of Dijkstra's FAL and the tests on which it is based.    [12] using PD 6493:1980 [13] (''modern'' FAL added for reference); (b) Muhammed [15] using PD 6493:1991 [16]. Whereas Dijkstra replaced the deterministic FAL by Eq. (1), Muhammed et al. [15] used the FAL of the subsequent version PD 6493:1991 [16] as a starting point, and formulated an additive MU, i.e. the probabilistic FAL is the deterministic FAL plus the MU. Based on 90 large scale (mainly wide plate) tests, they observed a larger bias and a larger scatter for plastic collapse (low ) and for brittle fracture (low ) as compared to the interaction region. In case all data are grouped, their MU is Weibull distributed with location, scale and shape parameter of −0.06, 0.97 and 1.11, respectively, see Fig. 2b.
Burdekin and Hamour [10] have established the partial safety factors of the SINTAP/FITNET [17] fracture assessment procedures. Involving the bias and the scatter of wide plate tests in their probabilistic procedure, they determined that the partial safety factor for the applied stress can be reduced by between 0.05 and 0.1, and that the partial safety factor on fracture toughness can be reduced by between 0.2 and 1.0, compared to the case where they took the FAL as deterministic. However, they note that these reductions apply only provided the service conditions of the assessment are similar to the conditions of the wide plate tests that were used to establish the factors. Different assumptions made for the distributions of the variables cause the partial safety factors for fracture toughness recommended in different guidelines to vary widely [18].
The procedures in BS 7910:2019 [1] have been developed over decades. The current version is based on predecessors back to the previously mentioned PD 6493 [13], see [19], and advantage has been taken of other standards and guidelines such as R6 [3], FITNET [17] and SINTAP [20] as well as studies that have been published in the meantime. The procedures have been updated and alternatives added, aiming at reducing excessive conservatism in the assessment. These modifications imply that the distributions of the MU in [10,12,18] as listed before are no longer applicable or are valid only for one of the alternative procedures included in the standard.
Major developments of more detailed procedures to determine accommodated by BS 7910 [1] are: parametric equations for many more types of flaw as compared to its predecessors, consideration of the out-of-plane constraint (plane stress or plane strain) for some types of flaw, and limit load solutions specific for Tubular Joints (TJ). A major development incorporated in the procedures for is the consideration of crack tip constraint, accounted for through the T-stress [21,22]. Various procedures have been developed to take the T-stress into account in the calculation of . Minami and co-authors [23,24] have developed a procedure which is based on the Beremin model [25], where the Weibull stress is determined with three-dimensional finite element models. This procedure is implemented in ISO 27306 [26], but not in BS 7910 [1]. The earlier work of Ainsworth and O'Dowd [27] is based on the same theory, but they applied the results of pre-computed plane strain finite element models to estimate the Weibull stress. Comparisons between these two methods are given in [24,28]. A third approximate procedure accounts for the crack tip constraint by a simple shift of the master curve, [29]. The latter two procedures are implemented in BS 7910 [1] and they will be elaborated below. Many authors have shown that these updated procedures can reduce excessive conservatism in the assessment procedure, e.g. [30,31]. However, Hadley and Horn [28] warn that adopting the crack tip constraint procedures may result in a non-conservative assessment by showing a failed WP test that was assessed inside the FAL. Pisarski [32] also emphasizes the risk of non-conservative assessments with the updated procedures in BS 7910 [1] and he recommends to use a lower bound (such as the 5% fraction) of the fracture toughness distribution in the assessment of safety critical structures instead of the usual three (accepted) fracture toughness tests. Not related to specific assessment procedures, Slatcher [33] indicates that ten fracture toughness tests are sufficient for a reasonably accurate estimate of the characteristic fracture toughness. However, the execution of many fracture tests on each steel batch used in a structure may be uneconomic for new structures and even impossible for existing structures.
This demonstrates the necessity of quantifying the MU for the updated procedures in BS 7910 [1], not only for probabilistic assessments but also to define the FAL at a certain confidence level, i.e. the acceptability criterion of the standard. This work is dedicated to determining the MU distribution based on 86 large scale tests carried out in our institute's laboratories. Emphasis is on the influence of the consideration of the crack tip constraint on the MU of the FAL. The idea behind this database selection, in addition to the full details being available for these tests, is that these tests have not been used to develop the procedures. They therefore form a good basis for validation purposes.

Assessment procedure
This section describes the procedures used to assess the large scale tests. It generally follows the procedures in BS 7910 [1], but with some modifications, which are explicitly mentioned below.

Tensile test properties
Some of the tensile tests were carried out at room temperature. The following equations are used to obtain the yield stress and tensile strength at WP test temperature from the room temperature values and , respectively: In agreement with [1], Young's modulus is assumed to increase from = 205 GPa at = 25 • C to = 210 GPa at = −75 • C and Poisson's ratio is taken as = 0.3 independent of temperature.

Fracture ratio
Standard single edge notch bending specimens were employed for obtaining the fracture toughness. The critical J-integral is available for some specimen only, whereas the Crack Tip Opening Displacement (CTOD), , is available for all specimens. The fracture toughness is therefore estimated from for all tests: These equations are derived from deeply notched CTOD specimens. The current test database contains CTOD specimens with notch depths of = 0.3 or = 0.5 , where is the notch depth and is the specimen height. To evaluate the applicability of Eq. (5), is compared to the fracture toughness following from the J-integral from the specimens for which both values are available. The ratio between and the fracture toughness following from the J-integral was on average 1.01 and 0.99 for the specimens with = 0.5 and = 0.3 , respectively, with both groups containing 21 tests. The standard deviations of the ratio are 0.06 and 0.09, respectively. These standard deviations are not larger than the test database on which Eq. (5) is based [34]. The equation is therefore considered applicable for the test database used here. As a comparison, the value = 1.5 as often used in earlier times, gives average ratios of 0.94 and 0.92, respectively. BS 7910 [1] provides two procedures to account for the crack tip constraint in estimating the fracture toughness. Both procedures are applied here. The first procedure is developed by Ainsworth and O'Dowd [27]: where and are material parameters, and and are the T-stress of the WP and of the CTOD specimen, respectively. Note that the T-stress in deep notched CTOD specimens is usually neglected, implying that the denominator in Eq. (6) is taken as 1, but the correction is considered necessary here because of the CTOD tests with = 0.3 . The T-stress for WP, TJ and CTOD specimens is determined using the parametric equations in Annex N of BS 7910 [1]. It requires the reference stress as input, which is given in Section 2.3 of this paper. BS 7910 [1] allows evaluation of the T-stress from the combination of the applied (primary) stress and the residual (secondary) stress if it is obtained from the finite element method but, using the option of the parametric equations in Annex N, it only considers the effect of the T-stress caused by primary loading. The implications of this will be shown later. Parameters and in Eq. (6) are taken from parametric equations based on finite element analyses by Seal and Sherry [35]. Their 3 isostress contour and Rice and Tracey [36] contour solutions are used for cleavage fracture and ductile fracture (void growth), respectively.
The second procedure applies a T-stress based shift of the master curve: This equation is based on Wallin [29], but both Wallin and BS 7910 [1] use a location parameter of 20 MPa √ m instead of 30 MPa √ m in Eq. (7). However, 20 MPa √ m is inconsistent with the relationship between temperature and fracture toughness adopted elsewhere in BS 7910 (e.g. [1], Eqs. K.17 and L.13) and the studies on which this relationship is based (e.g. [37][38][39]). Therefore the location parameter is adopted as 30 MPa √ m in this work. The Beremin-based crack tip constraint correction and the Master curve apply to lower shelf (cleavage) and transition (from cleavage to ductile) behaviour. For this reason, the T-stress corrections according to Eqs. (6) or (7) are not applied here if the CTOD test result was reported as , i.e. if it failed after reaching the maximum force.
Following the weakest link concept for cleavage fracture, the constraint-corrected fracture toughness is further corrected for the length of the crack front in case the CTOD test was not reported as : where is the length of the crack front over which the stress intensity factor is approximately equal to the maximum value. It is equal to 2 for centre cracked specimen or double edge notch specimen and it is approximated as 2 for semi-elliptical cracks [1]. The crack front length is maximized here, associated with the plane strain fracture toughness : Eqs. (8)-(10) need to be solved iteratively. Note that BS 7910 applies the crack front length correction of Eq. (8) without considering a maximum.
The Mode I stress intensity factor is determined as: where is the stress intensity factor due to the residual stress evaluated from a pre-determined peak value. Alternatively, the self-balancing component of the residual stress field can be employed for surface breaking flaws [1], but the differences between these two alternatives are negligible for the database used here. The product between the geometric correction factor and the applied stress is: where are component correction factors, subscripts and refer to membrane and bending loading, subscript refers to residual stress (values given later), and subscript refers to the geometric influence of weld detail. Parameter accounts for the finite width of a specimen, which is obtained from the parametric equations in BS 7910 [1]. The same applies to the correction factors and , except for full thickness cracks at the edge of a hole, for which [1] does not provide an equation. Various equations for are given in the literature [40][41][42][43][44], with insignificant differences for the crack length over hole diameter ratio that are applied in the tests in Section 3. Bowie's solution [40] is applied in this paper: where is the diameter of the hole and is the specimen width. Factors are taken from the three-dimensional solution in BS 7910 [1], following [45], but they are adjusted for the weld flank angle by multiplying this solution by a factor [46]: where is 180 minus the weld flank angle in degrees. The fracture ratio follows from: where accounts for plasticity interaction effects between applied load and residual stress. It is taken from the so-called ''simplified procedure'' in BS 7910 [1].

Plasticity ratio
The plasticity ratio, , of the WP tests is based on the reference stress estimate: where is the reference stress in the wide plate tests, taken from the parametric equations in Annex P of BS 7910 [1]. The equations for the T-stress in the standard also make use of the reference stress, but they are based on different parametric equations for some WP geometries. The reference stress parametric equations in Annex N of [1] are used in those cases. The reference stress in CTOD specimens , necessary to determine in Eq. (7), requires the force at failure of these specimens. These forces are not available for all CTOD tests and they are therefore determined from a fit of the force versus data of those CTOD specimens for which this data is available. Using the reference stress equations for single edge notched bend specimens in [1], the fit results in the following relationship between and : The limit load solution is used to estimate the plasticity ratio of the TJ tests with chord failure [47]: where , , and , are the applied axial load, in-plane bending moment and out-of-plane bending moment, respectively, and , , and , are their respective plastic resistance counterparts with account of the flaw influence. A maximum plasticity ratio , applies to both WP and TJ tests: The flaw is unacceptable if either the combination ( , ) is outside the FAL, or if is larger than , . BS 7910 [1] provides three options for the description of the FAL. Option 1 is applied here because data required for the other two options are lacking.

Wide Plate (WP) tests
The test database contains 82 WP tensile tests. Some of the specimens consist of base metal only and others contain welds with the crack in the centre of the weld or in the Heat Affected Zone (HAZ). The WP database contains centre cracked specimens in tension (CCT), specimens containing a hole (diameter 200 mm) with two cracks emanating from that hole (HCCT), surface cracked specimens in tension (SCT), surface cracks in plates with welded cover plates around the weld (CSCT), cruciform joints with curved plates with surface cracks at the weld toe (CJSCT), curved plates with surface cracks in tension (CPSCT), extended surface cracked specimens in tension (ESCT), and double edge notched specimens in tension (DENT) with a crack in the HAZ very close to the fusion line, see Fig. 3(a). The outer radius of the curved plates of the CPSCT and CJSCT specimens was 356 mm, except for specimen number 2896.1, which had an outer radius of 305 mm. Fig. 3(b) gives the cross-section of each specimen with a close-up of the crack and weld location (if any). All WP specimens except for the CJSCT specimens were notched with electro discharge machining and subsequently fatigue loaded to sharpen the crack tip before the fracture test was undertaken. The CJSCT specimen were fatigue J. Maljaars et al. tested from their as-welded state up to fracture. The crack dimensions at the onset of the fracture test -as the length of full thickness edge cracks, 2 as the length of full thickness centre cracks, as the depth of surface cracks and 2 as the length of surface cracks -were determined from the fracture surface after each test. The tests are reported in [48][49][50][51][52][53][54] and the relevant data are summarized below. Table 1 gives the data of all specimens, where is the load at failure and 'Batch' refers to the material batch from which the specimens were composed, see Table 2. Some specimens exhibited plastic deformation upon reaching . The strain at failure is not reported. Table 2 also provides the temperatures at which the WP and CTOD tests were carried out. The standard tensile tests on base or weld metal of some of the batches were carried out at room temperature, and Table 2 gives the adjusted values at test temperature using Eqs. (2) and (3). Between one and eight CTOD tests were carried out per batch, and Table 2 gives the CTOD values ( ) for each batch. The CTOD specimens were locally compressed according to the standard BS 7448-2 applicable at the time of execution (superseded by ISO 15653 [55]). The loading rates of the CTOD and WP specimens were approximately equal. Care has been taken that the direction of the notch relative to the rolling direction in the CTOD tests matches the crack growth direction in the WP tests. Separate sets of CTOD tests were carried out for the depth and the surface directions for the specimens with a surface crack. In all surface cracked specimens, in depth direction appeared higher than that of the surface direction and therefore Table 2 gives the data for the depth direction.
All WP specimens except for the DENT specimens originated from steel grade Fe510 (various qualities) with a nominal yield stress of 355 MPa. Steel grade FeE550 with a nominal yield stress of 550 MPa was used for the DENT specimens of batch 25-32. The steel grades of the DENT specimens of batches 33-35 are not reported. Failure of all listed WP tests was characterized as unstable.

Tubular Joint (TJ) tests
Four TJ tests are available in addition to the WP tests, with two configurations (denoted as (a) and (b)) and main dimensions according to Fig. 4. Each of these configurations was tested twice, see the final rows of Table 1. The specimens were pre-fatigued from the as-welded state (i.e. without applying an artificial notch) prior to the fracture test. The TJ specimens were from steel grade Fe510 with a nominal yield stress of 355 MPa.
The failure assessment of a flawed TJ should be based on the hot-spot stress [58]. Linear elastic finite element models are made to determine the Stress Concentration Factors (SCF) and the ratio between membrane and bending stress at the crack location of the two types of TJ specimen [56,57]. The specimens are modelled in ANSYS with linear solid elements with full integration of type SOLID45. The region of interest is modelled with 5 mm cube elements (7 to 9 elements over the chord wall thickness ). The weld profile is modelled with penetration values according to the construction drawings. Linear extrapolation to the weld toe from the surface points of 1.0 and 0.4 away from the weld toe [59] is applied to determine the hot-spot stress at the locations 1 to 3 or 1 to 5, as indicated in the right pictures of Fig. 4(a) and (b), respectively. The SCF are determined as the ratio between this hot-spot stress and the nominal stress, the latter determined from the forces and full cross-sectional area. Linearization of the stress over   the wall thickness at the location of the weld toe (giving the same normal force and bending moment per unit length as the actual stress) is applied to determine the ratio between the membrane and bending stress. Table 3 provides the SCF and the membrane to bending stress ratio resulting from the simulations. The hot-spot stress and the tensile to bending stress ratio follow directly from the table for the deepest point of the crack at the hot spot. Quadratic interpolation using the tabulated values of the SCF and the tensile to bending stress ratio is applied for the surface point of the cracks.

Choices made in the assessment and application field
A note in BS 7910 [1] informs that the distance between weld toes, , in evaluating may be reduced to 0.5 if the assessment is based on the hot-spot stress. This is adopted in the assessment of the TJ tests. Cracks were inserted in the centre of the weld in the CCT, HCCT and CSCT specimens (Fig. 3(b)) and the weld was ground flush in the DENT and the CPSCT specimens. Factor is therefore taken as unity for these geometries. A linear elastic finite element model consisting of solid elements is made to determine the SCF in the CSCT specimens. Even though the purpose of the cover plates was to generate a stress concentration -with an SCF (d) WP and CTOD tests are from one delivery with the same composition and texture, but possibly from different batches. Table 3 SCF and ratio membrane to bending stress in the TJ specimens.

Specimen type
Stress location SCF ∕  are taken as 0 to reflect local compression of these specimens.
BS 7910 [1] does not give parametric equations for the T-stress of cracks emanating from a hole. A three-dimensional finite element model consisting of solid elements was therefore created of the HCCT specimens. Based on the outcome, the T-stress is estimated as −0.65 times the reference stress. This is slightly less negative than the T-stress of a central crack with crack length 2 + . The equation parameters accounting for crack tip constraint or T-stress in BS 7910 [1] are not compatible with the equations that consider strength mismatch between base and weld metal [60,61]. Strength mismatch assessment equations have therefore not been used in the current work. Instead, is determined with the material properties of the metal containing the crack (weld metal or base metal), whereas is determined with the minimum of the yield stress of the weld metal and that of the base metal. This selection considers the local and global nature of the fracture and plasticity-induced failure modes, respectively.
For surface cracked WP specimens, BS 7910 [1] provides reference stress solutions for hinge-supported specimens and for specimens with normal restraint against out-of-plane bending. The solutions for normal restraint are used here, reflecting the restraint against out-of-plane bending exerted by the clamps in the test set-up.
The residual stress depends on the type of weld and the weld procedure. BS 7910 [1] provides guidance on the residual stress to be assumed in the assessment. Table 4 gives the membrane and bending portions and the maximum stress intensity factor related to the selfbalancing component of the residual stress, , , , and , , respectively, as assumed in the assessment for the different types of specimen. HCCT, CSCT specimens containing welds, TJ(b) specimens, and CCT specimens 9, 10, and 15-18 were post weld heat treated and a reduced residual stress then applies. BS 7910 [1] does not give guidance for the residual stress in full through thickness cracks that are locally compressed prior to testing, as in case of DENT specimens and CCT specimens 11-14. The membrane residual stress is assumed equal to that of a butt joint according to [1], whereas the bending residual stress and the self-balancing stress are assumed zero for these specimens. This estimate is selected as an intermediate value between the measured residual stresses after warm prestressing in [62] and the residual stress of through thickness flaws without treatment in [1].
The database consists of (predominantly) uniaxially loaded specimens. Pressure vessels are not part of the database, and data in [7] suggest that particularly the bias of is larger for pressured cylinders as compared to wide plates with the BS 7910 [1] assessment procedure. Similarly, WP tests subject to biaxial loading in [62] show a larger scatter as compared to uniaxially loaded tests. Hence, the results in this paper are limited to uniaxial in-plane loaded steel components.

Probabilistic model
A radial distance between the FAL and the failure point ( , ) for each test is defined as: where , and , are the radial distances of the FAL and of the test failure point, respectively, from the origin. The polar angle for , is taken equal to that of the failure point, see Fig. 5 for an example. Two main probabilistic models are considered with two variants for each. An additive MU, , independent of the polar angle is considered in probabilistic Model 1, where the probabilistic FAL is equal to the deterministic FAL+ − 1. Two distributions are considered for , namely, a normal distribution (Eq. (23)) and a two-parameter lognormal distribution, (Eq. (24)).
where is the mean and is the standard deviation of ,  is the normal distribution, and superscript refers to independent and identically distributed random variables. The probabilistic formulation is defined such, that the minimum possible value of the probabilistic FAL in case of the lognormal distribution is the deterministic FAL minus 1. The shift of −1 is an approximation that reflects the physics-based fact that and cannot be negative. Probabilistic Model 2 is similar to Model 1, but the distribution parameters of are polar angle dependent: The frequentist paradigm is used to interpret and to estimate the model parameters ( and in the first model, , , and in the second model) [63]: • The point estimates of the model parameters are obtained by maximizing the likelihood function.
• The standard errors due to sampling variability are estimated using the delta method [64].
The bounds of the 90% CI of the probabilistic FAL -i.e. 5% and 95% one-sided confidence bounds -are subsequently determined from these parameters. The Akaike Information Criterion (AIC) [65] is used to compare the performance of the models with respect to the goodness of model fit observations while penalizing model complexity. Following [66], a difference between two models is considered as significant if the absolute difference in AIC exceeds 10.

All tests combined
This section considers all available WP (or TJ) tests and associated CTOD tests as mutually independent realizations. In each realization, is determined from an individual CTOD test and and are determined from the corresponding WP (or TJ) test. The MU is determined from all of these realizations without considering dependence. Table 5 gives the resulting MU parameters. The first three sets of parameters apply to different treatment of the T-stress namely, no consideration of T-stress, Ainsworth's model (Eq. (6)) and Wallin's shift of the master curve (Eq. (7)). The last set with footnote (a) will be introduced later. The following differences between the options and models apply for the available data: • Based on the differences, the shifted lognormal distribution gives a consistently and significantly better fit of the data than the normal distribution.
• Based on the differences, the probabilistic Model 2 -where the MU is taken as dependent on the ratio between and -performs consistently and significantly better than Model 1 -where the MU distribution is independent of the assessment points.
• The sets that consider the T-stress have lower standard deviations than the set that ignores the T-stress. In addition, the values are closer to 1 for the sets that consider the T-stress. This demonstrates that a more accurate description of reality is obtained by considering the T-stress. For the available data and based on the same indicators, it appears that considering the T-stress through Wallin's shift of the master curve performs similar as the model of Ainsworth. Note that these curves are given for demonstration purposes only; each material batch has its own FAL and is determined with that FAL. The curves in subfigures (b) provide the mode and the 90% CI of the lognormal distributed MU of probabilistic Model 1. Comparing the figures, it is obvious that the scatter of data and the related 90% CI reduces substantially if the T-stress is taken into account in . Fig. 9(a) repeats the data of Fig. 8, but it distinguishes the material zones containing the crack. The data with a notch in the HAZ or in the weld metal have a slightly larger bias and scatter as compared to the data with a crack in the base metal. The larger scatter may be due to the variation of residual stress levels and variation of the microstructure of the HAZ and weld specimens. The larger bias may be related to the consideration of the T-stress in BS 7910 [1]. The standard allows evaluation of the T-stress from primary loading and residual stress but, using the option of the parametric equations in Annex N, it only considers the effect   of the T-stress caused by primary loading. Yamashita and Minami [31,67] have shown that the residual stress also contributes to the T-stress. Their WP tests were assessed closer to the FAL in case the residual stress is considered in calculating the T-stress. Fig. 9(b) distinguishes the type of specimen. This appears to have a significant influence on the MU distribution. The surface cracked (SCT, CPSCT, CJSCT and TJ) specimens show a larger bias and a larger scatter as compared to the other types of specimen. This may be (partially) related to the derivation of the equations for the reference stress of surface cracked specimens in BS 7910 [1], in which the surface crack is modelled with a square envelope and where a uni-axial stress state and yield locus are assumed, [68]. Miura and Takahashi [69] collected five possible alternatives from other literature for the reference stress of surface cracked specimens. All alternatives are considered here, and the alternative in [70] gives the best fit in terms of agreement of the MU distribution with the non-surface cracked specimens. The corresponding reference stress follows from: The last set of data in Table 5 as well as Fig. 10 present the results considering: • Eq. (27) for the reference stress of surface cracks.  • The T-stress as composed from the contribution of the primary stress and the residual stress and accounting for it using the shift of the master curve.
The values being lower and the values being closer to 1 show that this gives a more accurate prediction of reality than the other sets.

Minimum and average of three equivalent (MOTE, AOTE)
The evaluation of the previous section is useful to select the optimal combination of models. In most practical assessments, however, a limited number of K JC or CTOD tests are carried out and the Minimum Of Three Equivalent (MOTE) test results is selected to determine . The term ''equivalent'' refers to the need that the individual tests give reasonably close fracture toughness values. BS 7910 [1] recommends to consider the fracture toughness as the lower 20th percentile of more than three fracture toughness tests if the crack tip constraint is considered in the assessment. However, this requires prior knowledge on the type of distribution of the fracture toughness, because the number of tests per batch in Table 2 appears too small to select a distribution type based on the data (using the Kolmogorov-Smirnov test or the chi-squared test). The equation for the lower 20th percentile given by the standard results in a fracture toughness close to zero or even negative for a number of specimens when assuming a normal distribution. Deriving the MU from such a database is not meaningful and it is therefore not applied here. To establish the MU from as many J. Maljaars et al.  available data as possible, using consistency in the number of fracture toughness tests, and to be of use also in practical cases with a limited number of fracture toughness tests, the MU is evaluated here using three CTOD tests per specimen, by a procedure outlined below, even if more CTOD tests are available. BS 7910 [1] specifies that an individual test value should neither be smaller than 70% nor larger than 140% of the average of three to be ''equivalent''. Otherwise, more tests need to be carried out. This practice is applied to the WP and TJ test database as follows: • Tests from batches with less than three CTOD data are ignored.
• Sampling without replacement is used of CTOD data with more than three specimens per batch. The sample of three is accepted if it satisfies the ''equivalent'' criterion. • Re-sampling is applied if the sample of three is not accepted.
• In case the ''equivalent'' criterion is not met even after 100 times re-sampling, the three values closest to the mean of the batch are taken as the sample.
The minimum per sample of three equivalent is used to determine in the assessment for MOTE. In addition, the Average Of Three Equivalent (AOTE) tests is selected, which is defined as the arithmetical mean of the fracture toughness values of three equivalent tests.
To account for sampling uncertainty in the above described procedure, a total of 100 databases are generated using the bootstrap method (i.e. with re-sampled specimens), where each database contains all WP and TJ tests and three sampled ''equivalent'' CTOD J. Maljaars et al.   values per test. The MU parameters are evaluated from these 100 databases for the MOTE and the AOTE, assuming a lognormal distribution for + 1. Table 6 gives the resulting MU parameters: for each parameter the mean of the 100 estimates. Obviously, the scatter of the samples and the 90% CI are smaller for AOTE in comparison to MOTE, due to the greater confidence in the mean as compared to a lower bound based on three CTOD tests. The L20P appears to give the largest scatter of the three alternatives for .
Based on a limited number of WP tests, Hadley [28] suggests that the FAL can give unconservative results for an assessment using MOTE and considering the T-stress influence in the assessment. Pisarski [32] and Hadley and Pisarski [34] indicate that, if accounting for the T-stress, a more rigorous definition for the lower bound fracture toughness may be required instead of MOTE to determine . This is confirmed with the current analysis, see Fig. 11, which shows the resulting MU distribution and the data of one of the 100 database realizations as an example. A non-negligible number of tests are within the FAL. The figure also shows that the confidence interval is largest for L20P (subfigure (c)) and smallest for AOTE (subfigure (d)), with MOTE (subfigure (b)) in between these two options.
The distribution parameters of Table 6 allow the estimation of the fraction of test data that exceeds the FAL. Using probabilistic Model 1, the FAL corresponds to the 8% lower confidence bound if is based on MOTE and the T-stress is not considered. This fraction is 25% or 17% for the T-stress considered through Eq. (6) or Eq. (7), respectively. A division factor to the FAL can be introduced with a value such that the assessment corresponds to a certain, desired lower confidence bound. Fig. 12 gives the relationship between the confidence bound and the safety factor and, as an example, Table 7 gives the safety factors required to achieve a 5% lower confidence bound. The safety factor can be implemented in practical assessments either by dividing the FAL with the factor, or by vectorial multiplication of the ( , ) coordinate with it. The first option corresponds to the lower confidence bound displayed in Fig. 11.
The Appendix of this paper evaluates the MU using all available CTOD tests instead of the three used in this section. The safety factors appear relatively close to the ones given in Table 7.

Remarks about the MU estimate
The MU established in the previous section contains the uncertainty of the fracture toughness as based on a limited number of CTOD tests. Strictly, the distribution derived is hence not only a MU but it is a combination of the true uncertainty of the assessment procedure and the uncertainty of the true fracture toughness. This is underlined by the fact that the scatter in all models is lowest for low values of ∕(0.5 ), where the plasticity (which is typically more deterministic) is dominant. The coefficient of variation for J. Maljaars et al.  a base metal batch estimated from 106 tests was 0.41 in [32] and it was 0.43 for a batch estimated from 20 tests in [71]. Both sets comprise of steel specimens and contain a mix of ductile and brittle data. In order to estimate the contribution of the uncertainty in , a lognormal distribution is considered for as in [34], with a coefficient of variation of 0.41. Sets of three samples are randomly selected from that distribution and the MOTE and AOTE are determined for each set for which the criterion of equivalence is met. This is repeated until 500 MOTE and AOTE values are obtained. The coefficient of variation of the 500 MOTE values is 0.27 and that of the AOTE values is 0.25. Comparing these values with the values of in Table 6, it appears that a large fraction of the scatter can be explained by the uncertainty of . Possible causes for the remaining scatter are: variations in the residual stress in the CTOD and the WP (and TJ) tests, the complexity of geometries of the WP tests versus the nominal geometry applied in the model (such as the assumption of semi-elliptical crack shapes for surface cracks), and the multitude of material zones in the tests versus the assumption of homogeneous material for the assessment of the T-stress influence and for the effect of the crack front length, , on the fracture toughness (Eq. (9)).
The bias and scatter in are attributed to the complexity of some of the geometries and to possible influence of the T-stress on the Von Mises or Tresca yield load, whereas most parametric equations for are based on a uniaxial stress state. As a demonstration, the figures indicate that both bias and scatter in are smaller for the DENT specimens as compared to other specimen types. Indeed, the equations for of the DENT specimens are detailed in that these consider possible weld mismatch and plane strain versus plane stress state, [72]. Moreover this specimen type is not subject to out-of-plane bending. The bias in is smaller than the bias determined in older studies, such as those of Fig. 2. This is partially due to the availability of reference stress solutions for more types of geometry in the current version of BS 7910 [1], see the comparison in [7]. The additional difference could be related to the assumption of restraint of the specimen. Normal restraint against out-of-plane bending is assumed in the current paper, whereas it is possible that a more conservative pin restraint was assumed in other studies.
Considering the T-stress influence through the master curve shift, Eq. (7) gives a comparable model performance as Ainsworth's model, Eq. (6). This may be a consequence of the database, where most large scale tests failed around = 1. This was also the starting point of the derivation of the master curve shift in [29]. Indeed, the master curve shift model performs worse for the test data with large polar angles. Tronskar et al. [30] show that the accuracy of Wallin's original T-stress consideration (i.e. Eq. (7) with 20 instead of 31 MPa √ m) depends on the steel grade. Hadley and Pisarski [34] suggest that the FAL with T-stress consideration according to Eq. (7) could still be used if a lower bound value different than MOTE is used for . This approach has two drawbacks. First, it would require many more than three K JC or CTOD tests, and second, as more accurate estimates of become available and hence the level of conservatism in reduces (but some scatter remains), some of the test data with small polar angle will fall inside the FAL, as is evident from Fig. 10. A very low value of is then required to get these data outside the FAL. Instead, the authors of the current paper consider it advantageous to apply (or enlarge) a safety factor on the FAL, as given in Fig. 12 or Table 7. Because of the higher confidence of the mean than of a lower bound value in case of a limited number of test data, it may then also be better to base the assessment and the safety factor on the AOTE instead of the MOTE.
The data in subfigures (a) of Figs. 6-8 and 10 show a distinct non-linear dependency on the polar angle that is in conflict with the assumed statistical model. As an approximation, the authors accept this conflict and cover this seemingly non-random pattern with a single random variable (the MU), however, this should be analysed in further studies. The difference between the figures implies that a more accurate base model ( [1] and T-stress effect in this study) could largely eliminate the pattern and in turn could make the residuals more resemble a random sample from a (log)normal distribution.

Application Example 1: use of MOTE and AOTE
Consider a plate in an existing structure with a stress relieved butt weld and with dimensions = 25 mm, = 1000 mm, distance between weld toes = 40 mm and weld toe angle = 150 degrees. A weld toe flaw is present with ∕ = 0.5. The plate is of steel grade S355, for which [73] provides the following distributions of the material properties at room temperature: In addition, the yield stress distribution is truncated at its lower tail at 355 MPa in order to reflect the acceptance tests carried out by steel manufacturers for this steel grade. The design temperature is −30 • C and the (50-year maximum) applied tensile stress is Gumbel distributed with an expectation of 200 MPa and a coefficient of variation of 0.1: where ( ) is the cumulative distribution function of the applied stress and 1 and 2 are the distribution parameters with values of 0.064 and 191 MPa, respectively. The residual stress after stress relief is assumed as , = 0.   The distributions of , and and the procedures of Section 2 are used in determining , . The T-stress is considered through Eq. (7) and the corresponding distribution parameters and according to Table 6 (i.e. no dependency is considered on ) are used for (note that and are different for MOTE and AOTE). ISO 13822 [74] recommends a minimum reliability index of = 3.8 with a reference period of 50 years for an existing structure with medium consequences of failure. This implies that the tolerable probability of not obtaining the limit state is: where is the cumulative standard normal distribution. The first order reliability index is employed to estimate the crack depth that just satisfies the requirement. It is = 0.22 for MOTE and = 0.30 for AOTE. The assessment is repeated 100 times, each time sampling three equivalent values from the distribution. AOTE resulted in a larger tolerable flaw than MOTE in all repetitions. The average critical crack depth of the 100 repetitions is = 0.22 for MOTE and = 0.29 for AOTE. This example demonstrates the added value of basing the assessment on AOTE instead of MOTE.

Application Example 2: flaw in an existing bridge
Many mild steel bridges built before the Second World War are still in use to date. Charpy impact or other fracture toughness tests were not carried out in the construction industry at the time of construction. In order to obtain material data of such old bridges, samples from approximately 50 European bridges built between 1870 and 1938 were collected in a European research and tensile tests and K JC tests were carried out on the samples, [75]. Table 8 provides the distributions of the material properties gained from that research at a temperature of −30 • C.
Fatigue cracks can initiate from rivet holes in these structures. After reaching a certain size, the stress intensity factor of such cracks is similar to that of a crack emanating from a hole without a rivet, if the joint contains multiple rivets. A two-sided throughthickness flaw is assumed in such a joint in an old mild steel bridge. It is inspected with magnetic particle inspection. Based on the information in [76], the probability of detection as a function of the flaw size is considered here as Weibull distributed with a location parameter of 0.25 mm, a shape parameter of 0.7 and a scale parameter with a point estimate of 1.2 mm and a standard error of 0.7 mm. Herein, is the detected crack size outside of the rivet head. According to information in [77], the distance between the edge of the rivet head and the edge of the rivet hole (i.e. rivet head radius minus rivet hole radius) of the often applied rivet with shaft diameter of 24 mm is approximately = 9 mm. The probability is determined of the crack being detected before failure of the joint. The limit state function is the same as in Example 1, Eq. (30), but , is evaluated for a crack with size = + and is taken from the MU of the FAL using all specimens, i.e. Table 5, using the set of Eq. (7), lognormal distribution for , and probabilistic Model 1. The (50-year maximum) applied stress is Gumbel distributed with a coefficient of variation of 0.1 and a mean that is varied. The first order reliability method is applied to solve the reliability problem as a function of the mean of the maximum applied tensile stress . Results are presented with the solid curve in Fig. 13. The dashed horizontal line represents the required reliability as in Example 1. The tolerable mean of the distribution of the applied tensile stress distribution joint is = 140 MPa (see Fig. 13).

Conclusions
This paper estimates the MU of the FAL by comparing the assessment according to the British Standard BS 7910:2019 with the actual failure load of 82 WP tensile tests and 4 TJ tests. The following conclusions are drawn: 1. The bias and the scatter of the MU reduce substantially if the crack tip constraint is taken into account. 2. Irrespective of whether or not the T-stress is considered -and how -a non-negligible number of WP specimens has a lower failure load than the one predicted by the FAL, if based on three fracture toughness tests. Depending on the selected assessment procedure, the FAL coincides with the 13% to 25% lower confidence bound if is based on MOTE and the T-stress is considered in the assessment. 3. Instead of using MOTE, a better agreement with the tests and a less conservative assessment results if AOTE is taken as a basis for the assessment, together with an associated error distribution of the FAL (which is obviously different for AOTE compared to MOTE). 4. Instead of using more than three equivalent fracture tests to determine a more rigorous value of the lower bound fracture toughness, as proposed by others if the T-stress is considered, it is advantageous to apply (or enlarge) a safety factor on the FAL. See Table 7 for the required safety factors in case of a 5% lower confidence bound. 5. The AIC shows that the MU can be described better with a shifted lognormal distribution as compared to a normal distribution. 6. The assessment procedure in BS 7910 can be (further) improved by considering the residual stress field in estimating the T-stress from the compendium in Annex N and, for surface flaws, by providing a more accurate description of the reference stress.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.