Uncertainty Evaluation of Weibull Estimators through Monte Carlo Simulation: Applications for Crack Initiation Testing

The typical experimental procedure for testing stress corrosion cracking initiation involves an interval-censored reliability test. Based on these test results, the parameters of a Weibull distribution, which is a widely accepted crack initiation model, can be estimated using maximum likelihood estimation or median rank regression. However, it is difficult to determine the appropriate number of test specimens and censoring intervals required to obtain sufficiently accurate Weibull estimators. In this study, we compare maximum likelihood estimation and median rank regression using a Monte Carlo simulation to examine the effects of the total number of specimens, test duration, censoring interval, and shape parameters of the true Weibull distribution on the estimator uncertainty. Finally, we provide the quantitative uncertainties of both Weibull estimators, compare them with the true Weibull parameters, and suggest proper experimental conditions for developing a probabilistic crack initiation model through crack initiation tests.


Introduction
Stress corrosion cracking (SCC) is one of the main material-related issues that occur in the operation of nuclear reactors [1][2][3]. Particularly, in pressurized water reactors, the occurrence of SCC at a reactor's pressure boundary can cause a loss-of-coolant accident. Therefore, many researchers have endeavored to predict SCC initiation time for a given component. However, accurately predicting SCC initiation is difficult because the mechanism is quite complex and not yet clearly understood; instead, empirical SCC initiation models are generally considered [4][5][6].
However, most SCC experiments show significant variation in cracking time, even though all specimens are tested in the same experimental conditions (e.g., temperature and stress level). Therefore, the Weibull distribution [7], which considers the effect of time-dependent material degradation, is widely accepted as a probabilistic model for SCC initiation time [6,8,9]. Probabilistic models cannot offer an exact cracking time but can offer a cracking probability as a function of time for a given set of conditions. In this case, SCC initiation testing is required to determine the cracking probability function (i.e., the unreliability function).
The typical experimental procedure of an SCC initiation test involves an interval-censored reliability test. That is, stressed specimens are exposed to a corrosive environment and censored at scheduled periods. The results of these tests can be used to estimate the parameters of a Weibull distribution, using either maximum likelihood estimation (MLE) or median rank regression (MRR) [10].
Both estimation methods for Weibull parameters are anticipated to be more accurate with more test specimens and narrower censoring intervals. However, we do not yet know the optimal number of test specimens and censoring intervals required to estimate a sufficiently accurate Weibull distribution.
In this study, we use Monte Carlo simulation to compare MLE and MRR estimators and quantify the effects of specimen number, test duration, and censoring interval on the uncertainty of the estimated Weibull parameters.

Weibull Distribution
The cumulative distribution function (CDF) of a two-parameter Weibull distribution is frequently used as a cracking probability function, and is given by [10]: where t is time, β is the shape parameter, and η is the scale parameter of the Weibull distribution.
If β < 1, the cracking rate, or hazard function, decreases with time. If β ą 1, the cracking rate increases monotonically. This indicates time-dependent material degradation, or aging effects. If β " 1, the Weibull distribution becomes equivalent to an exponential distribution and the cracking rate is not influenced by time. The scale parameter η is also called characteristic time, which is the quantile at which the CDF of the Weibull distribution reaches approximately 0.632.

Median Rank Regression
MRR is a method that can derive a cracking probability function from the result of a crack initiation test. It is reasonable to assume that all specimens are tested independently; that is, the status of one specimen does not affect the cracking probability of the other specimens.
Let N be the total number of specimens and j be the number of cracked specimens. Then, the distribution of j at a certain time follows a binomial distribution. The CDF of the binomial distribution can be expressed as follows [11]: CDF Bin pj; N, F ptqq " j ř i"0˜N i¸r F ptqs i r1´F ptqs N´i " pN´jq˜N j¸ş 1´Fptq 0 t N´j´1 p1´tq j dt " I 1´Fptq pN´j, j`1q (2) where F ptq is the cracking probability function and I is the regularized incomplete beta function. When CDF Bin is set to 0.5, the value of F ptq at a certain time can be calculated, and is called the median rank. If the total number of specimens is very large, the value of the median rank is close to the cracked fraction j{N.
Benard and Bos-Levenbach [12] suggested a simple approximation for non-statisticians to easily calculate the median rank: where F Med ptq is the cracking probability function calculated using median rank. Figure 1 shows that the exact median rank values are very close to their approximations. Therefore, in this study, we use the approximation, defined in Equation (3), to improve calculation speed. If the test is not censored continuously (i.e., if it is an interval-censored test), the resulting F ( ) must be treated as a set of unreliability points and not as a function. With this median rank point set, it is possible to estimate the Weibull distribution, which is the model of SCC initiation, through regression [10,13]. Figure 2 is an example of Weibull estimation using MRR that uses the test data in Table 1. The red dots show the median rank points, F ( ) is the estimated Weibull CDF by regression with the median rank points, and and ̂ are the Weibull shape and scale parameters, respectively, estimated by MRR. A widely used Weibull regression technique employs the linearization of the Weibull distribution, which is as follows: If the test is not censored continuously (i.e., if it is an interval-censored test), the resulting F Med ptq must be treated as a set of unreliability points and not as a function. With this median rank point set, it is possible to estimate the Weibull distribution, which is the model of SCC initiation, through regression [10,13]. Figure 2 is an example of Weibull estimation using MRR that uses the test data in Table 1. The red dots show the median rank points, F MRR ptq is the estimated Weibull CDF by regression with the median rank points, andβ MRR andη MRR are the Weibull shape and scale parameters, respectively, estimated by MRR.  If the test is not censored continuously (i.e., if it is an interval-censored test), the resulting F ( ) must be treated as a set of unreliability points and not as a function. With this median rank point set, it is possible to estimate the Weibull distribution, which is the model of SCC initiation, through regression [10,13]. Figure 2 is an example of Weibull estimation using MRR that uses the test data in Table 1. The red dots show the median rank points, F ( ) is the estimated Weibull CDF by regression with the median rank points, and and ̂ are the Weibull shape and scale parameters, respectively, estimated by MRR. However, this Weibull estimation method encounters limitations. First, it cannot handle the case in which there are zero cracking points, which returns a negative infinity value in the linearized form. Second, the Weibull distribution is nonsymmetrical and the error in the rank probability estimation is not random in nature [14]. Weights in the MRR linear function are based on the incorrect assumption of uncorrelated, equal variance residuals [15]. Therefore, we used a nonlinear curve-fitting function, lsqcurvefit, provided by MATLAB, which is based on the least squares method.

Maximum Likelihood Estimation
The MLE method estimates the parameters of the Weibull distribution directly by using the likelihood function, instead of the cracking probability at each censoring point. The likelihood function for the interval-censored case is given by [13]: where S is the number of suspended specimens, s i is the last censoring time of i th suspended specimen, C is the number of interval-censored cracked specimens, and c j U and c j L are the upper and lower bound times, respectively, of the censoring interval for the j th cracking. The sum of S and C is equal to the total number of specimens N. The use of log-likelihood is convenient to determine the Weibull parameters that maximize the likelihood function (i.e., argmax pβ, ηq L pβ, ηq). The log-likelihood function is as follows: Λ pβ, ηq " lnL pβ, ηq " ř S i"1 ln r1´F ps i ; β, ηqs`ř C j"1 ln The maximum likelihood point is obtained where both partial derivatives of Λ pβ, ηq reach zero. Therefore, the maximum likelihood point is given by: Substituting Equations (1) and (6) into Equation (7), we can obtain the final simultaneous equation: The derivation of Equation (8) is available in the Supplementary Materials. It would be extremely difficult to determine a general analytical solution for Equation (8); therefore, we used a numerical approach. In this case, MATLAB offers the numerical nonlinear simultaneous equation function fsolve.

Monte Carlo Simulation
The goals of MRR and MLE are the same: the estimation of Weibull parameters. However, the resulting estimators are slightly different, even though they were both deduced from the same test result. Figure 3 shows the different Weibull curves estimated from the same test data, found in Table 1. It is intriguing to know which estimation method generates more precise estimators. A Weibull distribution with precise estimators could better describe inherent SCC initiation behavior.

Monte Carlo Simulation
The goals of MRR and MLE are the same: the estimation of Weibull parameters. However, the resulting estimators are slightly different, even though they were both deduced from the same test result. Figure 3 shows the different Weibull curves estimated from the same test data, found in Table 1. It is intriguing to know which estimation method generates more precise estimators. A Weibull distribution with precise estimators could better describe inherent SCC initiation behavior.  Table 1. The blue line is the Weibull curve estimated by MRR and the red line is that by MLE. The black squares are the median rank points.
Theoretically, it is possible to calculate the estimation confidence for data containing the exact cracking time only; that is, when cracking is continuously monitored using a direct current potential drop technique. However, an MLE theory to set the estimation confidence for interval-censored data is not yet available [10]. Therefore, Monte Carlo simulation [16] could be used to quantitatively evaluate estimator uncertainties of MLE and MRR. The experimental factors considered in the simulation study are as follows.


True Weibull parameters: It is assumed that the inherent cracking probability is Weibulldistributed. If the standardized estimation errors were affected by the value of the true scale parameter ( ), only changing the time unit (e.g., hours to seconds) could affect standardized estimation errors. It is a contradiction. In fact, a scale parameter is just a scale factor. Therefore, standardized estimation errors are not affected by the value of [15]. Without loss of generality, can be fixed at 100, whereas the value of the true Weibull shape parameter ( ) could affect the standardized estimation errors. To examine the degree of aging effects, several values of (2, 3, and 4) are examined. In earlier studies, the values of the Weibull shape parameter for crack initiation time range from 2 to ~4 [6,[17][18][19].  The number of specimens: The SCC initiation test for nuclear reactor materials requires a corrosive environment at high temperatures and pressures. Thus, simultaneously testing a large number of specimens is difficult. Therefore, the base number of test specimens is set at 10. To evaluate the effect of the number of specimens, additional cases were studied (see Table 2).  Test duration: When planning the SCC test, cracking will not necessarily occur for every specimen within the available testing time. Thus, the test duration is also a factor affecting the uncertainty of Weibull estimators. For convenience, the baseline test duration is set at 120% of . Additional test duration cases are shown in Table 2.  Censoring interval: A shorter censoring interval may be better for developing an accurate SCC initiation model. However, frequent censoring would be inconvenient for the experimenters. Therefore, the baseline censoring interval is set at 20% of . Other examined interval cases are shown in Table 2. Although time-dependent censoring intervals are more general for real  Table 1. The blue line is the Weibull curve estimated by MRR and the red line is that by MLE. The black squares are the median rank points.
Theoretically, it is possible to calculate the estimation confidence for data containing the exact cracking time only; that is, when cracking is continuously monitored using a direct current potential drop technique. However, an MLE theory to set the estimation confidence for interval-censored data is not yet available [10]. Therefore, Monte Carlo simulation [16] could be used to quantitatively evaluate estimator uncertainties of MLE and MRR. The experimental factors considered in the simulation study are as follows.

‚
True Weibull parameters: It is assumed that the inherent cracking probability is Weibull-distributed. If the standardized estimation errors were affected by the value of the true scale parameter (η true ), only changing the time unit (e.g., hours to seconds) could affect standardized estimation errors. It is a contradiction. In fact, a scale parameter is just a scale factor. Therefore, standardized estimation errors are not affected by the value of η true [15]. Without loss of generality, η true can be fixed at 100, whereas the value of the true Weibull shape parameter (β true ) could affect the standardized estimation errors. To examine the degree of aging effects, several values of β true (2, 3, and 4) are examined. In earlier studies, the values of the Weibull shape parameter for crack initiation time range from 2 to~4 [6,[17][18][19].

‚
The number of specimens: The SCC initiation test for nuclear reactor materials requires a corrosive environment at high temperatures and pressures. Thus, simultaneously testing a large number of specimens is difficult. Therefore, the base number of test specimens is set at 10. To evaluate the effect of the number of specimens, additional cases were studied (see Table 2).

‚
Test duration: When planning the SCC test, cracking will not necessarily occur for every specimen within the available testing time. Thus, the test duration is also a factor affecting the uncertainty of Weibull estimators. For convenience, the baseline test duration is set at 120% of η true . Additional test duration cases are shown in Table 2.

‚
Censoring interval: A shorter censoring interval may be better for developing an accurate SCC initiation model. However, frequent censoring would be inconvenient for the experimenters. Therefore, the baseline censoring interval is set at 20% of η true . Other examined interval cases are shown in Table 2. Although time-dependent censoring intervals are more general for real cracking tests, it is assumed that censoring intervals do not vary with time. If we consider time-dependent censoring intervals, there are too many possible combinations of experimental conditions to perform a simulation study.  cracking tests, it is assumed that censoring intervals do not vary with time. If we consider timedependent censoring intervals, there are too many possible combinations of experimental conditions to perform a simulation study.  Figure 4 shows examples of the simulation experiments with different combinations of conditions. Weibull_True represents the pre-assumed true cracking probability, which follows a Weibull distribution. Median_Rank is the set of cumulative cracking point probabilities resulting from the randomly simulated cracking experiments and is calculated by the median rank method. Weibull_MLE and Weibull_MRR are the estimated Weibull distributions obtained from simulation experiments using MLE and MRR, respectively. Figure 4a is an example of a simulation in which the number of specimens is relatively small, the censoring interval is wide, and the test duration is short. In this case, the estimated Weibull curves, Weibull_MLE and Weibull_MRR, are quite different from the true cracking probability curve, Weibull_True. Figure 4b shows another example of the simulation, in which the number of specimens is relatively large, the censoring interval is narrow, and the test duration is long. In this simulation, the estimated Weibull curves approximate the true Weibull curve, following our intuition. The detailed experimental conditions applied in Figure 4 are described in Table 3.   Figure 4a is an example of a simulation in which the number of specimens is relatively small, the censoring interval is wide, and the test duration is short. In this case, the estimated Weibull curves, Weibull_MLE and Weibull_MRR, are quite different from the true cracking probability curve, Weibull_True. Figure 4b shows another example of the simulation, in which the number of specimens is relatively large, the censoring interval is narrow, and the test duration is long. In this simulation, the estimated Weibull curves approximate the true Weibull curve, following our intuition. The detailed experimental conditions applied in Figure 4 are described in Table 3.  By combining the considered experimental conditions, a total of 441 simulation cases were studied. Each case was simulated 20,000 times.
We think that it is important to represent the degree of bias and degree of dispersion of estimators respectively in every specific experimental condition. For the experimenters who want to develop a cracking prediction model, both the degree of bias and degree of dispersion of estimators are necessary to guess their model uncertainty. For the same reason, the estimation uncertainties of β and η are respectively represented.
The 5th, 50th, and 95th percentiles of the Weibull estimators were derived from each simulation case. Further, these estimators were converted to the standard error, which is defined as follows: whereβ andη are the Weibull parameters estimated by MRR or MLE. To quantify the Weibull estimator deviations, we utilized a standardized length of 90% confidence interval, defined as follows: SLCI 90% pβq " SEpβ 95% q´SEpβ 5% q; SLCI 90% pηq " SEpη 95% q´SEpη 5% q.
The true Weibull parameters (β true , η true ) are input as initial values of numerical solvers (i.e., the fsolve and lsqcurvefit functions in MATLAB). If a given combination of experimental conditions is too poor (e.g., cases with a small specimen number and wide censoring interval), it is possible to fail to find estimators with this numerical approach; in these cases, we exclude the failed estimators.

Fixed Test Duration
We fixed the test duration at 120% of η true , the baseline case for test duration, to examine both the effects of the number of specimens and censoring interval.
As a special case, Figure 5 shows the effect of the number of specimens on estimation uncertainties when the censoring interval is fixed to 20% of η true . When the number of specimens is large, there is a high probability of precise and accurate estimation with both MRR and MLE. For estimating the shape parameter β, MRR and MLE provide similar estimation uncertainty levels (see Figure 5a-c). It is likely that the shape parameters are overestimated with high probability when the number of specimens is less than 30 (i.e., SE 50% pβq ą 0). For the scale parameter η estimation, smaller deviation levels are observed in the estimation of scale parameter η for all ranges of specimen number as compared with those of the β estimators, especially at the high β true (see Figure 5d-f). Notably, the scale parameters estimated through MLE have a very slight bias in all ranges of specimen number (i.e., SE 50% pη MLE q « 0).  Figure 6 shows the convergence ratio distributions of MLE numerical estimation. The convergence ratio is defined as follows: Convergence Ratio = Number of converged estimations by numerical solver Number of total simulation(= 20,000) .
The convergence ratio decreased when the number of specimens was small and the censoring interval was wide. This trend was enlarged when the value of increased. If the convergence ratio were too low, there would be a filtering effect caused by the disregard of outlier estimators. That is, for low convergence ratio region, output estimators were not purely random. It is recommended to be careful when analyzing the results in this region.
Although it is known that MLE convergence ratios might be improved by restricting β > 1 [20], we did not use this algorithm in this study. It will be considered in later research. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases. and (c) = 4 when test duration is 120% of . From these data, it is possible to calculate the confidence interval and bias of the estimators when real cracking test conditions are given. For example, if the test duration is 120% of η true and the censoring interval is 20% of η true , the probability of obtaining 0.853 ăη MLE η true ă 1.151 is approximately 90% andη MLE, 50% η true -1 with only 10 specimens when β true " 4 for the testing material (see Figure 5f). Figure 6 shows the convergence ratio distributions of MLE numerical estimation. The convergence ratio is defined as follows: Convergence Ratio " Number of converged estimations by numerical solver Number of total simulation p" 20, 000q  Figure 6 shows the convergence ratio distributions of MLE numerical estimation. The convergence ratio is defined as follows: Convergence Ratio = Number of converged estimations by numerical solver Number of total simulation(= 20,000) .
The convergence ratio decreased when the number of specimens was small and the censoring interval was wide. This trend was enlarged when the value of increased. If the convergence ratio were too low, there would be a filtering effect caused by the disregard of outlier estimators. That is, for low convergence ratio region, output estimators were not purely random. It is recommended to be careful when analyzing the results in this region.
Although it is known that MLE convergence ratios might be improved by restricting β > 1 [20], we did not use this algorithm in this study. It will be considered in later research. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases. and (c) = 4 when test duration is 120% of . Figure 6. Convergence ratio distributions of MLE numerical estimation at: (a) β true " 2; (b) β true " 3; and (c) β true " 4 when test duration is 120% of η true . The convergence ratio decreased when the number of specimens was small and the censoring interval was wide. This trend was enlarged when the value of β true increased. If the convergence ratio were too low, there would be a filtering effect caused by the disregard of outlier estimators. That is, for low convergence ratio region, output estimators were not purely random. It is recommended to be careful when analyzing the results in this region.
Although it is known that MLE convergence ratios might be improved by restricting β > 1 [20], we did not use this algorithm in this study. It will be considered in later research. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases. Figure 7 shows the distributions of SE 50% pβq by MLE or MRR. These results indicate a bias in Weibull shape parameter estimation. It is likely that when the number of specimens is relatively small, β true tends to be overestimated, as in Figure 5. However, this trend did not occur when MLE was used and the censoring interval was relatively wide. Furthermore, if the value of β true was relatively large, underestimation occurred in wide censoring interval regions of MLE estimators (see Figure 7c).  Figure 7 shows the distributions of SE % by MLE or MRR. These results indicate a bias in Weibull shape parameter estimation. It is likely that when the number of specimens is relatively small, tends to be overestimated, as in Figure 5. However, this trend did not occur when MLE was used and the censoring interval was relatively wide. Furthermore, if the value of was relatively large, underestimation occurred in wide censoring interval regions of MLE estimators (see Figure 7c).  Figure 8 shows the distributions of SE % ( ) by MLE or MRR. These results indicate bias in Weibull scale parameter estimation. It is interesting that when MLE was used, very little bias was observed in all simulation cases. For MRR, a tendency toward overestimation (i.e., SE % ( ) > 0) occurred when the number of specimens was relatively small. This tendency was slightly amplified when was relatively small. Figure 9 shows the distributions of SLCI % by MLE or MRR. These results illustrate the variance in Weibull shape parameter estimators. As anticipated, the variance in was large when the number of specimens was relatively small and the censoring interval was wide. It is likely that there are critical lines after which estimators whose variances are too large are produced. Near the critical lines, the gradients of SLCI % were very high. Experimenters who want to develop cracking prediction models with a cracking test should avoid this region.  Figure 8 shows the distributions of SE 50% pηq by MLE or MRR. These results indicate bias in Weibull scale parameter estimation. It is interesting that when MLE was used, very little bias was observed in all simulation cases. For MRR, a tendency toward overestimation (i.e., SE 50% pηq ą 0) occurred when the number of specimens was relatively small. This tendency was slightly amplified when β true was relatively small. Figure 9 shows the distributions of SLCI 90% pβq by MLE or MRR. These results illustrate the variance in Weibull shape parameter estimators. As anticipated, the variance inβ was large when the number of specimens was relatively small and the censoring interval was wide. It is likely that there are critical lines after which estimators whose variances are too large are produced. Near the critical lines, the gradients of SLCI 90% pβq were very high. Experimenters who want to develop cracking prediction models with a cracking test should avoid this region.       . Distributions of SLCI 90% pβ MLE q at: (a) β true " 2; (b) β true " 3; and (c) β true " 4; and SLCI 90% pβ MRR q at: (d) β true " 2; (e) β true " 3; and (f) β true " 4 (test duration: 120% of η true ). Figure 10 shows the distributions of SLCI 90% pηq by MLE and MRR. These results show the variance in Weibull scale parameter estimators. The overall values of the SLCI 90% pηq distributions were much lower than those of the SLCI 90% pβq distributions, especially for the case of high β true values. Interestingly, shortening the censoring interval slightly affects the reduction of estimator deviations as compared to the case of SLCI 90% pβq, and there is no critical line for SLCI 90% pηq distributions. In fact, the distributions of Weibull estimators were not normal in most simulation conditions. Therefore, the upper and lower bound of the confidence intervals (e.g., SE % and SE % , respectively) must be represented. We provide these data in the Supplementary Materials.

Fixed Censoring Interval
We fixed the censoring interval at 20% of , the baseline case for the censoring interval, to examine the effects of both the number of specimens and test duration. Figure 11 shows the convergence ratio distributions of MLE numerical estimation. The convergence ratio decreased when the number of specimens was small, and the test duration short. This tendency was enlarged when the value of was increased. As previously mentioned, if the convergence ratios were too low, there would be the filtering effect. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases.  In fact, the distributions of Weibull estimators were not normal in most simulation conditions. Therefore, the upper and lower bound of the confidence intervals (e.g., SE 5% and SE 95% , respectively) must be represented. We provide these data in the Supplementary Materials.

Fixed Censoring Interval
We fixed the censoring interval at 20% of η true , the baseline case for the censoring interval, to examine the effects of both the number of specimens and test duration. Figure 11 shows the convergence ratio distributions of MLE numerical estimation. The convergence ratio decreased when the number of specimens was small, and the test duration short. This tendency was enlarged when the value of β true was increased. As previously mentioned, if the convergence ratios were too low, there would be the filtering effect. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases. In fact, the distributions of Weibull estimators were not normal in most simulation conditions. Therefore, the upper and lower bound of the confidence intervals (e.g., SE % and SE % , respectively) must be represented. We provide these data in the Supplementary Materials.

Fixed Censoring Interval
We fixed the censoring interval at 20% of , the baseline case for the censoring interval, to examine the effects of both the number of specimens and test duration. Figure 11 shows the convergence ratio distributions of MLE numerical estimation. The convergence ratio decreased when the number of specimens was small, and the test duration short. This tendency was enlarged when the value of was increased. As previously mentioned, if the convergence ratios were too low, there would be the filtering effect. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases.   Figure 12 shows the distributions of SE 50% pβq. For MLE, when the number of specimens was relatively small, there was likely a tendency toward overestimation (i.e., SEpβq ą 0). For MRR, overestimation was shown at short test durations and underestimation was shown at long test durations.
Materials 2016, 9,521 12 of 17 Figure 12 shows the distributions of SE % . For MLE, when the number of specimens was relatively small, there was likely a tendency toward overestimation (i.e., SE( ) > 0). For MRR, overestimation was shown at short test durations and underestimation was shown at long test durations.  Figure 13 shows the distributions of SE % ( ) . When MLE was used, very little bias was observed in all simulation ranges, as in the fixed test duration case (see Figure 8a-c). For MRR, overestimation (i.e., SE( ) > 0) occurred when the number of specimens was relatively small, except in cases of short test duration. This tendency was amplified when was relatively small. Figure 14 shows the distributions of SLCI % . As anticipated, there was quite large variance in when the number of specimens was relatively small and the test duration was short. It is likely that very long test durations are not useful for reducing estimator variance. This phenomenon is natural because censoring beyond a certain time only returned repeated meaningless results (i.e., all the specimens were cracked after this time). As in the fixed test duration case (see Figure 9), critical lines are observed in the distributions of SLCI % . The areas after critical line region increased when the value of was relatively high. Figure 15 shows the distributions of SLCI % ( ) . The overall values of the SLCI % ( ) distributions were much lower than those of the SLCI % distributions especially at high . As in the case of SLCI % , too long a test duration was not useful for reducing estimator variance. Contrary to the fixed test duration case (see Figure 10), there were critical lines in SLCI % ( ) distributions.  Figure 13 shows the distributions of SE 50% pηq. When MLE was used, very little bias was observed in all simulation ranges, as in the fixed test duration case (see Figure 8a-c). For MRR, overestimation (i.e., SEpηq ą 0) occurred when the number of specimens was relatively small, except in cases of short test duration. This tendency was amplified when β true was relatively small. Figure 14 shows the distributions of SLCI 90% pβq. As anticipated, there was quite large variance inβ when the number of specimens was relatively small and the test duration was short. It is likely that very long test durations are not useful for reducing estimator variance. This phenomenon is natural because censoring beyond a certain time only returned repeated meaningless results (i.e., all the specimens were cracked after this time). As in the fixed test duration case (see Figure 9), critical lines are observed in the distributions of SLCI 90% pβq. . The areas after critical line region increased when the value of β true was relatively high. Figure 15 shows the distributions of SLCI 90% pηq. The overall values of the SLCI 90% pηq distributions were much lower than those of the SLCI 90% pβq distributions especially at high β true . As in the case of SLCI 90% pβq, too long a test duration was not useful for reducing estimator variance. Contrary to the fixed test duration case (see Figure 10), there were critical lines in SLCI 90% pηq distributions.      The upper and lower bounds for this fixed censoring interval case are also represented in the Supplementary Materials.

Fixed Number of Specimen
We fixed the number of specimen at 10, the baseline case for the number of specimens, to examine both the effects of censoring interval and test duration. Figure 16 shows the convergence ratio distributions of MLE numerical estimation. It was quite complicated to find a general tendency from these results. We hypothesize this complexity is due to the complex distribution of the number of censoring times (see Figure 17). For example, if only one censoring was implemented during the simulation, the convergence ratio reaches unity even though the experimental condition of the simulation was very poor. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases.  The upper and lower bounds for this fixed censoring interval case are also represented in the Supplementary Materials.

Fixed Number of Specimen
We fixed the number of specimen at 10, the baseline case for the number of specimens, to examine both the effects of censoring interval and test duration. Figure 16 shows the convergence ratio distributions of MLE numerical estimation. It was quite complicated to find a general tendency from these results. We hypothesize this complexity is due to the complex distribution of the number of censoring times (see Figure 17). For example, if only one censoring was implemented during the simulation, the convergence ratio reaches unity even though the experimental condition of the simulation was very poor. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases. The upper and lower bounds for this fixed censoring interval case are also represented in the Supplementary Materials.

Fixed Number of Specimen
We fixed the number of specimen at 10, the baseline case for the number of specimens, to examine both the effects of censoring interval and test duration. Figure 16 shows the convergence ratio distributions of MLE numerical estimation. It was quite complicated to find a general tendency from these results. We hypothesize this complexity is due to the complex distribution of the number of censoring times (see Figure 17). For example, if only one censoring was implemented during the simulation, the convergence ratio reaches unity even though the experimental condition of the simulation was very poor. For MRR estimation, the convergence ratios were mostly close to unity in all simulation cases.   In this case, with a fixed number of specimens, the distributions of SE % and SLCI % were very complex and it was difficult to find general tendencies for both the MLE and MRR cases. Therefore, we do not represent these results in this manuscript, but in the Supplementary Materials instead.
In fact, we think that the end cracking fraction would be a more appropriate factor of estimation uncertainty than test duration. First, end cracking fraction of the test is not directly related to the number of censoring times when the censoring interval is pre-determined. Second, it does not produce repeated meaningless results after a certain time (i.e., test will end when all specimens are cracked if end cracking fraction is set to unity). We will study the effects of end cracking fraction on Weibull estimation uncertainties in later research.

Conclusions
The main goal of this study is to suggest proper experimental conditions for experimenters who want to develop a probabilistic SCC initiation model through cracking tests. We consider the widely used MRR and MLE methods for Weibull estimation. By using Monte Carlo simulation, MRR and MLE estimator uncertainties were quantified in various experimental conditions. The following conclusions can be drawn:


It is possible to calculate the confidence interval and bias of estimators when the real cracking test conditions are given.  Very little bias is observed in all simulation ranges when MLE is used to estimate the scale parameter .  The overall deviations of ̂ are much lower than those of in the simulation study range. This effect is enlarged when the value of is relatively high. Therefore, it is not recommended to estimate from a cracking test when the experimental conditions are poor.  It is likely that there are critical lines after which estimators whose variances are too large are produced. Near the critical lines, the gradients of SLCI % are very high. It is recommended that experimenters avoid this region.  Before the critical line region, too narrow censoring interval, or too long test duration, is not useful for reducing the estimation uncertainty.

Outlook
The following issues will be considered in the later research:  In this study, it is assumed that censoring interval is time-independent variable. However, timedependent censoring interval is more general for a real SCC test.  The end cracking fraction seems more appropriate than the test duration for use as a factor of estimation uncertainty. In this case, with a fixed number of specimens, the distributions of SE 50% and SLCI 90% were very complex and it was difficult to find general tendencies for both the MLE and MRR cases. Therefore, we do not represent these results in this manuscript, but in the Supplementary Materials instead.
In fact, we think that the end cracking fraction would be a more appropriate factor of estimation uncertainty than test duration. First, end cracking fraction of the test is not directly related to the number of censoring times when the censoring interval is pre-determined. Second, it does not produce repeated meaningless results after a certain time (i.e., test will end when all specimens are cracked if end cracking fraction is set to unity). We will study the effects of end cracking fraction on Weibull estimation uncertainties in later research.

Conclusions
The main goal of this study is to suggest proper experimental conditions for experimenters who want to develop a probabilistic SCC initiation model through cracking tests. We consider the widely used MRR and MLE methods for Weibull estimation. By using Monte Carlo simulation, MRR and MLE estimator uncertainties were quantified in various experimental conditions. The following conclusions can be drawn:

‚
It is possible to calculate the confidence interval and bias of estimators when the real cracking test conditions are given.
‚ Very little bias is observed in all simulation ranges when MLE is used to estimate the scale parameter η.

‚
The overall deviations ofη are much lower than those ofβ in the simulation study range. This effect is enlarged when the value of β true is relatively high. Therefore, it is not recommended to estimate β from a cracking test when the experimental conditions are poor.
‚ It is likely that there are critical lines after which estimators whose variances are too large are produced. Near the critical lines, the gradients of SLCI 90% are very high. It is recommended that experimenters avoid this region.
‚ Before the critical line region, too narrow censoring interval, or too long test duration, is not useful for reducing the estimation uncertainty.

Outlook
The following issues will be considered in the later research: ‚ In this study, it is assumed that censoring interval is time-independent variable. However, time-dependent censoring interval is more general for a real SCC test.

‚
The end cracking fraction seems more appropriate than the test duration for use as a factor of estimation uncertainty.

‚
To improve the convergence ratio of MLE, we will consider the numerical algorithm which restricts β > 1.
‚ If a cost function (e.g., specimen cost and labor cost) is obtained for an experiment, it will be possible to find out an optimum experimental condition which returns minimum estimation uncertainty with a given cost.
Supplementary Materials: The following are available online at www.mdpi.com/1996-1944/9/7/521/s1. Derivation of interval censored ML simultaneous equation for 2-parameter Weibull distribution (word file); Raw data of the simulation results for "fixed test duration", "fixed censoring interval", and "fixed specimen number" cases (excel files); All contour plots of the simulation results, which contains the upper bounds (i.e., SE 95% ), lower bounds (i.e., SE 5% ), and the contour plots for the fixed specimen number case (pdf file).