From Gaussian Distribution to Weibull Distribution

Abstract


23
he Gaussian distribution is also commonly known as the Gaussian distribution, and it is generally known that 24 the height, weight, and even IQ of a group of people are relatively consistent with the Gaussian distribution. 25 However, like fatigue life of structures is often far from the Gaussian distribution and more in line with the 26 Weibull distribution. In [1] it was pointed out that the Weibull distribution is a full state distribution, i.e., it can 27 depict not only left-skewed and right-skewed data, but to some extent also symmetric as well as data satisfying a 28 power law. In this sense it is more versatile than the Gaussian distribution [2], [3] and plays a very important role 29 especially in fitting the fatigue life of structures. However, because of the difficulties encountered in determining 30 the three parameters of the Weibull distribution, the problem was solved by taking the logarithm to make the 31 data appear to be more in line with the Gaussian distribution. In fact, this approach is problematic. This paper 32 points out that logging the original data is only a spatial transformation from a mathematical point of view, but 33 from a physical point of view, it changes the structure of the data, and the physical meaning is changed, so it is 34 not appropriate to use logarithmic Gaussian distribution to fit the original data after logarithm. To determine the 35 three parameters of the Weibull distribution, the graphical and analytical methods [4] were previously adopted, 36 the former being inconvenient to use and with relatively large errors; the latter involves solving a system of three 37 joint transcendental equations, which, despite the availability of computers to do so, still has the problem of 38 being inconsistent. This problem can now be solved relatively well by using T.Z. Gao method proposed by [ Even in an arbitrary distribution, in the case of a large sample, the distribution of the mean will approximate 51 the Gaussian distribution. 3. Simplicity, i.e., only two parameters (?, ? 2 ) are needed to determine the shape of 52 the entire distribution.

53
Because the normal distribution has so many good characteristics, it has become the most studied and applied 54 distribution. However, it is obvious that not all data conform to Gaussian distribution, and in most cases the 55 data conform to Gaussian distribution is only a good approximation. In fact [4] , the data of various fatigue 56 lives are often not fit Gaussian distribution but better fit Weibull distribution, and sometimes the fatigue life is 57 logarithmically distributed, but it is only an approximation. Because of this, Weibull distribution needs to be 58 introduced and studied in more depth. It is easy to prove that the life is x i and the corresponding reliability It can be seen that when x=x 0 , p 0 =100%. This is the origin of 100% reliability safety life. If p 50 =50%,  (7), which is the analytical metho [4] . In addition to the analytical method, the maximum 77 likelihood method and some methods derived from it [6], [7] have been used more recently, but they have problems 78 such as cumbersome derivation and inconvenient calculation, so we will not discuss them in depth here. Theoretically if a set of fatigue life data N is given, then using the median (N m ), mean (N av ) and mean 81 squared deviation (s) of this array, then using the three equations ( 5), ( ??) and ( 7) It is not difficult to find that N 0 (=127) derived from the analytical method is greater than the minimum 86 value of 124 for this group of fatigue lives. And this is in contradiction with the definition of safe life N 0 . That 87 is, the problem of inconsistent occurs. Another question is what happens if we fit this set of data with a Gaussian 88 distribution? That is, which is the more appropriate distribution to fit?

89
The second problem can be judged by the magnitude of the determination coefficient [8] R 2 fitting the ideal 90 reliability based on the so-called "average rank" [4] . The so-called ideal reliability means that the following 91 formula is independent of the specific distribution,p i =1-i/(n+1) (9) 92 where i is the order of the data from smallest to largest, and n is the number of data. It can be seen that the fitted coefficient of determination of the Weibull distribution obtained by GZT method 117 is greater than that of the Gaussian distribution. That is, in this sense the data are more realistically depicted 118 by the Weibull distribution.

119
The advantage of GZT method is that the physical meaning is very intuitive, and there is no problem of 120 "inconsistent". This method is not only convenient for solving the problem of estimating the three parameters 121 of the Weibull distribution, but also easy to determine whether the original data fits better with the Weibull 122 distribution or with the Gaussian distribution. It is also easy to extend to solve similar problems, such as fitting 123 fatigue performance curves with three parameters [1] , and the confidence intervals of these three parameters will 124 be discussed in separate papers [10], [11] .

126
Due to the complexity of the Weibull distribution, when the original data is not so consistent with the 127 Gaussian distribution, often take its logarithmic, from a mathematical point of view is equivalent to do a spatial 128 transformation, at this time because the data "compressed", it may be closer to the Gaussian distribution [4] .

129
This has the advantage of making the PDF of the original data taken logarithmically will be fitted quite well by 130 the Gaussian distribution, which will be more convenient for people to study and apply. However, this will lose 131 the physical meaning of the safety lifetime, while making the original data density distribution is "distorted".

132
This is illustrated in the following two examples. "back" to the original state, only the median can "recover" 133 (see Table 2, line 3), and the mean is leftskewed, the relative coefficient and the coefficient of determination is 134 improved. Nevertheless, it is still not possible to obtain a 100% safe lifetime. In contrast, the fit with the Weibull 135 distribution, as seen in Table 3, is a fairly good fit. Even after taking the logarithm, the fit is almost the same 136 as that of the Gaussian distribution.

137
From the data in row 2 of the Weibull distribution parameters in Table 3 and ( ??) and ( 7), we can calculate 138 that ?^=5.7137; ?^=0.1005 And this result is almost the same as the data in row 2 of Table 2. In this sense the 139 Weibull distribution is indeed more general than the Gaussian distribution, which can be seen as a first-order 140 approximation to the Weibull distribution. It can be seen that using the Weibull distribution to fit this set of 141 fatigue life data does not require any logarithm of the data at all and the physical meaning of each parameter 142 is very clear. Example 4: Looking again at the case of a small sample, 20 data for the life of a structure using 143 Table ??-Also the following parameter table and histogram can be obtained. As seen in Fig. 3 there is no need to take the logarithm of the original data. Even if the logarithm is taken, the data looks more 152 symmetric, but the Weibull distribution does not fit worse than the Gaussian distribution. So in this sense, even 153 for symmetric data, fitting with the Weibull distribution is possible. However, the difficulty in fitting the Weibull 154 distribution is that it is more difficult to estimate the three parameters, but now there is no problem with GZT 155 method. 1. The three-parameter Weibull distribution is a more general full state distribution than the Gaussian 158 distribution. In the field of reliability, the physical meaning of its position parameter is particularly important, 159 that is, the safe life under 100% reliability. 2. Based on the complexity of the three-parameter Weibull 160 distribution, the previous methods to determine its three parameters by test data are complicated. The graphical 161 method is more errorprone and inconvenient to use; while the analytical method may be inconsistent; and the 162 3 6 VI. CONCLUSION GZT method makes full use of the advantages of Python, which solves this problem better. 3. In the past, the 163 fatigue life data that were not so well fitted with Gaussian distribution were taken logarithmically so that they 164 might be more consistent with Gaussian distribution, but the result of doing so made the 100% reliability of 165 the safe life no longer exist. The fact is that the data itself is more consistent with Weibull distribution. Since 166 Weibull distribution is a full state distribution, it is generally not necessary to take the fatigue life as logarithm 167 in the future and directly fit the fatigue life data with the three-parameter Weibull distribution to get a better 168 fit. 169 4. The two parameters of Gaussian distribution (mean and variance) are not very significant for asymmetric 170 data, while for asymmetric data like structural fatigue life the three parameters of Weibull distribution (safety 171 life, shape and scale parameters) will be much more significant, and in a sense these three parameters "contain" 172 the two parameters of Gaussian distribution. This is probably the reason why the Weibull distribution can 173 "contain" the Gaussian distribution. 5. Finally, it can be concluded that for asymmetric fatigue life, it is not 174 necessary to take logarithms to fit with Gaussian distribution, but can be directly fitted with three-parameter 175 Weibull distribution.

176
Further even for the more symmetric data, it is better to fit directly with the three-parameter Weibull