Reversed inverse regression for the univariate linear calibration and its statistical properties derived using a new methodology

. Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of ﬁ elds. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are ﬁ xed. Therefore, in the case of calibration, the regression line ﬁ tted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, “ classical regression ” and “ inverse regression ” have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce “ reversed inverse regression ” along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the “ error propagation rule ” and the “ method of simultaneous error equations ” and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.


Introduction
Simple linear regression is a model with a single independent variable in which a regression line is fitted through n data points such that the sum of squared errors (SSE), i.e., the vertical distances between the data points and the fitted line, is as small as possible.The statistical properties of this model have been established as theorems and are presented in many statistics textbooks, e.g., the textbook written by Walpole and Myers [1].In this model, a regression line of y on x is fitted based on the assumption that x is fixed but y varies according to a normal distribution.This model is called "basic regression" throughout the remainder of this study.Unfortunately, when calibrating an instrument such as a chemical analyzer using basic regression, a problem arises.In practical calibrations, the observed measurements (the x values) are subject to errors, and hence they vary, thus violating the assumption of fixed inputs.As a result, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of basic regression as already established based on this assumption.
Two approaches have been considered as possible solutions for this problem.In the first approach [2], called classical regression, the "standards" (the x values) are treated as the inputs, and the observed measurements (the y values) are treated as the response; these values are used to fit a regression line of y on x.This regression approach is consistent with the assumption that x is fixed.The problem with this approach is that estimating the x value for a new observed measurement involves the reciprocal of the estimated slope.Williams [3] demonstrated that the reciprocal of the slope has an infinite variance, which indicates that classical regression has an infinite variance and, hence, an infinite mean squared error.Nevertheless, Parker et al. [4] obtained an asymptotic approximation of the variance of the prediction interval using a formula derived by Casella and Berger [5] using the Delta Method.However, Parker et al.'s approach still has limitations.Even if we rely on this approximation, we cannot determine a prediction interval with a given confidence level because the approximation cannot be used to express the prediction interval as a t nÀ2 distribution.
In the second approach [6], called inverse regression, the standards (the x values) are treated as the response, the observed measurements (the y values) are treated as the inputs, and these values are used to fit a regression line of x on y.This regression approach is inconsistent with the assumption that the inputs are fixed.Shukla and Datta [7] and Oman [8] derived expressions for the mean and mean squared error of predicted x value based on multiple measurements taken during the prediction stage of the calibration process.Fuller [9] made a similar suggestion regarding the derivation of both the predicted x value and the prediction interval.Fuller's approach requires that the variance of the observed measurements is known.In his approach, it is necessary to measure a standard multiple times independently to estimate the variance.Parker et al. [4] derived the bias in prediction using a formula established by Pham-Gia et al. [10] with the aid of the Delta Method.Parker et al. [4] also showed through several simulation studies that inverse regression is preferable to classical regression in terms of bias and mean squared error.However, to derive the statistical properties of inverse regression, Parker et al. were obliged to borrow their estimate for the variance of the slope from "reversed basic regression" because of technical difficulties, which devalues their approach.(Reversed basic regression is basic regression in which the roles of x and y have merely been reversed.) As a fundamental solution for the calibration problem, which has not yet been resolved completely, the current study introduces "reversed inverse regression" along with a new methodology for deriving its statistical properties.(Simply put, "fundamental solution for the univariate linear calibration problem" = "reversed inverse regression" + "new methodology for deriving the statistical properties of the regression".)In the proposed regression approach, the observed measurements (the x values) are treated as the inputs, and the standards (the y values) are treated as the response; these values are used to fit a regression line of y on x.The statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations".In this regression approach, it is not necessary to measure any standards multiple times independently.We present an example of practical calibration.Each of three types of regression (i.e., classical regression, inverse regression and reversed inverse regression) is applied to the calibration example, and the corresponding calibration results, including the subsequently calculated estimates for the variance of the prediction interval, are compared.In addition, the accuracy of the statistical properties derived using the new methodology is investigated in a Monte Carlo simulation study.

Regression and methodology
If the roles of x and y are reversed, then inverse regression becomes reversed inverse regression.Reversed inverse regression is more convenient to use for calibration than inverse regression because the reversed roles are consistent with the convention that the variable x represents the inputs, whereas the variable y represents the response.This regression approach also violates the assumption that the inputs are fixed.It is modeled as follows.(It may be desirable to use some other term than "reversed inverse regression", e.g., "pseudo-basic regression", to eliminate potential confusion in terminology.)-There is a linear relationship between x and y.
-The observed measurements (the x values) are treated as the inputs, the standards (the y values) are treated as the response, and these values are used to fit a regression line of y on x. -For the fitting of the regression line, n data points of the form (x i , y i ) (i = 1, …, n) are used.The x i value varies according to a normal distribution, whereas the y i value is fixed; y i = a + bx i + e i , e i ∼ N(0, s2 ).-The x i 's (i.e., x 1 , …, x n ) are treated as variables.The variables x i and x j (i ≠ j) are independent of each other: In other words, the variance of the observed measurements is equal over the entire calibration range of interest.
• s 2 xi denotes the variance of the variable -The population regression line y = a + bx is defined as follows: Þ=n, x i0 is the mean of x i , and x 0 ¼ ð P x i0 Þ=n.• All points (x i0 , y i ) (i = 1, …, n) lie on the population regression line.In this study, we call these points the "mean data points".( P denotes summation from i = 1 to n throughout this study.) In reversed inverse regression, the assumption that the observed measurements (the x values), despite being the inputs, vary according to normal distributions is very important.Suppose that the regression line fitting is repeated an infinite number of times using a "new set of n different standards (or reference solutions)" each time.Here, this "new set of n different standards" refers to newly prepared standards whose nominal y values (or target y values) and confidence levels are identical to those of the previous set of standards.In this case, the x i 's (i.e., x 1 , …, x n ) will be observed to vary according to normal distributions.The standards are subject to errors that may arise when preparing or manufacturing them.However, such errors will appear as variations in the x i 's after being combined with random measurement errors.If the "same set of n different standards" is measured repeatedly, we will only observe the variance associated with the random measurement errors; the errors of the standards themselves will not be reflected.Such a variance should not be treated as the variance needed to derive the statistical properties of linear regression.In this respect, Fuller [9] is incorrect, because his approach requires a standard to be independently measured multiple times to estimate the variance.As previously mentioned, reversed inverse regression does not require any such separate prior measurements.
The slope of the regression line that is fitted on the basis of reversed inverse regression is: Unfortunately, it is technically difficult to derive the variance of the slope directly from the definition of the variance, i.e., var[f(x 1 , …, , because b is a fractional expression that contains " P (x i À x) 2 " in the denominator and the x i 's vary rather than being fixed.Because of this difficulty, we directly treat the x i 's as variables and derive the variance of the slope based on the first-order Taylor approximation as follows: ≈ E½ffðx 1 ; :::; x n Þ À fðx 10 ; :::; x j g Ã : Note : E½ffðx 1 ; :::; x n Þ À fðx 10 ; :::; x n0 Þg 2 ¼ E½ffðx 1 ; :::; x n Þ À E½fðx 1 ; :::; x n Þg 2 þfbias infðx 1 ; :::; x n Þg 2 ; where the notation [ ] * or { } * indicates that the value of the function contained within the bracket is determined using the mean values of the variables, i.e., x 10 , …, x n0 [11].Even in the case of derivation of expectations, this notation is often used for the same purpose.In particular, we define the expectation E[{f(x 1 , ..., x n ) À f(x 10 , …, x n0 )} 2 ] as the "meandata-point-based variance".The approximation method for deriving the variance described herein is commonly referred to as the "error propagation rule", and only the first-order partial derivatives are included in its derivation.To derive the variance of the slope, var[ b], after the partial differentiation of b with respect to the x i 's, the variances of the x i 's, including the covariances of x i and x j (j > i), are combined in accordance with the error propagation rule.The final result obtained from this combination process is the approximate variance of the slope.The same method can be used to derive the variance of the intercept and the variance of the predicted y value.All other statistical properties of reversed inverse regression, such as the expectation and bias of the slope and the expectation of the mean squared error, are derived by utilizing another special method, called the "method of simultaneous error equations" in this study, in combination with the error propagation rule.When we need to derive another statistical property from the primary expressions already obtained using the error propagation rule, the firstorder Taylor approximation is mainly used.Error terms of orders higher than (s x /A) 2 are discarded during or after the approximation calculations.For example, (s x /A) 4 (=1/10 8 ) is very small and can be neglected in comparison with (s x / A) 2 (=1/10 4 ).
The Delta Method is also an asymptotic approximation method based on Taylor approximation [12].Parker et al. [4] used the Delta Method to derive the variance of the prediction interval for classical regression.When the Delta Method is applied to the inverted equation x = Àâ 0 / b0 + (1/ b0 )y, the x i 's and y i 's are not directly treated as variables.Instead, U (=Àâ 0 + y 0 À e 0 ) and V (¼ b0 ) are treated as the variables [4,5,10].This is the most notable difference between the Delta Method and the approximation method used in this study.

Statistical properties of reversed inverse regression
The variance and bias of the slope and the expectation of the mean squared error are the statistical properties that are primarily required in linear regression because other properties, such as the variance and bias of the intercept and the variance of the prediction interval, depend on them.Therefore, the variance of the slope, var[ b], is first derived using the error propagation rule as follows (see supplementary material): To investigate the accuracy of the variance obtained using equation ( 1), we should consider two factors.One is that error terms of orders higher than s 2 x are not included in the derivation.The other is that because equation ( 1 where r(x, y) is the estimated correlation coefficient between x and y, i.e., r(x, y) = S xy /(S xx S yy ) 1/2 , and r 2 (x, y) is typically very close to 1 in linear calibrations.The variance of the intercept, var[â], is also derived using the error propagation rule as follows (see supplementary material): Separately from the previous derivation process, another equation for deriving var[â] can be obtained by applying the error propagation rule to â = y À bx: From equations ( 2) and ( 3), we can see that r( b, x) ≈ 0, and hence, b and x are nearly independent of each other.In equation ( 2), var[â] is derived by treating â as a function of x i 's (i = 1, …, n), whereas in equation ( 3), var[â] is derived by treating â as a function of b and x.In this way, by formulating two separate equations to obtain the variance of a statistic using the error propagation rule, we can derive the covariance or correlation coefficient between any two statistics.This method is called the "method of simultaneous error equations" in this study.Nearly all of the covariances (or correlation coefficients) in a linear regression problem can be derived using this method.In addition, the derived covariances can be further used to derive other statistical properties.However, we should note that the covariances thus derived are typically approximations, not exact expressions.
A predicted y value is the y value of a point (x, y) on the fitted regression line and is determined by substituting x into ŷ = â + bx.The variance of such a predicted y value, var[ŷ], is derived using the error propagation rule as follows: Separately from equation ( 4), another equation for deriving var[ŷ] can be obtained by applying the error propagation rule to ŷ = â + bx: From equations ( 4) and ( 5), the correlation coefficient r (â, b) can be determined as follows: As the next step, we derive the expectations of b and â, and the biases in b, â and ŷ.For this purpose, the following statistical properties are derived in advance using the method of simultaneous error equations (see supplementary material): Therefore, the expectation of the slope, b E , can be derived as follows (see supplementary material for more details): If we apply the first-order Taylor approximation to simplify the expression x }, we obtain the following expressions for b E and a E : Accordingly, the biases in b, â and ŷ are as follows: Based on these biases, we can see that b and a are not the mean, median, or mode of the b and â distributions.However, we can say that b and â, despite being slightly skewed, follow approximately normal distributions centered at b and a respectively, because the terms b[1/ S xx ] * (n À 3)s 2 x and x 0 b[1/S xx ] * (n À 3)s 2 x are each very small in magnitude in practical calibrations.(When n is 3, b coincides with b E .The same can be said of a and a E .) To show that the slope, intercept and predicted y value in reversed inverse regression can be expressed as t nÀ2 distributions, it is necessary to know the statistical properties of the mean squared error (MSE).The expectation of MSE is first derived (see supplementary material for more details): To investigate the accuracy of the expectation of MSE obtained using equation ( 8 Additionally, b and x are independent of each other and x and MSE are also independent of each other, then r(â, MSE) = r(y À bx, MSE) ≈ 0.
In the expression P (y i À â À bx i ) 2 /(n À 2), the y i 's are constant, b and â follow approximately normal distributions, and the x i 's also follow normal distributions.Therefore, (n À 2)MSE/s 2 approximately follows a x 2 distribution with n À 2 degrees of freedom.In addition, both b and â are nearly independent of MSE.Based on these facts, the following expressions can be obtained (see equations ( 1), ( 2), ( 4) and ( 8)): where ŝ is the square root of MSE and y 0 is the nominal y value of a newly prepared standard.The T's are all approximate t nÀ2 distributions.Although x 2 , (x À x) 2 and S yy /S 2 xy , which appear in the T's, are functions of x i (i = 1, …, n), the t nÀ2 distributions are not greatly deformed by these functions because the fluctuations of S yy /S

Comparison of regression approaches
Krutchkoff [6,13] compared classical regression and inverse regression using Monte Carlo simulations and recommended inverse regression based on the mean squared error.However, Berkson [14] and Halpern [15] presented significant criticisms of Krutchkoff's work.Parker et al. [4] also conducted several simulation studies and concluded that inverse regression performs better than classical regression.It seems that such debates arise because the existing regression approaches and accompanying methodologies are theoretically incomplete.Unusually, we compare different linear regression approaches using a practical calibration example.Each of three types of regression (classical, inverse and reversed inverse) is applied to the calibration scenario.In practical calibrations, the variance of the prediction interval is one of the most important statistical properties.Therefore, we identify the differences among the three regressions based on a comparison of the variances of the prediction interval estimated using the three regression approaches.For the fitting of a regression line as an example of practical calibration, we use a set of data points collected by Suh [16] while evaluating the uncertainty in the measurements recorded by an absorption spectrometer.The spectrometer determines the chemical concentrations (ppm) in a sample by measuring the absorbances (%) due to the corresponding chemical elements.Suh measured five different Cd (cadmium) standards.The data points collected by Suh and the calibration results are as follows: ð0:1ppm; 0:028%Þ; ð0:3ppm; 0:084%Þ; ð0:5ppm; 0:135%Þ; ð0:7ppm; 0:180%Þ; ð0:9ppm; 0:215%Þ: -   The estimate EV RI derived via reversed inverse regression at x = 0.215% (the upper end of the calibration range) is compared with the estimate EV C derived via classical regression at x = 0.8685 ppm and with the estimate EV I derived via inverse regression at y = 0.215%.All three estimates are different from one another.Classical regression yields the largest estimate, and inverse regression yields the smallest one.This can be explained by rewriting and comparing the following three estimators.(Both EV C and EV I are those derived by Parker et al. [4].)When rewriting EV C and EV I , the roles of x and y were reversed to facilitate comparison.In addition, ŝ2 y ð1= b0 Þ 2 in the expression for classical regression was changed to ŝ2

Reversed inverse regression
x ð1= b0 Þ 2 : x ð1= b0 Þ 2 is greater than ŝ2 : Therefore, the estimates can be arranged in order of increasing magnitude as follows: "inverse," "reversed inverse" and then "classical".This ordering holds for all linear calibrations.The differences among the three estimates depend on r(x, y).In Suh's measurement experiment, r(x, y) is 0.9964 (n = 5), the estimate derived via classical regression at the upper end of the calibration range is approximately 1.5% greater than that derived via inverse regression, and the estimate derived via reversed inverse regression is approximately 0.15% greater than that derived via inverse regression.If Suh had repeated this measurement experiment, the results would have been similar to those of this calibration.Regarding these calibration results, we should remind ourselves that although we rely on the estimate derived via classical regression, we cannot determine the prediction interval with a given confidence level because the estimate cannot be used to express the prediction interval as a t nÀ2 distribution.In addition, we should remind ourselves that the estimate derived via inverse regression is not a theoretically correct one.

Simulation study
We conducted a Monte Carlo simulation study to investigate the accuracy of the statistical properties derived using the error propagation rule and the method of simultaneous error equations based on the first-order Taylor approximation.

var[ b], bias[ b] and E[MSE] were the main targets of investigation because the accuracy of other properties, such as var[â], bias[â], var[ŷ], bias[ŷ] and var [prediction interval]
, depends on the accuracy of these three properties.We designed a simulation of regression line fitting using five data points based on reversed inverse regression.We first created five intended mean data points (x i0 , y i ) (i = 1, …, 5) that were needed for the simulation as follows: Depending on the intended variance s 2 x , the simulation study was organized into five simulation groups, SG1, SG2, SG3, SG4 and SG5, and the intended variances assigned to the five groups were 90 2 , 60 2 , 24 2 , 12 2 and 6 2 , respectively.Five simulations per group were conducted (25 simulations in total).In every simulation, the regression line fitting was repeated 50 000 times using independent random numbers generated from normal distributions using the program "Minitab 15".The results of the conducted simulations are presented along with the corresponding theoretically derived properties in Tables 1 and 2. (Even if different parameters, such as a different number of data points, a different ratio of y to x 0 , or non-equal distances between the x i0 's, were applied in a simulation study, such a simulation study would yield conclusions essentially similar to those of this study.) In Tables 1 and 2 Therefore, we can conclude that the variance of the slope and the expectation of the mean squared error derived using the error propagation rule and the method of simultaneous error equations largely coincide with the simulation results.

According to Table 1, when s 2
x is 6 2 , the ratio of bias[ b] to {var[ b]} 1/2 is approximately À0.01, and when s 2 x is 90 2 , the ratio is approximately À0.14.These two ratios are very different from each other in magnitude.In the case of either simulation or derivation, as the variance s 2 x increases, both the absolute value of the bias in b and the variance of b increase.The rate of increase of the absolute value of the bias in b is equal to the rate of increase of s 2 x (see equation ( 6)), whereas the rate of increase of {var[ b]} 1/2 is the square root of the rate of increase of s 2 x (see equation ( 1)).This indicates that as s 2 x increases, the b distribution becomes more skewed.In Tables 1  and 2, the derived values of the bias in b largely coincide with  x .This indicates that although the first-order Taylor approximation is used to derive the bias in b, the derived bias does not greatly differ from the simulation result.The bias in b plays an important role in analyzing the accuracy of other derived statistical properties.
When s 2 x is small, the derived variance of b exactly coincides with the simulation result; however, when s 2 x is large, the derived variance of b is generally slightly greater than the simulation result.When the variance of b (i.e., var½ b ¼ ½S yy =S 2 xy Ã s 2 x b 2 ) is derived using the error propagation rule, the partial derivatives of orders higher than the first are not included in the derivation, and the approximation var to derive the variance.This results in two phenomena.The first phenomenon is that error terms of orders higher than s 2 x are excluded from the derivation, and the second phenomenon is that the bias in b is not reflected in the derivation.The bias in b depends on s 2 x and n (see equation ( 6)).In this simulation study, n is 5.The first phenomenon typically causes the derived variance of b (i.e., Dvar[ b]) to decrease, whereas the second phenomenon tends to cause it to increase.If s 2 x is small, both effects are trivial, and Dvar x is large, both of these effects are also large.However, the effect of the second phenomenon is much greater than that of the first.As a result, if s 2 x is large, then Dvar  2) into equation (1) in place of b, we can obtain a variance of b that is much closer to the simulation result.For example, for SG1-1, we can obtain (1000/40 000 2 ) Â 90 2 Â 0.0247465 2 = 0.0017607 2  approximately equal to the square of bias[ b].)This value is very close to the simulation result.The difference that still remains can be regarded as the effect of the first phenomenon.
With regard to the expectation of the mean squared error, a similar explanation is possible.Even in this case, the effect of the second phenomenon is greater than that of the first phenomenon, and hence, DE[MSE] is generally greater than SE [MSE].In particular, let us attempt to approximately calculate the effect of the first phenomenon using another expression for the expectation of MSE.Namely, in the equation x , the last term on the right-hand side reflects the effect of the first phenomenon to a certain extent.This equation helps us understand the two phenomena.
In Table 2, if s 2 x is large, then * Dvar[ b] is generally greater than Dvar[ b].In every simulation, the estimate for the variance of the slope, i.e., (S yy /S 2 xy )ŝ 2 , was calculated for each regression line.* Dvar[ b] is the mean of the 50,000 estimates thus calculated.We can also obtain * Dvar[ b] using another method, as follows: The term 2ð7 À nÞS yy s 4 x =S 3 In this section, we investigated the accuracy of the statistical properties of reversed inverse regression as derived using the error propagation rule and the method of simultaneous error equations through comparisons with simulation results.However, it should be noted that the main target that calibration experts wish to obtain (or approach) by means of regression line fitting is the population regression line y = a + bx, not the average regression line y = a E + b E x.In this respect, it is recommended that after the physical or chemical value of a sample is determined based on the fitted regression line, the determined value be corrected taking into account the bias in predicted y value (see equation ( 7)); such a bias correction will lead us closer to the true value.

Conclusion
From Osborne [17], it can be seen that considerable effort has been made to resolve the linear calibration problem since the 1930s.Most representatively, Eisenhart [2] suggested classical regression as a solution for the problem, and Krutchkoff [6] suggested inverse regression as another solution.Later, Parker et al. [4] derived the variances of the prediction interval and the biases in x for these two types of regression using the Delta Method.However, it can be said that the problem has not yet been resolved completely.As a fundamental solution for this problem, the current study introduced reversed inverse regression along with a methodology for deriving its statistical properties.In this study, the statistical properties of reversed inverse regression, such as the variance and bias of the slope, the expectation of the mean squared error, and the variance of the predicted y value, were derived using the error propagation rule and the method of simultaneous error equations.The method of simultaneous error equations, which was introduced for the first time in this study, is a useful tool for deriving the covariance of any two statistics.As another example of its use, all of the statistical properties of basic regression can be derived much more easily with the aid of this method.Even in the case of weighted linear regression, this method can be used to derive its statistical properties.
We presented an example of practical calibration.Each of the three types of regression (i.e., classical, inverse and reversed inverse) was applied to this calibration example.As a result, we found that the estimates of the variance of the prediction interval can be arranged in order of increasing magnitude as follows: "inverse," "reversed inverse" and then "classical".This ordering holds for all linear calibrations.The differences among the three estimates depend on r(x, y).As the next step, to investigate the accuracy of the three derived statistical properties of reversed inverse regression, i.e., Dvar[ b], Dbias[ b] and DE[MSE], a Monte Carlo simulation study was conducted.Through this simulation study, we found that when the variance of the observed measurements, i.e., s 2 x , is small, the theoretically derived variance and bias of the slope as well as the theoretically derived expectation of the mean squared error coincide with the simulation results.However, when s 2 x is large, there are small differences between the derived properties and the simulation results.Such differences are caused by two phenomena.The first phenomenon is that error terms of orders higher than s 2 x are excluded from the derivation, and the second phenomenon is that the bias in b is not reflected in the derivation.The first phenomenon typically causes the derived statistical properties to decrease, whereas the second phenomenon tends to cause them to increase (when n is greater than 3).The effect of the second phenomenon is larger than that of the first phenomenon, and hence, the values of the derived properties are typically slightly greater than the simulation results.In this way, after performing simulations we could investigate and analyze the differences between the derived statistical properties and the simulation results.This is another benefit of the new methodology used to derive the statistical properties of reversed inverse regression.

Implications and influences
Lwin and Maritz [18] suggested that regression models do not require the assumption of fixed inputs.In other words, regardless of whether the regression model of interest is consistent with this assumption, the method of least squares can be applied to fit a regression line.In that sense, P. Kang et al.: Int.J. Metrol.Qual.Eng. 8, 28 (2017) it is meaningless to identify whether the line fitted using one regression approach is preferable to that fitted using another regression approach.However, it is nevertheless essential to know the statistical properties of the type of regression used for fitting.Unfortunately, the known statistical properties of the existing regression approaches are not without flaw.By contrast, all of the statistical properties of reversed inverse regression can be derived using the newly proposed methodology, and the statistical properties derived in this manner are theoretically correct and sufficiently accurate.In this respect, we claim that reversed inverse regression and the new methodology for deriving its statistical properties together serve as a fundamental solution for the univariate linear calibration problem, which had not previously been completely resolved.Finally, we expect this new methodology to be widely used in the field of calibration.
) represents the population-regression-line-based variance, the bias in b is not reflected in the calculation of [S yy / S 2 xy ] * s 2 (=[S yy /S 2 xy ] * s 2 x b 2 ).The bias in b depends on s 2 x and n.The details of the effects of these two factors are explained based on the simulation results in Section 5.For reference, the variance of b for basic regression is [1/S xx ] * s 2 , and this variance is not an approximation but an exact expression.The relationship between the estimates of var [ b] reversed inverse and var[ b] basic for a given set of data points is as follows: Estimate for var½ b revrsed inverse ¼ f1=r 2 ðx; yÞgðEstimate for var½ b basic Þ; ), we should consider the same factors taken into account in the case of the variance of b.The accuracy of the derived E[MSE] is discussed in detail based on simulation results in Section 5.The correlation coefficient between the slope and the mean squared error, r( b, MSE), is derived using the method of simultaneous error equations.Let K = bP (y i ÀâÀ bx i ) 2 = (S xx S yy S xy À S 3 xy )/S 2 xx , A = b = S xy /S xx , and F = P (y i À â À bx i ) 2 = (S xx S yy À S 2 xy )/S xx .Then, two separate equations for deriving the variance of K can be established.The correlation coefficient r( b, MSE) is obtained from these two equations:
, the ratio of Svar[ b] to Dvar[ b] ranges from 0.971 to 1.017 and the ratio of SE[MSE] to DE[MSE] ranges from 0.983 to 1.002.In addition, the two derived variances * Dvar[ b] and Dvar[ b] are very close to each other.

a
Svar[ b] is the variance of b observed through each simulation.The square of the standard deviation of the 50 000 slopes obtained from the 50 000 regression line fittings is treated as Svar[ b].b Sbias[ b] is the mean of the 50 000 slopes subtracted by the slope of the population regression line.c SE[MSE] is the mean of the 50 000 MSEs obtained from the 50 000 regression line fittings.d Dvar[ b] is the variance of b derived using equation (1) based on the intended mean data points and the intended variance s 2 x .e Dbias[ b] is the bias in b derived using equation (6).f DE[MSE] is the expectation of MSE derived using equation (8).P. Kang et al.: Int.J. Metrol.Qual.Eng. 8, 28 (2017) the simulation results regardless of s 2 [ b] is greater than Svar[ b].If we substitute b E (SMean[ b] in Table

xxÃ
reflects the difference between * Dvar[ b] and Dvar[ b].The difference depends on s 4x and n.
P. Kang et al.: Int.J. Metrol.Qual.Eng. 8, 28 (2017) 2 xy (or [1/ n + x 2 (S yy /S 2 xy )]) corresponding to the variations of the x i 's are typically very small compared with the magnitude of S yy /S 2 xy (or [1/n + x 2 (S yy /S 2 xy )]) itself.Based on these t n-2 distributions, we can evaluate the uncertainty (or confidence interval) of a measurement value determined based on the fitted regression line.

Table 1 .
Simulation results and theoretically derived properties.

Table 2 .
by substituting b E (= 0.0247465) into ½S yy =S 2 xy Ã s 2 x b 2 E .(Thedifference between ½S yy =S 2 xy Ã s 2 x b 2 and ½S yy =S 2 xy Ã s 2 x b 2E is Ratios of the simulation results to the corresponding derived properties. is the mean of the 50 000 estimates thus calculated.b SMean[ b] is the mean of the 50 000 slopes obtained from the 50 000 regression line fittings. a In every simulation, the estimate for the variance of b, i.e., (S yy /S 2 xy )ŝ 2 , was calculated for each regression line.*Dvar[ b]