Consensus Values, Regressions, and Weighting Factors

An extension to the theory of consensus values is presented. Consensus values are calculated from averages obtained from different sources of measurement. Each source may have its own variability. For each average a weighting factor is calculated, consisting of contributions from both the within- and the between-source variability. An iteration procedure is used and calculational details are presented. An outline of a proof for the convergence of the procedure is given. Consensus values are described for both the case of the weighted average and the weighted regression.


Introduction 2. Review
The problem of computing consensus values when the errors of measurement involve both internal (within group) and external (between group) components has been discussed in a number of papers [1][2][3][4]. The present authors have studied the case of a simple weighted average, as well as that in which the measured quantity y '\s & hnear function of a known variable x.
In the present paper we extend our results to cases in which the error standard deviations are functions, of known form, of the x-variables. We also provide an outline of a proof for convergence of the iterative process described in reference [1].
While our procedure is entirely reasonable, and results in acceptable values, we have no mathematical proof that the weights, which we calculate from the data, are optimal in any well-defined theoretical sense. The problem has been recognized in the literature [5], but we know of no attempt to provide the proof of optimality.
If o), denotes the weight (reciprocal variance) of a quantity, %, then the general equation for a weighted average is: (1)

X^'
If Equation (3) is used in "reverse fashion" to estimate the Wj and Y from the sample data. This is possible if in eq (2), the cr". are estimated from the within-group variability, so that the only unknown is o-fc. Note in eq (3) that or^ is embedded within each weight and therefore within Y. The estimated or^". and (Tb can also be used to estimate the standard deviation of the weighted average, which is equal to l/v2cu,. Henceforth, we use the symbol e), for the sample estimate of o)/. The same general reasoning holds for the weighted regression case. The variance of a simple weighted average is replaced by the residual mean square from a weighted least squares regression. where W; is given by eq (2) and % is the fitted value.
We now describe the case of a weighted regression with /?=2, The fitted value %, for the Jth group can be written as foUows: Y=a+^X, where ^ is a weighted average analogous to the weighted average described by eq (1), and d and $ are weighted least squares estimates of the coefficients, a and ^3. Again, the only unknown is o-b, which can now be estimated from sample data by use of eq (4).
A direct solution for cTb in either eq (3) or (4) would be extremely complicated since &),, Y, and 1^ all contain cr^. The number of terms m, in both equations will vary depending on the number of groups in a particular sample data set. Furthermore, for the regression case, the / § and X also depend on o-b-Therefore an iterative solution was proposed in reference [1]. This iterative procedure is central to the practical solution of either eq (3) or (4). In order that this paper be self-contained, we briefly review the iterative procedure for the regression case using eq (4) withp =2.

Iteration Procedure
We define the function: In view of eqs (2) and (4), the estimate s^ of crl must be such that F(s^)=0. For ease of notation let Sb=v. Start with an initial value, Vo=;0, and calculate an initial set of weights and then evaluate eq (6). In general, F{Sti) will be different from zero. It is desired to find an adjustment, dv, such that F(vo-hdv)=0. Using a truncated Taylor series expansion, one obtains: Evaluating the partial derivative in this equation, one obtains: The adjusted (new) value for v is: New tJo=01d Uo+dv.
This new value is now used and the procedure is iterated until dv is satisfactorily close to zero.
The iterative procedure is easily adapted to the computer. The programming steps are as follows: 1. Evaluate the s^^ from the individual groups of data. 2. Start the iteration process with a value of VQ just slightly over zero. 3. Evaluate eq (2) to get estimates of o),. 4. Fit eq (5) by a weighted least squares regression of Yi on Xi, and get estimates of the Yj. 5. Use eq (6) to evaluate FQ. If FO<0, then stop the iteration and set -u^O. If not, continue with 6. 6. Use eq (7) to evaluate dv. 7. If dv is positive and small enough to justify stopping, then stop. If it is positive, but is not small enough, repeat steps 3-7 [using the new Vo from eq (8)].
The consensus values are the final coefficients of the regression equation. One is also interested in the final v = Jb value since this is needed to characterize the imprecision of the fit.
For the case of a weighted average [see eq (1)] the above iteration steps are the same, except that in place of step 4, Y is calculated by eq (1), and steps 5 and 6 use Y in place of Y, and imity is used for the p value. The authors have frequently used this procedure for the evaluation of Standard Reference Materials [6]. desired between-group component of variance is thus: The weights estimated by eq (2) would then be: This newly defined weight can be used in the iteration process. The iteration process proceeds as before, but now the adjustable iteration parameter v' is the multiplier needed to make eq (4) true, that is, to make it consistent with the sample data sets. The denominator of eq (7) which is used in iteration step 6 for calculating dv, needs to be slightly modified since the derivative of F with respect to v now contains the function described by eq (10).
dv' Fo 2o>Hc+dx,y{Y-fd' All other steps in the iteration process are the same. The final between-group components of variance will be described by eq (10).

Theoretical Extensions
Once one recognizes the between-as well as the within-group component of variance in the evalviation of consensus values, one can begin to consider functional forms for these components. The withingroup component can be of any form, and can be easily handled since the appropriate sample values of the component are simply substituted into the weights described by eq (2). Thereafter, this component does not affect the iteration procedure. See for example reference [7], where the within component of variance refers to a Poisson process. The between-group component, however, affects the iteration procedure and must be handled more carefully. As an example, consider the case where the between-group component of standard deviation is believed to be a Unear function of the level of X,: crb=;7 + 5X,- Let us assume that we have preliminary estimates, c and d for the y and S coefficients. Suppose further that we wish to adjust the estimated value of the variance by a fixed scale factor, say v'. The

Example
The iteration process will be used to fit the data of table 1 to a straight line. These are real data taken from a large interlaboratory study for the determination of oxygen in sUicon wafers. A preliminary examination of the data indicates that the within error has a constant standard deviation and that the between error has a standard deviation proportional to X. Thus, the error structure for the example is given by the equation: where v now stands for the product v'd^ of eq (2').
From the replicates, the pooled within standard deviation is readily calculated to be 0.265. The iteration process then yields the following results  The figures support the assumpions made concerning the nature of the within and between errors.

Sketch of Proof of Convergence
The general functional form of the F of eq (6) is as shown in figure 2a or 2b. It is because of the nature of these forms that convergence always occurs. If the functional form of F is as shown in figure 2a, the previously described iterative procedure is used to determine the s^ satisfying the equation F(5'b)=0. If an initial estimate of Jb is chosen that is very slightly above zero, then convergence of the iteration process always occurs. This is a result of the fact that the first derivative of the function F with respect to s^ is negative, and the second derivative is positive. This means that each iteration will undershoot, since the iteration process extrapolates the slope of the F curve at the current s^ estimate to the F=0 value. Since each new iteration estimate of .Jb is the abscissa value of the inter-section of the tangent line with the F=0 horizontal line, the iteration process will never overshoot and convergence is obtained.
If the form is that of figure 2b, then there will be no positive solutions for the s^ that is associated with the function F. This represents a situation where the variability between the sample groups is less than that expected from the variability within the sample groups. For this situation FQ is negative and 5b is set to zero (see iteration step 5).
The proof regarding the signs of the first and second derivatives of F with respect to s^ follows. The simple regression case will be considered, with the Sb =v being constant. (The extension to the variable s^ case is straightforward.)

Proof That the First Derivative of F is Negative
An examination of eqs (1), (2), (5'), and (6) shows ct); and Yj to be functions of s^. Equation 5' also indicates that Y, X , and )3 are functions of 5b. We start with the first derivative of the F of eq (6) The derivative of co, will frequently be encountered in the following material. At this point, it will be convenient to note its value: Continuing, and making use of eq (5'): -2(£(l^-; §Z))i:a,,.(l^-l^) The last two terms of eq (Al) each contain summations that are equal to zero, so these terms drop out. Next, an examination of the remaining term shows that each product is a positive square, and that the summation is preceded by a minus sign. Thus, the first derivative is negative.

Proof That the Second Derivative of F is Positive
The evaluation of the second derivative is involved and only an outline of the steps is presented.

(x-^y
The first term on the r.h.s. is the "total" weighted sum of squares of Z. The second term is the weighted sum of squares for the regression of Z on X. Therefore the difference between the two terms is a "residual" sum of squares:

XMz,-je)^
where Z, is the fitted value of Z, in a weighted regression of Z on X. Thus, the second derivative is positive for *(),•> 0. The iteration process therefore will never overshoot, and convergence is always assured.

Extensions
The extension of the convergence proof to the variable si case is very similar to that given above. Two basic changes are needed. These changes, which introduce a function of Jf,, are in the derivative of to, and in the definition of Z,.

Equation (10) represents an example of a variable
Sb. For that case,

g{Xd = (c+dXdK
The reader may note that the new Z, contains X/, and that Z, is regressed on X,. The argument does not require that this regression "make sense", only that the sum of squares can be partitioned by a regression process. Again, convergence is obtained.
The weighted average is a special and simple application of the weighted regression case.

Acknowledgments
We wish to thank Miss Alexandra Patmanidou, a graduate student at Johns Hopkins University. She noted that whenever a negative correction in the iteration process is obtained, the process should be terminated and the between-group component of variance set to zero.