Response surface designs using the generalized variance inflation factors

We study response surface designs using the generalized variance inflation factors for subsets as an extension of the variance inflation factors. Subjects: Mathematics & Statistics; Science; Statistical Computing; Statistical Theory & Methods; Statistics; Statistics & Probability


Introduction
We consider a linear regression Y = X + with X a full rank n × p matrix and ( ) = N(0, 2 I n ). The variance inflation factor VIF, Belsley (1986), measures the penalty for adding one non-orthogonal additional explanatory variable to a linear regression model, and they can be computed as a ratio of determinants. The extension of VIF to a measure of the penalty for adding a subset of variables to a model is the generalized variance inflation factor GVIF of Fox and Monette (1992), which will be used to study response surface designs, in particular, as the penalty for adding the quadratic terms to the model.

Variance inflation factors
For our linear model Y = X + , let D X be the diagonal matrix with entries on the diagonal D X [i, i] = (X � X) the explanatory variables to the "ideal" variances had the columns of X been orthogonal. Note that we follow Stewart (1987) and do not necessarily center the explanatory variables. For our linear model Y = X + , view X = [X [p] , x p ] with x p the p th column of X and X [p] the matrix formed by the remaining columns. The variance inflation factor VIF p measures the effect of adding column x p to X [p] . For notational convenience, we demonstrate VIF p with the last column p. An ideal column would be orthogonal to the previous columns with the entries in the off-diagonal elements of the p th row and p th column of X ′ X all zeros. Denote by M p the idealized moment matrix The VIFs are the diagonal entries of S −1 X = D −1 X (X � X) −1 D −1 X . It remains to note that the inverse, S −1 X , can be computed using cofactors C i,j . In particular, the ratio of the determinant of the idealized moment matrix M p to the determinant of the moment matrix X ′ X. This definition extends naturally to subsets and is discussed in the next section.
For an alternate view of the how collinearities in the explanatory variables inflate the model variances of the regression coefficients when compared to a fictitious orthogonal reference design, consider the formula for the model variance where R 2 j is the square of the multiple correlation from the regression of the j th column of X = [x ij ] on the remaining columns as in Liao and Valliant (2012). The first term 2 ∕ ∑ (x ij − x j ) 2 is the model variance for ̂ j had the j th explanatory variable been orthogonal to the remaining variables. The second term 1∕(1 − R 2 j ) is a standard definition of the j th VIF as in Thiel (1971).

Generalized variance inflation factors
In this section, we introduce the GVIFs as an extension of the classical variance inflation factors VIF from Equation 1. For the linear model Y = X + , view X = [X 1 , X 2 ] partitioned with X 1 of dimension n × r usually consisting of the lower order terms and X 2 of dimension n × s usually consisting of the higher order terms. The idealized moment matrix for the (r, s) partitioning of X  Jensen and Ramirez (1993). A similar measure of collinearity is mentioned in Note 2 in Wichers (1975), (1) (2) Theorem 1 of Berk (1977), and Garcia, Garcia, and Soto (2011). For the simple linear regression model with p = 2, Equation 2 gives VIF = 1 1− 2 with the correlation coefficient as required. Fox and Monette (1992) suggested that X 1 contains the variables which are of "simultaneous interest," while X 2 contains additional variables selected by the investigator. We will set X 1 for the constant and main effects and set X 2 the (optional) quadratic terms with values from X 1 . Willan and Watts (1978) measured the effect of collinearity using the ratio of the volume of the actual joint confidence region for ̂ to the volume of the joint confidence region in the fictitious orthogonal reference design. Their ratio is in the spirit of GVIF as det(X � X) is inversely proportional to the square of the volume of the joint confidence region for ̂ . They also introduced a measure of relative predictability and they note: "The existence of near linear relations in the independent variables of the actual data reduces the overall predictive efficiency by this factor." For a simple case study, consider the simple linear regression model with n = 4, x 1 = [−2, −1, 1, 2] � , and y = [4, 1, 1, 4] � . The 95% prediction interval for x 1 = 0 is 2.5 ± 10.20. If the model also includes x 2 = [−2.001, −1.001, 1.001, 2.001] � , then the 95% prediction interval for (x 1 , x 2 ) = (0, 0) is 2.5 ± 46.02 demonstrating the loss of predictive efficiency due to the collinearity introduced by x 2 .
For the (r, s) partition of X = [X 1 , X 2 ] with X 1 of dimension n × r and X 2 of dimension n × s, set and denote the canonical moment matrix as with determinant equivalently, In the case {r = p − 1, s = 1}, X 2 = x p is a n × 1 vector and the partitioned design X = [X 1 , From standard facts for the inverse of a partitioned matrix, for example, Myers (1990, p. 459) We study the eigenvalue structure of M (r,s) . It is shown in Appendix 1 that an alternative formulation for GVIF is 4. Quadratic model with p = 3 For the partitioning X = [X r |X s ], the canonical moment matrix, Equation 3, has the identity matrices I r , I s down the diagonal and off-diagonal array . For the quadratic model . Surprisingly, many higher order designs also have the off-diagonal entry of the canonical moment matrix with a unique positive singular value with GVIF(X 2 |X 1 ) = 1 1− 2 X with the collinearity between the lower order terms and the upper order terms as a function of the canonical index 2 X .

Central composite and factorial designs for quadratic models (p = 6)
In this section, we compare the central composite design (CCD) X of Box and Wilson (1951) and the factorial design Z. The design points are shown in Table B1 of Appendix 2. Both designs are 9 × 6 and use the quadratic response model The CCD traditionally uses the value a = √ 2 in four entries, while the factorial design uses the value a = 1. To study the difference in the designs with these different values, we computed the GVIF to compare the "orthogonality" between the lower order terms X 1 of dimension 9 × 3 and the higher order quadratic terms X 2 of dimension 9 × 3. The off-diagonal B 3×3 entry of R from Equation 3 in Section 3 has the form √ 10, 2 X = 8∕10, and GVIF(X 2 |X 1 ) = 5. Surprisingly, the classical choice of a = √ 2 gives the largest value for GVIF(X 2 |X 1 ), that is the worst value, indicating the greatest collinearity between the lower and higher order terms, as noted in O'Driscoll and Ramirez (in press).

Larger designs (p = 10)
We consider the quadratic response surface designs for with n responses and with X partitioned into X 1 |X 2 with X 1 the four lower order terms (r = 4) and X 2 the six quadratic terms (s = 6). Four popular designs are given in Appendix 2. They are the hybrid designs (H310 and H311B) of Roquemore (1976) Tables B2 and B3, the Box and Behnken (1960) (BBD) design Table B4, and the CCD of Box and Wilson (1951) Table B5.
For each design, we compute the 10 × 10 canonical moment matrix. It is striking that, for all these designs, the off-diagonal 4 × 6 array in R has only one non-zero singular value with its square the canonical index 2 X . It follows that GVIF(X 2 |X 1 ) = 1 1− 2 X . Table 2 reports that the design H310 is the most conditioned with respect to the GVIF with the least amount of collinearity between the lower and higher order terms.

More complicated designs with ordered singular values
Let X be the minimal design of Box and Draper (1974) BDD with n = 11 from Table B6, and let Z be the small composite design of Hartley (1959) SCD with n = 11 from Table B7 for the quadratic response surface model (r = 4 and s = 6) as in Equation (5). Let = { 1 ≥ … ≥ r ≥ 0} and = { 1 ≥ … ≥ r ≥ 0} be the non-negative singular values of the off-diagonal array for R X and R Z , respectively. As i ≤ i (1 ≤ i ≤ r) (Table 3), it follows that GVIF(X 2 |X 1 ) ≤ GVIF(Z 2 |Z 1 ) showing less collinearity between the lower and higher order terms for the BDD design. (5)

An improved H310 design
When the diagonal matrix r×s in Equation 6 in Appendix 1 has only one non-zero entry, we have denoted the square of this value the canonical index. We extend this definition to the case when −1∕2 has multiple positive singular values. The Frobenious norm for a rec- We examine, in detail, the H310 design matrix X 11×10 , Table B2 in Appendix 2, with our attention to the value of −0.1360 in row 2 for x 3 . In succession, we will replace the values {1.1736, 0.6386, −0.9273, 1.0000, 1.2906, −0.1360} by a free parameter and use 2 X to determine an optimal value. For example, replacing the four entries which are 1.1736 with c 1 , we calculate the minimum value for 2 X = 0.8199 with c 1 = 1.1768 denoted c min in Table 4. These values are within the four digit accuracy of the data. We performed a similar calculation with c 2 using the four entries which are 0.6386; with c 3 with the four entries which are −0.9273; with c 4 with the eight entries which are 1; and with c 5 with the single entry 1.2906. The original design has 2 X = 0.8199. The entries in the H310 design are given to four significant digits. With this precision, the original design is nearly optimal with respect to the canonical index 2 X for the first five entries in Table 4. The sixth entry of c 6 = −0.1360 was not optimal with 2 X = 0.8181 with c min = −0.01264, a magnitude value smaller.
Denote the "improved" H310 design as the H310 design with the value of c 6 = −0.01264. The "improved" H310 also has a unique positive singular value for the off-diagonal of R with its square the canonical index 2 X . All of the standard design criteria favor the "improved" H310 design over the H310 design, which was originally constructed based on the rotatability criterion to maintain equal variances for predicted responses for points that have the same distance from the design center. As usual A(X) = tr((X � X) −1 ), D(X) = det((X � X) −1 ), and E(X) = max{eigenvalues of (X � X) −1 }. The small relative changes Δ in the design criteria are shown in Table 5 in Column 4.
The abnormality of the second row in H310 has been noted in Jensen (1998) who showed that the design is least sensitive to the second row of X, the row containing the value c 6 = −0.1360.

Conclusions
The VIF measure the penalty for adding a non-orthogonal variable to a linear regression. The VIF can be computed as a ratio of determinant as in Equation 1. A similar ratio criterion was studied by Fox and Monette (1992) to measure the effect of adding a subset of new variables to a design and they dubbed it the generalized variance inflation factor GVIF, Equation 2. We have noted the relationship between GVIF and the singular values of the off-diagonal array in the canonical moment matrix and have used GFIV to study standard quadratic response designs. The H310 design of Roquemorer (1976) was shown not to be optimal with respect to GFIV and an "improved" H310 design was introduced which was favored over H310 using the standard design criteria A, D, and E.

Appendix 1
We study the eigenvalue structure of M (r,s) .
As with the canonical correlation coefficients Eaton (1983), write the off-diagonal rectangular array B r×s of R as PΛQ � with P and Q orthogonal matrices and r×s the rectangular diagonal matrix with the non-negative singular values down the diagonal. Set For notational convenience, we assume r ≤ s. The matrix L is orthogonal and transforms R →L ′ RL into diagonal matrices: L = P r×r 0 r×s 0 s×r Q s×s .  r×(s−r) ] where SV r×r is the diagonal matrix of the non-negative singular values. Since L is orthogonal, this transformation has not changed the eigenvalues. To compute the determinant of R, convert the matrix in Equation 6 into an upper diagonal matrix by Gauss Elimination on � s×r . This changes r of the 1 ′ s on the diagonal in rows r + 1 to r + r into 1 − 2 i , and thus The singular values of R 12 = X � 1 X 1 −1∕2 (X � 1 X 2 ) X � 2 X 2 −1∕2 are the non-negative square roots of the eigenvalues of ′ denoted by If the trace of the inverse of the matrix in Equation 6 is required, then we note that with trace given by tr(