Path analysis under multicollinearity in S 0 x S 0 maize hybrids

The goal of this study was to partition genotypic correlations into direct and indirect yield components effects of grain weight of S0 x S0 maize hybrids by means of path analysis. Since multicollinearity was expressed in the independent variable correlation matrix, data adequacy to the analysis was evaluated. Least squares alternative methods were used to avoid the adverse effects of this phenomenon. The experimental design was randomized complete blocks with two replications. The associations among traits obtained from the path analysis under multicollinearity, using least squares solutions, showed little coherence. When using the ridge path analysis or the analysis with trait culling, the traits ear number and 50-kernel weight had the highest direct effects on grain weight/plant. These methods were efficient to reduce the multicollinearity effects.


INTRODUCTION
Correlation studies allow for the verification of indirect selection viability in providing genetic gains faster than in direct selection.However, an interpretation of a simple correlation can be the wrong procedure in the selection strategy due to the fact that a high correlation between traits can be a consequence of other traits (Dewey and Lu, 1959).
It is necessary to obtain path coefficients to estimate direct and indirect effects of a trait group on a basic trait.These coefficients are calculated through regression equations using previously standardized variables.However, coefficient estimates can be adversely affected by multicollinearity between traits.Depending on the degree of multicollinearity, the variances associated with path coefficient estimates can assume high but less reliable values (Carvalho, 1995).
In order to avoid multicollinearity adverse effects, the parameters can be estimated using either trait culling or the least squares alternative method proposed by Carvalho (1995).This method modifies the normal equation system adding a constant K in the diagonal of the independent variable correlation matrix, similar to the ridge regression proposed by Hoerl and Kennard (1970a).
The objectives of this study were: (i) to partition genotypic correlations into direct and indirect yield component effects on grain weight of S 0 x S 0 maize hybrids by means of path analysis; (ii) to evaluate data suitability to this analysis using least squares alternative methods to avoid adverse effects of multicollinearity in the independent variable correlation matrix.

MATERIAL AND METHODS
The experiment was carried out in Viçosa, MG, Brazil, with 130 S 0 x S 0 maize hybrids from Flint x Dent crosses.The Flint and Dent composite were synthesized in the Genetic Institute at ESALQ/USP.They were submitted to two mass selection cycles for prolificacy before beginning the recurrent reciprocal selection program.The Dent composite was obtained from the cross among white and yellow dented maize, mainly of the Tuxpeño race, representative of Mexican, Central and South American germplasms.The Flint composite was obtained from the cross among hard white and yellow maize, representative of Central American, Colombian and Brazilian germplasm.
The experimental design was randomized complete blocks with two replications.Each plot included one 10.2 meters long row, spaced one meter apart.Two seeds/hole were sowed with 0.3 meters between holes in the rows.After 40-45 days, thinning was done, and one plant/hole remained, totalling 36 plants/row.
The assessed traits were: a) grain weight/plant -GW (kg/10.2m 2 ), b) ear number -EN, c) 50kernels weight -50KW (g), d) number of leaves above the first ear -LA, e) number of leaves below the first ear -LU, f) plant height -PH (m), g) ear height -EH (m), h) root lodging -RL, i) stalk lodging -SL and j) days to flowering -DF.
Genotypic correlation estimates (r) were obtained according to Mode and Robinson (1959).The variance used to test the significance level of r was calculated as proposed by Vencovsky and Barriga (1992).The correlations were partitioned into direct and indirect effects by means of path analysis (Wright, 1921).In the regression model established, GW was the dependent variable and the other traits were considered independent.
The correlation matrix among independent variables had at least one value close to one.This condition is enough to cause problems in the regression analysis due to multicollinearity (Kmenta, 1971).Path coefficients were estimated discarding the traits that contributed most to the phenomenon.It was also used an alternative method to least squares, proposed by Carvalho (1995).This method adopts the following equation: where X X ' is the correlation matrix among independent variables of the regression model; K is a small amount added to X X ' matrix diagonal; is the identity matrix (p = parameter number); * b is the ridge path coefficient vector; and Y X ' is the correlation matrix between the dependent variable of the regression model and each independent.The value of the constant K was determined by the ridge trace exam (Hoerl and Kennard, 1970a,b).Ridge trace appeared as a two-dimensional chart, showing how the path coefficient values vary as a function of K in the interval 0< K <1.K was the smallest value capable of stabilizing most of the path coefficients.
The multicollinearity degree of the X X ' matrix was established on the basis of its number of condition (NC = ratio between higher and smallest positive eigenvalues of the matrix) (Montgomery and Peck, 1981).Multicollinearity did not cause serious problems to the analysis when NC<100.Multicollinearity was considered moderate to strong when 100<NC<1000, and severe when NC>1000 .The eigenvalue analysis was also used to identify linear dependency causes between traits to determine those which contributed to multicollinearity.Traits that smallest the highest elements of eigenvectors associated to the smallest eigenvalues, were those which contributed more to multicollinearity (Belsley et al., 1980).Multicollinearity diagnosis, as all the other analysis, was carried out by the GENES software (Cruz, 1997).

RESULTS AND DISCUSSION
Significant differences (P<0.01) were obtained among the S 0 x S 0 maize hybrids for all the traits assessed (Table 1).The coefficients of variation varied from 4.42 to 16.62 for GW, EN, 50KW, PH and EH.These values are considered low or medium according to classification by Scapim et al. (1995), which indicates good experimental accuracy.The variation coefficients varied from 2.12 to 70.42 for the traits LA, LU, RL, SL and DF.Although there is no appropriate classification in the literature for the coefficients of variation associated with these traits in maize, similar values were obtained by Soares (1979) and Ferrão (1985).
Genotypic correlations among traits are shown in Table 2. Ear number (r = 0.67), 50KW (r = 0.30) and LA (r = 0.30) showed highest correlations with GW.Path analysis was performed to verify if correlation magnitudes represented cause and direct effect or indirect effect of other traits (Table 3).

2001, Sociedade Brasileira de Melhoramento de Plantas
When using the least squares method (K= 0) and solving analysis with all the traits assessed, a considerable direct effect of RL on GW (0.691) was obtained.This implies that increasing root lodging helped to increase grain weight.The little coherence of this result was considered a multicollinearity adverse effect.
The NC value (52.77) for the X X ' matrix shows light multicollinearity among independent variables.However, when properties of the genotypic correlation matrix were tested, one negative eigenvalue was found.By definition, the correlation matrix is non-negative definite, so its eigenvalues should be zero or positive (Searle, 1971).Furthermore, the matrix determinant was negative (D = -0.0139).Therefore, in spite of NC<100, the negative eigenvalue and the negative matrix determinant show the possibility of having problems in the analysis (Hill and Thompson, 1978) carried out by this study probably due to multicollinearity.The high correlation between PH and EH (0.94) points out the association between these characters.Excluding the PH trait from the X X ' matrix, its determinant became positive (D = 0.0148) and only non -negative eigenvalues Table 1 -Analysis of variance, mean and coefficient of variation of ten traits assessed in a S 0 x S 0 maize hybrid experiment.Crop Breeding and Applied Biotechnology, v. 1, n. 3, p. 263-270, 2001 Table 3 -Estimates of direct effects of yield components on grain weight/plant (GW) obtained from maize S 0 x S 0 hybrids assessment.
1 For abbreviations see Table 1. 2 The path coefficients were estimated using the ridge path analysis (K = 0 or K = 0.25) and with trait culling.
The least squares procedure gives unbiased estimators of minimum variance (Searle, 1971).However, since the X X ' matrix is adversely affected by multicollinearity ,better results can be obtained using either the ridge path analysis or trait culling.This occurs because the least squares estimator cannot establish reliable * b ( K = 0) estimates.The path coefficient vector is an inverse function of that matrix.If there is perfect multicollinearity between some of the independent variables, the matrix will be singular, that is, there shall not be an inverse matrix (Carvalho et al., 1999a).An infinite number of vectors can be established from the generalized inverses, but none of them will have practical meaning.In this context, these authors argue that as the correlation matrix fits best this condition, the path coefficient estimates become less reliable due to of the an increase of the variances associated with these coefficients.
In the ridge path analysis, as K increases, the path coefficient variances are reduced.However, estimates become more biased (Carvalho, 1995).As a consequence, it's not possible to obtain desirable properties of the estimators for such high value.The chosen K value must be one that reduces estimator variance giving only a small bias.So, the mean square error (variance + bias square) will be smaller than the least squares solution (Hoerl and Kennard, 1970b).By examining the ridge trace, the b estimates were similar for the K values close to zero.However, the stability of this vector was only obtained for k=0.25.
Since there is no statistical test to verify if the mean square error of * b ( K = 0) is lower than the one obtained by the least squares method, it's difficult to decide whether ridge path analysis estimates are better.However, the values showed higher coherence when K = 0.25 (Table 3).In this case, the direct effect of RL on GW was relatively low (-0.178).Positive direct effects for PH (0.457) and for 50KW (0.504), and direct effects close to zero for LA (0.018) and DF (0.080) were also found.These results were not obtained with K = 0.
In addition to obtain the K value, the ridge trace detects multicollinearity and the variables that most affect it (Draper and Smith, 1981).On Figure 1, variation in parameter estimates on the interval 0< K <1 indicated multicollinearity.The more the coefficients oscillated, the more  associated variables contributed to the phenomenon.For better demonstration, the figure shows the ridge trace refering to the five traits that contributed most to multicollinearity.
The contribution of each trait to multicollinearity was confirmed by the analysis of the eigenvector elements associated with the lowest eigenvalues.Plant height showed the highest eigenvector element associated with the lowest eigenvalue of the X X ' matrix, and ,therefore , contributed more to multi-collinear effects.In general, the discarding/ non-utilization of this trait in the analysis allowed for a path coefficient attainment similar to that of the ridge path analysis (Table 3).The multiple determination coefficient (R 2 = 0.67) for the established regression equation was similar to that of the least squares analysis considering all traits assessed (R 2 = 0.62), and slightly superior to that of the ridge analysis (R 2 = 0.56).Similar results in the path analysis using these alternative methods were obtained with sweet pepper (Carvalho et al., 1999b).In spite of the simplicity of culling traits, Johnston (1972) and Heady and Dillon (1969) describe limitations to the culling of independent variables on regression analyses.
Despite the discrepancy in the results, the three methods indicated positive direct effect (0.613 to 0.825) of EN on GW.In this case, the multivariate analysis on the basis of the path analysis confirmed the result obtained in the simple correlation study.However, this was not verified, e.g., for LA.In table 3, the indirect effects of traits are not observed, and were negligible in general.The low indirect effect of most traits were also found by Singh et al. (1995) in the genetic analysis of maize.Considering the results described by the ridge path analysis and the analysis with trait culling, it can be seen that EN and 50KW had the highest direct effect on GW.These traits have been shown to influence grain yield in some experiments (Ottaviano and Camussi, 1981;Agrama, 1996;Djordjevic and Ivanovic, 1996;Arias et al., 1999).Although AP showed a direct positive effect on GW, of similar magnitude to that obtained for 50 KW, the feasibility of using this result (increase in AP leading to increase in GW) in the selective process is small.The reason for this is that the maize breeder seeks increase in yield without great increase in the plant height which would imply more lodging and breaking.Furthermore, the relationship of PH with GW depends on EH (indirect effect = -0.28)and the inclusion of these traits in a multivariate analysis may result in adverse effects caused by multicollinearity in the matrix.The possibility of using this matrix in genetic breeding is discussed by Carvalho (1995) and Carvalho et al. (1999a).
Positive and significant direct effect of PH on GW were also estimated on narrow-base maize (Djordjevic and Ivanovic, 1996).

Figure 1 -
Figure 1 -Path coefficient estimates ( * b ) as a function of K values.