Combined Multistage Linear Genomic Selection Indices To Predict the Net Genetic Merit in Plant Breeding

A combined multistage linear genomic selection index (CMLGSI) is a linear combination of phenotypic and genomic estimated breeding values useful for predicting the individual net genetic merit, which in turn is a linear combination of the true unobservable breeding values of the traits weighted by their respective economic values. The CMLGSI is a cost-saving strategy for improving multiple traits because the breeder does not need to measure all traits at each stage. The optimum (OCMLGSI) and decorrelated (DCMLGSI) indices are the main CMLGSIs. Whereas the OCMLGSI takes into consideration the index correlation values among stages, the DCMLGSI imposes the restriction that the index correlation values among stages be zero. Using real and simulated datasets, we compared the efficiency of both indices in a two-stage context. The criteria we applied to compare the efficiency of both indices were that the total selection response of each index must be lower than or equal to the single-stage combined linear genomic selection index (CLGSI) response and that the correlation of each index with the net genetic merit should be maximum. Using four different total proportions for the real dataset, the estimated total OCMLGSI and DCMLGSI responses explained 97.5% and 90%, respectively, of the estimated single-stage CLGSI selection response. In addition, at stage two, the estimated correlations of the OCMLGSI and the DCMLGSI with the net genetic merit were 0.84 and 0.63, respectively. We found similar results for the simulated datasets. Thus, we recommend using the OCMLGSI when performing multistage selection.

composed entirely of the additive effects of genes, and that the LPSI and the net genetic merit have bivariate normal distribution. When the phenotypic and genotypic covariance matrices of the individual traits under selection are known, the LPSI is the best linear predictor of the net genetic merit and is called optimum LPSI. The major advantages of the LPSI are that it assigns higher weights to traits whose differences are genetic and that it is relatively simple to analyze. Its disadvantages are that it requires large amounts of information, that the economic weights are difficult to assign and that the sampling error could be large. All linear selection indices associated with the LPSI theory have the same advantages and disadvantages (Cerón-Rojas and Crossa 2018, Chapters 2 to 6). Cochran (1951) was the first to propose a two-stage LPSI, and Young (1964) generalized the Cochran (1951) approach to the multistage LPSI context. Young (1964) combined the independent culling selection method (Hazel and Lush 1942) and the LPSI theory to develop the optimum multistage LPSI (OMLPSI) for selecting several traits. One stage is the individual age (the length of time that a plant or animal has lived) at which the breeder measures and selects the individual based on the traits of interest into one specific selection cycle. Thus, suppose two vectors of individual traits, x and y, become evident at different animal or plant ages. In the single-stage context, both vectors of information are selected jointly using LPSI, whereas in the two-stage selection context, first x is selected at stage 1, and, at stage 2, we select x and y using OMLPSI. Young (1964) called the OMLPSI the "part and whole index selection method", whereas Xu and Muir (1992) called it "selection index updating" because as traits become available, each subsequent index contains all traits available up to that stage.
Breeders apply the OMLPSI mainly in animal and tree breeding, where the target traits become evident at different individual ages. The OMLPSI is a cost-saving strategy for improving multiple traits because the breeder does not need to measure all traits at each stage. That is, the main advantage of the OMLPSI over the single-stage LPSI is that the breeder does not need to carry a large population of individuals throughout the multi-trait selection process. If the breeder selects with the LPSI, the same individuals for each trait of interest must be measured, and then the number of traits measured per mature individual is the same as for an immature individual. When traits have a developmental sequence in ontogeny, or when there are large differences in the costs of measuring several traits, the efficiency of OMLPSI over LPSI, in terms of cost saving, can be substantial Muir 1991, 1992;Xu et al. 1995;Xie et al. 1997).
Some problems associated with the OMLPSI are as follow. First, after the first selection stage, the OMLPSI values could be nonnormally distributed. Second, for more than two stages, the OMLPSI requires multiple integration techniques to derive selection intensities. Third, there are problems of convergence when the traits and the index values of successive stages are highly correlated, and finally, the computational time could be unacceptable if the number of selection stages becomes too high (Börner and Reinsch 2012). For these reasons, Xu and Muir (1992) developed selection index updating or the decorrelated multistage linear phenotypic selection index (DMLPSI).
In a similar manner as Cochran (1951) and Young (1964) developed the OMLPSI, Xu and Muir (1992) developed the DMLPSI combining the independent culling selection method and the LPSI theory for selecting several traits in the multistage context. However, while the OMLPSI theory takes into account the correlation among the OMLPSI values at different stages when predicting the net genetic merit, the DMLPSI theory imposes the restriction that the correlation between the DMLPSI values at different stages be zero; hence the name decorrelated multistage index (Börner and Reinsch 2012). Under this restriction, exact truncation points and selection intensities can be determined for a fixed selection proportion before selection is carried out, and the selected individual index values after the first selection stage could be normally distributed; in addition, it is not necessary to use multiple integration techniques to derive selection intensities. Xu and Muir (1992) derived a set of nonlinear equations in the DMLPSI context to obtain truncation points and selection intensities, and indicated that these equations may be solved iteratively using a multidimensional Newton method. Xu et al. (1995), however, found that the Newton method is sensitive to the initial values and frequently converges with solutions at a local maximum. Another problem associated with DMLPSI is that its selection responses and correlation with the net genetic merit are lower than the OMLPSI selection response and accuracy after the first selection stage (Börner and Reinsch 2012;Cerón-Rojas et al. 2019a, b).
In the marker-assisted selection (MAS) context, Lande and Thompson (1990) proposed a linear marker selection index (LMSI) which uses phenotypic and marker score values jointly to predict the net genetic merit. This index exploits the linkage disequilibrium between markers and quantitative trait loci (QTL) that occurs when inbred lines are crossed (Lange and Whittaker 2001). The LMSI requires regressing phenotypic values on marker coded values and, with these estimates, it constructs the marker score for each individual trait of the candidate for selection. Subsequently, the marker scores are combined with phenotypic information using the LMSI to predict the net genetic merit. Lande and Thompson (1990) assumed that the average effects on phenotype and the favorable alleles are known; however, this assumption is valid for major gene traits but not for quantitative traits that are affected by the environment, as well as many QTL with small effects that could interact with the environment and among themselves (Heffner et al. 2009). Several authors (Lange and Whittaker 2001;Meuwissen et al. 2001;Dekkers 2007) have criticized the LMSI approach because it makes inefficient use of the available data, as one would rather use all the available data in a single step to achieve maximally accurate estimates of marker effects. In addition, because the LMSI is based on only a few large QTL effects, it violates the selection index assumptions of multivariate normality and small changes in allele frequencies (Heffner et al. 2009). Dekkers (2007) proposed a slightly modified version of the Lande and Thompson (1990) index. Instead of using marker scores, the Dekkers (2007) index uses the genomic estimated breeding values (GEBV) jointly with the phenotypic values to predict the net genetic merit. Céron-Rojas and Crossa (2018, Chapter 5) called this index the combined linear genomic selection index (CLGSI), and because it uses GEBV instead of marker scores, it is free of the problems (indicated earlier) that the LMSI presents. In the CLGSI context, all marker effects and GEBV of the genotyped individuals in the training population are estimated using marker and phenotypic data, and then the GEBVs are combined with the phenotypic values in a CLGSI to predict the net genetic merit and select parents for the next generation. Xie and Xu (1998) extended the DMLPSI to the MAS context for developing a decorrelated multistage LMSI similar to the index of Lande and Thompson (1990). That is, the Xie and Xu (1998) index is an LMSI used to predict the net genetic merit in the multistage selection context. The main objective of Xie and Xu (1998) was to increase the efficiency of MAS in a two-stage breeding selection scheme. For this reason, they decided to select immature individuals (embryos) or seedlings at stage one based on a linear combination of trait molecular scores only, and, at stage two, to select mature individual traits based on a linear combination of trait molecular scores and phenotypic values jointly. According to Xie and Xu (1998), this selection method was implied in the paper by Lande and Thompson (1990) and the problem of these last two authors was how to find the selection intensities associated with a two-stage breeding scheme. For this reason, Xie and Xu (1998) adapted the DMLPSI theory to the MAS breeding context. This approach, however, has the same problems as those associated with the LMSI, which we indicated earlier.
In this work, we adapted the Dekkers (2007) index (which is an optimum index) to the multistage selection context. This index uses GEBV instead of marker scores; thus, it is free of the problems associated with the Xie and Xu (1998) index. We applied the proposed index in the two-stage context as follows. In stage one, we selected immature seedlings and embryos based on a linear combination of GEBV only, and, in stage two, we selected individual traits based on a linear combination of GEBV and phenotypic values jointly. This is the Xie and Xu (1998) idea but in the genomic selection context.
We validated the results of the proposed index using the optimum and decorrelated selection index theory in a two-stage breeding selection scheme (this approach can be extended to any number of stages). The optimum index was named optimum combined multistage linear genomic selection index (OCMLGSI), while the decorrelated index was called decorrelated combined multistage linear genomic selection index (DCMLGSI) because, at stage two, both indices use GEBV and phenotypic information jointly to predict the net genetic merit. While the OCMLGSI was based on the Dekkers (2007) and Young (1964) index theory, the DCMLGSI was based on the Xie and Xu (1998) and Xu and Muir (1992) index theory. We obtained the theoretical results of both indices under the assumption that the indices and the net genetic merit values have bivariate normal distribution. Under this assumption, the regression of the net genetic merit on any linear function of the phenotypic or GEBV values is linear (Kempthorne and Nordskog 1959) and the total selection response for two or more stages is the sum of each response obtained at each stage (Cochran 1951;Young 1964;Cerón-Rojas et al. 2019a).
We compared the relative efficiency of OCMLGSI and DCMLGSI using real and simulated datasets. The criteria used to compare the relative efficiency of both indices were that the total selection response of each index must be lower than, or equal to, the single-stage CLGSI (Dekkers 2007) selection response (Young 1964;Saxton 1983) and that the correlation of each index with the net genetic merit should be the maximum possible. The results of this study are the first ones comparing (with real and simulated data) the relative efficiency of the OCMLGSI with DCMLGSI efficiency using the total selection response and the maximized correlation with the net genetic merit as the main criteria to compare the efficiency of both indices.

MATERIAL AND METHODS
We completed this section with three supplementary materials (Supplementary material 1, 2 and 3) that are located at http:// hdl.handle.net/11529/10548356.

Objectives of the combined multistage linear selection indices
Two objectives of the OCMLGSI and DCMLGSI are to maximize the selection response and predict the net genetic merit (H ¼ w'g, where w' ¼ ½ w 1 w 2 . . . w t and g ' ¼ ½ g 1 g 2 . . . g t are vectors of economic weights and true unobservable breeding values, respectively, and t ¼ number of traits). Additional OCMLGSI and DCMLGSI objectives are to select individuals with the highest H values as parents of the next generation, and provide the breeder with a rule for evaluating and selecting several traits simultaneously.
The OCMLGSI and DCMLGSI at stage one At stage 1, the OCMLGSI and DCMLGSI use only GEBV to predict the net genetic merit and select individual traits. The index to predict the individual net genetic merit at stage 1 is where t ¼ number of traits, w' ¼ ½ w 1 w 2 . . . w t and z9 ¼ ½ z 1 z 2 ⋯ z t (Appendix 1, Equation A1 for details) are vectors of economic weights and genomic breeding values, respectively. At stage 1, Equation (1) is the same for both indices.
The OCMLGSI and DCMLGSI parameters at stage one The maximized OCMLGSI and DCMLGSI selection responses are (2) respectively, where k O1 and k D1 (the selection intensities of each index) are the only difference between R O1 and R D1 . In Equation (2), G ¼ VarðzÞ is the covariance matrix of genomic breeding values (Appendix 1, Equation A2 for details), whereas w was defined earlier. The maximized correlation between H ¼ w9g and is the standard deviation of the variance of H ¼ w9g, and C ¼ VarðgÞ is the covariance matrix of g.
The combined multistage linear selection index and the net genetic merit at stage two At stage 2, the OCMLGSI and DCMLGSI use genomic and phenotypic information jointly to predict the individual net genetic merit and can be written as where y9 ¼ ½ y 1 y 2 ::: y t is a vector of trait (t ¼ number of traits) phenotypic values, and z was defined earlier; b G and by are vectors of coefficients of genomic and phenotypic weight values, respectively; b9 ¼ ½ b9 G b9 y and t9 ¼ ½ z9 y9 . The only difference between OCMLGSI and DCMLGSI is the way vector b9 ¼ ½ b9 G b9 y is obtained. The net genetic merit (H ¼ w9g) can be written as where g, w and z have been defined above; w9 0 ¼ ½ 0 1 ⋯ 0 t is a null vector associated with vector z, whereas a9 ¼ ½ w9 0 w9 and f9 ¼ ½ z9 g9 .
are block covariance matrices associated with OCMLGSI and DCMLGSI at stage 2. Equation (5) indicates that the covariance matrix between g and z is G ¼ Covðz; gÞ(Appendix 1, Equation 2). In Appendix 1 (Equations A6 to A8) we describe how we estimated matrices G, P, C.
The OCMLGSI parameters at stage two In Supplementary material 1 (see http://hdl.handle.net/11529/10548356), we extend the OCMLGSI theory to the multistage context. Here we present only the main results for the two stages. Since the OCMLGSI theory is based on LPSI theory, the OCMLGSI vector of coefficients (b) that maximizes the OCMLGSI response (and the correlation between the OCMLGSI and the net genetic merit values) is where b G ¼ ½I 2 ðP2GÞ 21 ðC 2 GÞw and by ¼ ðP2GÞ 21 ðC 2 GÞw. By Equation (6), the maximized OCMLGSI selection response is where k O2 is the OCMLGSI intensity at stage 2. The total selection response for stages 1 and 2 is R Ot ¼ R O1 þ R O2 . The maximized correlation between H (Equation 4) and the OCMLGSI is In Appendix 1 (Equation A9), we indicated how to estimate R O2 . Additional details of the parameters associated with Equations (7) and (8)  The DCMLGSI parameters at stage 2 In Supplementary material 2 (see http://hdl.handle.net/11529/10548356), we extended the DCMLGSI theory to the multistage context, and we showed that the DCMLGSI vector of coefficients at stage 2 is  (6) and (9). At stage 1, S ¼ 0 (no constraints), K ¼ I and b ¼ b, the OCMLGSI vector of coefficients. The maximized DCMLGSI selection response for stage two is where k D2 is the DCMLGSI selection intensity at stage 2. The total selection response for stages 1 and 2 is R Dt ¼ R D1 þ R D2 . In Appendix 1 (Equation A10), we indicate how to estimate R D2 . The maximized correlation between H (Equation 4) and DCMLGSI for stage 2 is Note that the only difference between Equations (10) and (7), and Equations (11) and (8) is the vector of coefficients of each index.

The OCMLGSI and DCMLGSI selection intensities for stages 1 and 2
The OCMLGSI selection intensities for stages 1 and 2 (k O1 and k O2 , respectively) are those values associated with the maximum value of R Ot ¼ R O1 þ R O2 , which were obtained with the method described in Appendix 2 (Equations A11 and A12). We obtained the DCMLGSI selection intensities for stages 1 and 2 (k D1 and k D2 , respectively) using the Xu and Muir (1992) method. The value of k D1 and k D2 should maximize the total DCMLGSI selection response In the OCMLGSI and DCMLGSI context, we fitted, in a statistical model, phenotypic and marker data from the training population to estimate all available marker effects and obtain the GEBV (Appendix 1, Equations A3 and A4 for additional details).
Testing the OCMLGSI and DCMLGSI normality assumption Several authors (Shapiro and Wilk 1965; Mohd-Razali and Bee-Wah 2011; Rani Das and Rahmatullah-Imon 2016) have given details of how to perform a normality test procedure on a dataset, and many statistical packages provide graphs and normality tests (Crawley 2015).
For the real dataset, we corroborated the OCMLGSI and DCMLGSI normality assumption at stage 2 using graphical methods (histograms and normal Quantile-Quantile plots) and analytical test procedures (the Shapiro-Wilk and Kolmogorov-Smirnov normality tests). The corroboration procedure was as follows. In a two-stage context, let p ¼ q 1 q 2 be the fixed total proportion retained, where q 1 and q 2 denote the proportion selected at stages 1 and 2, respectively, and let n be the size of the dataset at stage 1; then nq 1 will be the size of the selected individuals at stage 1. We used the nq 1 individual genotypes and traits at stage 2 to construct graphs and statistical tests to corroborate the OCMLGSI and DCMLGSI normality assumption.
Criteria for comparing OCMLGSI efficiency vs.

DCMLGSI efficiency
The criteria to compare OCMLGSI efficiency vs. DCMLGSI efficiency were that the total OCMLGSI and DCMLGSI selection responses , respectively) should be lower than, or equal to, the single-stage CLGSI selection response (R) described by Dekkers (2007) and Céron-Rojas and Crossa (2018, Chapter 5). In addition, the maximized correlation between the net genetic merit and the OCMLGSI and DCMLGSI (Equations 9 and 11, respectively) should be a maximum at each stage. Thus, the greater the OCMLGSI and DCMLGSI correlation with the net genetic merit, the more effective they are at predicting the net genetic merit and the selection response at each stage.

Real data
We used a real maize (Zea mays L.) F2 population with 247 genotypes, 195 markers and 4 phenotypic traits: grain yield (GY, t/ha), plant height (PHT, cm), ear height (EHT, cm), and anthesis days (AD, d) to compare OCMLGSI efficiency vs. DCMLGSI efficiency to predict the net genetic merit. Beyene et al. (2015) described this dataset and denoted it as JMpop1 DTMA Mexico optimum environment. We assumed that the breeding objective was to increase GY while decreasing PHT, EHT, and AD. The vector of economic weights for those traits was w ' ¼ ½ 5 20:1 20:1 21 , while the total proportions (p ¼ q 1 q 2 ) of retained values were p ¼ 0.05, 0.10, 0.20 and 0.30 for both indices.
The estimated matrices of P, C, and G for all four traits werê P ¼ we estimated the OCMLGSI and DCMLGSI selection responses (Appendix 1, Equations A9 and A10, respectively).

Simulated datasets
The datasets were simulated with QU-GENE software (Podlich and Cooper 1998) by Ceron-Rojas et al. (2015) using 2500 molecular markers and 315 QTL for eight phenotypic selection cycles (C0 to C7), each with four traits (T 1 , T 2 , T 3 , and T 4 ), 500 genotypes and 4 replicates of each genotype. The authors distributed the markers uniformly across 10 chromosomes and the QTL randomly across the 10 chromosomes to simulate maize (Zea mays L.) populations. A different number of QTL affected each of the four traits: 300, 100, 60, and 40, respectively. The common QTL affecting the traits generated genotypic correlations of -0.5, 0.4, 0.3, -0.3, -0.2, and 0.1 between T 1 and T 2 , T 1 and T 3 , T 1 and T 4 , T 2 and T 3 , T 2 and T 4 , T 3 and T 4 , respectively. The economic weights for T 1 , T 2 , T 3 , and T 4 were 1, -1, 1, and 1, respectively, in all selection cycles. To illustrate the efficiency of the OCMLGSI and DCMLGSI to predict the net genetic merit, in this work we used six selection cycles (C1 to C6) with p ¼0.05, 0.10 and 0.20 in each cycle. We selected all four traits in each selection cycle.

Data availability
The real and simulated datasets are available in the Application of a Genomics Selection Index to Real and Simulated Data repository, at http://hdl.handle.net/11529/10199. The real dataset used in this work is the folder named "File Real_Data_Sets_GSI" which contains four folders named "DATA_SET-3, 4, 5, and 6". Each of the four folders contains four Excel data files. The four Excel data files within the folder DATA_SET-3 are as follows: DATA_SET-3_Markers_Cycle-0, 1, 2, and DATA_SET-3_Phenotypic_Cycle-0. The first three Excel files contain the marker-coded values for cycles 0, 1, and 2, while the Excel file DATA_SET-3_Phenotypic_Cycle-0 contains the phenotypic information of cycle 0 (training population). The Excel data files of the other folders were described in a similar manner as for folder 3. In this work, we used dataset 3 for cycle 0 to make selections and to estimate the selection response and the correlation of the OCMLGSI and DCMLGSI with the net genetic merit. The results are presented in Table 1. Folder Simulated_Data_GSI contains two folders: Data_Phenotypes_ April-26-15 and Haplotypes_GSI_April-26-15. In turn, folder Data_ Phenotypes_April-26-15 also contains two folders: GSI_Phenotypes-05 and PSI_Phenotypes-05. Within folder GSI_Phenotypes-05, there are six Excel data files, each denoted as C2_GSI_05_Pheno, C3_GSI_05_ Pheno, C4_GSI_05_Pheno, C5_GSI_05_Pheno and C6_GSI_05_Pheno, corresponding to the phenotypic simulated information for the genomic selection index for cycles 2-7. In addition, folder GSI_Phenotypes-05 contains eight Excel datasets denoted as C0_Pheno_05, C1_PSI_05_ Pheno, C2_PSI_05_Pheno, C3_PSI_05_Pheno, C4_PSI_05_Pheno, C5_PSI_05_Pheno, C6_PSI_05_Pheno, and C7_PSI_05_Pheno corresponding to the phenotypic simulated information for the phenotypic selection index for cycles 0-7. File Haplotypes_GSI_April-26-15 contains the haplotypes of the markers for cycles 0-7 of GSI. We present the results of the simulated datasets in Tables 2 and 3 for cycles 1 to 6.

Matching phenotypic and genomic real data
To estimate the OCMLGSI and DCMLGSI parameters and make selections, we use the following two Excel files: (1) "DATA_SET-3_ Phenotypic_Cycle-0" (which contains the raw phenotypic data) and (2) the "DATA_SET-3_Markers_Cycle-0" (which contains the coded molecular markers). Both datasets are in the folder named "DATA_SET-3".
Matching phenotypic and genomic simulated data To estimate the OCMLGSI and DCMLGSI parameters and to make selections, we used the data of two folders. The first one is called "PSI_Phenotypes-05" (which contains the raw phenotypic data of six Excel files named: C1_PSI_05_Pheno to C6_PSI_05_Pheno) and a second one named "Haplotypes_GSI_April-26-15"(which contains the raw marker data of six text files named: C1_PSI_S2_05_Haplo.pop to C6_PSI_S2_05_Haplo.pop). Both datasets are in the folder named "Simulated_Data_GSI". To estimate the OCMLGSI and DCMLGSI parameters, the foregoing files were matched as follows. For selection cycle 1, we matched the Excel file C1_PSI_05_Pheno with the text files C1_PSI_S2_05_Haplo.pop; for selection cycle 2, we matched the Excel file C2_PSI_05_Pheno with the text files C2_PSI_S2_05_Haplo.pop, etc. Finally, in cycle 6, we matched the Excel file C6_PSI_05_Pheno with the text files C6_PSI_S2_05_Haplo.pop.

Real data
Estimated OCMLGSI parameters for stages 1 and 2: The estimated OCMLGSI values at stages 1 and 2 wereÎ 1 ¼ w9ẑ andÎ 2 ¼b9t, respectively, where w9 andb9 were the estimated vector or coefficient (Appendix 1, Equation A9),ẑ (Appendix 1, Equations A3 and A4) was a vector of GEBV associated with the vector of traits y, and t9 ¼ ½ẑ9 y9 . The maximum estimated total OCMLGSI selection response wasR Ãb q were the estimated selection responses at each stage, and matrixT Ã was the adjusted matrixT for prior selection onÎ 1 (Appendix 3, Equation A13). Figure 1 shows the theoretical relationship between one truncation point (u 1 ) value, the proportion retained (q 1 ), and the density values (zðu 1 Þ ¼ e 20:5u 2 1 = ffiffiffiffiffiffi 2p p ) of the truncation point at stage 1, while Figure 2 describes the theoretical relationship between two truncation point (u 1 and u 2 ) values and their density values [zðu 1 ; u 2 Þ, Appendix 2, Equation A11] for two stages. In Appendix 2 (Equation A12), we described a method to obtain the OCMLGSI selection intensities (k O1 and k O2 ) that maximizeR Ot for both stages. We found the k O1 and k O2 values as follows: for a fixed value p ¼ q 1 q 2 , we used an iterative process with an R-code (Kabakoff 2011) where, by successively changing the possible values of q 1 (q 2 ¼ p=q 1 ), u 1 , and u 2 , we found the maximum value of the estimated total OCMLGSI selection responseR Ot (Figure 3). Thus, for p ¼ 0:05, the values of the truncation points (u 1 ¼ 0:61 and u 2 ¼ 0:90), proportions retained (q 1 ¼ 0:27 and q 2 ¼ 0:18) and selection intensities (k O1 ¼ 1:22 and k O2 ¼ 1:44) at both stages, were those associated with the maximumR Ot ¼ 8:18 value (Figure 3).
In the OCMLGSI and DCMLGSI context, p ¼ q 1 q 2 , q 2 ¼ p q1 , that is, the only parameter that changes is q 1 because p is fixed. The same is true for the truncation points (u 1 and u 2 ) because u 1 ¼ F 21 ð1 2 q 1 Þ and u 2 ¼ F 21 ð1 2 q 2 Þ, where F 21 ðÞis the inverse function of the standard normal distribution Muir 1991, 1992). Thus, in this context, k O1 , k O2 , andR Ot values are mainly associated with the possible values of q 1 . Figure 3 presents theR Ot values on the y-axis, whereas the x-axis presents the possible values of the combinations of k O1 and k O2 for all possible realizations ofR Ot , and for this reason, the x-axis takes values n■ Table 1 Real data for different total proportions (p ¼ q 1 q 2 ) retained; estimated optimum and decorrelated combined multistage linear genomic selection index truncation points (u 1 and u 2 ), proportions retained (q 1 and q 2 ), selection intensities (k 1 and k 2 ) and maximized estimated selection responses (R 1 , R 2 and R t ¼ R 1 þ R 2 ) for stages 1 and 2. Values of R correspond to maximized estimated single-stage combined linear genomic selection index responses. n■ Table 2 Simulated data for estimated optimum and decorrelated combined multistage linear genomic selection indices responses (R 1 , R 2 , R t ¼ R 1 þ R 2 ) and single-stage combined linear genomic selection index responses (R 0:05 , R 0:10 , R 0:20 ) for six simulated selection cycles in a twostage breeding scheme for total proportions retained p ¼ q 1 q 2 ¼0.05, 0.10, and 0.20.
The estimated DCMLGSI correlations with the net genetic merit at stages 1 and 2 werer 1 ¼ ffiffiffiffiffiffiffiffi ffi Truncation points, proportion retained and selection intensities Table 1 presents the OCMLGSI and DCMLGSI truncation points, proportions retained and selection intensities for p ¼ q 1 q 2 ¼0.05, 0.10, 0.20, and 0.30 in a two-stage context. When the p ¼ q 1 q 2 values changed from 0.05 to 0.30, the truncation point values decreased, the proportions retained values increased and the selection intensity values decreased in both indices, as we would expect. In addition, while the DCMLGSI truncation points and proportions retained values were the same at both stages, the OCMLGSI truncation point values at stage 1 were lower than at stage 2, and then q 1 . q 2 . For this reason, the OCMLGSI selection intensity was different from the DCMLGSI selection intensity.

Simulated data
Estimated maximized OCMLGSI and DCMLGSI selection responses: For p ¼ q 1 q 2 ¼0.05, 0.10, and 0.20, Table 2 presents the estimated maximized OCMLGSI and DCMLGSI selection responses (R 1 ,R 2 ,R t ¼R 1 þR 2 ) and the estimated maximized singlestage CLGSI responses (R 0:01 ,R 0:10 ,R 0:20 ) for six simulated selection cycles in a two-stage breeding selection scheme. For p ¼0.05, the average of the estimated total OCMLGSI selection responses (17.47) explained 98.10% of the average of the estimated CLGSI selection responses (17.81), whereas for p ¼ 0.10 and 0.20, the average of the estimated total OCMLGSI selection responses (14.89 and 11.93, n■ Table 3 Simulated data for estimated maximum correlation values of optimum (r 1 and r 2 ) and decorrelated (r 1 and r 2 ) combined multistage linear genomic selection indices with the net genetic merit under a two-stage (each stage denoted by 1 and 2) breeding scheme for six simulated cycles.

Optimum index Decorrelated index
Cycler 1r2r  The foregoing results indicate that while the average of the total OCMLGSI selection responses (for all p values) explained 98.60% of the average of the CLGSI, the average of the total DCMLGSI selection responses (for all pvalues) explained 91.80% of the average of the CLGSI. Thus, OCMLGSI accuracy was higher than DCMLGSI accuracy for predicting the selection response.
Estimated OCMLGSI and DCMLGSI correlations with the net genetic merit: Table 3 presents the estimated and maximized values of the OCMLGSI (r 1 andr 2 ) and DCMLGSI (r 1 andr 2 ) correlations with the net genetic merit in a two-stage context for six simulated selection cycles. At stage 1, the averages of the estimated OCMLGSI and DCMLGSI correlations were the same. However, at stage 2, the average of the estimated OCMLGSI correlations with net genetic merit was 40.0% higher than the average of the estimated DCMLGSI correlations for six simulated selection cycles.
Histograms and quantile-quantile plots for the estimated OCMLGSI and DCMLGSI values at stage two: We used the real dataset selected in cycle 1 to test the normality assumption of the estimated OCMLGSI and DCMLGSI values at stage 2. For p ¼ q 1 q 2 ¼ 0.05 and 0.30, the q 1 values for OCMLGSI were 0.27 and 0.63, while those values for DCMLGSI were 0.22 and 0.55, respectively. Then, at stage 2, ð0:27Þð247Þ ¼ 67 and ð0:63Þð247Þ ¼ 156 were the number of genotypes for OCMLGSI, whereas for DCMLGSI, the number of genotypes were ð0:22Þð247Þ ¼ 54 and ð0:55Þð247Þ ¼ 136, where 247 was the number of genotypes at stage 1. With these genotypes, we constructed histograms ( Figure 4)    Normality test for the estimated OCMLGSI and DCMLGSI values at stage two: For the real dataset, we present the results of the Shapiro-Wilk and Kolmogorov-Smirnov normality tests of the estimated OCMLGSI and DCMLGSI values at stage 2 when the number of genotypes was 67 and 156 for OCMLGSI, and 54 and 136 for DCMLGSI. We tested the null hypothesis that the estimated OCMLGSI and DCMLGSI values have normal distribution. The statistical value of the Shapiro-Wilk test should be close to 1.0 to accept the null hypothesis, while the statistical value of the Kolmogorov-Smirnov test should be close to 0.0 to accept the null hypothesis (Rani Das and Rahmatullah-Imon 2016). In the present case, for the OCMLGSI values (67 and 156), the Shapiro-Wilk test values were 0.976 and 0.987, respectively, while the Kolmogorov-Smirnov test values were 0.088 and 0.089, respectively. Thus, the null hypothesis was true for the estimated OCMLGSI values. Similarly, for the DCMLGSI values (54 and 136), the Shapiro-Wilk test values were 0.988 and 0.984, respectively, while the Kolmogorov-Smirnov test values were 0.067 and 0.058, respectively. Thus, we accepted that the estimated DCMLGSI values approach the normal distribution.

DISCUSSION
The DCMLGSI restrictions imposed on the covariance values The DCMLGSI imposed the restriction that the covariance between DCMLGSI values among stages be zero. This restriction was to ensure the existence of solutions for the truncation points at different stages without resorting to numerical multiple integration (Xu and Muir 1992;Xie and Xu 1998). However, the restriction decreased the estimated DCMLGSI selection response and the estimated DCMLGSI correlation with the net genetic merit after stage 1. Xu and Muir (1992) indicated that the loss of DCMLGSI efficiency after stage 1 is justified because their method for obtaining the selection intensities and total responses gives the breeder the opportunity to implement an unlimited number of selection stages, which would otherwise be very difficult or impossible to do. Xu and Muir (1991;1992) indicated that the restriction imposed on the covariance between DCMLGSI values is similar to the Kempthorne and Nordskog (1959) restriction imposed on the expected genetic gain per trait, which makes some traits not change their mean values while the rest of the trait means remain without restrictions (Cerón-Rojas and Crossa, 2018, Chapter 3). Xu and Muir (1992) and Kempthorne and Nordskog (1959) used a projection matrix (e.g., K) to project the LPSI vector of coefficients (e.g., d) into a space smaller than the original space of d. The reduction of the space into which the Kempthorne and Nordskog (1959) matrix projects d is equal to the number of zeros that appears on the expected genetic gain per trait, and the selection response and accuracy decrease as the number of restrictions increases (Cerón-Rojas and Crossa 2018, Chapter 3). However, it is not clear if under the Xu and Muir (1992) restrictions, the selection response and accuracy decrease as the number of stages increases. If this were true, the Xu and Muir (1992) method would not give the breeder the opportunity to implement an unlimited number of stages, because the selection response and accuracy would decrease as the number of stages increases and soon would be null. For example, Xie et al. (1997) compared the estimated single-stage LPSI selection response with the estimated DMLPSI selection response for two and three stages and found that at stages 2 and 3, the estimated total DMLPSI selection response explained only 92 and 87%, respectively, of the estimated LPSI selection response. That is, at stage 3, the estimated total DMLPSI selection response was lower (5%) than at stage 2.

The OCMLGSI and DCMLGSI vector of coefficients
The OCMLGSI is an optimum index and we obtained its vector of coefficients based on the OMLPSI (Young 1964) and LPSI (Smith 1936) theory. For this reason, the OCMLGSI vector of coefficients was easier to obtain than the DCMLGSI vector of coefficients. This last vector is a linear transformation of the OCMLGSI vector of coefficients made by a transforming matrix (K) which is idempotent (K ¼ K 2 ) and projects the OCMLGSI vector of coefficients (b) into a space smaller than the original space of b (Cerón-Rojas and Crossa 2018, Chapter 3). Xu and Muir (1992) and Xie and Xu (1998) did not identify that matrix, and for this reason, the equations they used to estimate the decorrelated vector of coefficients seem very complex. This matrix makes the DCMLGSI values independent among stages and is the base for assuming that the DCMLGSI values are normally distributed after stage one. However, due to this matrix, the correlation of the DCMLGSI with the net genetic merit and its selection response after stage one are lower than the correlation of the OCMLGSI with the net genetic merit and its selection response after stage one. Xu and Muir (1992) and Xie and Xu (1998) indicated that the loss of efficiency is justified because their method for obtaining the selection intensities and total responses gives the breeder the opportunity to implement an unlimited number of selection stages, which otherwise would be very difficult or impossible to do. However, Börner and Reinsch (2012) indicated that decorrelated indices should not be used due to the availability of accurate and fast algorithms for exact multidimensional integration to find the selection intensities for the OCMLGSI. Mi et al. (2014) described an R-code algorithm in the multistage context, but it is useful only when the selection is made for a trait at each stage. That is, up to now, there is no quick algorithm for finding the selection intensities for the OCMLGSI for more than two stages, and for this reason, in this work we described a method to find the OCMLGSI selection intensities in the two-stage context.
The multivariate normality assumption Based on the normality assumption of the estimated OCMLGSI, DCMLGSI, GEBV, and phenotypic values, we developed and applied the OCMLGSI and DCMLGSI to the real and simulated datasets. The histograms, quantile-quantile plots, and the Shapiro-Wilk and Kolmogorov-Smirnov normality tests of the OCMLGSI and DCMLGSI values at stage two indicated that these values approached the normal distribution. The multivariate normality distribution is very important for breeding plant and animal quantitative traits because these traits show continuous variability and are the result of many gene effects interacting among themselves and with the environment. That is, quantitative traits are the result of unobservable gene effects distributed across plant or animal genomes, which interact among themselves and with the environment to produce the observable characteristic plant and animal phenotypes (Falconer and Mackay 1996). Under the multivariate normal distribution assumption, the indices, the traits, and GEBV can be described using only means, variances, and covariances. In addition, if the traits are not correlated, they are independent. Linear combinations of traits are also normal; and even when the trait phenotypic values do not have normal distribution, this distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, by the central limit theorem, have multivariate normal distribution (Rencher 2002). By this reasoning, a fundamental assumption in this work was that the net genetic merit and each index have multivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic and GEBV values was linear and the total OCMLGSI selection response was the sum of the responses obtained at each stage.

DCMLGSI and OCMLGSI
The DCMLGSI is an application of the Xie and Xu (1998) index to the genomic selection (GS) context. Based on the LMSI (Lande and Thompson 1990) and on the DMLPSI (Xu and Muir 1992) theoretical results, Xie and Xu (1998) developed their multistage index in the MAS context before Meuwissen et al. (2001) GS theory. For this reason, those authors used molecular scores instead of GEBV to predict the net genetic merit at each stage. Because the Xie and Xu (1998) index has the same theoretical and practical problems as the LMSI indicated in the Introduction of this work, we extended the Xie and Xu (1998) index to the GS context and developed the DCMLGSI, which uses GEBV instead of molecular scores in the prediction. This index is free of the problems of the Xie and Xu (1998) index associated with the LMSI. However, because the DCMLGSI is based on the DMLPSI, it has the same advantages and disadvantages as the DMLPSI, indicated in the Introduction of this work.
The OCMLGSI is an application of the OMLPSI (Young 1964;Cerón-Rojas et al. 2019 a and b) to the GS context based on the Xie and Xu (1998) idea. The OCMLGSI has the same advantages and disadvantages as the OMLPSI, as indicated in the Introduction of this work, and is an optimum index. In this work, we showed that its selection response and correlation with the net genetic merit is higher than the DCMLGSI selection response and correlation with the net genetic merit.
The OCMLGSI and the DCMLGSI exploit the linkage disequilibrium between markers and QTL that is produced when inbred lines are crossed, which is useful for identifying markers correlated with the traits of interest and for obtaining GEBV (Meuwissen et al. 2001). Börner and Reinsch (2012) indicated that GS could replace traditional progeny testing when maximizing the genetic gain per year, as long as the accuracy of GEBV is higher than 0.45. For the simulated dataset, the correlation values between the GEBVs and the traits' true breeding values in cycle two were 0.52, 0.74, 0.69, and 0.73 for each of the four traits, respectively, whereas in cycle seven, those correlations were 0.40, 0.55, 0.54, and 0.50 for each of the four traits (Cerón-Rojas and Crossa 2019). In all selection cycles, the estimated correlations were higher than, or equal to, 0.45; thus, the GEBVs obtained with the simulated datasets were good predictors of the individual trait breeding values, and the OCMLGSI and the DCMLGSI were good predictors of the net genetic merit because both indices are linear combinations of GEBV at stage 1, and of the GEBV and phenotypic values at stage 2.
The OCMLGSI and the DCMLGSI can only be used in training populations when there is phenotypic and marker information, while the LGSI (Ceron-Rojas et al. 2015; Cerón-Rojas and Crossa 2019) is used in testing populations where there is only marker information. However, because both indices incorporate more information than the single-stage LPSI and LGSI, their selection response and correlation with the net genetic merit are higher than the LPSI and the LGSI selection response and correlation with the net genetic merit in all selection cycles. This is the main reason why the OCMLGSI should be used instead of the LPSI and the LGSI.

Method for obtaining the OCMLGSI selection intensity
The method used in this work to obtain the OCMLGSI selection intensities in a two-stage context is simple and can be programmed in a computer using an R code. Cochran (1951) and Young (1964) described methods to obtain OMLPSI intensities in the two-stage context; however, such methods overestimate the OMLPSI selection intensity (Cerón-Rojas et al. 2019 a and b). The method proposed here was good for obtaining the selection intensity values of OCMLGSI in a two-stage context and did not overestimate the OCMLGSI selection intensity. Thus, breeders should use the proposed method when they perform multistage selection.
The estimated total DCMLGSI selection response was counter-intuitive The estimated total OCMLGSI and DCMLGSI selection response should be lower, or equal to, the single-stage CLGSI. This implies that when the total proportion selected (p) increased, e.g., from 0.05 to 0.30, the estimated total OCMLGSI and DCMLGSI selection response should tend to be more similar to the estimated single-stage CLGSI selection response. This was true for the estimated total OCMLGSI selection response but not true for the estimated total DCMLGSI selection response for the real and simulated datasets. Thus, for the real dataset, when p ¼0.05, 0.10, 0.20, and 0.30, the estimated total OCMLGSI selection response explained 97.27, 97.35, 97.55, and 97.68%, respectively, of theR values, while the estimated total DCMLGSI selection response explained 91.68, 90.64, 89.32, and 88.16%, respectively, of theR values. That is, the estimated total DCMLGSI selection response decreased when p values increased. We found similar results for the simulated dataset. Our results were in accordance with those of Börner and Reinsch (2012) and Cerón-Rojas et al. (2019b) when these authors compared the optimum with the decorrelated multistage indices in the genomic and phenotypic selection context, respectively. Börner and Reinsch (2012) called the decorrelated multistage index results counter-intuitive and difficult to interpret. Thus, we are in agreement with Börner and Reinsch (2012) that breeders should not use decorrrelated indices when they make multistage selection.

CONCLUSION
We evaluated the relative efficiency of two combined multistage linear genomic selection indices. We determined the efficiency of both indices based on the estimated total selection response and correlation of each index with the net genetic merit using real and simulated datasets. In both datasets, we found that the OCMLGSI was a better predictor of the net genetic merit than the DCMLGSI. Therefore, breeders should use the OCMLGSI when performing multistage selection.

ACKNOWLEDGMENTS
We are grateful for the financial support provided by the Bill & Melinda Gates Foundation and CIMMYT's CGIAR CRP (maize and wheat), as well as the USAID projects (Cornell University and Kansas State University). We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806. relationship matrix between individuals (Lynch and Walsh 1998), and V j ¼ As 2 g j þ Is 2 e j . We estimated s 2 g j and s 2 e j assuming absence of dominance and epistatic effects.

Estimating matrices P and C using the Expectation and Maximization algorithm
The Expectation and Maximization algorithm allows computing the REML for the variance components s 2 g j and s 2 ej by iterating the following equations: