Efficiency of a Constrained Linear Genomic Selection Index To Predict the Net Genetic Merit in Plants

The constrained linear genomic selection index (CLGSI) is a linear combination of genomic estimated breeding values useful for predicting the net genetic merit, which in turn is a linear combination of true unobservable breeding values of the traits weighted by their respective economic values. The CLGSI is the most general genomic index and allows imposing constraints on the expected genetic gain per trait to make some traits change their mean values based on a predetermined level, while the rest of them remain without restrictions. In addition, it includes the unconstrained linear genomic index as a particular case. Using two real datasets and simulated data for seven selection cycles, we compared the theoretical results of the CLGSI with the theoretical results of the constrained linear phenotypic selection index (CLPSI). The criteria used to compare CLGSI vs. CLPSI efficiency were the estimated expected genetic gain per trait values, the selection response, and the interval between selection cycles. The results indicated that because the interval between selection cycles is shorter for the CLGSI than for the CLPSI, CLGSI is more efficient than CLPSI per unit of time, but its efficiency could be lower per selection cycle. Thus, CLGSI is a good option for performing genomic selection when there are genotyped candidates for selection.

The unconstrained linear genomic selection index (LGSI) is a linear combination of genomic estimated breeding values (GEBVs) and was originally proposed by Togashi et al. (2011); however, Ceron-Rojas et al. (2015) developed the LGSI theory completely and applied the LGSI theoretical results to real and simulated data. The LGSI is useful for predicting the net genetic merit, which in turn is a linear combination of true unobservable breeding values of the traits weighted by their respective economic values (Hazel 1943). In LGSI, all marker effects of the genotyped individuals in the training population are estimated using marker and phenotypic data; these estimated effects are then used in subsequent selection cycles to obtain GEBVs that are predictors of the individual breeding values in the testing population for which there is only marker information about the candidates for selection. In LGSI, the GEBVs can be obtained by multiplying the genomic best linear unbiased predictor (GBLUP) of the estimated marker effects in the training population (VanRaden 2008) by the coded marker values obtained in the testing population in each selection cycle. Applying LGSI in plant or animal breeding requires genotyping the candidates for selection to obtain marker effects and GEBVs, and then predicting and ranking the net genetic merit of the candidates for selection.
The LGSI was developed in the genomic selection (GS) context in which animals and plants are selected based on the GEBV of the candidates for selection. Meuwissen et al. (2001) developed the GS theory and showed that it increases the accuracy of predicting the breeding values of the candidates for selection, and reduces the intervals between selection cycles and the costs of the breeding programs. Meuwissen et al. (2001) suggested estimating all marker effects jointly using linear mixed models and Bayesian methods to increase the accuracy of the predicted breeding values because many genes affect the quantitative traits, which are of most concern to plant and animal breeders. Isik et al. (2017), however, have indicated that GS has not changed the fundamental basis of breeding methods, since in GS the individuals are ranked based on their estimated breeding values, which has been the selection method used by animal and plant breeders for a long time. Nevertheless, because GS decreases the generation interval, it leads to a much higher genetic gain per year. For example, in the plant breeding context, a four-year breeding cycle, which includes three years of field testing, can be reduced to only four months, i.e., the time required to grow and cross plants (Lorenz et al. 2011). Börner and Reinsch (2012) have indicated that GS could replace traditional progeny testing when maximizing the genetic gain per year, as long as the accuracy of GEBV is higher than or equal to 0.45.
One of the main problems associated with the LGSI theory is that the values of its expected genetic gain per trait (or multi-trait selection response) can increase or decrease in a positive or negative direction without control. In the phenotypic selection context, Kempthorne and Nordskog (1959) developed a restricted linear selection index that allows imposing restrictions equal to zero on the expected genetic gain of some traits. Other authors (Mallard 1972;Harville 1975;Tallis 1985) extended the Kempthorne and Nordskog (1959) approach and developed a constrained linear phenotypic selection index (CLPSI) that attempts to make some traits change their expected genetic gain values based on a predetermined level, while the rest of the traits remain without restrictions. Itoh and Yamada (1987) showed that the Mallard (1972), Harville (1975) and Tallis (1985) indices give the same results. The CLPSI is the most general index and includes the unconstrained and restricted phenotypic indices as particular cases.
Céron-Rojas and Crossa (2018, Chapter 3) developed a constrained linear genomic selection index (CLGSI); however, these authors did not evaluate this index completely. For example, they did not give results associated with the CLGSI expected genetic gain per trait, which is the main parameter of this index because the breeder imposes constraints on it to make some traits change their expected genetic gain values based on a predetermined level, while the rest of the traits remain without restrictions.
This study had two main objectives: first, to apply the CLGSI theoretical results to two real datasets and to seven simulated datasets using only GEBVs for selecting non-phenotyped candidates for selection, and second, to compare the relative efficiency of CLGSI and CLPSI using real and simulated datasets in a single-stage context. The criteria we used to compare CLGSI vs. CLPSI efficiency were the estimated expected genetic gain per trait values, the selection response, and the interval between selection cycles. One additional criterion was that the CLGSI and CLPSI proportional constant value should be positive (Céron-Rojas and Crossa 2018, Chapters 3 and 6 for details). The results indicated that because the interval between selection cycles was shorter for CLGSI than for CLPSI, CLGSI is more efficient than CLPSI per unit of time, but its efficiency could be lower per selection cycle.
We applied the CLGSI theory assuming that the GEBV values have multivariate normal distribution and that the CLGSI (CLPSI) and the net genetic merit have joint bivariate normal distribution. Under this last assumption, the regression of the net genetic merit on any linear function of the phenotypic or GEBV values is linear (Kempthorne and Nordskog 1959).

MATERIAL AND METHODS
Objectives of the constrained linear selection index Cerón-Rojas et al. (2016) and Céron-Rojas and Crossa (2018, Chapter 3) described the CLPSI theory; thus, in this work, we shall describe only the CLGSI theory. Let m i be the population mean of the i th trait before selection. One of the main CLGSI objectives is to change m i to m i þ d i , where d i is the i th (i = 1, 2, . . ., r; r = number of constrained traits) trait constraint or predetermined proportional gain imposed by the breeder on the CLGSI expected genetic gain per trait. Additional CLGSI objectives are to maximize the selection response; to predict the net genetic merit (H ¼ w9g, where w9 ¼ ½ w 1 w 2 . . . w t and g9 ¼ ½ g 1 g 2 . . . g t are 1 · t (t ¼number of traits) vectors of economic weights and true unobservable breeding values, respectively); to select individuals with the highest H values as parents of the next generation; and to provide the breeder with an objective rule for evaluating and selecting several traits simultaneously.
The constrained linear genomic selection index Let z9 ¼ ½ z 1 z 2 ⋯ z t be a vector of genomic breeding values for t traits (Appendix 1, Equations A1 to A4 for details). The CLGSI used to predict the individual net genetic merit of a candidate for selection is . . b t is the CLGSI vector of coefficients and z i (i= 1, 2, . . ., t) is the genomic value associated with trait i th . Ceron-Rojas et al. (2015) described Equation (1) in the unconstrained LGSI.

The CLGSI selection response
The CLGSI selection response (R) can be written as where k is the selection intensity, L denotes the interval between selection cycles,  (1) indicates that the genetic gain that can be achieved in R by selecting for several traits simultaneously within a population of plants, is the product of k, s H and r HI (Kempthorne and Nordskog 1959). Thus, to increase selection progress, r HI should be as large as possible. Céron-Rojas and Crossa (2018, Chapter 3) described the selection response of the CLPSI, which is very similar to Equation (2).
The CLGSI expected genetic gain per trait The CLGSI expected genetic gain per trait (E, or multi-trait selection response) is the covariance between the genomic breeding value vector z9 and the index, , multiplied by the selection intensity (k) and divided by the interval between selection cycles (L), i.e., is the expected genetic gain of trait i th , z i is the i th genomic breeding value, I ¼ P t j¼1 b j z j is the index value (Equation 1) for one individual, CovðI; z i Þ is the covariance between I and z i , and s ji is the covariance between z i and the j th (j ¼ 1, 2, . . .,t) index genomic breeding value. We defined all the other terms of Equation (3) in Equation (2).

The CLGSI vector of coefficients
Maximizing r HI (Equation 2) is equivalent to minimizing the mean squared difference between the net genetic merit H ¼ w9g and the index I ¼ b 0 z, i.e., E½ðH2IÞ 2 , with respect to the vector of coefficients b under the restriction that Equation (3) values are equal to the d i values (i = 1, 2, . . ., r) imposed by the breeder. Details of the process used to obtain the CLGSI vector of coefficients (b) are given in Appendix 2 (Equations A6 and A7). Here we present only the main result. The CLGSI vector of coefficients is where w9 ¼ ½ w 1 w 2 . . . w t is a vector of economic weights, K ¼ ½I t 2 Q, Q ¼ UDðD9U9GUDÞ 21 D9U9G, I t is an identity matrix of size t · t, D is a Mallard (1972) matrix described in Appendix 2, and U is the Kempthorne and Nordskog (1959) In this last case, the CLGSI is a null restricted LGSI similar to the Kempthorne and Nordskog (1959) restricted index. When D ¼ U and U9 is a null matrix, b ¼ w, the vector of economic weights or the unconstrained LGSI vector of coefficients (Ceron-Rojas et al. 2015). Thus, the CLGSI is more general than the LGSI and includes the null restricted and the unrestricted LGSI as particular cases. The vector of coefficients b should maximize the selection response (R, Equation 2) and make Equation (3) values similar to d i values.
It is possible to show (Itoh and Yamada 1987; Céron-Rojas and Crossa 2018, Chapter 3) that another way of writing Equation (4a) where u¼ d9ðU9GUÞ 21 U9Gw d9ðU9GUÞ 21 d is the proportionality constant and d9 ¼ ½ d 1 d 2 ::: d r is the vector of the constraints imposed by the breeder. According to Itoh and Yamada (1987), u should be higher than zero (u . 0) because when u , 0, the CLGSI moves the population means in the opposite direction to the predetermined desired direction. In addition, when u¼ 0, we would have the CLGSI with null constraints similar to the Kempthorne and Nordskog (1959) restricted index. Thus, u is a good criterion for determining when the CLGSI moves the population means in the desired direction, which will occur when u . 0. While Equation (4a) is associated with the Mallard (1972) constrained index, Equation (4b) is associated with the Tallis (1985) constrained index; however, Equations (4a) and (4b) express the same result in a different mathematical way.
In Appendix 1 (Equation A5), we give a method for estimating matrix G, and in Appendix 2 (Equation A8), we explain how to estimate b.
Maximized CLGSI selection response and optimized expected genetic gain per trait The maximized CLGSI selection response is while the optimized CLGSI expected genetic gain per trait is the same as Equation (3). In Equation (5) we omitted L to simplify notation. Whereas in Equation (2) the selection response can take any value, in Equation (5), R gives the maximum value of Equation (2). In Appendix 2 (Equations A9 and A10), we indicate how to estimate R and E. In CLGSI, we fitted phenotypic and marker data from the training population in a statistical model to estimate all available marker effects; these estimates were then used to obtain GEBV that are predictors of the individual traits' true genomic breeding values in a testing population for which there is only marker information. We obtained the GEBV in the non-phenotyped testing population by multiplying the estimated marker effects obtained in the training population by the coded marker values obtained in the testing population at each selection cycle.
Criteria for comparing CLGSI efficiency vs.

CLPSI efficiency
The criteria used to compare CLGSI vs. CLPSI efficiency when making genomic and phenotypic selection were as follows. First, the estimated theta value (u) should be positive (u . 0). Second, the estimated expected genetic gain values should be close to the constraints (d i ) imposed by the breeder. Third, the estimated selection response value should be equivalent to the true selection response value. Fourth, the size of the interval between selection cycles (L) should be short to increase the selection gain per year.
In this work, we do not consider the problem associated with the cost of obtaining measures of each phenotypic trait and the process of genotyping individual candidates for selection. However, some authors (Schaeffer 2006;Börner and Reinsch 2012) have considered this problem in the animal breeding context.

Real data
We used two real maize (Zea mays L.) F2 populations: "JMpop1 DTMA Mexico optimum environment" and "6x1020 WEMA Africa optimum environment" (hereafter we shall refer to the first and second maize F2 populations as dataset 1 and dataset 2, respectively). The training population (C0) of each dataset contains genotypic data and four phenotypic traits: grain yield (GY, t/ha), plant height (PHT, cm), ear height (EHT, cm), and anthesis days (AD, d). In addition, each dataset has three sets of individuals from the training population (C0) and two testing populations (C1 and C2). We present the number of individuals and molecular markers in each population in Table 1. Assuming that the breeding objective was to increase GY while decreasing PHT, EHT, and AD, the vector of economic weights in C0, C1, and C2 for GY, PHT, EHT, and AD was w9 ¼ ½ 5 20:3 20:3 21 for both indices and the two datasets. For illustration purposes only, in this work we used three selection cycles (C0, C1 and C2) for both datasets to illustrate the theoretical results and the efficiency of CLGSI and CLPSI.
For illustration purposes only, to select traits GY, PHT, EHT and AD, we imposed two sets of restrictions. First we restricted traits GY and PHT with vector d9 ¼ ½0:5 2 1:0 and matrices D9 ¼ ½ 21:0 20:5 and (CLGSI and CLPSI). The total proportion (p) of retained value for these datasets was p ¼ 0.10 (k ¼ 1:755) for both indices and the two datasets with the two sets of constraints.

Simulated datasets
The datasets were simulated by Ceron-Rojas et al. (2015) with QU-GENE software (Podlich and Cooper 1998) using 2500 molecular markers and 315 quantitative trait loci (QTL) for eight phenotypic selection cycles (C0 to C7), each with four traits (T 1 , T 2 , T 3 and T 4 ), 500 genotypes and 4 replicates for each genotype. The authors distributed the markers uniformly across 10 chromosomes and the QTL randomly across the 10 chromosomes to simulate maize (Zea mays L.) populations. A different number of QTL affected each of the four traits: 300, 100, 60, and 40, respectively. The common QTL affecting the traits generated genotypic correlations of -0.5, 0.4, 0.3, -0.3, -0.2, and 0.1 between T 1 and T 2 , T 1 and T 3 , T 1 and T 4 , T 2 and T 3 , T 2 and T 4 , T 3 and T 4 , respectively. The economic weights for T 1 , T 2 , T 3 and T 4 were 1, -1, 1 and 1, respectively. Additional details of the simulated data can be seen in Ceron-Rojas et al. (2015). We used seven phenotypic and genomic selection cycles (C1 to C7) with p ¼ 0.10 (k ¼ 1:755) in each cycle. We selected all four traits in each selection cycle. For illustration purposes only, to select traits T 1 , T 2 , T 3 and T 4 , we imposed two sets of restrictions. First we restricted traits T 1 and T 2 with vector d9 ¼ ½5 2 2 and matrices Real and simulated data availability The real and simulated datasets are available in the Application of a Genomics Selection Index to Real and Simulated Data repository, at n■ Table 1 Two real maize (Zea mays L.) F 2 populations and the number of genotypes (g) and molecular markers (m) used in three selection cycles (cycles 0, 1 and 2) b Mean1 is the average of the three selection cycles. c Mean2 = Mean1/1.5, where 1.5 is the interval between selection cycles and denotes the average of the genetic gain per year. http://hdl.handle.net/11529/10199. The two real datasets used in this work are the folder named "File Real_Data_Sets_GSI" that contains four folders called "DATA_SET-3, 4, 5 and 6". Each of the four folders in turn contains four Excel data files. The four Excel data files within the folder DATA_ SET-3 are as follows: DATA_SET-3_Markers_Cycle-0, 1, 2, and DATA_ SET-3_Phenotypic_Cycle-0. The first three Excel files contain the marker n■ Table 3 Two real datasets for estimated LGSI a parameters constructed with four traits (GY, PHT, EHT and AD) without constraints for three selection cycles coded values for cycles 0, 1 and 2, while the Excel file DATA_SET-3_ Phenotypic_Cycle-0 contains the phenotypic information of cycle 0 (training population). The Excel data files of the other folders were described in a similar manner as for folder 3. In this work, we used datasets 3 and 6 to make selections and to estimate the theta values, the selection response and the genetic expected gains. The results are presented in Tables 2 and 3. Folder Simulated_Data_GSI contains two folders: Data_Pheno-types_April-26-15 and Haplotypes_GSI_April-26-15. In turn, folder Data_Phenotypes_April-26-15 contains two folders: GSI_Phenotypes-05 and PSI_Phenotypes-05. Within folder GSI_Phenotypes-05, there are six Excel data files, each denoted as C2_GSI_05_Pheno, C3_GSI_05_Pheno, C4_GSI_05_Pheno, C5_GSI_05_Pheno and C6_GSI_05_Pheno, corresponding to the simulated phenotypic information for the genomic selection index for cycles 2-7. In addition, folder GSI_Phenotypes-05 contains eight Excel datasets denoted as C0_Pheno_05, C1_PSI_05_ Pheno, C2_PSI_05_Pheno, C3_PSI_05_Pheno, C4_PSI_05_Pheno, C5_ PSI_05_Pheno, C6_PSI_05_Pheno, and C7_PSI_05_Pheno corresponding to the phenotypic simulated information for the phenotypic selection index for cycles 0-7. File Haplotypes_GSI_April-26-15 contains the haplotypes of the markers for cycles 0-7 of GSI. We present the results of the simulated datasets in Tables 4, 5 and 6.

Real data
The normality assumption: Figure 1 presents the frequency distribution of the GEBVs associated with traits GY (Figure 1a) and PHT ( Figure 1b) for Dataset 1, in cycle 1. In addition, Figure 2 presents the frequency distribution of the CLGSI values for real Dataset 1 (in cycle 1) with two constraints (Figure 2a, d9 ¼ ½ 0:5 21:0 ), whereas Figure 2b presents the frequency distribution of the CLGSI values for real Dataset 2 (in cycle 2) with three constraints (d9 ¼ ½ 0:5 21:0 20:5 ). Based on these results, we can assume that the GEBVs associated with traits GY and PHT, and the CLGSI values for the two set of restrictions, approach the normal distribution.
Estimated maximized CLGSI parameters: Tables 2 and 3 present the estimated CLGSI and LGSI parameters, respectively. The estimated parameters are the theta (û) values (for CLGSI only), the expected genetic gains per traitÊ (Appendix 2, Equation A10), and the selection responsesR (Appendix 2, Equation A9). In Table 2, all parameters were estimated for datasets 1 and 2 with two (d9 ¼ ½ 0:5 21:0 ) and three (d9 ¼ ½ 0:5 21:0 20:5 ) constraints, whereas in Table  3, the parameters were estimated, without constraints, for three selection cycles (0, 1 and 2). The estimated theta values were all positive (Table 2) for the two real datasets, which indicates that the CLGSI moves the population means in the desired direction, as we would expect.
Estimated CLGSI expected genetic gains per trait: In Tables 2 and 3, there are two means: Mean1 and Mean2 = Mean1/L (L= interval between selection cycles) associated with the estimated CLGSI and LGSI expected genetic gain per trait and selection response for the two real maize (Zea mays L.) F2 populations, respectively. Mean1 is the average of the three n■  selection cycles, whereas Mean2 (Mean1/1.5, where 1.5 is the interval between selection cycles or the time required to complete one selection cycle ) is the average of the genetic gain per year.
When we constrained three traits (GY, PHT and EHT) by vector d9 ¼ ½ 0:5 21:0 20:5 , we found results similar to those for two constraints. That is, the Mean1 of theÊ9 q values associated with traits GY, PHT and EHT (0.75, -1.50 and -0.75, respectively) overestimated the d9 values by 50% (Table 2). However, for Mean2 = Mean1/1.5, we found that theÊ9 q values associated with traits GY, PHT and EHT (0.50, -1.0 and -0.50, respectively) were equal to the d9 values. Thus, for dataset 1, the values of the vector of restrictions, d9 ¼ ½ 0:5 21:0 and d9 ¼ ½ 0:5 21:0 20:5 , were closer to Mean2 than to Mean1 values. This means that the estimated maximized expected genetic gain per trait estimated the genetic gain per year better than the genetic gain per selection cycle for dataset 1.
For dataset 2 (p ¼ 0:10 and k ¼ 1:755), with two (d9 ¼ ½ 0:5 21:0 ) and three (d9 ¼ ½ 0:5 21:0 20:5 ) constraints, Mean1 for theÊ9 q values associated with traits GY and PHT and those associated with GY, PHT and EHT, were closer to the d9values than the Mean2 values (Table 2). In this case, the estimated expected genetic gain per trait estimated the genetic gain per year better than the genetic gain per selection cycle.
Dataset 1 gave higher estimated maximized CLGSI expected genetic gains per trait and genetic gains per year than dataset 2. This means that the number of genotypes affected the estimated values of the expected genetic gains (Table 1). The number of genotypes in real dataset 1 for cycles 0, 1 and 2 were 247, 320 and 303, respectively, whereas for dataset 2, the number of genotypes for those cycles were 181, 274 and 274 (Table 1), respectively.

Estimated maximized CLGSI selection response:
The results of the estimated maximized CLGSI selection responses for both datasets and constraints, were as follows. For dataset 1, the Mean1 of the estimated maximized CLGSI selection responses for two and three constraints were 4.24 and 4.29, respectively, while for dataset 2, those estimated values were 2.79 and 3.01. For dataset 1, the Mean2 of the estimated maximized CLGSI selection responses for two and three constraints were 2.82 and 2.99, respectively, while for dataset 2, those estimated values were 1.86 and 2.01. These results indicate that dataset 1 gave higher estimated CLGSI genetic gains per year. Again, we explain these results by noting that the number of genotypes in real dataset 1 for cycles 0, 1 and 2 were 247, 320 and 303, respectively, n■  while for dataset 2, the number of genotypes for those cycles were 181, 274 and 274 (Table 1), respectively. This means that, in effect, the number of genotypes affected the estimated values of the selection response.
When we compared the CLGSI results (Table 2) with unconstrained LGSI results (Table 3), we found that the estimated CLGSI expected genetic gains per trait were different to the estimated unconstrained LGSI expected genetic gains per trait for both datasets. However, the estimated selection responses of both indices were very similar (Tables 2 and 3); this means that the two sets of constraints imposed on the CLGSI when we obtained its vector of coefficients mainly affected the CLGSI expected genetic gain per trait, as we would expect.

Simulated data
Correlation of the GEBV With the true breeding values of the traits: Figure 3 presents the estimated correlations between the GEBVs and true breeding values for four traits in six (C2 to C7) selection cycles. Each selection cycle contains four columns: the first column (from left to right) corresponds to the correlation between the GEBV and the T1 true breeding values; the second column corresponds to the correlation between the GEBV and the T2 true breeding values, and so on. In this figure, all correlation values tend to decrease. In C7, the correlation values between the GEBVs and the traits' true breeding values were 0.40, 0.55, 0.54, and 0.50 for each of the four traits, respectively, whereas in cycle two (C2), these correlations were 0.52, 0.74, 0.69, and 0.73 for each of the four traits, respectively. In percentage terms, the correlation values of C7 were only 76%, 74%, 78%, and 68% of the correlation values in C2. That is, the correlation between the GEBVs and the traits' true breeding values decreased more for traits 2 and 4 than for traits 1 and 3. We can explain these results by the number of QTL that affected each trait and the size of the QTL effects on the traits in each selection cycle. In all selection cycles, the estimated correlations were higher than or equal to 0.45; thus, the GEBVs obtained with the simulated data were good predictors of the individual breeding values, and so the CLGSI was a good predictor of the net genetic merit because the CLGSI is a linear combination of GEBV. Tables 4 and 5 present the estimated maximized CLPSI and CLGSI parameters, whereas Table 6 presents the unconstrained estimated maximized linear phenotypic and genomic selection index (LPSI and LGSI, respectively) parameters for seven simulated selection cycles. Céron-Rojas and Crossa (2018, Chapters 2 and 5) have given details of how to estimate the LPSI and LGSI parameters. We imposed two sets of constraints on the CLPSI and CLGSI expected genetic gains for seven selection cycles (C1 to C7). The vector for two constraints imposed on traits T1 and T2 was d9 ¼ ½ 5:0 22:0 , whereas the vector for three constraints imposed on traits T1, T2 and T3 was d9 ¼ ½ 5:0 22:0 3:0 . The estimated theta values were all positive for the simulated dataset in the seven selection cycles for both sets of constraints, which indicates that CLPSI and CLGSI move the population means in the desired direction.

Estimated maximized CLPSI and CLGSI parameters:
We shall compare the estimated maximized CLPSI parameters to the estimated maximized CLGSI parameters using the results in Tables 4 and 5. In Tables 4 and 5, there are three means: Mean1 and Mean2 (Mean3) = Mean1/L (L= interval between selection cycles) associated with the estimated maximized CLPSI and CLGCI expected genetic gain per trait and selection response, respectively. Mean1 is the average of the seven selection cycles, whereas, for the CLPSI, Mean2= Mean1/4 (4= interval between selection cycles, L ) is the average genetic gain per year. For the CLGSI, L = 1.5 , from where Mean3= Mean1/1.5. This means that the interval between selection cycles is a characteristic of each index and that the interval was shorter for the CLGSI than for the CLPSI.
The total proportion retained for both indices was p ¼ 0.10 (k ¼ 1.755) for all seven selection cycles. Each estimated CLPSI and CLGSI expected genetic gain per traitÊ9 q (q ¼1, 2, . . ., 7) for two and three constraints is a vector of values associated with the expected genetic values of traits T1, T2, T3 and T4 (Equation 3). Some authors (Cerón-Rojas et al. 2016; Céeron-Rojas and Crossa 2018, Chapter 3) have described how to obtainÊ9 q for the CLPSI. In Appendix 2 (Equations A9 and A10), we describe how to estimate the maximized CLGSI selection response and expected genetic gain per trait.
Estimated and maximized CLPSI and CLGSI expected genetic gain per trait: For two constraints (d9 ¼ ½ 5 22 ), the Mean1 of the estimated CLPSI expected genetic gain values associated with traits T1 and T2 were 6.99 and -2.80, respectively, whereas the Mean1 of the estimated CLGSI expected genetic gain values associated with traits T1 and T2 were 5.91 and -2.36, respectively. This means that the CLPSI overestimated the d9 values by 38.80%, whereas the CLGSI overestimated those values by 18.20%.
For three constraints (d9 ¼ ½ 5:0 22:0 3:0 ), the Mean1 of the estimated CLPSI expected genetic gain values associated with traits T1, T2 and T3 were 5.86, -2.34 and 3.52, respectively, whereas the Mean1 of the estimated CLGSI expected genetic gains associated with those traits were 4.69, -1.87 and 2.81, respectively. This means that while the CLPSI overestimated the d9 values by 17.20%, the CLGSI underestimated those values by only 6.20%. Thus, the expected genetic gain per trait per selection cycle of the CLGSI was closer to the true gain than was the expected gain of the CLPSI. In addition, because the interval between selection cycles was higher for the CLPSI (L = 4) than for the CLGSI (L = 1.5), the genetic gain per year associated with the expected genetic gain per trait was also higher for the CLGSI than for the CLPSI (Tables 4 and 5).
Finally, note that the estimated LPSI and LGSI expected genetic gains per trait (Table 6) were more different from the vectors of constraints, than the estimated CLPSI and CLGSI expected genetic gains per trait. Thus, the two sets of constraints imposed on the CLGSI (CLPSI) when we obtained its vector of coefficients, affected the CLGSI (CLPSI) expected genetic gain per trait, as we would expect.
Efficiency of the CLPSI and CLGSI selection response: In this subsection, we shall compare the average (Mean1) of the estimated CLPSI and CLGSI selection response to the true selection response obtained for both indices (Tables 4 and 5). In Appendix 2 (Equations A11a and A11b), we indicate how we obtained the maximum true selection response for LPSI (LGSI) and for CLPSI (CLGSI). The maximum possible true selection responses for the economic index was the same for the genomic and phenotypic indices when we started the selection process, as this did not depend on the weights used in a particular selection index. However, from selection cycle two to selection cycle seven, the maximum possible true selection responses for both indices were different (Ceron-Rojas et al. 2015).  (Fig. 2a) and three (Fig. 2b) constraints for real Datasets 1 and 2, in cycles 1 and 2, respectively. Again, let p ¼ 0.10 (k ¼1.755) be the total proportion retained for both indices for all seven selection cycles with two set of constraints, as indicated in the last subsection. For two constraints, the average of the estimated CLPSI selection response (13.91) explained 84.66% of the true selection response (16.43), while the average of the estimated CLGSI selection response (11.94) explained only 80.0% of the true selection response (14.89) ( Table 4). In a similar manner, for three constraints, the average of the estimated CLPSI selection response (13.14) explained 89.02% of the true selection response (14.76), while the average of the estimated CLGSI selection response (10.63) explained only 79.0% of the true selection response (13.41) ( Table 5). That is, in this case, the estimated CLPSI selection response was closer to its true response than the estimated CLGSI selection response was to its true response. Note, however, that because the interval between selection cycles was higher for the CLPSI (L = 4) than for the CLGSI (L = 1.5), the genetic gain per year associated with the selection response was higher for the CLGSI than for the CLPSI (Tables 4 and 5) for both constraints.
Finally, when we compared the estimated CLPSI and LPSI selection responses, we observed that they were very similar. The same was true when we compared the estimated CLGSI and LGSI selection responses (Tables 4 and 6). This means that the two sets of constraints imposed on the expected genetic gains of both indices did not affect the selection response of the indices, at least for this dataset.

CLGSI (CLPSI) expected genetic gain per trait and selection response
For the two real and simulated datasets, the CLGSI and CLPSI results indicated that CLGSI was more efficient than CLPSI per unit of time, but not per selection cycle, when the criterion for comparing the efficiency of both indices was the estimated selection response.
However, when the criterion used to compare the efficiency of both indices was the estimated expected genetic gain per trait, the CLGSI was more efficient than the CLPSI per unit of time and per selection cycle. In addition, we found that when we compared the estimated CLPSI selection response with the LPSI selection response, they were very similar. The same was true for CLGSI and LGSI when we compared their selection responses. This means that the two sets of constraints imposed on the CLGSI (CLPSI) when we obtained its vector of coefficients, mainly affected the estimated and maximized CLGSI (CLPSI) expected genetic gain per trait, not its estimated selection response.

Interval between selection cycles
For the two real and simulated datasets, the time required to conduct a selection cycle is a function of the time required to collect the data needed to estimate the index parameters. For CLPSI, the interval between selection cycles was 4, while for the CLGSI it was 1.5 . The interval between selection cycles was the main factor that made CLGSI a more efficient index than CLPSI when we measured the efficiency of both indices by the genetic gains per year. It could also be the main justification for implementing genomic selection programs instead of phenotypic selection programs.

Accuracy of the GEBV and the CLGSI values
For the simulated datasets, the average of the accuracy of the GEBV and the estimated correlation of the CLGSI with the net genetic merit were higher than or equal to 0.5. The accuracy of the GEBV in cycle 2 was 0.67, while in cycle 7 it was 0.5. The average of the estimated correlation of the CLGSI with the net genetic merit (data not presented) for the seven selection cycles was 0.8 for two constraints and 0.72 for three constraints. Thus, the correlation between the estimated CLGSI and the net genetic merit decreases when the number of constraints increased. Céron-Rojas and Crossa (2018; Chapters 5 and 6) have given a method for estimating the correlation of the CLGSI with the net genetic merit. In addition, Cerón-Rojas et al. (2016) found similar results in the CLPSI context. These results indicated the reliability of the GEBV for predicting the breeding value of the traits and of the CLGSI for predicting the net genetic merit; thus breeders can use the CLGSI as a good option for performing GS when there are genotyped individuals.
The multivariate normality assumption Based on the normality assumption of the estimated CLGSI (CLPSI) and GEBV values, we developed and applied the CLGSI (CLPSI) to real and simulated data. The histograms of the GEBV and the CLGSI values indicated that these values approached the normal distribution. The multivariate normality distribution is very important for breeding plant and animal quantitative traits because these traits show continuous variability and are the result of many gene effects interacting among themselves and with the environment. That is, quantitative traits are the result of unobservable gene effects distributed across plant or animal genomes, which interact among themselves and with the environment to produce the observable characteristic plant and animal phenotypes (Falconer and Mackay 1996). Under the multivariate normal distribution assumption, the traits under selection can be described using only means, variances, and covariances. In addition, if the traits are not correlated, they are independent; linear combinations of traits are also normal; and even when the trait phenotypic values do not have that distribution, the normal distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, by the central limit theorem, have multivariate normal distribution (Rencher 2002). By this reasoning, the fundamental assumptions in this work were that the GEBVs have multivariate normal distribution, while the net genetic merit and the index have bivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic values was linear.
Criteria for evaluating the relative efficiency of the indices We used four main criteria to compare CLGSI vs. CLPSI efficiency when performing genomic and phenotypic selection. Those criteria were the estimates of the theta values, the expected genetic gain per trait values, the selection response, and the interval between selection cycles. For real and simulated data, the estimated theta values were always positive, as we would expect. The estimated expected genetic gain per trait indicated how close these estimates were to the constraints imposed by the breeder for each trait, whereas the estimated selection response predicts the mean value of the net genetic merit in the progeny population. The interval between selection cycles is the time required to collect information to evaluate the index and complete one selection cycle. The four criteria were useful for evaluating and comparing the efficiency of both indices.

CONCLUSION
We compared the relative efficiency of a CLGSI vs. a CLPSI. We determined the efficiency of both indices based on four criteria using real and simulated datasets. In both type of datasets, we found that the CLGSI genetic gain per year was higher than the CLPSI genetic gain per year because the CLGSI interval between selection cycles was shorter than the CLPSI interval. Therefore, breeders should use the CLGSI when performing selection.

APPENDIX 1 Genomic breeding values
In the training population, the i th phenotypic value (y i ) can be denoted as y i ¼ g i þ e i , where g i is the breeding value (i.e., the average additive effects of the genes an individual receives from both parents) and e i is the residual. It is assumed that g i and e i are independent and that both have normal distribution with an expectation equal to zero and variance s 2 gi and s 2 ei , respectively. The vector of genomic breeding values z9 i ¼ ½ z i1 z i2 ⋯ z in associated with the vector of observations y9 i ¼ ½ y i1 y i2 ⋯ y in (n ¼ number of observations for trait i, i ¼1, 2,. . .,t; t ¼ number of traits) of the candidates for selection for trait i can be written as where X is an n · m matrix (n ¼number of observations and m ¼number of markers in the population) of coded marker values (2 2 2q, 1 2 2q and 22q for genotypes AA, Aa, and aa, respectively) associated with the additive effects of the quantitative trait loci (QTL), and u i is an m · 1 vector of the additive effects of the QTL associated with markers that affect the i th trait. It is assumed that z i has multivariate normal distribution (MVN) with mean 0 and covariance matrix Gs 2 z i , where s 2 z i is the additive genomic variance of z i and G ¼ XX9=c is the n · n additive genomic relationship matrix; c ¼ P m j¼1 2q j ð1 2 q j Þ in an F 2 population; q is the frequency of allele A and 1 2 q is the frequency of allele a in the j th marker (j ¼ 1; 2; :::; m).
Let g ij (i ¼1, 2,. . .,t, t ¼number of traits; j ¼ 1; 2; :::; n, n ¼number of observations) be the j th element of the vector of true genotypic breeding values g9 i ¼ ½ g i1 g i2 ⋯ g in ; then the covariance between z ij and g ij is Covðg ij ; z ij Þ ¼ s 2 zi . That is, the covariance between z ij and g ij is equal to the additive genomic variance of z ij (Dekkers 2007). Let z9 ¼ ½ z 1 z 2 ⋯ z t and g9 ¼ ½ g 1 g 2 ⋯ g t be 1 · t vectors of genomic and true breeding values, respectively, for t traits; then is the covariance matrix of genomic breeding values.

Estimating marker effects
Let u9 ¼ ½ u9 1 u9 2 ⋯ u9 t be a vector 1 · nt associated with t traits. The i th vector u i (i ¼1, 2,. . .,t) of marker effects in the training population can be estimated asû i ¼ c 21 X9½G þ yI n 21 ðy i 2 1m i Þ, where y ¼ ; s 2 gi , s 2 ei , and the other parameters were defined earlier. In addition, we can estimate vector u9 ¼ ½ u9 1 u9 2 ⋯ u9 t in the multi-trait context aŝ u ¼ c 21 W9½ðI t 5GÞ þ ðN5I n Þ 21 ðy 2 m51Þ; where W ¼ I t 5X, "5" denotes the Kronecker product (Schott 2005); c and X were defined in Equation (A1); N ¼ RC 21 , R and C are the residual and genotypic covariance matrices for t traits, respectively; y9 ¼ ½ y9 1 y9 2 ⋯ y9 t MVN(m, V) is a vector of size 1 · tn, with covariance matrix V ¼ C5G þ R5I n ; I t and I n are identity matrix of size t · t and n · n, respectively; m9 ¼ ½ m 1 m 2 ⋯ m t is a vector 1 · t of means associated with vector y, and 1 is a vector n · 1 of 1's.

Estimating genomic breeding values
We can estimate the vector z9 ¼ ½ z9 1 z9 2 ⋯ z9 t 1 · nt of genomic breeding values for t traits in the testing population aŝ z ¼ Wû: Equation (A4) is the vector of GEBV for the multi-trait case and W ¼ I t 5X (Equation A3). Thus, in the testing population, the only thing that will change in Equation (A4) will be the coded values in matrix X, whileû will be the same in each selection cycle. In Equations (A3) and (A4), we assumed that m, C and R are known.
Estimating the genomic covariance matrix G Letẑ j ¼ Xû j andẑ i ¼ Xû i (Equation A4) be the genomic estimated breeding values (GEBVs) of z j ¼ X i u j and z i ¼ Xu i (Equation A1), respectively, and denote bym j andm i the arithmetic means of the values ofẑ j andẑ i . We can estimate matrix G (Equation A2) when there is no phenotypic information, asĜ ¼ Èŝ . ., t); g is the number of genotypes; 1 is a g · 1 vector of 1's and G ¼ c 21 XX9 is the additive genomic relationship matrix (Ceron-Rojas et al., 2015, for details).
To obtain the CLGSI vector of coefficients (b) that maximizes Equations (2) and (3), we will minimize the mean squared difference between H ¼ w9g and I ¼ b 0 z (E½ðH2IÞ 2 ) with respect to b under the restriction D9U9Gb ¼ 0.
Let M9 ¼ D9U9G and suppose that G, U9, D9 and w are known. To minimize E½ðH2IÞ 2 under the restriction M9b ¼ 0, we need to minimize the function Fðb; vÞ ¼ b9Gb 2 2w9Gb þ 2v9M9b (A6) with respect to vectors b and v9 ¼ ½ v 1 v 2 ⋯ v r21 , where v is a vector of Lagrange multipliers. Equation (A6) derivative results are: , from where the vector that minimizes E½ðH2IÞ 2 under the restriction M9b ¼ 0 is where K ¼ ½I t 2 Q, Q ¼ UDðD9U9GUDÞ 21 D9U9G and I t is an identity matrix of size t · t. If in Equation (A7), d i ¼ 0 for all traits, then D ¼ U and the CLGSI will be similar to the Kempthorne and Nordskog (1959) index, but in the genomic selection context. When D ¼ U and U9 is a null matrix, b ¼ w, the vector of economic weights or the unconstrained LGSI vector of coefficients (Ceron-Rojas et al., 2015). Thus, the CLGSI is more general than the LGSI and includes the LGSI as a particular case.
Estimating the maximized CLGSI selection response and expected genetic gain per trait The estimated maximized CLGSI selection response (Equation 5) iŝ while the estimated optimized CLGSI expected genetic gain per trait (Equation 3) iŝ

Maximized true selection responses
We obtained the maximized true selection response for the unconstrained linear phenotypic and genomic selection indices (LPSI and LGSI, respectively) as respectively, where matrices C and G are the covariance matrix of true genotype values associated with the phenotypic (C) and the genomic values (G, Equation A2) in each selection cycle, k is the selection intensity and w is a vector of economic weights. In addition, the maximized true selection responses for CLPSI and CLGSI are respectively, where b C ¼ K C w and b G ¼ K G w are the vector of coefficients; K C ¼ ½I t 2 Q C , K G ¼ ½I t 2 Q G , Q C ¼ UDðD9U9CUDÞ 21 D9U9C, and Q G ¼ UDðD9U9GUDÞ 21 D9U9G (Equation A7). Matrices C and G were defined in Equation (A11a). We obtained these last parameters only for the simulated datasets.

APPENDIX 3
The U9 matrix of restrictions In addition to matrix D, to obtain the CLGSI (CLPSI) vector of coefficients, we need the matrix U9 ðt 2 rÞ · t(t ¼ number of traits; r ¼number of traits restricted), which is a matrix of 1's and 0's (1 indicates that the trait is restricted and 0 that the trait is not restricted), and depends on the number of restricted traits. Thus, suppose that we restrict only one of t traits; then, we can restrict the first of them as U9 ¼ ½ 1 0 0 ⋯ 0 , the second as U9 ¼ ½ 0 1 0 ⋯ 0 , the third as U9 ¼ ½ 0 0 1 ⋯ 0 , etc. When we restrict two of t traits, matrix U9 could be constructed as follows. We can restrict the first and second traits as