The marker effects of a single-step random regression model for four test-day traits in German Holstein

Single-step genomic model has become the golden standard for routine evaluation in livestock species, like Holstein dairy cattle. The single-step genomic model with direct estimation of marker effects has been proven to be efficient in accurately accounting for millions of genotype records. For diverse applications including frequent genomic evaluation updates on a weekly basis, estimates of the marker effects from the single-step evaluations play a central role in genomic prediction. In this study we focused on exploring the marker effect estimates from the single-step evaluation. Phenotypic, genotypic and pedigree data were taken from the official evaluation for German dairy breeds in April 2021. A multi-lactation random regression test-day model was applied to more than 242 million test-day records separately for 4 traits: milk, fat and protein yields and somatic cell scores. Nearly one million genotyped Holstein animals were considered in the single-step genomic evaluations including about 21 million animals in pedigree. Deregressed multiple across country breeding values of Holstein bulls having daughters outside Germany were integrated into the national test-day data to increase the reliability of genomic breeding values. To assess the stability and bias of the marker effects of the single-step model, test-day records of the last 4 years were deleted and the integrated bulls born in the last 4 years were truncated from the complete phenotypic data set. Estimates of the marker effects were shown to be highly correlated, with correlations around 0.9, between the full and truncated evaluations. Regression slope values of the marker effect estimates from the full on the truncated evaluations were all close to their expected value, being about 1.03. Calculated us-ing random regression coefficients of the marker effect estimates, drastically different shapes of genetic lactation curve were seen for 2 markers on chromosome 14 for the 4 test-day traits. The contribution of individual chromosomes to the total additive genetic variances seemed to follow the polygenic inheritance mode for protein yield and somatic cell scores. However, chromosome 14 was found to make an exceptionally large contribution to the total additive genetic variance for milk and fat yields because of markers near the major gene DGAT1. For the first lactation test-day traits, we obtained nearly zero correlations of chromosomal direct genomic values between any pair of the chromosomes; no spurious correlations were found in our analysis, thanks to the large reference population. For trait milk yield, chromosomal direct genomic values appeared to have a large variation in the between-lactation correlations among the chromosomes, especially between first and second / third lactations. The optimal features of the random regression test-day model and the single-step marker model allowed us to track the differences in the shapes of genetic lactation curves down to the individual markers. Furthermore, the single-step random regression test-day model enabled us to better understand the inheritance mode of the yield traits and somatic cell scores, e.g., variable chromosomal contributions to the total additive genetic variance and to the genetic correlations between lactations.


INTRODUCTION
Genomic selection using single nucleotide polymorphism (SNP) information has revolutionized dairy cattle breeding (VanRaden, 2008) based on the concept of the genomic model (Meuwissen et al., 2001).A single-step genomic BLUP model (ssGBLUP; Aguilar et al., 2010;Christensen and Lund 2010) utilizes simultaneously all phenotypic, genotypic and pedigree data to conduct genomic prediction.Therefore, ssGBLUP model has become the golden standard for routine genomic evaluation in livestock species like Holstein.In comparison to the ssGBLUP model, a single-step SNP BLUP model (ssSNPBLUP; Liu et al., 2014) estimates additive genetic effects of the SNP markers jointly with all other effects of the single-step model The marker effects of a single-step random regression model for four test-day traits in German Holstein and has been proven to be most efficient for large-scale genomic evaluation in livestock (Vandenplas et al., 2019;2021;Osawa et al., 2022;Alkhoder et al., 2022).The SNP effects estimated with the phenotypic data in genotyped animals play a central role for predicting genomic breeding values (GEBV) of particularly young selection candidates without their own performance.Moreover, frequent genomic evaluation such as weekly genomic evaluation in German Holstein requires accurate estimates of the SNP marker effects.
Before the genomic selection, genetic improvement was mainly based on conventional genetic evaluation that focused on optimal modeling of diverse types of phenotype data.For example, test-day milk yields were analyzed with a so-called random regression test-day model (RRTDM, Jamrozik et al., 1997;Schaeffer et al., 2000;Liu et al., 2004).However, the complex conventional model like RRTDM has not been able to be considered in the current 2-step genomic evaluation which usually used a single-trait or single-effect genomic model (Alkhoder et al., 2022).Oliveira et al. (2019) applied the ssGBLUP model to the test-day traits of Canadian dairy cattle breeds using a multiple-trait RRTDM model for their single-step genomic evaluation.Alkhoder et al. (2022a) conducted a single-step evaluation of test-day traits in German dairy cattle using a ssSNPBLUP RRTDM model.Both studies demonstrated the combined advantages of the single-step model as well as the RRTDM model.
According to the theory of the RRTDM model, additive genetic effects of the SNP markers from the ssSNPBLUP model are expressed in form of random regression coefficients (RRC).For the multiple-lactation RRTDM model (Liu et al., 2004;Alkhoder et al., 2022), every SNP marker received estimates of the RRC effects for each of the lactations, with which SNP effects on a 305-d lactation basis can be calculated thereafter.To our knowledge, little has been investigated in the literature yet about the properties of the SNP effects in the forms of RRC for milk production traits in dairy cattle.Furthermore, we would like to learn how the SNP effects change between genomic evaluations across time.
The objectives of this study were 1) to assess the stability and bias of the SNP effect estimates of the ssSNPBLUP model, 2) to evaluate the shape of genetic lactation curves by individual SNP markers, and 3) to compare chromosomal contributions to the additive genetic variance and genetic correlations between lactations.

A single-step SNP BLUP multiple-lactation random regression test-day model
Test-day milk yields (MKG), fat yields (FKG), protein yields (PKG) or somatic cell scores (SCS) from first 3 lactations of a cow were analyzed with the following RRTDM model for dairy cattle in Germany (Liu et al., 2004): where y is a vector of the cow's test-day yields or SCS in first 3 lactations that have been pre-adjusted for heterogeneous variances (Reents et al., 1998), h is a vector of fixed effects of herd-test-date-parity-milking-frequency associated with her test-day records, f is a vector of fixed lactation curve effects, modeled as regressions on days in milk (DIM) using Wilmink function (Liu et. al., 2004), p is a vector of permanent environmental effects that are expressed as RRC, u is a vector of additive genetic effects or breeding values of the cow also as RRC, and e is a vector of random residuals.For modeling the effect u, normalized orthogonal Legendre polynomials with 3 parameters are used, e.g., for the additive genetic effects: with values ranging from −1 to 1 and DIM with the range x ~ [5,305] (Liu et al., 2004) and u i is the ith RRC (i = 1,2,3) of breeding value as regression coefficient of variable v on test-day yield or SCS.The RRC u 1 , u 2 and u 3 are the intercept, slope, and a quadratic term, respectively.The same Legendre polynomials were used for effect p. Design matrix V for either u or p contains the Legendre coefficients for the first 3 lactations of the cow.X h is the incidence matrix for the fixed herd-test-date-parity-milking-frequency effects associated with this cow's test-day records h, X f is the design matrix for the fixed lactation curve effects f.The joint distribution of the 3 RRC in first 3 lactation for the additive genetic effects of the cow is where V i denotes the design matrix for this cow's test-day records in lactation i (i = 1,2,3) and U 0_ij the genetic covariance matrix of RRC between lactation i and j, which is defined as where and σ kl ij _ represent additive genetic variance of k-th RRC (k = 1,2,3) and covariance between k-th and l-th (l = 1,2,3) RRC for the lactation pair i and j, respectively.Similar joint distribution is also defined for the permanent environmental RRC.The additive genetic and permanent environment effects are correlated between the lactations.However, no residual correlations are assumed between the lactations for the multi-lactation test-day model.
Table 1 shows genetic parameters of the multi-lactation RRTDM model for test-day MKG.Genetic parameters for the other 3 test-day traits can be found in Liu et al. (2004).The first genetic RRC, the intercept, has the highest heritability value, among the 3 RRCs, for any of the lactations.A negative genetic correlation is seen between the first and third RRC, the quadratic term of the Legendre polynomials, in each of the lactations.On the 305-d lactation basis, the lactations are genetically highly correlated, particularly between the second and third lactation.Phenotypic correlations of 305-d yields between the lactations in Table 1 are clearly lower than the corresponding genetic correlations.
According to the ssSNPBLUP model (Liu et al., 2014), genomic breeding values of all genotyped animals, u all, sorted by RRCs within animal, are further decomposed into 2 components: u all = Zg + a, [5] where matrix Z is a design matrix containing regression coefficients on animals' genotype data (Liu et al., 2016), and g is a vector of all SNP marker effects also in form of RRC as u all with the joint distribution: where k represents percentage of additive genetic variance not explained by the SNP markers, ⊗ denotes Kronecker product, and matrix B being where m represents the number of SNP markers, and p j is allele frequency of the marker i of the current population, and I is the identity matrix.The vector a is a vector of residual polygenic effects (RPG) for all where A gg represents pedigree relationship matrix for all the genotyped animals.It was assumed that RPG explained 30% additive genetic variance, k = 0.3, for any of the test-day traits (Alkhoder et al., 2022).A total of 45,613 SNP markers, m = 45,613, were selected for genomic evaluation in German Holstein, covering all 31 chromosomes including 29 autosomes and the sex chromosomes.The selection of the SNP marker set was primarily based on the minor allele frequency with a minimum of 0.01 in the German Holstein genotyped population.At the time of setting the SNP marker set, Illumina Bovine SNP 50K chips were used as the standard chip for German genomic evaluation.Table 2 gives numbers of SNP markers selected for all the chromosomes for genomic evaluation of German Holstein.
In the German dairy cattle evaluation (Liu et al., 2004), breeding values on 305-d basis for the 3 yield traits or on an average daily basis for SCS were weighted across 3 lactations and were defined as the official breeding values (u comb ): where w l represents the weight for lactation l, u l 305 305d lactation breeding value of lactation l, u lj breeding value of j-th RRC of lactation l, and v i the standardized day in milk i.
The official breeding values expressed on the combined lactation basis were submitted to Interbull's bull multiple across country evaluation (MACE).Deregressed MACE estimated breeding values (EBV) of bulls from the MACE evaluation are analyzed with a single-trait animal model: [10] where y m represents deregressed EBV (DRP) of the bulls for a MACE trait, like MKG, FKG, PKG or SCS, µ is a general mean, 1 is a vector of 1's, u m is s vector of breeding values of the bulls for the MACE trait and e m is a vector of residual effects of the bulls for the MACE trait with where σ e m 2 is residual variance of the MACE trait, matrix D is a diagonal matrix containing reciprocal values of effective daughter contribution (EDC) of the bulls expressed on an animal-model basis.For genotyped MACE bulls, then their GEBV can be decomposed as where g m is a vector of SNP effects for the MACE trait, Z m is a design matrix containing regression coefficients on the bulls' genotype data, a m is a vector of RPG effects of the bulls.The RPG effects of the genotyped bulls with MACE data have a joint distribution: where A mm represents pedigree relationship matrix for the genotyped MACE bulls and σ m 2 is additive genetic variance of the MACE trait which is equal to variance of the official breeding values u comb .Like Equation [6], joint distribution of additive genetic effects of the SNP markers for the MACE trait is To avoid double counting the contribution of domestic daughters to the bulls evaluated in MACE, adjustments were carried out for both EDC and DRP of the bull (Alkhoder et al., 2022).Because the MACE trait is the German official breeding value expressed on a combined lactation basis, its genetic correlations with the national genetic RRC were derived based on the genetic (co)variance matrix of u.Table 1 shows heritability values, genetic and phenotypic correlations for the genetic RRCs and 305-d lactations.

Direct genomic breeding values per SNP marker and per chromosome
For a genotyped animal i, direct genomic breeding value (DGV) for a selected trait is calculated, according to Liu et al. (2016), with where d i denotes DGV of the animal i for the chosen trait, g j is additive genetic effect of SNP marker j describing the substitution effect of allele A, p j is observed frequency of allele A of the j-th marker calculated from the current genotyped population, M = 45,613 is the total number of fitted SNP markers, and n ij = 0,1,2 is the number of copies of allele A in genotype BB, AB or AA, respectively.The term n ij − 2p j equals z ij defined in the design matrices of genotyped animals in Equations [5] and [12].For a specific SNP marker j, this animal's DGV can be defined, following Liu et al. (2016), as Similarly for a given chromosome c, we define DGV of this animal i as where M c1 and M cn represent starting and ending SNP marker on the chromosome c.For the genotyped animal, d i , d ic , and d ij represent genome-wide, chromosome-wide and SNP marker specific DGV, respectively.

Breeding values of the SNP markers
Same as the GEBV or DGV of evaluated animals, the SNP marker effects g in Equation [5] are expressed in RRC as well (Equation [2]).For a given SNP marker j, its 305-d lactation breeding value for lactation l is computed as where L jl represents 305-d lactation breeding value of the j-th SNP marker for lactation l (l = 1,2,3), g kl is the SNP effect of the k-th RRC (k = 1,2,3) of lactation l, and c 1 = 305, c 2 = 0, c 3 = 2.273 (Liu et al., 2004).For SCS, lactation breeding values are defined as average of breeding values of DIM 5 to DIM 305, instead of the sum of breeding values of all DIMs as for the 3 yield traits.A combined lactation breeding value for SNP marker j is a weighted average of its 305-d lactation breeding values of the first 3 lactations: where C j represents the combined lactation breeding value of SNP marker j, w l denotes relative weight on lactation l, and w 1 + w 2 + w 3 = 1.For the 3 yield traits, the 3 lactations had equal weights w w w whereas for SCS the weights are 0.26, 0.37 and 0.37 for first, second and third lactation 305-d breeding values, respectively.
Like the expression of the SNP effects on a 305-d lactation basis, marker DGV of the diverse expression forms can be expressed on different bases: SNP specific DGV calculated with Equation [16] for a given marker for a lactation, chromosome-wide DGV of a chromosome for lactation l using Equation [18], and genomewide DGV on a combined lactation basis with Equation [19] .

Phenotypic, genotypic and pedigree data for the single-step evaluation
Phenotypic, genotypic and pedigree data were obtained from the official April 2021 evaluation for German dairy cattle (Alkhoder et al., 2022).records from 12,432,940 cows of the breeds, Holstein, Red Dairy Cattle and Jersey, were analyzed together with 138,770 MACE Holstein bulls that had daughters outside Germany.The tot/al number of national cows with test-day data and integrated bulls with foreign daughters amounted to 12,571,710.All genotyped Holstein animals, including culled animals, were jointly evaluated with those animals with phenotype data, and the total number of genotyped Holstein animals was 949,636 for the April 2021 evaluation.A maximum number of 20 generations was set to trace ancestors of the genotyped or phenotyped animals.Additionally, the oldest bulls with daughters or cows with records were guaranteed to have at least 3 generations of ancestors.The pedigree file for the single-step evaluation contained 20,461,400 real animals and 176 phantom parent groups that were defined according to breed, country of origin, 4 selection paths and birth year of animals having parents missing.There were 130, 20, and 26 phantom parent groups for the breeds Holstein, Red Dairy Cattle and Jersey, respectively.Among all the cows with test-day records, 293,724 cows had genotype data.In total, 43,699 genotyped Holstein bulls had daughters either in Germany or in foreign countries included in the MACE evaluation.

A genomic validation for the single-step evaluation
Test-day records from last 4 test years, 2017 to 2021, were removed to simulate a genomic evaluation 4 years ago.Owing to the lack of MACE EBV from a truncated MACE evaluation, bull MACE evaluation from April 2021 were used instead for the validation study.The 4 youngest birth years of the bulls were deleted from the MACE data.As an extra step, national daughters of the validation bulls were additionally deleted from the truncated test-day data, if there were any daughters left in the truncated test-day data set.
All the genotype records used in the full evaluation were also considered in the truncated genomic evaluation.After the data truncation, 222,634,210 test-day records from 10,903,891 cows and 128,504 bulls with foreign daughters remained in the phenotype data for the validation study (Table 3).The numbers of genotyped bulls with daughters or genotyped cows with test-day records were reduced to 37,283 from 43,699 or to 112,759 from 293,724, respectively.Due to the routine whole-herd female animal genotyping service started in Germany in 2016, the number of genotyped female animals is much higher than that of genotyped male animals.Furthermore, more than 60% genotyped cows with test-day records from the last 4 years were removed for the validation study.Figure 1 shows the numbers of cows with test-day records or with genotype records in the full or truncated evaluations.Because of the relatively short history of the large-scale genotyping service in Germany, the number of genotyped cows with test-day records was significantly lower in the truncated data set than in the full data set, but still at a high level.

RESULTS AND DISCUSSION
Our SNP markers in Table 2 contained a total of 866 markers on the sex chromosomes: 773 on X and 93 on Y chromosome.This set of SNP markers was set up at the beginning of genomic selection based on a 2-step genomic model and a bull reference population.As argued by Fernando and Grossmann (1990), the transmission of markers on the sex chromosomes was not correctly modeled by the pedigree relationship matrix A. As Druet and Legarra (2020) pointed out, a bull does not transmit the non-pseudo-autosomal sex chromosome X to his male progeny, therefore, using the pedigree matrix A to model the transmission of sex chromosomes is not correct.Removal of the SNP markers on the sex chromosomes from routine single-step evaluation is planned for German Holstein.
However, the SNP markers on the sex chromosomes were included in this investigation.Because of most of the genotyped animals being females (Table 3), the impact of the inadequate modeling of the SNP markers on the X chromosomes in the genotyped sires should be limited.Single-step evaluations for the test-day traits with the full or truncated data sets were conducted using the software package MiX99 (Strandén and Lidauer, 1999), in which the ssSNPBLUP by Liu et al. (2014) was implemented in a special way (Esa Mäntysaari, personal communication).In addition, we applied the software package MiXBLUP (Ten Napel et al., 2020) to the same data sets.Approximately equal execution time was needed for either of the software packages to reach the pre-defined convergence criteria.However, MiXBLUP needed only a maximum RAM of 135Gb for the full data evaluation, but MiX99 required 358Gb for the same data set due to the difference in the implementation of the ssSNPBLUP model.Because identical effect estimates were obtained with the 2 software packages, we analyzed only the solutions from MiX99 for further analyses.For the single-step evaluation of trait MKG, the total number of all estimated effects was 336,548,724 with the full data set (Table 3).The total clock time per round of iteration was 1.63 min using 17 threads on a Linux server with 2x10 cores of Intel® Xeon® CPU E5-2690 v2 @3.00Ghz.Based on the convergence criteria CD or CR (Strandén and Lidauer, 1999), reasonably accurate solutions could be obtained within approximately 2000 rounds of iteration.The single-step model gave higher correlation of the SNP effect estimates between the full and truncated validation data sets than the current multi-step genomic model.Regression coefficient of the SNP effect estimates, expressed on the combined lactation basis, from the full on the truncated evaluation was close to 1 for the single-step model.Based on the results of the GEBV test on German validation bulls, the single-step model was proven to give unbiased genomic prediction for the 4 test-day traits.More details about the rate of convergence and GEBV validation results can be found in the study by Alkhoder et al. (2022).

Correlation and regression of the SNP effect estimates between the evaluations
Between the full and truncated single-step evaluations, estimates of SNP effects or marker breeding values in different expression forms were compared.First, SNP effect estimates on the combined lactation basis (Equation [19]) from the full evaluation were regressed on those from the truncated evaluation.Figure 2 shows observed correlation and regression coefficients of the SNP effect estimates in the form of combined lactation for the 4 test-day traits.The correlations of the SNP effect estimates are high, ranging from 0.895 for trait protein to 0.935 for SCS, despite the large difference in the number of genotyped and phenotyped cows with test-day records and bulls with daughters (Table 3) between the full and truncated evaluations.The regression slopes of the SNP effect estimates are all close to their expected value of 1.Because standard errors of the regression slope estimates are about 0.02, none of the slope estimates deviate significantly from their expected value.These regression slope coefficients indicate neither an inflation nor a deflation of the SNP effect estimates.

Correlation and regression of the SNP effect estimates in the RRC form.
Second, we compared the SNP effect estimates or marker breeding value estimates also in RRC form.Table 4 shows correlations of the SNP marker effect estimates between the full and truncated evaluation as well as regression slopes of the SNP marker effect estimates of the full on truncated evaluation for the testday trait MKG.The SNP effect estimates in the RRC form were highly correlated between the 2 evaluations, all being 0.88 or higher.The quadratic term of the Legendre polynomials was the least correlated among the 3 RRC coefficients, with the lowest correlation of 0.88 for the quadratic term of first lactation MKG.The highest correlation was seen for the slope term of the Legendre polynomials among all the 3 terms.Overall, the SNP effect estimates of first lactation were correlated between the 2 evaluations with the lowest correlation value among all the 3 lactations, possibly caused by the largest increase in the number of genotyped cows with first lactation test-day records between the full and truncated evaluations.Regarding the regression slopes of the SNP effect estimates in the RRC form, their estimates ranged from 0.983 for the quadratic term of the Legendre polynomials for first lactation to 1.032 for the intercept term for second lactation MKG.In general, all the SNP effect estimates in the RRC form were unbiased, because of the standard error of the regression slope being about 0.02.It appeared that the quadratic term of the SNP effect estimates has lower regression slope value than the other 2 terms, all slightly below 1.Similar results of the SNP effect estimates in the RRC form were found also for the other test-day traits, FKG, PKG and SCS, with little difference to those for the trait MKG.

Correlation and regression of the SNP effect estimates in the form of 305-d lactation.
Third, we examined the consistency of the SNP effect estimates or marker breeding value estimates on the 305-d lactation basis for the trait MKG.The correlations of the SNP effect estimates between the full and truncated evaluations were between 0.89 for first lactation to 0.92 for the second lactation, being close to the correlations of the SNP effect estimates in the RRC form.The combined lactation SNP effect estimates had a correlation of 0.91 between the full and truncated evaluations.Little difference in the SNP effect estimates in the form of 305-d lactation existed in the other traits, FKG, PKG and SCS, as the trait MKG.
Genetic correlations between the 305-d lactation and the RRC terms in Table 1 were close to 1 for the in- Table 4 also shows the regression slope values of the SNP effect estimates in the 305-d lactation form from the full evaluation on the SNP effect estimates in the RRC form from the truncated evaluation.They are also close to their expected value of 1.The regression slopes of the 305-d lactation SNP effect estimates from the full evaluation on the truncated evaluation are around 1.03 for each of the 3 lactations.On the combined 305-d lactation basis, the regression slope of the SNP effect estimates is 1.031.All the regression slope values suggest an unbiased estimation of the SNP effect estimates for the trait MKG, which was the case for all the other traits.
With more phenotypic data added to single-step evaluation, variance of the SNP effect estimates or the marker breeding value estimates is expected to increase.Table 4 shows also the ratio of standard deviations of the SNP effect estimates, in either RRC or 305-d lactation forms, of the full evaluation over the truncated evaluation.We can see that there is a 14% increase in standard deviation of the SNP effect estimates for the combined 305-d lactation, and a 15%, 13% and 12% increase for the first, second and third lactation respectively.Looking at the SNP effects in the RRC form, we observe that the variance increase is largest for the intercept term, followed by the slope and quadratic term of the Legendre polynomials.The positive correlation of the SNP effect estimates between the evaluations and the higher variance of the SNP effect estimates in the full evaluation mean that the SNP markers of larger effect will receive even larger estimates over time and those of small effect will deviate further from zero when more phenotypic data will be added to the single-step evaluation.Oliveira et al. (2019) demonstrated that the ssGB-LUP model was able to model the genetic lactation curves for GEBV of individual animals.Comparing to the ssGBLUP model that can also estimate SNP effects by back-solving GEBV, our ssSNPBLUP (Liu et al., 2014) enables direct modeling of not only animal GE-BVs but also the SNP effects in genetic lactation curves using the Legendre polynomials.Figure 3 and Figure 4 show genetic lactation curves for 2 selected SNP markers, both near the DGAT1 gene on chromosome 14, one with the largest positive effect and the other with the largest negative effect on trait MKG from the full single-step evaluation, respectively.

Marker specific genetic lactation curves
For the SNP marker, ARS-BFGL-NGS-4939 with a frequency of allele A being 0.73 and a large positive effect of allele A on trait MKG, genetic lactation curve shapes appear to differ between the trait MKG or PKG in concave form on one hand and FKG or SCS in convex form on the other hand.But no difference in the shape of the genetic curves is observed across the first 3 lactations for any of the traits.This SNP marker seems to have a biologically favorable effect on MKG and PKG and unfavorable effect on FKG and SCS.  Figure 4 shows genetic lactation curves of the SNP marker near the DGAT1 gene on chromosome 14, Hapmap30381-BTC-005750 with a frequency of allele A being 0.81 and a large negative effect of allele A on trait MKG.In contrast to Figure 3 for the SNP marker with a large positive effect on MKG, this SNP marker has a convex form of genetic lactation curve for traits MKG and PKG but a concave form for FKG and SCS.No difference in the shape of the genetic curves is observed across the lactations for any of the traits.
Furthermore, we compared the genetic lactation curves of the 2 SNP markers between the full and truncated evaluation.No change in the curve shape was seen in any of the traits or lactations.However, the 305-d lactation breeding values of the 2 SNP markers increased from the truncated to the full evaluation with more phenotype data added.This is indeed expected: the SNP effect estimates differentiated more from one another when more phenotype data were used in the genomic evaluation.
We also examined genetic lactation curves for 2 genotyped female animals, one with extremely high and the other with extremely low GEBV for trait MKG on the combined lactation basis.At the chromosomal level, the superior female animal had clearly higher GEBV average across all chromosomes than the inferior one, especially for chromosome 14.GEBV curve of first lactation for the superior female had a large positive intercept term but a concave form for chromosome 14, whereas the inferior female animal had a GEBV lactation curve, for chromosome 14, with a large negative intercept term and in convex form for MKG in first lactation.

Additive genetic variances by chromosomes
For all 293,724 genotyped cows with test-day records and 43,699 genotyped bulls with daughters in the full single-step evaluation, we computed, using Equation [17], their chromosomal DGV of each chromosome for all the 3 lactations of the 4 test-day traits.Then we calculated variances of the chromosomal DGV in the genomic reference population for each trait and lactation combination.Figure 5 shows proportions of additive genetic variances contributed by all the chromosomes for first 3 lactations and combined lactation of the 4 test-day traits, calculated as the variance of chromosome DGV divided by their sum.Due to the low number of SNP markers, 93, on the sex chromosome Y (Table 2), we plotted variance contributions only for 29 autosomes and the sex chromosome X labeled as chromosome 30.
Because of the assumption of equal SNP marker variances under the ssSNPBLUP model, the additive genetic variances contributed by chromosomes in Figure 5 should be highly correlated with the numbers of SNP markers on the chromosomes (Table 2).This can be indeed seen for trait PKG or SCS.However, for traits MKG and FKG heavily influenced by the major gene DGAT1, chromosome 14 harboring DGAT1 seems to contribute a significantly higher proportion of additive genetic variance, c.a. Fifteen to 25% for FKG and 25 to 30% for MKG, in the first 3 lactations and the combined lactation.Chromosome 14 contributes the lowest percentage of additive genetic variance for first lactation MKG (MKGlact1) and FKG (FKGlact1) among the 3 lactations.Chromosome 5 also appears to explain more additive genetic variances than expected based on the number of SNP markers for both traits MKG and FKG, likewise chromosome 6 for trait MKG and chromosome 2 for FKG.Traits PKG and SCS seem to have a more polygenic inheritance than MKG or FKG, i.e., additive genetic variances contributed by the chromosomes are in proportion to the numbers of SNP markers on the chromosomes.However, chromosomes 1, 3, 5, 6, 14 or 15 seem to contribute a little bit more than expected from the numbers of SNP markers, above 4% of the total additive genetic variance each for traits PKG or SCS.

Correlations of DGV between chromosomes
The genomic model by Meuwissen et al. (2001) treats all the SNP markers theoretically in the same way by making no distinguishment between closely linked markers on the same chromosome and SNP markers located on different chromosomes.When the number of genotyped and phenotyped animals is not sufficiently large enough, spurious correlations might be shown up between the SNP marker effect estimates, even for markers located on different chromosomes.To verify if the SNP effect estimates are not correlated between the markers located on different chromosomes, we show in Figure 6 the correlations of chromosomal DGVs of any chromosome with all the other 30 chromosomes for first lactation of the 4 traits based on the reference animals from the full single-step evaluation.The correlation of chromosomal DGV of any chromosome with itself has been excluded.It can be seen in Figure 6 that all the correlations of chromosomal DGVs between any pair of chromosomes are between −0.08 and 0.08 for all the traits in first lactation.No systematic pattern in the correlation structure can be identified in Figure 6.These low correlations of chromosomal DGV between the chromosome pairs demonstrate no spurious association of the SNP effect estimates of the markers located on different chromosomes.Additionally, nearly zero correlations between the chromosomal DGV were also found for the later lactations of the 4 test-day traits.

Correlations of chromosomal DGV between lactations
For first 3 lactations of trait MKG, we calculated correlations of chromosomal DGV between any pair of the lactations using the reference animals from the full single-step evaluation.Because of the small number of SNP markers on chromosome Y, we did not consider this sex chromosome in this calculation.Figure 7 shows the between-lactation correlations of chromosomal DGV for each chromosome.At the genome level, genetic correlation between first and second lactation is  1), we can see that there is a considerable variation in the genetic correlations between the 2 lactations on the chromosome level (Lact1-2, blue line), ranging from 0.79 for chromosome 22 to 0.98 for chromosome 14.However, average of the lactation correlations across all chromosomes is close to the assumed genetic correlation of 0.85 on the genome level.Between first and third lactations we observe a slightly higher variation in the between-lactation correlations (Lact1-3, red line) among all the chromosomes than between first and second lactations.On average, the chromosomal correlations between first and third lactations are a little lower than between first and second lactations, which is also the case for the genetic correlation between lactations on genome level shown in Table 1.Among all the chromosomes, we see that chromosome 14 has the highest correlation of chromosomal DGV between first and third lactations.For the pair of second and third lactations, genetic correlation is assumed to be 0.97 on the genome level (Table 1) and we see in Figure 7 that the correlations of chromosomal DGVs between the 2 lactations (Lact2-3, green line) are all between 0.95 and 0.99.The highest correlation between second and third lactation MKG is found again for chromosome 14, likely caused by the major gene DGAT1.

CONCLUSIONS
The single-step random regression test-day model was applied to more than 242 million test-day records and nearly 1 million genotype records in the Holstein genomic evaluation of 4 test-day traits.The SNP effects, directly estimated with all the other effects of the single-step SNP BLUP model, were proven to be accurate and unbiased, via a genomic validation study with a truncated data set.Marker specific genetic lactation curves obtained from the single-step random regression model were shown to differ in the shape of lactation curves not only between the markers but also between the test-day traits.Regarding the chromosomal contribution to the total additive genetic variance, protein yield and somatic cell scores seemed to have a pure polygenic inheritance mode, whereas chromosome 14 made a much larger contribution to the total additive genetic variance for milk and fat yields.Due to the very large genotype and phenotype data sets, chromosomal direct genomic values were almost uncorrelated between any pair of chromosomes, and no spurious correlation was observed.The chromosomes seemed to differ evidently in the correlation of direct genomic values between the lactations.Combining properties of the single-step SNP BLUP model with the random regression test-day model allowed us to explore the SNP effect estimates for a more accurate genomic selection at the level of chromosomes or markers.

Figure 1 .
Figure 1.Numbers of cows with test-day records or with genotype data by birth year for the full and truncated single-step evaluations

Figure 2 .
Figure 2. Correlation and regression values of the SNP effect estimates, expressed on the combined 305-d lactation basis, of the full and truncated single-step evaluations for the 4 test-day traits Alkhoder et al.: MARKER EFFECTS OF A RANDOM REGRESSION MODEL

Figure 3 .
Figure3.Genetic lactation curves for a SNP marker, ARS-BFGL-NGS-4939, near DGAT1 gene on chromosome 14 that has a large positive effect on milk yield (MKGlact2 for milk yield in second lactation, PKGlact3 for protein yield in third lactation, etc., units in kg or somatic cell scores)

Figure 4 .
Figure 4. Genetic lactation curves for a SNP marker, Hapmap30381-BTC-005750, near DGAT1 gene on chromosome 14 that has a large negative effect on milk yield (FKGlact1 for fat yield in first lactation, SCSlact2 for somatic cell scores in second lactation, etc., units in kg or somatic cell scores)

Figure 5 .Figure 7 .
Figure 5. Additive genetic variances contributed by the chromosomes for first 3 lactations (MKGlact1 for milk yield of first lactation, etc.) and combined lactation (PKGcombi for protein yield of combined lactation, etc.) of the 4 test-day traits

Table 1 .
Alkhoder et al.: MARKER EFFECTS OF A RANDOM REGRESSION MODEL Genetic parameters of the single-step random regression test-day model used in the genomic evaluation of milk yields for German dairy breeds

Table 2 .
Number of SNP markers by chromosome selected for genomic evaluation in German Holstein Alkhoder et al.: MARKER EFFECTS OF A RANDOM REGRESSION MODEL Table 3 summarizes the genotype and phenotype data used for the single-step evaluation.A total of 242,121,126 test-day Alkhoder et al.: MARKER EFFECTS OF A RANDOM REGRESSION MODEL

Table 3 .
Alkhoder et al.: MARKER EFFECTS OF A RANDOM REGRESSION MODEL Description of the phenotypic and genotypic data sets for the full and truncated single-step evaluations for trait milk yield in German dairy cattle

Table 4 .
Correlation and regression values of the marker effect estimates of trait milk yield between the full and the truncated single-step evaluations Correlation of marker effect estimates between the full and truncated evaluations