Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs

Improved performance of crossbred animals is partly due to heterosis. One of the major genetic bases of heterosis is dominance, but it is seldom used in pedigree-based genetic evaluation of livestock. Recently, a trivariate genomic best linear unbiased prediction (GBLUP) model including dominance was developed, which can distinguish purebreds from crossbred animals explicitly. The objectives of this study were: (1) methodological, to show that inclusion of marker-based inbreeding accounts for directional dominance and inbreeding depression in purebred and crossbred animals, to revisit variance components of additive and dominance genetic effects using this model, and to develop marker-based estimators of genetic correlations between purebred and crossbred animals and of correlations of allele substitution effects between breeds; (2) to evaluate the impact of accounting for dominance effects and inbreeding depression on predictive ability for total number of piglets born (TNB) in a pig dataset composed of two purebred populations and their crossbreds. We also developed an equivalent model that makes the estimation of variance components tractable. For TNB in Danish Landrace and Yorkshire populations and their reciprocal crosses, the estimated proportions of dominance genetic variance to additive genetic variance ranged from 5 to 11%. Genetic correlations between breeding values for purebred and crossbred performances for TNB ranged from 0.79 to 0.95 for Landrace and from 0.43 to 0.54 for Yorkshire across models. The estimated correlation of allele substitution effects between Landrace and Yorkshire was low for purebred performances, but high for crossbred performances. Predictive ability for crossbred animals was similar with or without dominance. The inbreeding depression effect increased predictive ability and the estimated inbreeding depression parameter was more negative for Landrace than for Yorkshire animals and was in between for crossbred animals. Methodological developments led to closed-form estimators of inbreeding depression, variance components and correlations that can be easily interpreted in a quantitative genetics context. Our results confirm that genetic correlations of breeding values between purebred and crossbred performances within breed are positive and moderate. Inclusion of dominance in the GBLUP model does not improve predictive ability for crossbred animals, whereas inclusion of inbreeding depression does.


Background
Crossbreeding is primarily and intensively applied in meat production systems [1], especially for swine and poultry. Crossbreeding capitalizes on heterosis effects and complementarity between breeds, and results in an increased performance of crossbred animals compared to purebred animals [1]. In terminal crossbreeding systems, selection on purebred animals to maximize their crossbred performance is the ultimate goal [2,3]. Due to the existence of genotype-by-environment interaction effects and non-additive genetic effects in combination with different allele frequencies in different breeds [3,4], the genetic correlation of breeding values between purebred and crossbred performances (r PC ) is usually lower than 1 [1,5], and therefore, purebred performance under nucleus conditions may not be an optimal predictor for crossbred performance in commercial animals [4,6].
One of the major genetic bases of heterosis is dominance [7,8]. At the level of gene action, dominance is due to interactions between alleles at the same locus [9]. In pedigree-based genetic evaluation, dominance is rarely included because large-scale datasets that comprise a high proportion of full sibs are required to obtain accurate estimates and because the computational complexity is high [10]. With the recent availability of single nucleotide polymorphism (SNP) information and the development of genomic selection, estimation of the dominance effects of SNPs has become more feasible [11,12].
Genomic evaluation has been successfully used in purebred [13,14] and crossbred populations [15][16][17]. However, these studies generally ignore the dominance effects. A number of studies have been carried out on genomic evaluation including dominance effects using either simulated [18] or real purebred data [9,12].
Recently, several studies [19,20] have tried to extend genomic evaluation including dominance effects from purebred performance to crossbred performance. However, they either used genomic information on purebred animals only [19] or applied a genomic model that assumed that all animals belong to a single population, and thus the variance components were estimated based only on the genotyped crossbred animals [20]. Nevertheless, combining purebred and crossbred information is essential to implement genetic evaluation for crossbred performance [1,19]. Furthermore, because of genotypeby-environment interaction effects and different patterns of linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTL), the effects of SNPs may be breed-specific [21]. To overcome these issues, a trivariate genomic best linear unbiased predictor (GBLUP) model that explicitly distinguishes between purebred and crossbred data and includes dominance was recently developed by Vitezica et al. [22]. This model allowed the estimation of different, yet correlated, additive and dominance marker effects in crossbred and purebred individuals. However, the empirical predictive ability of the trivariate GBLUP model has not been evaluated yet.
Thus, the current study had the following objectives: (1) to show how genomic inbreeding can be meaningfully included in GBLUP, even for crossbred animals; (2) to estimate the variance components of additive and dominance genetic effects by using data on total number of piglets born (TNB) in two Danish purebred and one crossbred pig populations using the trivariate GBLUP model; (3) to show how to derive, from variance component estimates, estimated genetic correlations of breeding values between purebred and crossbred performances in each pure breed, and also correlations of allele substitution effects between the two pure breeds; and (4) to evaluate the impact of dominance effects from genomic information on genomic evaluation by comparing accuracies of estimated genomic values in different cross-validation scenarios.

Animals and genotypes
We begin this section with a short presentation of the data used in the study, with the aim of defining the notation for the methodological developments that follow. For this study, all datasets were provided by the Danish Pig Research Centre. Data from three Danish pig populations were analyzed simultaneously: Landrace (L), Yorkshire (Y) and their reciprocal crosses (LY). Only data on TNB data for the first parity of sows in the three populations were used. In total, there were 2126, 2218 and 5143 genotyped sows with own records on TNB for L, Y and LY, respectively. Instead of using original records, corrected phenotypic values of TNB were used as dependent variables for the trivariate GBLUP model, because the pre-correction for non-genetic effects, such as herdyear-season, month at farrowing, and service sire was more accurately achieved on a larger dataset (293,339 L, 180,112 Y, and 10,974 LY). Among the crossbred animals, 7407 LY had a Landrace sire and a Yorkshire dam, while 3567 LY had a Yorkshire dam and a Landrace sire; L and Y populations were from nucleus farms and LY from a commercial farm. The litters of purebred sows were both purebred and crossbred litters. The relationship between LY-L and LY-Y are comparable since, in both cases, parents of the F1 animals are in the purebred datasets; further details about the model used for the pre-correction are in [17]. All the purebred sows had first farrowing dates between 2003 and 2013, while the crossbred sows first farrowed between 2010 and 2013. Only five of these purebred L and Y sows were dams of the LY.
The pedigrees for both purebred and crossbred sows were available and all crossbred animals were traced back to their purebred ancestors until 1994 by the DMU Trace program [23], as was done for the larger dataset used for pre-correction. Consequently, 8227 L, 9851 Y and 5143 LY individuals were in the pedigree. The dataset of pre-corrected TNB records for genotyped individuals is termed "full genomic dataset" throughout the whole paper, and it should not be confused with the larger dataset used to do the pre-correction.
For the "full genomic dataset", purebred sows were genotyped with the Illumina PorcineSNP60 Genotyping BeadChip [24], while the crossbred sows were genotyped with a 8.5 K GGP-Porcine Low Density Illumina Bead SNP chip [25]. SNP quality controls (such as: call rate for individuals ≥80%; call rate for SNPs ≥90%; minor allele frequencies ≥0.01; etc.) were applied on the same dataset in a previous study [26], which provides more details. Then, for the crossbred individuals, imputation from low density to moderate density was done by using a joint reference panel of the two pure breeds [26] using the software Beagle version 3.3.2 [27] (imputation accuracies ≥95% in terms of correlation coefficients and ≥99% in terms of correct rates between imputed and true genotypes). Finally, 41,009 SNPs were available for all the recorded purebred and crossbred sows.

Considering genomic inbreeding and heterosis
Inbreeding can be defined as the proportion of homozygous SNPs across all loci for each animal, as suggested by several authors (e.g., [28]). If there is directional dominance causing inbreeding depression [29], then inbreeding should be considered in the genetic evaluation models [30]. Otherwise, using pedigree or marker data, estimates of genetic parameters are inflated [30,31]. In Vitezica et al. [22], genomic inbreeding was fitted as a covariate and, in the current study, we prove this reasoning by using a parametric genomic model, such as a GBLUP.
Theory and evidence of directional dominance (equivalently, inbreeding depression) suggest that dominance effects of genes (here associated to markers) should have a priori a positive value for traits that exhibit inbreeding depression or heterosis. If we call d the vector of dominance marker effects, the following prior distribution is plausible: where μ d is the overall mean of dominance effects, which should be positive if there is heterosis due to dominance. A typical model for genomic prediction is that in Toro and Varona [11]: where y contains phenotypic values; Xβ stands for fixed effects and random effects other than additive and dominance effects; a is the vector of "biological" additive SNP effects, d is the vector of "biological" dominance SNP effects for each of the markers; matrix Z has entries 1, 0, −1, for SNP genotypes AA, Aa and aa, respectively, while matrix W has entries 0, 1, 0 for SNP genotypes AA, Aa and aa, respectively. e is the vector of overall random residual effects.
Typically, genetic models require a and d to have zero means, which is not true for d when directional dominance exist. Defining d * = d − E(d), then E(d * ) = 0, and Eq. (1) can be written as: The term W1µ d is actually an average of dominance effects for each individual and is equal to hµ d , where h = W1 contains the row-sums of W, i.e. individual heterozygosities (it should be noted that W has a value of 1 at heterozygous loci for an individual). Inbreeding coefficients f can be calculated as: where N is the number of SNPs. Then, the prior means hµ d can be rewritten as: The term 1N µ d is confounded with the overall mean of the model (μ), while the term f (−N µ d ) models the inbreeding depression and b = (−N µ d ) is the inbreeding depression parameter summed over the SNPs, which has to be estimated. Thus, the linear model including genomic inbreeding is, finally: Thus, we have proven why fitting overall homozygosity for the individual as a measure of inbreeding depression accounts for directional dominance.

Estimating genetic (co)variances of markers with additive and dominance effects
A trivariate model based on "biological" (genotypic) additive and dominance effects of SNPs [22,32], and including genomic inbreeding as above, was applied considering TNB as a different trait in each population: where y L , y Y and y LY contain corrected phenotypic values for purebred L, purebred Y and crossbred LY sows, (2) respectively; µ L , µ Y and µ LY are the respective means; a L , a Y and a LY are the "biological" additive SNP effects and d L , d Y and d LY are the "biological" dominance SNP effects for each of the SNPs for L, Y and LY, respectively; matrices Z and W are as above; f L b L , f Y b Y and f LY b LY model the inbreeding depression for L, Y and LY populations; e L , e Y and e LY are the overall random residual effects. Note that "biological" is used here to refer to the genotypic additive and dominance values of the SNPs, to distinguish them from the traditional treatment of quantitative genetics in terms of "statistical" effects (breeding values and dominance deviations) [32].
The above equations can be reformulated to genotypic values of individuals instead of SNPs, in order to be compatible with the classical GBLUP model and animal breeding software, such as BLUPF90 [33] and DMU [34]: Note that u and v are vectors of genotypic additive and dominance effects and therefore cannot be directly compared to breeding values and dominance deviations in the pedigree-based genetic evaluation. In addition, f is a vector of genomic inbreeding coefficients and b is a population-specific inbreeding depression parameter per unit of genomic inbreeding, respectively. Note that there is potentially inbreeding depression at the level of the crossbred animals, although, first, the numeric values of the vector f should be smaller since crossbred animals have a higher level of heterozygosity, and second, the estimates of the inbreeding depression parameters (b) do not need to be identical across the three populations, which thus gives considerable flexibility.
In terms of the genotypic additive effects u, the variances within each breed are: where σ 2 a L , σ 2 a Y and σ 2 a LY are the additive variances of SNP effects in breeds L, Y and LY, respectively. The covariances between the genotypic additive effects u are: where σ a L,Y , σ a L,LY and σ a Y ,LY are the additive covariances of SNP effects between populations L and Y, populations L and LY, and populations Y and LY, respectively. Analogous structures exist for dominance genotypic effects:

Estimation of marker-based variance components using an equivalent model
The variance components σ 2 a L , σ 2 a Y , σ 2 a LY and σ a L,Y , σ a L,LY , σ a Y ,LY in Eq. (4) cannot be estimated by regular methods or software (i.e. REML or Gibbs sampling) because they cannot be factorized out from Eq. (4). To fit such a multivariate structure, we used an equivalent model. Additional effects need to be defined, even if they are of no interest per se. For instance, the vectors of hypothetical genotypic additive effects of the genotypes of the L breed on the scale of breed Y (u L,Y ) and LY (u L,LY ) have variance- , respectively. Thus, as a whole, the genetic variance and covariance structure for the genotypic additive effects u are: where matrix Z contains elements 1, 0, −1 for the three genotypes, and is defined across the three breeds, To construct a relationship matrix similar to the classical G-matrix of GBLUP [35], Vitezica et al. [22] introduced a normalized genomic relationship matrix , where n is the number of animals across the three populations and the division by tr ZZ ′ /n scales the matrix such that the average of the diagonal elements equals 1. This alters the variances across genotypic additive effects u in the following way: where G 0 are variance components associated to the genotypic additive effects u. This structure (a Kronecker product) is compatible with animal breeding software for BLUP and REML and the variance-covariance component G 0 can be estimated in a straightforward manner. Then, the (co)variances of additive genotypic effects of SNPs across populations can be obtained as: The variances across genotypic dominance effects v are altered in a similar way: where D 0 contains variances and covariances associated to the genotypic dominance effects v and , where the matrix W contains elements 0, 1, 0 for the three genotypes, and is defined Then, the (co)variances of dominance genotypic effects of SNPs are: This approach, which is an extension of Vitezica et al. [22], makes it possible to estimate (co)variances of genotypic effects of SNPs in purebred and crossbred populations under a genomic model with additive and non-additive (dominance) inheritance.
Matrices Z and W, their crossproducts and the inverses of G and D were built using own programs. Genetic parameters were estimated by using average information REML with software airemlf90 [33]. Standard errors on functions of genetic parameters (i.e. standard errors on correlations) were estimated from the average information matrix using the REML-MVN method of Houle and Meyer [36].

Additive and dominance variances in purebred and crossbred populations
The additive and dominance (co)variances of genotypic effects of SNPs, either within breed or between breeds, were calculated using Eqs. (6) and (8), respectively. Using these calculated additive and dominance (co)variances of SNPs across all the SNPs, the corresponding traditional, individual-based genetic parameters can be obtained as follows. The genetic parameters obtained are directly comparable to pedigree-based estimates [32].
Consider the allele substitution effect α = a + (q − p)d . According to [32], the additive genetic variances for purebred performance (mating animals in the same breed) for breed L (σ 2 where σ 2 a and σ 2 d are the variances of additive and dominance genotypic effects of SNPs in either breed L or Y; p i and q i are allele frequencies for SNP i; indices L and Y denote the breeds Landrace and Yorkshire, respectively. For crossbred performance of say, Landrace, the allele substitution effect is α AC L = a AC L + q Y − p Y d AC L . Thus, the additive genetic variances within purebred L and Y for crossbred performance (due to gametes from the L or Y individuals in the crossbred population) are equal to: where the σ 2 AC L represents the additive genetic variance of animals in breed L when mated to animals in breed Y; the σ 2 AC Y represents the additive genetic variance of animals in breed Y when mated to animals in breed L; and σ 2 a LY and σ 2 d LY are the variances of additive and dominance genotypic effects of SNPs in the crossbred LY population, respectively. The additive genetic variance for animals in the crossbred LY population (σ 2 AC LY ) is the sum of the additive genetic variance of Landrace alleles and that of Yorkshire alleles in the crossbred animals [22] as follows: Note that this variance is not the additive genetic variance of the crossbred animals acting as reproducers (i.e., creating an F2) [37].
The additive genetic covariances between purebred and crossbred performances within breeds L (σ AP L ,AC L ) and Y (σ AP Y ,AC Y ) are:

Correlations of allele substitution effects between two breeds
The breeding value of an individual includes the allele substitution effects of all genes and the allele frequencies.
For purebred performance, the allele substitution effects of one locus for breed L and Y are: where a is the additive effect and d is the dominance effect for each SNP; p i and q i are allele frequencies for SNP i, with superscripts denoting breeds L or Y. In the case of purely additive gene action, the covariance between α L and α Y is σ α L,Y , which can be interpreted as a genetic correlation among populations [38][39][40]. Then, the covariance between the allele substitution effects of one locus is: where σ a L,Y and σ d L,Y are the additive and dominance covariances of SNP effects between breeds L and Y for additive and dominance, respectively. If we assume that where σ a L,LY and σ d L,LY are the covariances of SNP effects between purebred L and crossbred LY populations for additive and dominance, respectively; σ a Y ,LY and σ d Y ,LY are the covariances of SNP effects between purebred Y and crossbred LY populations for additive and dominance, respectively. Therefore, the genetic correlations of breeding values between purebred and crossbred performances within L (r PC L ) and Y (r PC Y ) are: According to [22], the dominance genetic variances within purebred populations L and Y are σ 2 respectively. The dominance genetic variance in crossbred LY animals is The broad sense heritabilities for purebred performance (H 2 P ) were calculated as the ratio of total genetic variances for purebred performance (σ 2 AP + σ 2 D ) to phenotypic variances (σ 2 AP + σ 2 D + σ 2 e ).
SNP effects (both additive and dominance) are independent across loci, then the covariance between the allele substitution effects across all n loci is: Also, the variances of allele substitution effects across all n loci for breeds L and Y are: where σ 2 a and σ 2 d are the additive and dominance variance of SNPs. Then, the correlation of allele substitution effects for purebred performance between populations L and Y is r αP L ,αP Y = If there is no dominance variation, the r αP L ,αP Y relates to additive genetic variances as The correlation of allele substitution effects for crossbred performance between populations L and Y is similar to that for purebred performance, but the allele frequencies are swapped, as:

Gen_ADM
The statistical model includes additive and dominance effects as in Eq.
(3). Genomic information was used to where σ 2 a LY and σ 2 d LY are the additive and dominance variance of SNPs in the crossbred LY population. If there is no dominance variation, the r αC L ,αC Y is equal to 1, by assumption in the model.

Scenarios
Variance components, genetic correlations of breeding values between purebred and crossbred performances (r PC ) within each pure breed and correlations of allele substitution effects for purebred (r αP L ,αP Y ) and crossbred (r αC L ,αC Y ) performance between two pure breeds were first investigated using the full genomic dataset. To explore the effects of using genomic information and the inclusion of dominance deviation on the genetic evaluation of crossbred performance in the trivariate model, three different scenarios were compared.

Nogen
The statistical model was a trivariate BLUP model, similar to Eq. (3), but the dominance deviation was excluded. Instead of using a genomic relationship matrix, a single relationship matrix A was constructed across the three breeds, assuming that they form a single population. Thus, the genetic (co)variances of additive genetic effects u were: where A 0 were variance components associated to genetic additive effects and not the genotypic additive effects in Eq. (5). Pedigree-based inbreeding depression was also included in the model. The pedigree-based inbreeding coefficients were calculated as in [41] using the software inbupgf90 [33].

Gen_AM
The statistical model was similar to Eq. (3), but without dominance deviations. Genomic information was used to construct the additive genomic relationship matrix.
construct the additive and dominance genomic relationship matrices.
To explore the impact of genomic information and dominance effects on genomic evaluation for crossbred performance, the full genomic dataset was split into training and validation populations and the predictive ability for crossbred animals in the validation population was investigated in different scenarios. The farrowing date of January 1, 2013 was used as the cut-off date to divide recorded purebred and crossbred sows into training and validation populations. As a result, 6769 sows (1270 L, 1405 Y and 4094 LY) were included in the training population, while the remaining 2716 sows (854 L, 813Y and 1049 LY) were included in the validation population. Predictive ability of crossbreds was measured as the correlations cor y c ,ŷ in the validation population for each scenario, where y c is the corrected phenotypic records of TNB for crossbred animals; ŷ is the predicted corrected observations of TNB for crossbred animals and is equal to the sum of the estimated population mean (μ ), inbreeding ( fb) and genotypic values (ĝ); the genotypic value ĝ was calculated as the sum of additive and dominance genetic effects in the scenario Gen_ADM. In the other two scenarios, the genotypic value ĝ only included the additive genetic effect. Hotelling-Williams t test at a confidence level of 5% was applied to evaluate the significance of the differences in validation correlations in each scenario. Furthermore, to detect the possible biases in the predictions, the regression coefficients of y c on ŷ were explored. Note that no bias implies that a regression coefficient equals 1. In addition, to measure the uncertainty associated with the predictions, 1000 bootstrap samples [42] was applied to estimate the means and standard errors.
For comparison, the predictive ability of crossbred animals was also investigated in a model without inbreeding depression effects, for all three scenarios. The predictive ability was measured as the correlation cor y c ,ŷ , where ŷ is the sum of the estimated population mean (μ) and genotypic value (ĝ). Table 1 shows the estimates of variance components for additive genetic effects for purebred (σ 2 AP ) and crossbred (σ 2 AC ) performance in different scenarios, and dominance variations (σ 2 D ) in the Gen_ADM scenario. For all scenarios, the additive genetic variances for purebred performance (σ 2 AP ) were larger than those for their crossbred performance (σ 2 AC ). Estimated variance components in the scenarios Gen_AM and Gen_ADM were very close, but different from those obtained in scenarios without using genomic information. In general, estimates had large standard errors in all scenarios, but no obvious differences in standard errors were detected between different scenarios. Residual variance for purebred animals (σ 2 e ) was larger than for crossbred animals (σ 2 e LY ) in each scenario. For the scenario Gen_ADM, the ratios of dominance genetic variance to additive genetic variance ranged from 5 to 11% for both purebred and crossbred populations.

Variance components, heritabilities and correlations
The broad sense heritabilities for purebred and crossbred animals, genetic correlations between breeding values for purebred and crossbred performances within pure breeds and correlations of allele substitution effects across the two breeds are in Table 2. In different scenarios, the heritabilities of purebred performance (H 2 P ) ranged from 0.07 (0.03) to 0.08 (0.03) and from 0.06 (0.03) to 0.10 (0.03) for breeds L and Y, respectively. Standard errors of H 2 P were almost consistent across scenarios. Estimated genetic correlations of breeding values between purebred and crossbred performances (r PC ) increased from 0.76 (0.20) (Nogen) to 0.95 (0.06) (Gen_AM) for breed L and from 0.43 (0.22) (Gen_ADM) to 0.54 (0.30) (Nogen) for breed Y. The r PC was higher for breed L than for breed Y in all scenarios, but the standard errors of r PC were always higher for breed Y than for breed L. With genomic information, the correlations of allele substitution effects between purebred (r αP L ,αP Y ) and crossbred (r αC L ,αC Y ) performance between breeds L and Y were estimated, as shown in Table 3. For purebred performance, r αP L ,αP Y was equal to 0.14 and 0.19 in Gen_AM and Gen_AMD, respectively. However, the standard errors were large, around 0.2 in both scenarios. For crossbred performance, r αC L ,αC Y was equal to 0.98 in Gen_ADM. This high correlation is a byproduct of assuming that additive biological effects in crossbred animals are the same regardless of the Yorkshire or Landrace origin of the allele. However, the same allele has potentially different effects in the respective Landrace or Yorkshire genetic backgrounds, and the difference is modeled through the correlations, hence the low values of r αP L ,αP Y . Without including the dominance effects in the model Gen_AM, r αC L ,αC Y was equal to 1 by definition.

Table 1 Variance components of additive and dominance genetic effects for purebred and crossbred animals
Numbers in brackets are the standard errors of the corresponding parameters σ 2 AP is the additive genetic variance for purebred performance; σ AP,AC is the additive genetic covariance between purebred and crossbred performance; σ 2 AC is the additive genetic variance for crossbred performance; σ 2 D is the dominance genetic variance for either purebred animals; σ 2 e is the residual variance for purebred animals; σ 2 ACLY is the additive genetic variance for the F1 crossbred animals LY; σ 2 DLY is the dominance genetic variance for the F1 crossbred animals LY; and σ 2 eLY is the residual variance for the F1 crossbred animals LY

Predictive abilities
Predictive abilities for crossbred pigs in the validation population are in Table 4. The correlation between the corrected phenotypic values and the predicted observations for TNB (cor y c ,ŷ ) ranged from 0.010 in the scenario Nogen to 0.056 in scenarios Gen_AM and Gen_ ADM. Standard errors of cor y c ,ŷ based on 1000 bootstrap samples were equal to 0.03 across all scenarios. No significant differences in predictive ability between scenarios were detected by the Hotelling-Williams t test at the confidence level of 5%. The regression coefficients of corrected phenotypic values on the predicted corrected observations for TNB are in the second row of Table 4. Regression coefficients were smaller than 1 for the three scenarios. Among these scenarios, regression coefficients for scenarios with genomic information (Gen_AM and Gen_ADM) were slightly closer to 1 than that for the pedigree-based scenario (Nogen). Except for the Nogen scenario, standard errors of regression coefficients were around 0.39. For the Nogen scenario, the standard error was around 5 times larger than that for other scenarios. Overall, there was no clear trend towards a scenario with less bias.
For comparison, predictive abilities cor y c ,ŷ for crossbred pigs in the validation population for the models without the inbreeding depression effect were equal to −0.08 in scenario Nogen, 0.045 in scenario Gen_AM and 0.046 in scenario Gen_ADM. In all cases, these are lower than the predictive abilities in Table 4, and these differences are statistically significant according to the Hotelling-Williams t test.

Inbreeding depression
Marker-based and pedigree-based inbreeding coefficient (f) for each population and their estimated corresponding inbreeding depression parameters (b) in the different scenarios are in Table 5. Marker-based inbreeding coefficients were almost identical for breeds L and Y, but they were larger than those for LY, which was expected because crossbred animals have a higher level of heterozygozity than purebred animals. However, according to the pedigree-based inbreeding coefficients, the Landrace population was slightly more inbred than the Yorkshire population. In terms of inbreeding depression parameters (b), they were all negative (thus, genomic inbreeding has detrimental effects for TNB even in crossbred animals) but not of the same magnitude across the three populations. Note that for the scenario Nogen, b was estimated based on the pedigree-based inbreeding coefficients. As a whole, breed L had the most negative b, while breed Y had the least negative b, regardless of the scenario. Thus, TNB was more negatively affected by inbreeding in breed L than in breed Y and population LY.

Table 3 Correlations of allele substitution effects for purebred and crossbred performance between Landrace and Yorkshire breeds
Numbers between brackets are the standard errors of the corresponding parameters r αPL,αPY is the correlation of allele substitution effects for purebred performance between the Landrace and Yorkshire breeds; r αCL,αCY is the correlation of allele substitution effects for crossbred performance between the Landrace and Yorkshire breeds. For Gen_AM, r αCL,αCY is equal to 1 by definition

Table 4 Predictive ability for crossbred animals in the validation population
Numbers between brackets are the standard errors of the corresponding parameters a Predictive ability (cor y c ,ŷ ) is given by the correlation coefficient between the corrected phenotypes (y c ) and their predictions (ŷ) for total number of piglets born (TNB) in crossbred animals b Regression coefficient of the corrected phenotypes (y c ) on the predicted observations (ŷ) in crossbred animals

Discussion
This study extended the trivariate GBLUP model of Vitezica et al. [22] in order to obtain (co)variances of effects of SNPs, genetic correlations of breeding values between purebred and crossbred performances and correlations of allele substitution effects under dominance. We also evaluated this model using different scenarios for the genetic evaluation of crossbred performance in Danish purebred and crossbred pigs. Scenarios that included or not genomic information were studied to estimate the genetic correlations of breeding values between purebred and crossbred performances. To our knowledge, this is the first study to report correlations of allele substitution effects between two breeds in the presence of dominance effects. The results show that the Vitezica model [22] is a tool that can be used for the genomic evaluation of crossbred performance in genotyped animals. In this study, for TNB, models with dominance deviations did not improve the genomic evaluation of crossbred performance with regard to both predictive ability and unbiasedness, but the inclusion of an inbreeding depression effect in the models significantly improved predictive ability. Phenotypic variances were larger for purebred animals (11.76 for breed L and 9.99 for breed Y) than for crossbred animals (7.30 for LY). This could be the reason why the estimated additive genetic variances for purebred performance (σ 2 AP ) were larger than those for crossbred performance (σ 2 AC ). However, compared to results in a previous study that used a much larger Danish purebred and crossbred dataset [17], both estimated additive genetic variances and phenotypic variances in the current study were smaller, which is due to three reasons. (1) The dataset in the current study was a genotyped subset of the population used in the previous study. Purebred genotyped individuals were pre-selected and their performances were more homogeneous than that of the whole population. The preselection process resulted in a loss of about 15% of the purebred phenotypic variation. However, the genotyped crossbred animals were an almost random sample of the whole population and there was only a small loss of about 5% of phenotypic variation for crossbred animals. (2) The phenotypic values for TNB in the current study were pre-corrected for fixed and non-genetic random effects. This pre-correction led to a loss of about 11 and 17% of phenotypic variation for purebreds and crossbreds, respectively. (3) During the pre-correction, some genetic variation may have been allocated to other random effects (e.g. service boar effects), in particular because TNB is a lowly heritable trait.
The estimated heritabilities of TNB for purebred performance (H 2 P ) were slightly lower than those previously reported (0.11 and 0.09 for breeds L and Y, respectively) [17,22,43]. Large standard errors of H 2 P implied that the current dataset was not large enough. The consistent standard errors across scenarios indicated that even when genomic information was included, the uncertainty of H 2 P did not decrease. Taking the standard errors into account, the estimated H 2 P across scenarios were not very different. Compared to the results of [17], the lower H 2 P found in the current study was due to the sharp decrease in additive genetic variances (σ 2 AP ). The ratios of estimated dominance genetic variances to additive genetic variances in the current study (5 to 11%) were generally a little smaller than in other studies on TNB. Vitezica et al. [22] reported that this ratio was equal to about 20% for litter size in both purebred and crossbred lines by using the same trivariate GBLUP model. Esfandyari et al. [19] stated that, by using purebred genomic information in a univariate Bayesian mixture model at the SNP level, the ratio between dominance variance and additive variance for TNB was equal to 15 and 18% for breeds L and Y, respectively. Based on pedigree information, Misztal et al. [10] reported a ratio that reached about 25% for number of piglets born alive in a Yorkshire population. However, there are some studies that did report smaller ratios than those reported here. For instance, Hidalgo [20] reported that, based on genotyped crossbred animals, the dominance variance for TNB accounted for nearly zero of the total genetic variance and concluded that TNB was not affected by dominance effects in the Dutch Landrace and Yorkshire populations. For other traits or species, different ratios of dominance genetic variance to additive genetic variance were also reported. For average daily gain in Duroc pigs, Su et al. [9] estimated a ratio of 15%, but their results were based on genotypic variance components and cannot be directly compared to genetic variance components [32]. For average daily weight gain in Yorkshire and Landrace pigs, Lopes et al. [44] reported ratios of 13.8 and 28%, respectively by including genomic information. For Fleckvieh cattle, Ertl et al. [12] calculated ratios that ranged from 3.4% for stature to 69% for protein yield by using a univariate SNP-BLUP model. Overall, these different ratios of dominance genetic variance to additive genetic variance may reflect differences in the traits analyzed and in the type of information used for the estimation [9], and also uncertainty in the estimates.
The genetic correlation of breeding values between purebred and crossbred performances (r PC ) is a key parameter in crossbreeding schemes [2]. In the current study, the estimated r PC was in line with results reviewed by Wei et al. [3]. Lutaaya et al. [5] also reported r PC that ranged from 0.32 to 1. Such differences in r PC may reflect differences in the extent of GxE interactions and the distance across breeds. In our study, estimated r PC did not vary dramatically across the scenarios, when the standard errors were taken into account. These standard errors were very large, which indicated that the amount of available information was too small to ensure accurate r PC estimates. Across scenarios, standard errors of r PC decreased when genomic information was included, which indicates that including genomic information may reduce the uncertainty of the estimations. r PC was larger for breed L than for breed Y, which was in agreement with a previous study [17] and may be due to the data structure. Among the 5143 crossbred animals, the number of Yorkshire sires (N = 1125) was much smaller than that of Landrace sires (N = 4018). Such a different amount of information affects the accuracy of the estimates, and thus the standard error of r PC was larger for breed Y than for breed L (see Table 2). However, compared to the results reported in [17], the r PC for breed L increased by about 10% while that for breed Y did not change much. Both pre-correction of data and the genotyped subset of original data used may play a role in the differences observed between the current and previous results [17]. In the previous study, a single-step method, which can use pedigree information and genomic information simultaneously, was used. In this study, the use of only phenotypic records on genotyped individuals affected the accuracy of estimates. Our results confirmed the moderate value of the r PC for TNB in breeds L and Y.
To our knowledge, this is the first time that correlations of allele substitution effects for both purebred (r αP L ,αP Y ) and crossbred (r αC L ,αC Y ) performance between two breeds in the presence of dominance variation are estimated. In genomic selection, SNPs are assumed to be in LD with QTL along the whole genome [45]. The correlation of allele substitution effects between breeds measures the degree of average similarities between SNP effects assuming that the QTL effects are the same in breeds 1 and 2 [38][39][40]. In practice, the correlation of allele substitution effects between two breeds can be interpreted as indicating "how consistent the SNP substitution effects are across two breeds". For purebred performance, the estimated SNP substitution effects were based on the within-breed allele frequencies. A high r αP L ,αP Y correlation means that the estimated SNP substitution effects based on allele frequencies from breed L can be used for breed Y and vice versa. However, r αP L ,αP Y was not significantly different from 0 in the current study, which demonstrates that SNP effects estimated from a reference population that consists of one pure breed (e.g. Landrace) cannot be readily applied to the other breed (e.g. Yorkshire). This was in agreement with the findings of [46] who reported that prediction based on an acrosspopulation reference panel was worse than within-population prediction. In other species, estimated correlations of allele substitution effects between breeds based on models without dominance, oscillate between 0 and 0.8, and are trait-dependent [38,47]. For crossbred performance, an r αC L ,αC Y close to 1 was found in the current study, which indicated that the allele substitution effects based on the allele frequencies from the opposite breeds were very similar for the L and Y breeds. In practice, this suggests that SNP substitution effects that are estimated based on a reference population consisting of crossbred animals can be used to estimate crossbred breeding values for both breeds L and Y.
It was expected that genomic evaluations obtained by including dominance deviations in the model would be improved, especially when records of crossbred animals were included [9]. However, our results showed that inclusion of dominance deviations did not increase the predictive ability for crossbreds. This result was in line with conclusions in [9,12,20], but was opposite to those in [18,19,48,49]. Theoretically, estimating dominance genetic effects should be useful because ignoring them will result in less accurate estimates of allele substitution effects and consequently less accurate estimated breeding values in genomic prediction [11]. However, regarding the additive genetic variance, estimates were nearly the same in scenarios Gen_AM and Gen_ADM, which demonstrated that the additive variances were already well captured by the additive model. Thus, the accuracy of the estimated additive genetic effects was not affected when dominance effects were included in the model [12]. Moreover, a simulation study at the level of the gene action showed that when all gene actions were purely additive, including dominance in addition to the additive effects in the model was not advantageous compared to using an additive model. Hidalgo [20] showed that TNB was not affected by dominance in the Dutch crossbred population. In the current study, we also observed similar results, and dominance variation accounted for a small proportion of the total genetic variation (4 to 10%). The lack of change in predictive ability also indicated the difficulty of distinguishing dominance genetic effects from additive genetic effects [9], but it confirmed a previous simulation study that concluded that the use of a dominance model did not negatively affect genomic evaluation even if the trait was purely additive [18].
Scenarios in which genomic information was included (Gen_AM and Gen_ADM) showed higher predictive abilities than the pedigree-based scenario (Nogen). For the Nogen scenario, the relationship matrix was constructed based on a base population that was considered as a mixture of L and Y animals, which was not the case. Therefore, the results of the Gen_AM and Gen_ADM scenarios were more reliable than those of the Nogen scenario. Although predictive abilities were not significantly different according to the Hotelling-Williams t-test, the results from 1000 bootstrap samples still showed that the predictive abilities of about 90% of the crossbred animals would be higher when genomic information was available (894 of 1000 bootstrap samples showed higher predictive abilities in scenarios that included genomic information than those in the Nogen scenario; results not shown). Comparison of the predictive abilities that were estimated in the current study with those from a previous study [17] indicated that the single-step model [16] might be more robust than the Vitezica model [22] used in this paper in terms of both predictive ability and unbiasedness for the crossbred performance. Our results suggested that using a small set of genotyped animals and pre-corrected data to implement genetic evaluation for crossbred performance was less powerful than using the whole dataset, which is similar to the conclusions for purebred performance [43].
The regression coefficients obtained with the Vitezica model were less than 1, which suggests that variations in total genetic effects could be overestimated (inflated). In terms of unbiasedness, there was no clear trend among the scenarios examined, regardless of whether genomic information was included or not. Overall, unbiasedness was not a problem in the current study because the regression coefficients in all scenarios did not significantly differ from 1.
Inbreeding depression for litter size in pigs is a wellknown phenomenon [50,51], and we found that inclusion of inbreeding effects in the model improved predictive abilities of crossbred animals. Estimates of inbreeding depression effects are rarely reported, but our estimates agree with those previously reported for commercial and Iberian pigs [52]. Inbreeding depression was, for the same amount of marker-based inbreeding, more detrimental in the Landrace than in the Yorkshire breed. There are many possible explanations among which the purging of lethal recessive alleles [53]. We also report an estimate of the inbreeding depression parameter for the crossbred animals, which is between the estimates for the parental breeds. To our knowledge, this estimate has never been reported.
The correlation between breeding values and dominance deviations is of theoretical concern [30]. However, this does not apply to the current marker-based analyses for the following reasons. (1) In a pedigree-based analysis, mating in an inbred population produces deviations from the Hardy-Weinberg equilibrium, which generate correlations between breeding values and dominance deviations [30]. However, in our study, SNPs are in Hardy-Weinberg equilibrium if allele frequencies are considered in the current generation. (2) Such a correlation occurs because the pedigree information forces the genetic model to refer to the base population, since the state of alleles is not known, i.e. only probabilities of IBD are known. In our study, the states of alleles are known and the model can be described as referring to the current generation instead.
(3) The equivalent GBLUP models in Eq. (3) used genotypic additive and dominance values, not breeding values and dominance deviations. A reasonable assumption in the model is that additive and dominance effects are unrelated at each SNP. Thus, covariance between additive and dominance genetic effects was ignored in the current study.

Conclusions
We present for the first time the use of genomic inbreeding in crossbred and purebred genomic evaluation. Estimates are biologically sound and are relevant even for crossbred animals. We also report for the first time, estimated correlations of allele substitution effects in the presence of dominance. For TNB, the dominance genetic variance accounts for only a small proportion of the total genetic variation (4 to 10%). A moderate, positive genetic correlation between breeding values for TNB for purebred and crossbred performances was confirmed. Inclusion of dominance in the GBLUP model did not improve predictive ability for crossbred animals, whereas inclusion of inbreeding depression effects did. An additive GBLUP model is sufficient to capture the additive genetic variances and for genomic evaluation. The GBLUP model [22] was applied successfully for genetic evaluations for crossbred performance in pigs. This model can potentially be a useful tool in genetic evaluation for crossbred performance.