Genetic evaluation for three-way crossbreeding

Commercial pig producers generally use a terminal crossbreeding system with three breeds. Many pig breeding organisations have started to use genomic selection for which genetic evaluation is often done by applying single-step methods for which the pedigree-based additive genetic relationship matrix is replaced by a combined relationship matrix based on both marker genotypes and pedigree. Genomic selection is implemented for purebreds, but it also offers opportunities for incorporating information from crossbreds and selecting for crossbred performance. However, models for genetic evaluation for the three-way crossbreeding system have not been developed. Four-variate models for three-way terminal crossbreeding are presented in which the first three variables contain the records for the three pure breeds and the fourth variable contains the records for the three-way crossbreds. For purebred animals, the models provide breeding values for both purebred and crossbred performances. Heterogeneity of genetic architecture between breeds and genotype by environment interactions are modelled through genetic correlations between these breeding values. Specification of the additive genetic relationships is essential for these models and can be defined either within populations or across populations. Based on these two types of additive genetic relationships, both pedigree-based, marker-based and combined relationships based on both pedigree and marker information are presented. All these models for three-way crossbreeding can be formulated using Kronecker matrix products and therefore fitted using Henderson’s mixed model equations and standard animal breeding software. Models for genetic evaluation in the three-way crossbreeding system are presented. They provide estimated breeding values for both purebred and crossbred performances, and can use pedigree-based or marker-based relationships, or combined relationships based on both pedigree and marker information. This provides a framework that allows information from three-way crossbred animals to be incorporated into a genetic evaluation system.


Background
Commercial pig producers generally use a terminal crossbreeding system with three breeds. In this system, F1 sows from two maternal breeds are mated to purebred boars from a breed that has high-level production traits (growth, leanness, feed efficiency) to produce pigs for slaughter. Commonly, boar lines in Europe are Duroc and Pietrain and sows are crosses between Large White and Landrace. Genetic evaluation is usually done within each of these breeds based on recorded phenotypes on purebred animals. However, ideally genetic evaluation of purebreds should incorporate phenotypes of interest recorded on crossbreds, and breeding values for performance in the three-way cross should be estimated.
Many pig breeding organisations have started to use genomic selection [1], for which genetic evaluation is often done by applying single-step methods [2][3][4] to handle the fact that only a fraction of the animals are genotyped. Here, the pedigree-based additive genetic relationship matrix is replaced by a combined relationship matrix based on both marker genotypes and pedigree. Genomic selection is implemented for purebreds, but it also offers opportunities for incorporating information from crossbreds and selecting for crossbred performance [5][6][7].
For two-way terminal crossbreeding (two breeds named A and B, and all crossbred animals AB have known purebred parents), Wei and van der Werf [8] proposed the following trivariate model: where the vectors y A , y B and y AB contain phenotypes on animals from breeds A and B and from the cross AB, respectively, and for the three populations A, B and AB, the vectors β A , β B and β AB contain fixed effects (note that intercepts should always be included!), and e A ∼ N (0, σ 2 e,A I), e B ∼ N (0, σ 2 e,B I) and e AB ∼ N (0, σ 2 e,AB I) are the residual error vectors. The vectors a A and a B contain breeding values for purebred performance (mating within breed) for breeds A and B, respectively, and the vector of genetic values on the crossbreds, g AB , is related to the vectors of breeding values on purebred animals for crossbred performance (mating with the other breed), g A and g B , by additive pedigree-based relationships (throughout this paper, additive genetic effects for purebred performance and for crossbred performance are denoted by a and g, respectively). Each animal has then two breeding values (one related to mating within breed, e.g. a A , and another related to mating to another breed to produce the cross, e.g. g A ) and these are correlated. Genetic correlations less than 1 are due to the presence of non-additive gene action in combination with different allele frequencies in the two breeds [9,10], but also to genotype by environment interactions. The model also assumes different genetic variances in the two pure breeds, which is often the case in practice. Christensen et al. [11] reformulated the model using partial relationship matrices (see below) and constructed those from a combination of marker genotypes and pedigree in such a way that it could be fitted by using standard animal breeding software, i.e. a single-step method was developed.
The aim of this work was to develop models for threeway terminal crosses that handle both pedigree-based and marker-based relationships, as well as combined relationship matrices based on both pedigree and marker genotypes. As indicated above, an essential part of the model is the specification of relationships such that the model can be fitted by using standard animal breeding software.

Methods
We present a specific scenario with records on all three pure breeds and on three-way production pigs, but not on two-way crossbred sows, having in mind production traits such as daily gain, leanness or feed efficiency. (1) However, since we will specify relationships across all five populations, it is straightforward to generalise to other scenarios with records. The model for this three-way terminal crossbreeding system is in principle a straightforward generalisation of the Wei and van der Werf model [8] to the following fourvariate model: where notation is defined as for Eq. (1), and it is assumed that all C(AB) animals have a purebred C father and crossbred AB mother, and that these AB animals all have purebred parents. Breed C animals have two breeding values that are correlated, a C for purebred performance (mating within breed) and g C for C(AB) crossbred performance (mating between a male and a AB crossbred female). Breed A animals also have two breeding values, a A for purebred performance (mating within breed) and g A for crossbred C(AB) performance (mating with a breed B animal whose female AB crossbred offspring is mated with a breed C male). Finally, breed B animals have two breeding values a B and g B , defined similarly to the breeding values for breed A. For each breed, association between breeding values for purebred and crossbred performances is determined by a 2 × 2 genetic variance-covariance matrix. An essential part of the model is the specification of the additive relationships between genetic values for crossbred performance on crossbred animals and purebred animals, and in particular marker-based versions of these relationships such that pedigree-based and marker-based relationships are consistent. These relationships should also be specified in such a way that the model can be formulated using Kronecker products, allowing the model to be fitted by using Henderson's mixed model equations and standard animal breeding software. Additive relationships are relationships between gene substitution effects and these can be defined either within populations or across populations [12]. These two approaches will be called "partial genetic" and "common genetic" approaches in the following.
Lo et al. [13] derived the following recursive formulas for the variance and covariance of genotypic values for animals composed of multiple breeds under an additive model. Let the genotypic value of individual i be g i , then the additive variance is: g,b is the breed b genetic variance, g f (i) and g m(i) are the additive genetic values of parents f(i) and m(i), respectively, and σ 2 g,b,b ′ is the breed b and breed b ′ segregation genetic variance. The additive covariance between genotypic values of individuals i and i ′ is: García-Cortés and Toro [14] showed that Eqs. (3) and (4) could be expressed as (using matrix notation): where the A b and A b,b ′ matrices are separately defined using recursions, and that this provides a partition of the vector of genotypic values into: They termed matrix A b as the breed b specific partial relationship matrix and matrix A b,b ′ the breed b and breed b ′ segregation partial relationship matrix. The vectors g b and g b,b ′ depend on genetic origin, such that g b is the breed b specific partial genetic vector, and g b,b ′ is the breed b and breed b ′ segregation partial genetic vector. Matrices A b and A b,b ′ have sparse inverses that can be computed using the usual methods for the additive relationship matrix (see [14]). In this paper, the approach using a partition of the genetic effects into independent terms is named partial genetic approach.
Legarra et al. [15] proposed that pedigree relationships should be specified across all animals, and that for base animals in the pedigree, the pedigree-based relationships within and across breeds and inbreeding should be estimated from observed marker genotypes. This approach is contradictory to the García-Cortés and Toro [14] approach described above, since it violates the assumption of independence of the g b and g b,b ′ vectors. The approach in which relationships are specified across breeds is named common genetic approach.
First, partial genetic and common genetic approaches for constructing pedigree-based relationships are presented, then the corresponding two different ways of constructing marker-based relationships are presented, and finally the genetic variances and covariances in model (2) are shown for the two approaches. Detailed derivations are in the "Appendix".

Additive genetic model for crossbred C(AB) performance: partial genetic approach
For the three-way crossbreeding system, the decomposition of the additive genetic effects by García-Cortés and Toro [14] is as follows. For a C(AB) crossbred animal, where terms g C C(AB) , g A C(AB) , g B C(AB) are breed of origin specific partial genetic effects and g AB C(AB) is a breed-segregation term. For a AB crossbred sow, with terms g A AB and g B AB being breed of origin partial genetic effects. Finally, for purebred animals, the three vectors of breeding values for crossbred C(AB) performance, g A , g B and g C , are defined as being equal to the genotypic values.
In this way, a breed-specific partial genetic effect is defined for all animals containing the specific breed, and a breed-segregation partial genetic effect is defined for crossbred C(AB) animals. Assuming that base individuals in the three breeds are not related across breeds implies that: are independent. In addition, for a crossbred C(AB) individual the fact that it inherits either a breed A or B allele is independent of what particular alleles the AB mother has and what alleles all other AB individuals have, and hence g AB C(AB) is independent of the vectors above. The variance-covariance matrices of the partial genetic effects become (García-Cortés and Toro [14]): where the breed-specific partial relationship matrices are defined by the recursive formulas: for breed b = A, B, C, with f b i denoting the breed b proportion, and the breed-segregation partial relationship matrix is defined by the recursive formulas: where in both cases non-contributing animals are not included in the resulting matrices. We immediately see that C(AB) animals are the only animals contributing to matrix A AB , and since f A m(i) = f B m(i) = 1/2 for these animals, the matrix is a diagonal matrix with diagonal elements equal to 2 × 1/2 × 1/2 = 1/2. This specification of additive relationships using partial relationship matrices is equivalent to the specification in Eqs. (3) and (4).
To illustrate the different partial relationship matrices, we analysed the small pedigree in Table 1. Tables 2, 3, 4 and 5 show the partial relationship matrices for this example.
Wei and van der Werf [8] presented a reduced form of the two-way crossbreeding model (1) in which the Mendelian sampling term of the genetic effect on crossbred animals was included in the residual error term. A reduced model can also be formulated for the three-way crossbreeding model by expressing: ,i is the Mendelian sampling term. The Mendelian sampling terms are independent among the C(AB) crossbred animals, and by making the approximation that father f(i) is not inbred and since mother m(i) is not inbred, the variance is constant. In this way, the Mendelian sampling error term can be included into the residual error term e C(AB) in model (2), and the model can be formulated using three breedspecific partial relationship matrices defined on the A, B, C and AB animals. However, as explained in Christensen et al. [11], such a reduced model cannot be extended to incorporate marker genotypes since these provide information about the Mendelian sampling term. Therefore, we did not pursue the reduced form of the model any further. Note that model (2) with relationships as presented here is the most obvious generalisation of the Wei and van der Werf model in Eq. (1) from two to three breeds since base individuals are assumed unrelated. Without a formulation using partial relationship matrices, it would be difficult to estimate parameters in this model using standard animal breeding software.

Additive genetic model for crossbred C(AB) performance: common genetic approach
In the previous subsection, base animals were assumed to be unrelated. An alternative proposed by Legarra et al.      Table 5 Breed AB segregation partial relationship matrix A AB for the pedigree in Table 1 id 9 10 9 1 2 0 10 1 2 [15] is to assume that base animals are related and inbred within breeds and related between breeds with relationships determined by: This means that among the base animals, the variancecovariance of genetic effects is as follows. The variancecovariance within breed is defined by: for an individual in breed b, and for two individuals in breed b, i.e. base animals are inbred with coefficient γ b /2 and related with relationship coefficient γ b . Furthermore, for two individuals in different breeds b and b ′ , i.e. base animals in different breeds are related. Therefore, a joint relationship matrix is specified among all base animals, and by applying the usual recursive definition: an additive relationship matrix A(Ŵ) is defined across all animals with relationships among the three base populations A, B and C defined by matrix Ŵ. The variance-covariance of genetic effects is therefore determined by Table 6 shows the common relationship matrix for the pedigree in Table 1. Legarra et al. [15] suggested a framework where individuals in the base population of the pedigree are related because they originate from overlapping ancestral populations with a finite size, and they termed each of these ancestral populations as a meta-founder to be included in the pedigree. Here, A, B, C are metafounders, and each base individual in the pedigree has a meta-founder, which is both its parents; see example in Table 7. When extending the pedigree and the matrix A(Ŵ) with these meta-founders, Legarra et al. [15] showed that the algorithms for computing the sparse inverse matrix A(Ŵ) −1 directly as in Henderson [16] and submatrices of A(Ŵ) by the Colleau algorithm [17] are as usual.
The parameter σ 2 g in Eq. (5) does not correspond to the usual genetic variance which is the variance among unrelated individuals in the base population. As explained in Legarra et al. [15], σ 2 g (1 − γ b /2) corresponds to the variance among unrelated breed b animals, and therefore the genetic variances for crossbred g,B and σ 2 g,C in the previous section, respectively. In addition, Legarra et al. [15] explained that the breed-segregation variance is in the previous section.

Genomic model for crossbred C(AB) performance: partial genetic approach
Marker-based partial relationship matrices are constructed by tracing breed of origin of alleles and defining relationships according to breed of origin. Assume that breed of origin of alleles can be determined for all animals and define breed-specific allele content matrices as: matrix m b with entries 0, 1, 2 for purebred b animals, matrices z A and z B with entries 0, 1 for paternal Table 6 Common relationship matrix A(Ŵ) for the pedigree in Table 1 id 1 2 3 4 5 6 7 8 and maternal alleles, respectively, for crossbred AB animals, matrix z C with entries 0, 1 for paternal allele of crossbred C(AB) animals, and finally matrices z A p and z B p with entries 0, 1, respectively, for crossbred C(AB) animals when the breed-specific allele is inherited and zero otherwise. This means that breed of origin of each allele needs to be traced, usually by a phasing software [18].
Marker-based breed-specific partial relationship matrices are constructed as follows (details can be found in the "Appendix"). For breed A, the marker-based breed A specific partial relationship matrix G A is divided into submatrices with indices denoting genotyped breed A and crossbred AB animals, which are defined as where the vector p A contains breed A specific allele frequencies, matrix p A p has elements (i, j) equal to p A j when the crossbred C(AB) individual i inherited an A specific allele and zero otherwise, and s A is a scaling parameter. The marker-based breed B specific partial relationship matrix G B is defined similarly to G A , and the markerbased breed C specific partial relationship matrix is where submatrices are defined as where the vector p C contains estimated breed C specific allele frequencies and s C is a scaling parameter. The breed-segregation partial relationship matrix is defined as: , and n is the number of markers. Note that diagonal elements of G AB equal diagonal elements of A AB (i.e. 1/2). Off-diagonal elements of G AB measure whether pairs of individuals share more alleles from a particular parental breed (A or B) than expected. Expectations of off-diagonal elements G AB equal offdiagonal elements of A AB (i.e. 0).
Relationship matrices that combine pedigree and marker information [2,4] can then be constructed. can be computed by the Colleau algorithm [17]; see Christensen et al. [11]. The breed-specific partial marker-based relationship matrices above require estimates of breed-specific allele  frequencies. Such estimates can be obtained from marker genotypes of purebred animals and breed-specific marker alleles for crossbred animals. Furthermore, there is a need to adjust these matrices to be compatible with partial pedigree relationship matrices similar to Christensen et al. [11,19] The scaling parameters s b in marker-based relationship matrices G b , b=A, B, C are unspecified above, since the compatibility adjustment involves a scaling parameter β b for each breed, and therefore s b can be arbitrary. On the other hand, matrix G AB does not need an adjustment.
Finally, to incorporate the fact that marker genotypes only capture a fraction of the genetic effects, the partial marker-based relationship matrices G b , b ∈ A, B, C and G AB above may be replaced by where ω is the fraction of genetic variance not captured by marker genotypes [4].

Genomic model for crossbred C(AB) performance: common genetic approach
The marker-based relationship matrix is constructed as usual across all genotyped animals: where m is the gene content matrix with entries 0, 1, 2 and s is scaling parameter. As in Christensen [20] and Legarra et al. [15], we chose common allele frequencies, i.e. p j = 0.5, and then determine the parameters in matrix Ŵ and parameter s such that the pedigree-based and markerbased relationship matrices are compatible. Parameters in matrix Ŵ and scaling parameter s can be estimated by matching A(Ŵ) and G for purebred individuals; see Legarra et al. [15]. For example, if genotyping is done in each of the three pure breeds then the following system of equations can be used to determine the parameters: This is a linear system of 7 equations with 7 parameters γ A , γ B , γ C , γ A,B , γ A,C , γ B,C and 1/s and can therefore be solved directly to obtain estimates. The relationship matrix that combines pedigree and marker information becomes Finally, similar to the previous section, the markerbased relationship matrices G above may be replaced by G ω = G(1 − ω) + A(Ŵ)ω where ω is the fraction of genetic variance that is not captured by marker genotypes.

Genetic models for both purebred and crossbred C(AB) performances
In the previous sections, partial genetic and common genetic models for additive genetic effects for crossbred C(AB) performance were presented, and in both cases genomic versions of the models and combined relationship matrices were shown. Now, we show how the genetic variances and covariances for the model in Eq. (2) look like in the two cases.
For the partial genetic case, the vector of genetic effects on crossbred C(AB) individuals equals g C C(AB) + g A C(AB) + g B C(AB) + g AB C(AB) and based on g A C(AB) , g B C(AB) and g C C(AB) , breed-specific partial relationships define the breeding values for crossbred C(AB) performance on purebred animals, g A , g B and g C , respectively. Combining these effects with the breeding values for purebred performances, a A , a B and a C , the variancecovariance of genetic effects is determined by with the four vectors being independent. Here, denotes the Kronecker product, ⋆ denotes artificial random vectors such that the genetic variance-covariance matrices can be expressed using Kronecker products and matrices for b = A, B, C, are the 2 × 2 variance-covariance matrices containing the genetic variances for purebred breeding values and crossbred breeding values, and the covariance between them. Thus, using partial relationship matrices provides a formulation of the model in Eq. (2) using Kronecker products, such that parameters can be estimated and breeding values predicted using standard animal breeding software. In this model, there are 10 genetic parameters and 2(n A + n B + n C + n AB ) + 3n C(AB) genetic values where n X is the number of individuals in population X.
For the common genetic case, all individuals are related, and breeding values for crossbred C(AB) performance on purebred animals, g A , g B and g C , are defined by additive relationships to the genetic effects on crossbreds, g C(AB) . Combining these effects with the breeding values for purebred performances, a A , a B and a C , the variance-covariance of genetic effects equals: where ⋆ denotes artificial random vectors and is the 4 × 4 genetic variance-covariance matrix: A σ a,A,B σ a,A,C σ ag,A  σ a,A,B σ 2 a,B σ a,B,C σ ag,B σ a,A,C σ a,B,C σ 2 a,C σ ag,C σ ag,A σ ag,B σ ag,C σ 2 The formulation of the model in Eq. (2) using Kronecker products implies that parameters can be estimated and breeding values predicted using standard animal breeding software. This model contains 10 genetic parameters and 2(n A + n B + n C ) + n AB ) + n C(AB) genetic values, and in addition, 6 parameters in matrix Ŵ.
In the common genetic case, there are three parameters σ a,A,B , σ a,A,C and σ a,B,C which are genetic covariances between purebred performances, and these parameters are not present in the partial genetic case. The reason is that they would not be identifiable since there is no specification of the relationships across breeds in the partial genetic case. In the common genetic case, the identifiability of σ a,A,B , σ a,A,C and σ a,B,C relies on the genomic relationships between pairs of animals in different breeds. In the partial genetic case, there are four genetic parameters for crossbred performance, σ 2 g,A , σ 2 g,B , σ 2 g,C and σ 2 g,AB that scale each of the four partial relationship matrices, whereas in the common genetic case there is only one such parameter σ 2 g . As explained in a previous section, there is a correspondence between these parameters via the parameters in matrix Ŵ as follows: . However, note that there is a difference between estimating σ 2 g,A , σ 2 g,B , σ 2 g,C and σ 2 g,AB from phenotypes as in the partial genetic case, and determining these from a general σ 2 g and parameters in Ŵ, which are estimated based on marker genotypes as in the common genetic case.

Discussion
For three-way crossbreeding, we presented models based on both pedigree-based, marker-based and combined relationships. Using combined relationship matrices results in a model for genetic evaluation where both pedigree and marker genotypes are used simultaneously for genetic evaluation, i.e. a single-step method for genomic evaluation. This paper provides the models and mathematical formulas, but a numerical implementation is needed before the methods are ready for use in practice. Such methods make it possible to incorporate phenotypes and genotypes on crossbreds into an existing genetic evaluation system, assuming that such a system is based on a single-step method.
The models for three-way crossbreeding investigated in this paper were four-variate models where each variable was measured in a specific population, A, B, C or C(AB) . The main scenario that we have in mind is a scenario where the four variables represent the same biological trait measured in four different genetic backgrounds and possibly different environments, but in principle the four variables could also be different biological traits. An extension of the model to a situation where multiple biological traits are measured in each of the four populations is in principle straightforward since the additive relationship matrices are the same, although in practice it may require the estimation of a very large number of genetic parameters. Extending the approaches to other types of models that are implemented in standard animal breeding software, like threshold models, models with indirect genetic effects, models for test-day records, etc. is also in principle straightforward. Finally, modifying the models to other scenarios with data recording, for example with records on AB individuals or no records on one of the pure breeds, is also straightforward. In general, designing data recording for these complicated models is an issue, and for example to obtain precise estimates of the genetic correlation parameters, it would be important that the relationships between crossbred animals with records and purebred animals with records are close.
Two types of approaches for constructing additive relationships were presented, based on different assumptions about allele substitution effects of causal loci or SNPs. In the partial genetic approach, allele substitution effects of SNPs were assumed independent between breeds, whereas in the common genetic approach, they were assumed to be the same in different breeds. The partial genetic approach requires that alleles are traced according to breed of origin, which is feasible in some scenarios but may be difficult with sufficient accuracy in others. In particular, when crossbred C(AB) animals are genotyped, a reasonable requirement is that breed C fathers are also genotyped which would make the tracing of the breed C paternal allele feasible, but the tracing of the breed of origin (A or B) of the maternal allele may be more uncertain and depend on whether AB mothers are genotyped (may not be due to logistical issues), maternal grandfathers are genotyped and maternal grandmothers are genotyped (may be difficult to obtain if these are from multiplier herds). An advantage of the common genetic approach is that the marker-based relationship matrix is easier to construct because tracing the breed of origin of alleles is not required, but a disadvantage may be the computational burden of using a larger relationship matrix. In addition, parameters in matrix Ŵ need to be estimated and the sensitivity of genetic evaluation to these estimates is unknown. Future research using simulated and real data is needed to clarify the differences between the two approaches.
Other terminal crossbreeding systems are of interest in pig production. Models for two-way crossbreeding are relevant for sow-traits measured on animals from breed A and B and cross AB, and such models were presented in Christensen et al. [11] using partial genetic relationship matrices. An alternative to this partial genetic approach would be to use the common genetic approach presented here. The four-way crossbreeding system where crossbred CD sires are mated to AB dams to produce (CD)(AB) pigs for slaughter, is also used in pig production. The approaches in this paper can be extended to such a system, and the resulting model would be a fivevariate model. Using the partial genetic approach, there would be four breed-specific partial relationship matrices and two breed-segregation partial relationship matrices, and the corresponding model for purebred and crossbred performances would contain 14 genetic parameters, whereas using the common genetic approach, the model for purebred and crossbred performances would contain 15 genetic parameters.
Many papers have reported genetic correlations between purebred and crossbred performances [21][22][23][24][25][26]. The reported estimated correlations ranged from 0.38 to 0.946, depending on trait and on differences in the environment, and in general with relatively high standard error on the estimates. The higher the genetic correlation, the less gain there will be by including crossbred data into the genetic evaluation system. All these results are from two-way crosses, and the authors are not aware of publications based on data from three-way crossbreeding where data in purebred and crossbred populations are considered to be different traits. The models presented in this paper should be useful to investigate such data from three-way crossbreeding.

Conclusion
Models for genetic evaluation in the three-way crossbreeding system are presented. These models provide estimated breeding values for both purebred and crossbred performances, and can use pedigree-based or marker-based relationships, or combined relationships based on both pedigree and marker information. This provides a framework that allows information from three-way crossbred animals to be incorporated into a genetic evaluation system. storage resources. Comments and corrections from reviewers and editors are also acknowledged.
Here, explicit and detailed derivations of the additive relationships across purebred and crossbred animals related to the C(AB) crossbreeding system are presented.
In the derivation, both partial genetic and common genetic approaches for the variance-covariance of genetic effects for crossbred C(AB) performance are inspired by the derivation in Lo et al. [13] of formulas (3) and (4). Lo et al. [13] based their derivation on the genotypic value expressed as a sum over loci of effects of paternal and maternal alleles: where α s j i and α d j i are the additive effects of the paternal and maternal alleles, respectively, and these effects depend on the breed of origin b of the alleles j, such that where the expectation is taken across all individuals in breed b and symbol ∈ is used to denote breed of origin of an allele. The term ǫ b j is the mean additive effect and this is different between breeds due to different allele frequencies in different breeds. To be explicit, j is the allele frequency in breed b and a j is the additive effect of the j-th allele. Above, both expectations and allele frequencies refer to the base populations, and it is assumed that in each base population, alleles are assigned randomly to individuals. It is further assumed that effects for different loci are independent. Here, we introduce the notation α b such that the expectations of the α b s are equal to 0.
First, pedigree-based additive genetic relationships are derived using the partial genetic approach and common genetic approach, respectively, and second the corresponding marker-based relationships are derived.
Additive genetic model for crossbred C(AB) performance: partial genetic approach Contrary to Lo et al. [13] and Garcia-Cortes and Toro [14], in the derivation presented here, we first split the genotypic values according to breeds of origin instead of computing the variances and covariances and then splitting them. The reason for this is for similarity with the derivation of the corresponding genomic model using the partial genetic approach that appears in a following subsection.
For the C(AB) crossbred animals studied here, the paternal allele is always breed C and the maternal allele is either breed A or B with equal probability, and therefore the genotypic value becomes: and terms g C C(AB) , g A C(AB) , g B C(AB) and g AB C(AB) are defined implicitly. In this way, the genotypic value has been split into partial genetic effects, where the terms g C C(AB) , g A C(AB) , g B C(AB) are breed of origin specific and g AB C(AB) is a breed-segregation term.
For the AB crossbred sows (where for simplicity of notation it is assumed that their fathers are breed A and mothers breed B), with μ AB = µ + j ǫ A j + j ǫ B j , and terms g A AB and g B AB defined implicitly. For a purebred animal of breed b is the breeding value for crossbred C(AB) performance. From this, the three vectors of breeding values for crossbred C(AB) performance, g A , g B and g C , are defined. In this way, a breed-specific partial genetic effect has been defined for all animals containing the specific breed, and a breed-segregation partial genetic effect has been defined for crossbred C(AB) animals. The resulting variance-covariance matrices are as shown in the "Methods" section.
Note that the different means μ C(AB) , μ AB , μ A , μ B and μ C should strictly speaking be included into the genetic effects and breeding values, but they have been omitted here, since the genetic values are for performance in a specific C(AB) cross and these means cannot be inferred from data. The variance of the breed-segregation term for a C(AB) animal equals where σ 2 g,AB = j (ǫ A j − ǫ B j ) 2 /2 is the breed-segregation variance, i.e. the additional genetic variance in an F2 cross compared to an F1 cross. The assumption that has been used here is that the ǫ A j s and ǫ B j s are fixed constants.
Additive genetic model for crossbred C(AB) performance: common genetic approach Using the notation defined above, the genetic value for s are independent between breeds, and N is the number of genes. Compared to the partial genetics approach, here we assume that both the α b s and the ǫ b j s are random variables. The randomness in the α b s is because different animals inherit different alleles, but the randomness in ǫ b j has a different origin. Note that the differences in ǫ b j between breeds is due to differences in allele frequencies, and as explained previously, ǫ b j = p b j a j + (1 − p b j )(−a j ) = (p b j − 1/2)2a j , where p b j is the allele frequency and a j is the allelic effect. Assigning prior distributions with expectations 1/2 to the allele frequencies as in Christensen [20] corresponds to assigning a prior distribution with expectation 0 and variance proportional to (a j ) 2 to ǫ b j for all loci. Furthermore, assuming that prior distributions for allele frequencies are correlated between breeds implies that covariances of the ǫ b j s become proportional to (a j ) 2 .
Thus, we assume that the ǫ b j s are random variables that are independent of the α b Defining new parameters σ 2 g = j (a j ) 2 (2τ 2 α,b + 2τ 2 ǫ,b ) and γ b = 4τ 2 ǫ,b /(2τ 2 α,b + 2τ 2 ǫ,b ), we see that the elements of the variance-covariance matrix within breed is defined by: for an individual in breed b, and for two individuals in breed b, i.e. animals in the base population are inbred with coefficient γ b /2 and related with relationship coefficient γ b . Note that from τ 2 α,b + τ 2 ǫ,b = 1/4, we see that σ 2 g = j (a j ) 2 /2 does not depend on breed and γ b = 8τ 2 ǫ,b . Furthermore, define γ b,b ′ = 8τ ǫ,b,b ′, then for two individuals in different breeds b, b ′ , i.e. base animals in different breeds are related. Therefore, a joint relationship matrix is specified among all base animals, and by applying the usual recursive definition, an additive relationship matrix is defined across all animals as shown in the "Methods" section.
Note that the breed-segregation term disappears here, since the differences in the ǫ A and ǫ B terms are Var(g i ) = σ 2 g (1 + γ b /2),