Accounting for genetic differences among unknown parents in microevolutionary studies: how to include genetic groups in quantitative genetic animal models

Summary Quantifying and predicting microevolutionary responses to environmental change requires unbiased estimation of quantitative genetic parameters in wild populations. ‘Animal models’, which utilize pedigree data to separate genetic and environmental effects on phenotypes, provide powerful means to estimate key parameters and have revolutionized quantitative genetic analyses of wild populations. However, pedigrees collected in wild populations commonly contain many individuals with unknown parents. When unknown parents are non‐randomly associated with genetic values for focal traits, animal model parameter estimates can be severely biased. Yet, such bias has not previously been highlighted and statistical methods designed to minimize such biases have not been implemented in evolutionary ecology. We first illustrate how the occurrence of non‐random unknown parents in population pedigrees can substantially bias animal model predictions of breeding values and estimates of additive genetic variance, and create spurious temporal trends in predicted breeding values in the absence of local selection. We then introduce ‘genetic group’ methods, which were developed in agricultural science, and explain how these methods can minimize bias in quantitative genetic parameter estimates stemming from genetic heterogeneity among individuals with unknown parents. We summarize the conceptual foundations of genetic group animal models and provide extensive, step‐by‐step tutorials that demonstrate how to fit such models in a variety of software programs. Furthermore, we provide new functions in r that extend current software capabilities and provide a standardized approach across software programs to implement genetic group methods. Beyond simply alleviating bias, genetic group animal models can directly estimate new parameters pertaining to key biological processes. We discuss one such example, where genetic group methods potentially allow the microevolutionary consequences of local selection to be distinguished from effects of immigration and resulting gene flow. We highlight some remaining limitations of genetic group models and discuss opportunities for further development and application in evolutionary ecology. We suggest that genetic group methods should no longer be overlooked by evolutionary ecologists, but should become standard components of the toolkit for animal model analyses of wild population data sets.

definition unknown.
genetic group: A subset of the pedigree base population that can be genetically distinct from the rest of the base population. The genetic group effect is the mean deviation of total additive genetic effects in a particular genetic group. Conceptually, the genetic group effects to be estimated can be thought of as fixed effects.
identical-by-descent (IBD): Alleles that are direct descendants of the same allele in a common ancestor (Lynch & Walsh 1998, p.132).
implicit genetic groups: A method of analysis which directly models the total additive genetic effects of individuals (as opposed to modelling their genetic group effects separately from their breeding values), thereby incorporating the genetic group effects into the individual random effects. Estimates of the fixed genetic group effects are also provided by the model. This approach results in a statistical model that is equivalent to the model under the explicit genetic group approach.
infinitesimal model: A genetic model of quantitative traits that assumes phenotypes are determined by an infinite number of loci, each with an infinitesimally small effect on trait expression (Fisher 1918;Lynch & Walsh 1998, p.47;Mrode 2005, p.2). phantom parent: An unknown (i.e., unidentified or unobserved) parent of an individual. Analyses that make genetic inferences from pedigrees typically assume that phantom parents have one offspring and have the same genetic properties as the pedigree base population . random effects: Factors for which the levels are a sample from the entire population of values constituting the random variable (Searle 1997). For a given dataset with a sample of the observed dependent variable, random effects are the realized (but unobservable) values of the random variable. Predicting the realized values of the random effects and estimating the parameters describing the distribution of levels are the main aims in fitting mixed effect genetic models (Searle 1997;Bolker et al. 2009).
relatedness: A measure of the extent to which individuals share alleles identical-by-descent that is defined relative to the base population. Relatedness has many formal definitions (see Lynch & Walsh 1998, p.132), but is used here to refer to elements of the additive genetic relatedness matrix (which equal twice the coefficient of coancestry).
total additive genetic effects: The sum of the additive effects of an individual's genes on any focal phenotype. The expected value is the mean of the additive genetic effects that an individual receives from both parents.

Appendix S2: Collection of pedigrees from wild populations
To illustrate that substantial numbers of parents are commonly unknown in wild population pedigrees, we extracted standard pedigree statistics from published papers citing the R package pedantics (Morrissey & Wilson 2010). The focal statistics comprised the total number of individual animals in the pedigree (i.e., pedigree size), the percentages of individuals that had unknown dams and sires [calculated as: 100 * (pedigree size -the number of identified maternities or paternities) / pedigree size], and maximum pedigree depth (i.e., the maximum number of lineal ancestors of any individual). Only pedigrees from wild populations were retained; pedigrees from captive populations were excluded (e.g., zebra finch, Taeniopygia guttata, located in Seewiesen, Germany). Use in quantitative genetic analyses was not a requirement for inclusion of a pedigree in our summary. When data for both a full pedigree and a pruned pedigree were available in the same paper, we report the largest pedigree for which the required statistics were provided. Pedigree statistics from multiple papers using the same study population are included since reported pedigree structures often differed due to further data collection, paternity assignment, or different degrees of pruning to informative individuals for different analyses.   J., Reichert, S., Zahn, S., Hegelbach, J., Massemin, S., Keller, L.F., Postma, E. & Criscuolo, F. (2015) Mother-offspring and nest-mate resemblance but no heritability in early-life telomere length in white-throated dippers. Proceedings of the Royal Society B, 282, 20142924.

Appendix S3: Breeding value prediction
Breeding value reliability, or accuracy, will be affected by both missing phenotypic and pedigree information.
Missing pedigree information also implies that the phenotypic data for an individual's relatives are also incomplete. In particular, predicted breeding values may be inaccurate when an individual with unknown parents also has an unknown phenotype itself. To illustrate the difference between true and predicted breeding values in such a situation, table S3.1 portrays a simple pedigree and three scenarios in which different forms of missing information might affect breeding value predictions.

References: Appendix S3
Postma, E. (2006) Implications of the difference between true and predicted breeding values for the study of natural selection and micro-evolution. Journal of Evolutionary Biology, 19, 309-320.

Appendix S4: Simulating data with genetic group effects
It is often useful to be able to simulate datasets on which animal models can be tested for bias, imprecision, or low power. The rbv() function in the MCMCglmm package can assign breeding values to a pedigree, given a specified variance-covariance matrix and genetic group means. However, rbv is limited to the supplied pedigree and cannot simulate trends over time in breeding values. We therefore provide the R function simGG(), included in the nadiv package (≥v2. 14 Individuals in the base population of the "focal" population are assigned total additive genetic values with a mean of muf and variance VAf. Offspring total additive genetic values u are the mean of their parents' u values plus a Mendelian sampling deviation drawn from a normal distribution with mean of 0 and variance equal to 0.5V A (1 − f sd ) where V A is VAf and f sd is the mean of the parents' coefficient of inbreeding f. Each "immigrant" (individual with unknown parents in generations >1) is assigned a total additive genetic effect that is drawn from a normal distribution with mean of mui and variance equal to VAi. Residual deviations are sampled for "focal" and "immigrant" populations separately, using normal distributions with means of murf and muri, respectively, and variances of VRf and VRi, respectively. Phenotypes are the sum of total additive genetic effects and residual deviations plus an overall mean mup.
Trends in total additive genetic effects and/or residual deviations can be specified for both the "focal" and "immigrant" populations. Trends in total additive genetic effects occurring in the "immigrants", in the residual deviations occurring in the "focal" population, and in the residual deviations occurring in the "immigrants" are all produced by altering the mean each generation for the separate distribution from which each of these effects are drawn. The change in mean over a generation is specified in units of standard deviations of the respective distributions (e.g., square roots of VAi, VRf, and VRi) and is set with d_bvi, d_rf, or d_ri, respectively. Trends in total additive genetic effects for the "focal" population are produced by selecting individuals to be parents of the next generation according to their predicted total additive genetic effects. Individuals are assigned probabilities of being selected as a parent of the next generation depending on how closely a prediction of their total additive genetic effect matches an optimum value. The parameter d_bvf specifies how much the optimal total additive genetic effect changes per generation. d_bvf is multipled by the square root of VAf (i.e., the additive genetic standard deviation of the "focal" population) and the number of generations since selection began, and it is this quantity that is added to the base generation's mean total additive genetic effect (muf) to determine the optimal total additive genetic effect in a given generation. Individuals with predicted total additive genetic effects closest to this optimum have a higher probability of being randomly sampled to be parents of the next generation. This represents selection directly on predicted total additive genetic effects.
The function yields a data.frame with a row for every individual and columns: id, dam, and sire indicating individual identity, dam, and sire; parAvgU which is the individual's dam and sire average total additive genetic effect; mendel which is the Mendelian sampling deviate for an individual; u is the total additive genetic effect of each individual; r is the residual deviation; p is the individual's phenotype; is indicates 0 if the individual was born in the "focal" population or 1 if the individual is an "immigrant"; and gen specifies the generation in which each individual was born.
Details and examples of the function can be found in the help file associated with the simGG() function.
These can be accessed by running the command ?simGG in an R session.

Simulating a fixed difference between groups
The data in Figs 2 and 4 of the main text or the object ggTutorial of these tutorials, was simulated so that all additive genetic and residual variances are the same for both the founder population and immigrant More information on the ggTutorial object can be found in the help file for this dataset, which can be accessed by running the command ?ggTutorial in an R session.

Simulating a temporal trend in breeding values
Here is an example of simulating data where a positive trend in breeding values occurs in the immigrant population (e.g., response to directional natural selection in the immigrant population). Analysing the above dataset with 6 genetic groups (divide generations in which immigrants occur into 6 groups) provides unbiased estimates of the simulated genetic group effects and additive genetic variance.

Appendix S5: Development of genetic groups in agricultural science
Genetic group methods have a long history in agricultural science and were first developed to control for selection in sire models (e. g., Henderson 1973). In a sire model, only the sire breeding values are modelled (Lynch & Walsh 1998, p.758;Mrode 2005, p.52). Genetic group effects are modelled in a sire model by nesting the random effects for each sire within fixed effects of sire herd and/or year (i.e., the genetic group), because sires may have different histories of artificial selection in their ancestry or belong to different herds.
However, these models assume that all sires are unrelated and the genetic group effect applies to only the sire.
Methodological advancements made it possible to first incorporate relatedness among sires and subsequently to incorporate the relatedness among all individuals through A (i.e., A −1 in an animal model). A accounts for selection if the model includes records for all individuals on which selection decisions are based and individuals in the base population have the same expected breeding value (Pollak and Quaas 1983). However, it is not always possible to obtain records for selection decisions when parents originate from different populations, often with different expected mean breeding values (Famula, Pollak & Van Vleck 1983). Including a model factor that reflects differences in breeding values due to selection, however, accommodates both of these issues. Thus, group effects were re-defined for use in animal models (Pollak & Quaas 1983;Famula, Pollak & Van Vleck 1983).  re-defined group effects in the context of relatedness among all individuals (A) to reflect differences in breeding values among unknown parents. This allowed the animal model to further account for any selection that created differences in breeding values among unknown parents  Fitting a fixed regression on columns of Q, however, requires a post-analysis calculation to obtain the predicted total additive genetic effects of each individual. It is the total additive genetic effect that is of practical use in a selective breeding context. Therefore, Quaas & Pollak (1981) developed a method to modify the mixed model equations to directly obtain the total additive genetic effects of individuals as the solutions to the model equations in a sire model  showed how a similar transformation to that suggested by Quaas & Pollak (1981)  with the random effect of individual total additive genetic effects (Appendix S6 ).  formally demonstrated the equivalence between models that included genetic group effects by implementing either the fixed covariate regression or random effects specification of A * .
These basic genetic group methods have been extended to accommodate more complexities. First, genetic groups have been defined for maternal genetic effects (Van Vleck 1990). It is thus possible to allow the direct and maternal additive genetic effects to have different genetic groups (and hence different mean genetic effects) to reflect selection acting differently on these two components (Cantet et al. 1992). Secondly,  has expanded the framework to include genetic group effects, conceptually, as random effects. As random effects, genetic groups are assumed to have a normal distribution with expectation of zero and a covariance model. Typically, random genetic group effects are modelled with a covariance structure of Iσ 2 g , implying zero covariances among groups. More practically, the work by Schaeffer demonstrates how to remove potential dependencies among the group effects by simply making them random effects. This can be achieved by adding 1 to the diagonal values in the group coefficients partition of A * (see further discussion in Appendix S6.3.2 ). More recently, grouping methods have progressed such that individuals with missing parents do not have to have their phantom parents assigned to a single group. In fuzzy classification of genetic groups (Fikse 2009;Appendix S7 ), phantom parents are assigned to genetic groups with a probability of membership in each group. These classification methods are a simple extension to the methods for setting up Q and A * , but they have yet to be widely implemented. Finally, Misztal et al. (2013) have shown how to properly account for genetic group effects when fitting a combined genomic and pedigree derived relatedness matrix in the animal model.

Appendix S6: Construction and implementation of Q and A*
Fitting animal models with genetic groups first necessitates formulating the appropriate Q and/or A * matrices.
First, we generally describe how Q and A * are formed, with an emphasis on the code used to do this in the R package nadiv (>v2.14.0; Wolak 2012). This is followed by explanations of minor changes that can be made to A * to improve animal model convergence and how to deal with warnings from software regarding singularities and non-positive definite matrices. Next are sections with code that illustrate fitting genetic group animal models in the MCMCglmm and asreml packages in R, the ASReml standalone program, and finally the WOMBAT standalone program. In each of these four sections, code and examples are provided to fit genetic groups using the nadiv package in R to create the Q and A * matrices that can then be supplied to these animal model fitting programs (in some cases outside of the R environment), followed by examples using existing functionality within each software program (see Table S6.1 for a summary of these programs and their capabilities).
The tutorials for these software programs all use the simulated dataset (ggTutorial available in nadiv or to download in WOMBAT and ASReml standalone formats) depicted in Figs 2 and 4 of the main text and described below. In Appendix S4, the R code in the nadiv function simGG() that is used to simulate the ggTutorial dataset is explained. Details of package and software program versions are available at the end of the supporting information. matrices, respectively, from a supplied pedigree. We indicate whether fitting genetic groups explicitly (as fixed effects) or implicitly within the random effects structure (as either fixed or random effects) is recommended (+) or not (-) when using each software program.

MCMCglmm
ASReml The table above is essentially just a pedigree (first three columns indicate an individual's unique identifying code, dam, and sire). However, two rows have been added at the top of the pedigree for the unique genetic groups and extra columns are included for ease of using all possible pedigree formats accepted by the nadiv functions that implement genetic group methods.
In the dataset, "g1" and "g2" are two genetic groups, lower case letters are phantom parent identities, and upper case letters are observed individual identities. The genetic groups and phantom parents are assigned as in . Additional details regarding the Q1988 dataset in nadiv are in the R help file that can be viewed by running the command ?Q1988 in an R session.
Below, we will only deal with observed individuals in this dataset, such that the subset of Q1988 we will use is: Q1988sub <-Q1988[-c(3:7), c("id", "damGG", "sireGG")] Note, this is not the only format for including genetic group information in functions of the nadiv package. To see a full description of alternative formats see the function help files, particularly for the ggcontrib (to make Q) and makeAinv (to make A * ) functions.

Constructing Q
As described in the main text, the matrix Q contains the fractional contributions from each of r genetic groups to each individual's genome, calcluated from the pedigree (see also Mrode 2005, ch. 2).
Therefore, if there are n individuals in the pedigree (e.g., n=4 in Q1988sub), Q will be an n row by r column matrix. In Henderson's (1976) decomposition of the numerator relationship matrix (A=TDT'), the lower triangle matrix T reflects the contribution of each individual in the pedigree to every other individual. In other words, T traces the flow of genes from one generation to another. Therefore, if we add the r genetic groups to the top of the pedigree (originally of length n) as identities with unknown dam and sire (NA) and fill in genetic group labels as dam and sire identities of every individual with a missing parent (such as how Q1988sub is arranged) we can obtain the T matrix for a pedigree of length n+r. The first r columns of the T matrix contain the fractional contributions from each genetic group. Therefore, Q for the n individuals in a pedigree is the r+1 to r+n rows and the first r columns of T.
The nadiv function ggcontrib() constructs Q for a given pedigree with r potential genetic group contributions to each of the n individuals in the pedigree. For example, Q for Q1988sub is obtained: Each row in Q (each entry is a proportional contribution) sums to one and offspring inherit half of each genetic group contribution from each of their parents. Individuals A and C in Q1988sub each have a phantom dam from genetic group "g1" and phantom sire from genetic group "g2". These individuals correspond to the first and third rows of Q above. As expected, the two genetic groups (each column in Q) contribute equally to the genomes of A and C.
Individual B has observed dam A and a phantom sire from genetic group "g2". Implementing an animal model with genetic group effects fitted explicitly as separate fixed regressions requires the Q matrix. Because every row of Q sums to one, it is not possible to solve an animal model to obtain estimates for every genetic group effect and an overall model intercept. Thus, genetic group effects are not estimable themselves, but can only be estimated when expressed as deviations (or functions) from other group effects (Lynch & Walsh 1998, p.839-841). Therefore, a column/genetic group is chosen as the "reference group" and is assigned a genetic group effect of 0 (see practical implementation of this below). All other group effects will represent deviations from this reference group. For example, we might choose genetic group "g1" in the Q1988sub pedigree as our reference group. The estimate of the fixed regression slope from the second column of Q equals the "g2" genetic group mean expressed as a deviation from the reference group ("g1") mean of zero. If we had used a pedigree with more than two genetic groups, still setting the first group as the reference, then the genetic group estimates for each group are all deviations from the "g1" reference group mean of zero.
Q is also needed to calculate breeding values (a) from an animal model that has fitted fixed genetic group effects implicitly within the random effects syntax of the model software. This is because such a model will yield predictions of total additive genetic effects (u). From equation (6) in the main text, an individual's breeding value (a i ) equals the total additive genetic effect (u i ) minus a weighted sum of genetic group effects contributing to that individual. The weighted sum of genetic group contributions for all individuals can be calculated as the matrix product Qg, where g is a vector of all genetic group effects and Q is calculated above.

Constructing A*
A major advance leading to the widespread use of animal models comes from Henderson (1976) and subsequent researchers' (e.g., Quaas 1976Quaas , 1995Meuwissen and Luo 1992) work to directly construct the matrix inverse of the additive genetic relatedness matrix (A −1 ). Direct methods for obtaining A −1 are necessary, because A −1 and not A is used when solving the equations in the animal model. A −1 has a unique structure that can be formed by tracing the flow of alleles from each individual to its offspring. Thus methods to directly construct A −1 rely on the basic premise that the flow of alleles through a pedigree can be traced because each individual inherits, on average, half of its alleles from each parent with some sampling variation that can be modeled based on the Mendelian sampling variance in the population.
To estimate genetic group effects by implicitly fitting the genetic groups within the random effect syntax of the model software, requires an inverse matrix to model the covariance among individual total additive genetic values and group effects based on alleles shared identical by descent. Conveniently, the matrix inverse accounting for individual-individual, group-group, group-individual, and individual-group sharing of alleles can be constructed directly from the pedigree in a very similar manner to A −1 (see also Appendix S5 ; Westell, ). This matrix, A * (following notation of Quaas 1988), is a compilation of four sub-matrices that are each a mathematical function of Q, A −1 , or both ( Fig. 3e in main text).
The A * for any pedigree with genetic groups can be obtained using the makeAinv() function in the nadiv package. For example, A * for the Q1988sub pedigree is: ## 6 x 6 sparse Matrix of class "dgCMatrix" There are several points to highlight regarding A * . First, this matrix can be split into four blocks: (1) the upper-left 4-by-4 (nxn) block (individual-individual covariance structure), (2) the lower-left 2-by-4 (rxn) block (group-individual covariance structure), (3) the upper-right 4-by-2 (nxr) block (individual-group covariance structure), and (4) the lower-right 2-by-2 (rxr) block (group-group covariance structure). Note, the upper-left block of A * is the same as the A −1 for the Q1988 pedigree without implicitly including genetic group effects (see also Fig. 3e in main text). For example, A −1 for the Q1988 pedigree without genetic groups can be obtained and compared to A * above using the same nadiv function without invoking the genetic A * constructed in nadiv can be used in various animal model software programs (Table S6.1) to implicitly include genetic group effects within the random effects structure of the model. Although the details vary among software programs (see tutorials in the subsections to Appendix S6.4 for detailed instructions in each of the highlighted software programs), the general approach is to associate the inverse covariance matrix A * with the term in the model representing additive genetic effects; almost exactly the same way as the typical A −1 is associated with additive genetic effects in a basic animal model. However, some potential alterations to A * should be considered before and during the model fitting process.

Genetic groups on top or bottom of A*?
The entries in A * corresponding to genetic groups can be placed in the last r rows and columns (as above) or the first r rows and columns. When setting up A * , switching between these placements is trivial. The consequences for using A * , however, can be rather large. Solving an animal model requires solving a large set of equations (the Mixed Model Equations or MME). Often, some of these equations may not be unique or in other words two equations may be exactly the same, very similar, or have a linear dependency (such that knowing the solution to one equation means the solution to the other equation is also known). This is a cause of singularities in the MME coefficient matrix of an animal model.
The MME are solved by either moving from the first set of equations to the last set ('top-down') or from the last set to the first set ('bottom-up'). In either case, if two equations are nearly the same then the model will obtain a solution for the first of the pair, but then encounter a problem with the second. Some software packages might automatically deal with this second, non-unique equation. For example, ASReml solves the equations from the bottom up and will delete the second equation (equation on top) to continue to solve the MME (Butler et al. 2009, pp. 134-135;Gilmour et al. 2014, pp. 106-107).
Unfortunately, the algorithms to solve the MME are not this simple in practice. Even though ASReml states which way the equations are solved, some options allow the program to re-order the equations to simplify solving the model. This re-ordering is based on the contents of each equation without respect to its categorization in the model. In other words, one genetic group equation might be re-ordered so that it is placed next to an equation for a year random effect level while another genetic group equation in the same model may be placed elsewhere with a random effect of individual additive genetic value. Although one cannot be completely certain where the genetic group equations will end up when the software program optimizes the order of equations to most efficiently solve the model, the ability to easily switch where in A * the genetic groups are positioned has been a strategy that has sometimes worked well in our own personal experience. Switching the placement of the genetic groups from the top to the bottom (or vice versa) of A * has sometimes solved convergence issues.
However, the standalone version of ASReml has the !LAST job qualifier (see Table S6.1 and the ASReml tutorial in Appendix S6.4.4 ) to consistently re-order the MME such that the genetic groups equations are solved last. This option assumes that the genetic groups are the first r rows and columns of A * , where r is the number of levels following the !LAST qualifier, and therefore 'on top'. We are not aware of a similar argument that can be implemented for a model using asreml in R.
WOMBAT, also, has several MME ordering strategies that can be implemented. However, these re-order the entire set of equations and the order of certain levels of random effects cannot be specified (Meyer 2007, section 5.3.1). MCMCglmm re-orders the MME according to algorithms associated with the underlying CSparse functions (Davis 2006;Hadfield 2010) and also cannot designate certain levels of random effects to a certain location in the re-ordered equations. Therefore, it is not clear how specifying genetic groups at the top or bottom of A * will affect the convergence of any particular model.
The nadiv function makeAinv(), enables the user to place genetic groups at the top or bottom of A * , in an attempt to solve genetic group equations first or last. Therefore, if any singularities in the MME coefficient matrix occur then the user can attempt to re-order the genetic group equations in the hope that this will lead to model convergence. The gOnTop argument to the makeAinv() function switches between placing the genetic groups on top or on bottom of A * , with the default to place genetic groups at the bottom of A * [makeAinv (..., gOnTop = FALSE, ...)].

Removing singularities and other problems within A*
Fitting genetic group effects implicitly often causes singularities in the coefficient matrix of the MME, that can sometimes be overcome by slight changes to the strategy for grouping . Software programs may report this issue as a non-positive definite generalized inverse matrix (often abbreviated GIV, GIN, or ginverse), or alternatively stating that a GIV/GIN/ginverse is negative definite, non-positive semi-definite, or ill-conditioned. For example, ASReml often deals with these types of singularities according to its own strategies (Butler et al. 2009;Gilmour et al. 2014). If genetic groups are specified in an ASReml analysis and a singularity occurs the program will introduce a row to the matrix to overcome this (see Lagrangian multipliers in Gilmour et al. 2014, pp. 164-165). Regardless of which software program is used, however, adding values to some of the matrix elements may also remove singularities occurring within the MME in addition to the aforementioned strategy of ordering A * with genetic groups either on top or at the bottom.
Adding a small number to some diagonals can remove singularities (Schaeffer , 1999Oikawa and Yasuda 2009;Gilmour 2010). Schaeffer ( , 1999 advocates adding an identity matrix to the rxr group-group block of A * such that all of the diagonals (d ii ) become 1 + d ii . This addition to A * means that the genetic group effects are no longer, conceptually speaking, fixed effects but are random effects with expectations of zero and a variance describing their distribution (see Appendices S1 & S5 ). Further, predicted genetic group effects will be biased toward the expected value of zero when genetic groups are random effects, just like any other random effects, with the amount of bias depending on how much information the data provide to predict genetic group effects . This bias will extend to any functions involving the predicted genetic group effects, specifically the predicted total additive genetic effects (u). Although, in many cases the mixed model equations and subsequent variance component estimates change only slightly, thus basic interpretations of results will remain the same (but see cautionary note below; further discussion in Schaeffer 1994). The strategy of adding an identity matrix is desirable if there are many individuals assigned to each group, because it does not re-rank the predicted breeding values and can improve convergence . The main change to the model when the leading diagonal elements of the group-group portion of A * are altered is a (co)variance matrix for traits within groups is added to the model equations dealing with the genetic group effects. In the simple case of a single trait, this variance is assumed to be the same across group effects with no covariances between groups (i.e., groups effects are normally distributed with a variance of Iσ 2 g ). In practice this variance is often assumed to equal the additive genetic variance for the individual total additive genetic effects (Sullivan 1999, see also WOMBAT's genetic groups in Appendix S6.4.5.4 ).
A further interpretation of this assumption means that a model's estimate of additive genetic variance will also include the magnitudes of the genetic group effects. This may cause problems if the variance in the genetic group effects themselves is much greater than the true additive genetic variance. In such a case, adding a smaller value than 1 to the diagonals (d ii ) of A * structures the model to reflect this difference.
Using values other than 1 for the addition to the diagonal elements effectively means that the ratio of the variance among genetic group effects to the additive genetic variance is no longer constrained to be 1:1. Thus, when modelling genetic groups as random effects we strongly caution users to carefully consider the sensitivity of the additive genetic variance estimate to the choice of value added to the group-group diagonal elements of A * . Multiple models fitted with different values added to the diagonal elements are recommended to ensure an unbiased estimate of the additive genetic variance.  provides a detailed discussion and interpretation of different strategies for altering A * .
Note that WOMBAT fits a separate variance component to the genetic group effects (see Appendix S6.4.5.4 ), thus offering a direct way to check that the variance in genetic group effects can be assumed to be the same as the additive genetic variance. ASReml provides the !GOFFSET pedigree file qualifier to add a value to the group-group diagonal elements when using ASReml's !GROUPS pedigree qualifier (see Appendix S6.4.4.4 ). Otherwise, this alteration of A * is easily accomplished manually after creating A * with the makeAinv() function in the nadiv package. Altering the A * for the Q1988sub pedigree above (with genetic groups on the bottom) would proceed as follows and result in the matrix displayed below: n <-4 r <-2 (ggIdentity <bdiag(Diagonal(n = n, x = 0), Diagonal(n = r, x = 1))) ## 6 x 6 sparse Matrix of class "dtCMatrix"

library(nadiv)
The first three columns of ggTutorial contain the complete pedigree and all 6000 individuals have a phenotypic record in p.

head(ggTutorial)
## id dam sire parAvgU mendel u r p is gen

The rest of the columns contain:
• parAvgU is the average u (total additive genetic effect) of each individual's parents • mendel is the Mendelian sampling deviation from the mid-parent value that is unique to each individual • u is the total additive genetic effect of each individual's genotype • r is the residual deviation of each individual • p is the phenotype for each individual (population expected mean is 20) • is indicates the immigration status, where focal population residents have is == 0 and immigrants have is == 1 • gen is the generation in which each individual was born The supplementary files for use in ASReml and WOMBAT contain a subset of these variables plus three derived variables. The ASReml and WOMBAT data files contain continuous variables for the coefficient of inbreeding (f ) and two genetic group genomic contributions (from Q). For demonstration purposes, these are added to the basic ggTutorial below within each section. Note the WOMBAT pedigree and data files have had 10000 added to the individual identity codes to comply with WOMBAT data structure requirements.
It is necessary to fit a fixed regression on the coefficient of inbreeding (f ) to account for inbreeding depression and minimize bias in estimates of quantitative genetic parameters (Kennedy, Schaeffer & Sorensen 1988;Reid & Keller 2010;Wolak & Keller 2014). This is not directly part of the process to fit genetic groups in the animal model, but it is considered necessary when fitting animal models to population data in which inbreeding occurs.

MCMCglmm
Modelling genetic groups either explicitly as separate fixed regressions or implicitly within the random effects syntax of MCMCglmm requires either Q or A * to be created using the nadiv package. Below we demonstrate how to create these matrices and fit each in a MCMCglmm model.

library(MCMCglmm) library(nadiv)
To complete the basic preparations of the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (f ; to minimize bias from inbreeding depression, see the last paragraph of Appendix 6.4.1 above). This calculation can be done using the inverseA() function of MCMCglmm. Consequently, the A −1 matrix is also created, which we will assign as its own object in R, because it will be needed in the model fitting genetic group effects explicitly as fixed covariate regressions with Q (see Appendix 6.4.2.2 ).
MCMCglmm facilitates modelling of non-Gaussian response variables, which has greatly extended the application of animal models in evolutionary ecology to a range of non-Gaussian phenotypes. Fitting genetic groups is no different when modelling non-Gaussian response variables. In such models, the genetic group effects (fitted either explicitly as separate fixed regressions or implicitly within the random effects) are estimated on the underlying latent scale.

Preparing a pedigree with genetic groups
Here, we provide general instructions to prepare pedigrees. Calculating the matrix of genetic group contributions (Q) and the augmented inverse relatedness matrix (A * ) requires a pedigree that indicates groups instead of missing values for parents. In the ggTutorial data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (foc0). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (g1). Alternatively, this second group could be further divided into genetic groups based on generation. However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which the immigrant was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more groups!).
First, create an object that will be the pedigree with genetic groups (ggPed).
ggPed <-ggTutorial[, c("id", "dam", "sire", "is", "gen")] naPar <which(is.na(ggPed[, 2])) ggPed$GG <rep("NA", nrow(ggPed)) # focal genetic group = "foc0" and immigrant = "g1" # obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively Note, the approach and format of the pedigree above is different from the Q1988sub pedigree earliermostly because the format here makes it easier to specify genetic groups in a pedigree for this particular case, but partly to illustrate the flexibility of nadiv functions. To be more specific, the Q1988sub pedigree contained two extra rows for the two genetic groups. The ggroups argument to makeAinv() supplied an integer indicating how many rows at the begining of the pedigree contained genetic groups and not individuals.
However, in the ggPed above no extra rows are added to the pedigree, but in the next few sections the character vector given to the ggroups argument in both ggcontrib() and makeAinv() specifies the name of the unique genetic groups that have been filled in for individuals instead of missing dam and sire identities.

Fixed explicit genetic group effects with Q (from nadiv) Fitting genetic group effects
explicitly requires the columns of Q to be included as separate fixed covariate regressions. The code below creates Q for the ggPed pedigree and adds the columns (foc0 and g1) as variables in ggTutorial so that they can be included in a model. An important point to make is that, because the pedigree and data for ggTutorial are in the same dataframe, the genetic group coefficients from Q can be added directly to the data with ggTutorial <cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or Astar). However, the model returns an error that G-structure 1 is ill-conditioned. In other words, A * has at least one eigenvalue that is either negative or zero. Re-defining the groupings can eliminate such a singularity. However, removing the singularity in the matrix while maintaining the current groupings is accomplished by adding a small value to the diagonal elements of the group-group portion of the matrix. Note that such an addition changes the interpretation of the effects in the model, as discussed above (Appendix 6.3.2 as well as Appendix S5 ), now implying that the genetic group effects are random effects in the model.
Below, we add 0.1 to the diagonal element of the A * which will constrain the variance in genetic group effects to be one tenth the additive genetic variance estimated by the model. A value of 0.1 may bias estimates of additive genetic variance in models of other datasets. Therefore, we advise testing values other than 0.1 (e.g., 1 or 10) and comparing estimates from these alternative models. Note that a value of 1 constrains the variance in genetic group effects to be equal to the additive genetic variance estimated by the model.

AstarAdd <-Astar
(ggrows <match(c("foc0", "g1"), dimnames(AstarAdd The AstarAdd sparse inverse matrix can now be associated with the individual identities (id) by supplying it as an argument to ginverse in the MCMCglmm model. We specify the same default prior distribution for the fixed effects below (i.e., the same ones as we wrote out in Appendix 6. 4.2.2 ). The prior distribution for the id term in the model is the prior used for the additive genetic variance and, in this case, also the same prior for the variance of the genetic group effects. It is unclear how the prior specified for the additive genetic variance will affect the estimate of the genetic group effects or even what prior is to be used in this context. As usual with Bayesian analyses, we recommend priors that are framed within the context of the data at hand, the model being fitted, and any a priori belief in the model parameters. Further we suggest testing the sensitivity of any particular prior used for a given model and especially when genetic group effects are treated as random effects by the model.
The model is written out below, but a saved version is available on Dryad (Wolak & Reid 2016) as supporting information file "./MCMCglmm/ggAstarRandMC.RData" so that it is not necessary to re-run this model to examine the results.

ASReml in R
To model genetic group effects explicitly as separate fixed regressions in an animal model using the R package asreml requires Q to be created using the nadiv package and then included in the asreml() model function.
Including genetic group effects implicitly within the random effects of an animal model in asreml() can be done either using A * constructed with nadiv or with asreml::asreml.Ainverse(). All three of these approaches will be demonstrated below using the simulated dataset ggTutorial:

library(asreml) library(nadiv)
To complete the basic preparations of the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (f ; to minimize bias from inbreeding depression, see the last paragraph of Appendix 6.4.1 ). This calculation can be done using the asreml.Ainverse() function of asreml. Consequently, the A −1 matrix is also created, which we will assign as its own R object, because it will be needed in the model fitting genetic group effects explicitly as separate fixed covariate regressions with Q below (Appendix 6.4.3.2 ).

Preparing a pedigree with genetic groups
Calculating the matrix of genetic group contributions (Q) and the augmented inverse relatedness matrix (A * ) requires a pedigree that has genetic groups indicated instead of missing values for parents. In the ggTutorial data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (foc0). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (g1). Alternatively, this second group could be further divided into genetic groups based on generation.
However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which it was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more groups!).
First, create an object that will be the pedigree with genetic groups (ggPed).
ggPed <-ggTutorial[, c("id", "dam", "sire", "is", "gen")]  Note that the nadiv function makeAinv() used to construct A * is more flexible than asreml's asreml.Ainverse() in the way pedigrees containing genetic groups can be formatted (for more detail on makeAinv() formats, see the help file by running the command ?makeAinv in an R session -particularly the examples at the end of the file -or the genetic group pedigree used above in the MCMCglmm tutorial). To be more specific, the format required by asreml.Ainverse() is similar to the Q1988sub pedigree used earlier in the tutorial (Appendix 6.1 ) which contains two extra rows for the two genetic groups.

Fixed explicit genetic group effects with Q (from nadiv)
Fitting genetic group effects explicitly in asreml requires the columns of Q to be included as separate fixed covariate regressions. The code below creates Q for the ggPed pedigree and adds the columns (foc0 and g1) as variables in ggTutorial so that they can be included in a model. An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures (only ggPed now has two extra rows for the genetic groups -but these will be dropped automatically from Q by ggcontrib()), the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
The fixed regressions on foc0 and g1 can be combined with the standard A −1 to specify an animal model in asreml(). Because unique solutions to the model do not exist (see discussion of estimability in Appendix S6.2 ) for the intercept, coefficient of inbreeding (f ), and genetic group effects (foc0 and g1), we fit the foc0 genetic group as the last fixed effect. This ensures that the model sets foc0 as a reference group and estimates the g0 genetic group effect as a deviation from the mean breeding value in the foc0 group. To re-scale the variances of the fixed effects back to the phenotypic scale and report them as standard errors: The above estimate of 3.117 agrees with the expected difference between mean breeding value in the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted breeding values (a) can be accessed from ggRegASR$coefficients$random. Because the order of rows in Q matches the order of individual identities in ggTutorial, the total additive genetic effects (u) can be calculated without any further manipulation as (eqn 5 in main text): reg_gHat <matrix(ggRegASR$coefficients$fixed[1:2], ncol = 1) reg_u <-Q %*% reg_gHat + ggRegASR$coefficients$random Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the breeding values (a). The R function match() is particularly helpful in these cases.

Fixed implicit genetic group effects with A* (from nadiv or asreml) Fitting fixed effects
of genetic groups implicity within the random effects of an animal model requires the augmented inverse relatendess matrix (A * ) to be supplied as a list in the ginverse argument. First, we create the sparse matrix of A * for the ggPed pedigree using the makeAinv function in the nadiv package and then demonstrate how to obtain this same matrix with the asreml package.

listAstar <-makeAinv(ggPed, ggroups = 2, gOnTop = FALSE)$listAinv
In the above code, we directed makeAinv() to include the genetic groups on bottom (bottom-right block) of A * using gOnTop = FALSE. This was done to improve model convergence (see Appendix 6.3 and its subsections). In this case, the asreml model converges. In other pedigrees, models, and/or genetic group classifications, if singularities occur then sometimes this can be overcome by adding one (or a small value) to the diagonal elements of the group-group portion of the matrix (although see discussion of this in Appendix S5 and cautions/considerations in Appendix 6. 3.2 ). The alteration is demonstrated below along with cautions and interpretations (Appendix 6.4.3.4 ). Note, the easiest approach is to adjust the genetic group diagonals of A * using the matrix returned by makeAinv() (i.e., object Ainv in the list), then convert the matrix to the list format required by asreml.
Alternatively, we can use asreml's function asreml.Ainverse() to obtain a list format of A * .

CAUTION: under the current and previous versions of asreml (current version at the end of Supporting
Information), quite often asreml.Ainverse() simply does not work. It will return an object with a ginv entry that is a data.frame, but with zero rows. Sometimes, re-running the above command will workotherwise use nadiv's makeAinv() (Appendix 6. 4.3 )! Note that in the above code, asreml.Ainverse() includes the genetic groups on top (top-left block) of A * . Often models fitting this structure of A * have trouble converging due to this matrix not being positive-definite (see Appendix 6.3 and its subsections). One strategy to remedy this is to place the genetic groups on the bottom of A * . Since there is no easy option to specify this in asreml.Ainverse() we recommend using nadiv's function makeAinv() as demonstrated above.
Regardless of which package was used to create listAstar, this sparse inverse matrix in list format has two extra properties ('attributes') that are critical for being able to use it in an asreml model. First (and not specific to just genetic group models), all asreml ginverse objects must have the "rowNames" attribute to map row/column numbers back to the factor levels in the model variable. Second, specific to models fitting fixed genetic groups implicitly within the random effects the listAstar object must have the "geneticGroups" attribute where the first number indicates the number of unique genetic groups. This is essential for asreml ## -attr(*, "rowNames")= chr "1" "2" "3" "4" ...

## -attr(*, "geneticGroups")= num 2 0
Both of these attributes are discussed in the help file to makeAinv() and can be accessed by typing ?makeAinv in R. Now the object listAstar can be associated with the individual identities (id) in the model by supplying the list to the ginverse argument in the asreml() function. # find the location of the genetic groups (ggrows <match(c("foc0", "g1"), attr(listAstar, "rowNames"))) Note the software has set the intercept estimate to zero in this model (see discussion of estimability in Appendix S6.2 ), implying this is the reference to which the foc0 and g1 genetic group effects are the estimated deviations. Therefore, the predicted total additive genetic effects (u) include the overall phenotypic mean. However, the difference between the genetic group effects 3.117 agrees with the expected difference between the mean breeding values of the immigrant group (3) and the founder population (0) that was specified in the simulation.

The predicted total additive genetic effects (u) can be accessed from ggAstarASR$coefficients$random[-ggrows].
Because the order of rows in Q matches the order of individual identities in ggTutorial, the breeding values (a) can be calculated without any further manipulation as (eqn 6 in main text): Astar_gHat <matrix(ggAstarASR$coefficients$random[ggrows], ncol = 1)

Astar_a <-ggAstarASR$coefficients$random[-ggrows] -Q %*% Astar_gHat
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (u) and used to calculate breeding values (a).
The R function match() is particularly helpful in these cases.

Random implicit genetic group effects with A* (from nadiv)
Fitting random effects of genetic groups implicity within the random effects of an animal model requires the augmented inverse relatendess matrix (A * ) to be supplied as a list in the ginverse argument. First, we create the sparse matrix of A * for the ggPed pedigree using the makeAinv function in the nadiv package, then make the necessary alterations (see discussion of this in Appendix S5 and cautions/considerations in Appendix 6. 3.2 ) so the model interprets the genetic groups as random effects, convert the sparse matrix to a list, and fit the model.

Astar <-makeAinv(ggPed, ggroups = 2, gOnTop = FALSE)$Ainv
In the above code, we directed makeAinv() to include the genetic groups on bottom (bottom-right block) of A * using gOnTop = FALSE. This was done to improve model convergence (see Appendix 6.3 and its subsections).
Next we add a small value to the diagonal elements of the group-group portion of the matrix taking the same approach as in the MCMCglmm tutorial above (Appendix 6.4.2.3 ). As a result of this alteration, the genetic group effects are conceptually random effects in the model. Below, we add 0.1 to the diagonal element of the A * which will constrain the variance in genetic group effects to be one tenth the additive genetic variance estimated by the model. A value of 0.1 may bias estimates of additive genetic variance in models of other datasets. Therefore, we advise testing values other than 0.1 (e.g., 1 or 10) and comparing estimates from these alternative models. Note that a value of 1 constrains the variance in genetic group effects to be equal to the additive genetic variance estimated by the model.
The easiest approach is to adjust the genetic group diagonals of A * , and this is why we stored the matrix returned by makeAinv() above (i.e., object Ainv in the list). Once the additions are made, we can then convert the matrix to the list format required by asreml.

The object listAstarAdd can be associated with the individual identities (id) in the model by supplying
the list to the ginverse argument in the asreml() function. # find the location of the genetic groups (ggrows <match(c("foc0", "g1"), attr(listAstarAdd, "rowNames"))) Here, the prediced total additive genetic effects include the genetic group effects expressed as deviations from 0. Because the order of rows in Q matches the order of individual identities in ggTutorial, the breeding values (a) can be calculated without any further manipulation as (eqn 6 in main text): AstarRan_gHat <matrix(ggAstarRanASR$coefficients$random[ggrows], ncol = 1)

AstarRan_a <-ggAstarRanASR$coefficients$random[-ggrows] -Q %*% AstarRan_gHat
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (u) and used to calculate breeding values (a).
The R function match() is particularly helpful in these cases.

ASReml Standalone
Here we cover four approaches to fit genetic groups in an ASReml analysis. The first two use nadiv functions in R to create Q and A * , which are each separately included in ASReml models. The third approach uses ASReml arguments and qualifiers to rely entirely upon the ASReml capabilities for fitting genetic groups. Whereas the first three approaches are used to fit genetic groups as fixed effects, the final approach demonstrates how to fit random effects of genetic groups.
The preparation of the ggTutorial data and requisite outputs from nadiv in R will first be demonstrated before detailing the code used in the ASReml model fitting. However, the basic files necessary to fit the ASReml models without any preparation in R are available in the accompanying supporting files on Dryad (Wolak & Reid 2016). The content of these files is summarized as: • ggTutorial.dat contains the ggTutorial data. This file contains space separated columns from a subset of columns in the original data.frame in the following order: id, p, is, gen, f, foc0, and g1.
This is needed for all models fitted.
• reg.ped contains the pedigree and is analogous to the first three columns of ggTutorial. This is needed to fit a model with genetic group effects explicitly as separate fixed regressions in the model.
• nadivAstar.giv and nadivAstar.txt are needed to fit genetic group effects implicitly within the random effects of the model using nadiv's makeAinv() in R to create A * (contained in nadivAstar.giv).
nadivAstar.txt provides the row names for nadivAstar.giv and is needed to associate levels in nadivAstar.giv to identities in the data ggTutorial.dat. Further explanation is provided below where these files are created.
• asremlGG.ped contains the genetic group pedigree. It is the same as reg.ped except two rows are added for each of the two genetic groups and individuals do not have missing values, but genetic groups, specified for missing dams and sires. This is needed to fit genetic group effects implicitly within the random effects using ASReml's own capabilities.
• ggReg.as, ggNadivAstarFxd.as, ggAsremlAstarFxd.as, and ggAsremlRan.as are included within the folders with similar names. Each specifies a different model to run.
To prepare the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (f ; to minimize bias from inbreeding depression, see the last paragraph of Appendix 6.4.1 ). This calculation can be done using the makeAinv() function in the nadiv package.

Preparing a pedigree with genetic groups
Calculating the matrix of genetic group contributions (Q) and the augmented inverse relatedness matrix (A * ) requires a pedigree that has genetic groups indicated instead of missing values for parents. In the ggTutorial data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (foc0). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (g1). Alternatively, this second group could be further divided into genetic groups based on generation.
However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which it was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more groups!).
First, create an object that will be the pedigree with genetic groups (ggPed).
ggPed <-ggTutorial[, c("id", "dam", "sire", "is", "gen")] naPar <which(is.na(ggPed[, 2])) ggPed$GG <rep("NA", nrow(ggPed)) # focal genetic group = "foc0" and immigrant = "g1" # obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively  Note that the two warnings above (regarding zeroes being interpreted as missing dams/sires) are to be expected and should not be cause for any concern in this particular pedigree/example. Now we add the columns of Q (foc0 and g1) as variables in ggTutorial so that they can be included in a model. An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures (only ggPed now has two extra rows for the genetic groups -but these will be dropped automatically from Q by ggcontrib()), the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Now we need to write the pedigrees and portion of the data to files in formats that ASReml can use.
Note, ASReml expects pedigree columns ordered ID, Sire, Dam and we will use '0' to denote missing values (instead of NA The fixed regressions on foc0 and g1 can be combined with the standard A −1 to specify an animal model in ASReml (see also the "./asreml/ggReg" folder in the supporting files). Because unique solutions to the model do not exist (see discussion of estimability in Appendix S6.2 ) for the intercept, coefficient of inbreeding (f ), and genetic group effects (foc0 and g1), we fit the foc0 genetic group as the last fixed effect.
This ensures that the model sets foc0 as a reference group and estimates the g0 genetic group effect as a deviation from the mean breeding value in the foc0 group.  Note the software has set the foc0 group estimate to zero in this model, implying this is the reference group and the g1 genetic group effect is the estimated deviation from this reference. The above estimate of 3.117 agrees with the expected difference between mean breeding value in the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted breeding values (a) are also reported in the "ggReg.sln" file (see now R object regPred) and follow below the fixed effect estimates. Because the order of rows in Q matches the order of individual identities in ggTutorial, the total additive genetic effects (u) can be calculated without any further manipulation as (eqn 5 in main text): reg_gHat <matrix(regPred[1:2, "Effect"], ncol = 1) reg_u <-Q %*% reg_gHat + regPred [-c(1:4), "Effect"] Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the breeding values (a). The R function match() is particularly helpful in these cases.

Fixed implicit genetic group effects with A* (from nadiv) Genetic group effects can be fit
implicitly within the random effects when the augmented inverse relatedness matrix A * has been supplied as a list to ASReml. The code below creates A * for the ggPed pedigree and saves it as a list in the format ASReml requires.
First, we will calculate Q and include it in the dataset for consistency with other versions of this data. Note that the two warnings above (regarding zeroes being interpreted as missing dams/sires) are to be expected and should not be cause for any concern in this particular pedigree/example.
Add the columns of Q (foc0 and g1) as variables in ggTutorial.
An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures (only ggPed now has two extra rows for the genetic groups -but these will be dropped automatically from Q by ggcontrib()), the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Now we create the sparse matrix of A * for the ggPed pedigree using the makeAinv function in the nadiv package.
listAstar <-makeAinv(ggPed, ggroups = 2, gOnTop = TRUE)$listAinv In the above code, we directed makeAinv() to include the genetic groups on top (top-left block) of A * using gOnTop = TRUE. This is necessary so that ASReml solves the equations in an order that improves model convergence (see Appendix 6.3 and its subsections). In this case, the ASReml model converges. In other pedigrees, models, and/or genetic group classifications, if singularities occur then sometimes this can be overcome by adding one (or a small value) to the diagonal elements of the group-group portion of the matrix (although see discussion of this in Appendix S5 and cautions/considerations in Appendix 6. 3.2 ). The alteration is demonstrated above in the asreml tutorial along with cautions and interpretations (Appendix 6.4.3.4 ). The resulting list could then be given to the standalone ASReml program much the same as we do here (see also Appendix 6.4.4.5 ). Note, the easiest approach is to adjust the genetic group diagonals of A * using the matrix returned by makeAinv() (i.e., object Ainv in the list), then convert the matrix to the list format required by ASReml.
Now we need to write the data (ggTutorial) and list of the A * (listAstar) to a file for ASReml.
Further, we need to write to a file all of the identity codes in the pedigree that match to the rows and columns in listAstar. This enables ASReml to match levels in the generalized inverse (row and column numbers of listAstar) to observations in ggTutorial. Critically, we need to include the "geneticGroups" attribute of Here, genetic group effects are fitted in the model using nadiv's generalized inverse by including giv1(id) in the random effects syntax statement. The line below the data file, but before the model statement, (!LAST id 2) tells ASReml to fit the first two equations associated with the id variable (the genetic groups) last. This is done to help the model to converge (see Appendix 6.3.1 ).
The genetic group predictions and standard errors are reported in the "ggNadivAstarFxd.sln" file.
These can be read back into R: nadivAstarFxdPred <read. Note the software has set the intercept (mu) estimate to zero in this model (see discussion of estimability in Appendix S6.2 ), implying this is the reference to which the foc0 and g1 genetic group effects are the estimated deviations. Therefore, the predicted total additive genetic effects (u) include the overall phenotypic mean. However, the difference between the genetic group effects 3.11 agrees with the expected difference between the mean breeding values of the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted total additive genetic effects (u) are also reported in the "ggNadivAstarFxd.sln" file (see now R object nadivAstarFxdPred) and follow below the genetic group predictions. Because the order of rows in Q matches the order of individual identities in ggTutorial, the predicted breeding values (a) can be calculated without any further manipulation as (eqn 6 in main text): AstarFxd_gHat <matrix(nadivAstarFxdPred[3:4, "Effect"], ncol = 1) AstarFxd_a <-nadivAstarFxdPred[-c(1:4), "Effect"] -Q %*% AstarFxd_gHat Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (u) and used to calculate breeding values (a).
The R function match() is particularly helpful in these cases.

Fixed implicit genetic group effects with A* (from ASReml) ASReml can fit genetic
group effects implicitly within the random effects using its own function to construct A * . First, we construct the data and pedigree files in R.
Calculate Q and include it in the dataset for consistency with other versions of this data. Note that the two warnings above (regarding zeroes being interpreted as missing dams/sires) are to be expected and should not be cause for any concern in this particular pedigree/example.
Add the columns of Q (foc0 and g1) as variables in ggTutorial.
An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures (only ggPed now has two extra rows for the genetic groups -but these will be dropped automatically from Q by ggcontrib()), the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Write the pedigree containing genetic groups and the relevant portion of the data to files in formats that ASReml can use. Note, ASReml expects pedigree columns ordered ID, Sire, Dam. Here, genetic groups are specified as the top two lines of the pedigree file asremlGG.ped using !GROUPS 2 as an additional qualifier to the pedigree file line. ASReml then creates A * from the pedigree and saves this as a list in the file "ggAsremlAstarFxd_A.giv" (when the !GIV qualifier is included in the pedigree line).
The line below the data file, but before the model statement, (!LAST id 2) tells ASReml to fit the first two equations associated with the id variable (the genetic groups) last. This is done to help the model to converge (see Appendix 6.3.1 ).
The genetic group predictions and standard errors are reported in the "ggAsremlAstarFxd.sln" file.
These can be read back into R: asremlAstarFxdPred <read. Note the software has set the intercept (mu) estimate to zero in this model (see discussion of estimability in Appendix S6.2 ), implying this is the reference to which the foc0 and g1 genetic group effects are the estimated deviations. Therefore, the predicted total additive genetic effects (u) include the overall phenotypic mean. However, the difference between the genetic group effects 3.11 agrees with the expected difference between the mean breeding values of the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted total additive genetic effects (u) are also reported in the "ggAsremlAstarFxd.sln" file (see now R object asremlAstarFxdPred) and follow below the genetic group predictions. Because the order of rows in Q matches the order of individual identities in ggTutorial, the predicted breeding values (a) can be calculated without any further manipulation as (eqn 6 in main text): AstarFxd_gHat <matrix(asremlAstarFxdPred[3:4, "Effect"], ncol = 1) AstarFxd_a <-asremlAstarFxdPred[-c(1:4), "Effect"] -Q %*% AstarFxd_gHat Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (u) and used to calculate breeding values (a).
The R function match() is particularly helpful in these cases.

Random implicit genetic group effects with A* (from ASReml)
Fitting random effects of genetic groups implicity within the random effects of an animal model requires a modified augmented inverse relatendess matrix (A * ) to be created. This can be accomplished in R with nadiv such that the object is then passed to ASReml or the built-in functions of ASReml can do this while fitting the model. Creating the modified matrix in R using nadiv has been demonstrated in Appendix 6.4.3.4 and including such a matrix in an ASReml analysis is demonstrated in Appendix 6.4.4.3. Here, we demonstrate how to use ASReml to create a modified A * for the ggPed pedigree that has the necessary alterations (see discussion of this in Appendix S5 and cautions/considerations in Appendix 6. 3.2 ) so the model interprets the genetic groups as random effects and fit the model all in a single execution of ASReml.
First, we construct the data and pedigree files in R. Calculate Q and include it in the dataset for consistency with other versions of this data. Note that the two warnings above (regarding zeroes being interpreted as missing dams/sires) are to be expected and should not be cause for any concern in this particular pedigree/example.
Add the columns of Q (foc0 and g1) as variables in ggTutorial. An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures (only ggPed now has two extra rows for the genetic groups -but these will be dropped automatically from Q by ggcontrib()), the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Write the pedigree containing genetic groups and the relevant portion of the data to files in formats The model is specified in ASReml (see also the "./asreml/ggAsremlRan" folder in the supporting files): ASReml s Astar for random genetic groups implicitly within the random effects of additive genetic variance in models of other datasets. Therefore, we advise testing values other than 0.1 (e.g., 1 or 10) and comparing estimates from these alternative models. Note that a value of 1 constrains the variance in genetic group effects to be equal to the additive genetic variance estimated by the model.
ASReml then creates A * from the pedigree and saves this as a list in the file "ggAsremlRan_A.giv" (when the !GIV qualifier is included in the pedigree line). The line below the data file, but before the model statement, (!LAST id 2) tells ASReml to fit the first two equations associated with the id variable (the genetic groups) last. This is done to help the model to converge (see Appendix 6.3.1 ).
The genetic group predictions and standard errors are reported in the "ggAsremlRan.sln" file. These can be read back into R: asremlRanPred <read. agrees with the expected difference between the mean breeding values of the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted total additive genetic effects (u) are also reported in the "ggAsremlRan.sln" file (see now R object asremlRanPred) and follow below the genetic group predictions. Here, the prediced total additive genetic effects include the genetic group effects expressed as deviations from 0. Because the order of rows in Q matches the order of individual identities in ggTutorial, the predicted breeding values (a) can be calculated without any further manipulation as (eqn 6 in main text): AstarRan_gHat <matrix(asremlRanPred[3:4, "Effect"], ncol = 1) AstarRan_a <-asremlRanPred [-c(1:4), "Effect"] -Q %*% AstarRan_gHat Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (u) and used to calculate breeding values (a).
The R function match() is particularly helpful in these cases.

WOMBAT
Here are three approaches to fit genetic groups in a WOMBAT analysis. All three require some preparation using the nadiv functions in R. However, only the explicit regression on the genetic group covariate includes genetic groups as fixed effects. The second and third examples below include genetic groups as random effects in the model. It should be noted that WOMBAT sets up the mixed model slightly differently than the previous software programs. Notably, WOMBAT fits response variables and covariates as deviations from the mean response and mean of each covariate, respectively. Thus, regression coefficients are equivalent, but have slightly different estimates/interpretations from the estimates in the above tutorials. The preparation of the ggTutorial data and requisite outputs from nadiv in R will first be demonstrated before detailing the code used in the WOMBAT model fitting. However, the basic files necessary to fit the WOMBAT models without any preparation in R are available in the accompanying supporting files on Dryad (Wolak & Reid 2016). The content of these files is: • ggTutorial.d contains the re-coded ggTutorial data. A value of 10000 has been added to the id to allow unique integer codes for the genetic groups (WOMBAT requires integer identities). This file contains space separated columns from a subset of the columns in the original data.frame in the following order: id, p, is, gen, f, foc0, and g1. This is needed for all three types of models fitted.
• wombatPed.d contains the pedigree and is analogous to the first three columns of ggTutorial • ggReg.par, ggNadivAstarRan.par, and ggWombatRan.par are included within each folder and each specifies a separate model to run.
• id.gin contains an inverse covariance matrix, in list format. This is either the nadiv created A * or A −1 , depending on the analysis.
• id.codes contains the mapping of running integers to the particular name of individuals in ggTutorial.d. For WOMBAT's implementation of a genetic group analysis, this file must contain an extra column (dummy column in these analyses) that contains an integer code specifying if a genotype is available, as well as extra columns with the Q matrix (which must be constructed within nadiv as there is no function to do this in WOMBAT).
To prepare the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (f ; to minimize bias from inbreeding depression, see the last paragraph of Appendix 6.4.1 ). This calculation can be done using the makeAinv() function in the nadiv package.

Preparing a pedigree with genetic groups
Calculating the matrix of genetic group contributions (Q) and the augmented inverse relatedness matrix (A * ) requires a pedigree that has genetic groups indicated instead of missing values for parents. In the ggTutorial data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (foc0). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (g1). Alternatively, this second group could be further divided into genetic groups based on generation.
However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which it was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more than 2 groups!).
Create an object that will be the pedigree with genetic groups (ggPed).

Fixed explicit genetic group effects with Q (from nadiv) Fitting genetic group effects
explicitly requires the columns of Q to be included as separate fixed covariate regressions. The code below creates Q for the ggPed pedigree and adds the columns (foc0 and g1) as variables in ggTutorial so that they can be included in a model.

str(ggTutorial)
## data.frame : 6000 obs. of 13 variables: ## $ id : Factor w/ 6000 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...  An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures, the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.

##
Now we need to write the pedigree and portion of the data to files in formats that WOMBAT can use.
Also, for consistency with analyses below, we will add a constant value of 10000 to the integer ids.
IDaddDF Estimates of the genetic group effects are reported in the "./wombat/ggReg/FixSolutions.out" file. Note the software has set the foc0 group estimate to zero in this model, implying this is the reference group and the g1 genetic group effect is the estimated deviation. The estimate of 3.098 agrees with the expected difference between mean breeding value in the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted breeding values (a) are also reported in the "RnSoln_id.dat" file.

Random implicit genetic group effects with A* (from nadiv) Fitting genetic group effects
implicitly within the random effects requires the list format of a modified A * to be supplied to WOMBAT.
Currently, there is no option to incorporate the number of genetic groups fitted implicitly when the software calculates the residual degrees of freedom. Residual degrees of freedom that incorporate the number of genetic groups are needed to calculate the residual variance and the log-likelihood for a model in which genetic groups are considered fixed effects (e.g., see how the residual degrees of freedom are adjusted in an asreml analysis in Appendix 6.4.3.4 ). However, the number of genetic groups are not considered in these same calculations when genetic groups are considered random effects. Therefore, we demonstrate fitting genetic groups implicitly when groups are treated as random effects.
Fitting random effects of genetic groups implicity within the random effects of an animal model requires the augmented inverse relatendess matrix (A * ) to be supplied as a list. First, we create the sparse matrix of A * for the ggPed pedigree using the makeAinv function in the nadiv package in R, then make the necessary alterations (see discussion of this in Appendix S5 and cautions/considerations in Appendix 6. 3.2 ) so the model interprets the genetic groups as random effects, convert the sparse matrix to a list, and fit the model.
The code below creates A * for the ggPed pedigree, the data file, and id.codes so that the genetic group and identities in A * can be associated with identities in the data file. We will also calculate Q and include it in the dataset for consistency with other versions of this data.
Q <ggcontrib(ggPed[, 1:3], ggroups = c("foc0", "g1")) ggTutorial <cbind(ggTutorial, Q) An important point to make is that, because the pedigree and data for ggTutorial have essentially the same sizes and structures (only ggPed now has two extra rows for the genetic groups -but these will be dropped automatically from Q by ggcontrib()), the genetic group coefficients from Q can be added directly to the data with ggTutorial <-cbind(ggTutorial, Q) above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the Q matrix contains a row for every individual in the pedigree, only the subset of rows in Q that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Next create A * : Astar <-makeAinv(ggPed[, 1:3], ggroups = c("foc0", "g1"), gOnTop = TRUE)$Ainv Next we add a small value to the diagonal elements of the group-group portion of the matrix. As a result of this alteration, the genetic group effects are conceptually random effects in the model. Below, we add 0.1 to the diagonal element of the A * which will constrain the variance in genetic group effects to be one tenth the additive genetic variance estimated by the model. A value of 0.1 may bias estimates of additive genetic variance in models of other datasets. Therefore, we advise testing values other than 0.1 (e.g., 1 or 10) and comparing estimates from these alternative models. Note that a value of 1 constrains the variance in genetic group effects to be equal to the additive genetic variance estimated by the model.
The easiest approach is to adjust the genetic group diagonals of A * , and this is why we stored the matrix returned by makeAinv() above (i.e., object Ainv in the list). Once the additions are made, we can then convert the matrix to the list format required by asreml.
The predicted total additive genetic effects (u) are also reported in the "./wombat/ggNadivAstarRan/RnSoln_id.dat" file. Here, the prediced total additive genetic effects include the genetic group effects expressed as deviations from 0.

WOMBAT's genetic groups (Random group effects with Q from nadiv) Fitting genetic
group effects using WOMBAT's own functionality still requires calculating and supplying Q. Although Q is supplied, genetic groups are included implicitly within the random effects. Further, the genetic group effects themselves are treated as random effects with a variance among group effects that is estimated by the model.
WOMBAT does not provide a function to calculate Q, so we calculate it using nadiv's ggcontrib() function and provide A −1 as a generalized inverse. WOMBAT can be used to create A −1 , but here we will use the makeAinv() function in nadiv. For more details, see example 18 supplied with WOMBAT, particularly the pdf file entitled "RNote_WOMBATS2Step.pdf" that is included within the Example 18 folder and the "wombat.par" file in the folder "~/Example18/C".
The code below creates A −1 for the ggPed pedigree and creates the data file and id.codes that contains Q (we also include the columns of Q in the data file for consistency with other versions of this data set).
Q <ggcontrib(ggPed[, 1:3], ggroups = c("foc0", "g1")) ggTutorial <cbind(ggTutorial, Q)  Note that WOMBAT wants the genetic group contributions in the id.codes file as an integer. Therefore, we have multiplied Q by a large number (10000 used above) and then rounded these to the nearest whole number. This scaling factor of 10000 will need to be indicated in the parameter file ("ggWombatRan.par").
The animal model in WOMBAT is then (see also the "./wombat/ggWombatRan" folder in the supporting files): COM WOMBATs genetic groups (random genetic group effects with Q from nadiv ) Here, genetic groups are specified as their own random effect in the model as ggrps with a diagonal covariance matrix specified (IDE). Thus, genetic group effects are conceptually treated as random effects (Appendix S1 ) in the model. Note the SPECIAL sections, which are necessary because the coefficient of inbreeding (f ) has values of zero that are meaningful (i.e., these zeroes do not indicate missing values) and for fitting genetic groups. The genetic group SPECIAL section indicates the scaling factor by which we multiplied elements in Q that were supplied in the file "./wombat/ggWombatRan/id.codes".
The variance estimates in "./wombat/ggWombatRan/SumEstimates.out" match the simulated values for the id and residual terms. The estimated variance for the genetic group term (ggrps) is approximately 2.
All other simulated values are recovered in this analysis (model estimates match simulated values).
The genetic group effect predictions are reported in the "./wombat/ggWombatRan/RnSoln_ggrps.dat" file. The genetic group predictions are random effects and thus are deviations from their expected value of zero. The difference between the genetic groups 3.111 agrees with the expected difference between the mean breeding values of the immigrant group (3) and the founder population (0) that was specified in the simulation.
The predicted total additive genetic effects (u) are reported in the "./wombat/ggWombatRan/RnSoln_id.dat" file. Here, the prediced total additive genetic effects include the genetic group effects expressed as deviations from 0.
In the id column, lower case letters are phantom parent identities and upper case letters are observed individual identities. The second and third columns indicate an individual's dam and sire, respectively. Every observed individual (upper case letter) with an unknown parent is assigned a unique phantom parent instead of a missing value (e.g., a instead of NA). The phantom parents are assigned to individuals as in .
Phantom parents, however, have missing values (NA) assigned as their dams and sires. Additional details regarding the Q1988 dataset in nadiv are in the R help file that can be viewed by running the command ?Q1988 in an R session. The next step is to assign phantom parents to genetic groups, specifically in such a way as to allow group membership to be probabilistic.

Constructing the fuzzy classification matrix F
When fuzzy classification of genetic group membership is used, unknown parents are no longer replaced with genetic groups in the pedigree. Instead, each unknown parent is assigned a unique phantom parent identity, these phantom parent identities are given their own entry/row in the pedigree. Also, a fuzzy classification (F) matrix that has p rows, one for each phantom parent, and r columns, one for each genetic group, needs to be constructed. The F matrix contains individual elements f ij that take values between 0 and 1 and quantify the probability of phantom parent i belonging to genetic group j. The rows of F (i.e., a phantom parent's probabilities of group membership across all genetic groups) must sum to one.
In the example pedigree Q1988fuzz, F is a 5 row by 2 column matrix. Here, we arbitrarily assign fuzzy classification probabilities (f ij ). We exclusively assign phantom parent a to genetic group "g1" and b to genetic group "g2". We assign phantom parents c, d, e each a 0.5 probability of group membership in each of the two genetic groups.

# Find the phantom parents in the pedigree Q1988fuzz
pp <which(is.na(Q1988fuzz[, 2])) # First, an empty F matrix F <matrix(0, nrow = length(pp), ncol = 2, dimnames = list(as.character(Q1988fuzz[pp, 1]), c("g1", "g2"))) # Assign "a" to "g1" and "b" to "g2" Now that the phantom parents have been assigned probabilities of genetic group membership, we construct the Q and A * matrices. To help distinguish the approaches, we will call the resulting matrices created with fuzzy classification Q f and A * f . These will have the same properties as the Q and A * matrices constructed above without fuzzy classification, however, the individual matrix elements will differ. The difference reflects the uncertainty in phantom parent group membership. If all phantom parents are assigned with complete probability to one genetic group each (i.e., each row of F, corresponding to each of the phantom parents, contains a single value of one and the rest zeroes), then the resulting Q f and A * f will be exactly the same as if they were constructed without using fuzzy classification. Further details of the method are in Fikse (2009), whereas below the focus is on how to implement fuzzy classification using the nadiv functions ggcontrib() to obtain Q f and makeAinv() to obtain A * f .

Constructing Q f
As described in the main text (and above), the matrix Q contains the fractional contributions from each of r genetic groups to each individual's genome, calcluated from the pedigree. With n observed individuals (i.e., not phantom parents) in the pedigree (e.g., n=4 in Q1988fuzz), the fuzzy classification matrix Q f (the subscript "f" is used to distinguish this Q matrix as including fuzzy classification of phantom parents) is an n row by r column matrix. As detailed above (and in the main text), offspring inherit half of every genetic group contribution from each of their parents. Similarly, offspring of phantom parents inherit half of each phantom parents genetic group contributions. In the normal construction of Q, phantom parents only belong to a single genetic group. With fuzzy classification, however, each phantom parent has a probability of group membership that can be non-zero for more than one group. These probabilities can be thought of as fractional contributions of each genetic group to each phantom parent's genome. Consequently, offspring of phantom parents inherit half of the phantom parents genetic group contributions described in F in exactly the same manner as offspring inherit all of the genetic group contributions from an observed parent quantified in a row of Q.
The nadiv function ggcontrib() can construct Q f for a given pedigree with r potential genetic group contributions to each of the n individuals in the pedigree when the fuzzy classification of phantom parents (F) is supplied to the fuzz argument of the function. For example, Q f for Q1988fuzz is obtained: To implement an animal model with genetic group effects estimated explicitly as separate fixed regressions, where the phantom parents have fuzzy classification of genetic group membership, requires the Q f matrix as just calculated. The implementation in the animal model is no different with fuzzy classification of genetic groups than without fuzzy classification, just the details of constructing Q f differ slightly. Similarly, there is no difference when using Q f with fuzzy classification of genetic groups to calculate breeding values (a) from an animal model that has genetic group effects implicitly within the random effects (see above and equation 6 in the main text).

Constructing A * f
Methods to directly construct A −1 and A * rely on the basic premise that the flow of alleles through a pedigree can be traced because each individual inherits, on average, half of its alleles from each parent with some sampling variation that can be modeled based on the Mendelian sampling variance in the population.
The same principles apply when phantom parents have fuzzy classification of genetic group membership.
Therefore, the same methods to directly construct A * without fuzzy classification can now be used to directly construct an augmented inverse relatedness matrix from a pedigree when fuzzy classification of genetic groups has been implemented (we will call this matrix A * f , where the subscript "f" is used to distinguish this matrix from A * constructed without fuzzy classification of genetic groups) (Fikse 2009).
A * f is also a compilation of four sub-matrices that are each a mathematical function of Q f (as calculated directly above incorporating fuzzy classification), A −1 , or both (see Fig. 3e in main text, replacing Q with the Q f constructed with fuzzy classification).
The A * f for any pedigree with phantom parents assigned can be obtained using the makeAinv() function in the nadiv package and supplying the fuzzy classification defined by F to the fuzz argument. For example, A * f for the Q1988fuzz pedigree is: Astar_fOut <-makeAinv(Q1988fuzz, fuzz = F) (Astar_f <-Astar_fOut$Ainv) ## 6 x 6 sparse Matrix of class "dgCMatrix" To model genetic group effects, where the phantom parents have fuzzy classification of genetic group membership, implicitly within the random effects of an animal model, A * f is supplied to the model the same way as the typical A −1 or A * (see tutorials above). It remains to be seen if the same alterations to A * f are also necessary as when fitting A * in an animal model (see Appendices 6.3.1 and 6.3.2 ).

Software version information
The following software versions were used to run all of the code above: