A fast indirect method to compute functions of genomic relationships concerning genotyped and ungenotyped individuals, for diversity management

Pedigree-based management of genetic diversity in populations, e.g., using optimal contributions, involves computation of the Ax\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Ax}}$$\end{document} type yielding elements (relationships) or functions (usually averages) of relationship matrices. For pedigree-based relationships A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{A}}$$\end{document}, a very efficient method exists. When all the individuals of interest are genotyped, genomic management can be addressed using the genomic relationship matrix G\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{G}}$$\end{document}; however, to date, the computational problem of efficiently computing Gx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Gx}}$$\end{document} has not been well studied. When some individuals of interest are not genotyped, genomic management should consider the relationship matrix H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{H}}$$\end{document} that combines genotyped and ungenotyped individuals; however, direct computation of Hx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Hx}}$$\end{document} is computationally very demanding, because construction of a possibly huge matrix is required. Our work presents efficient ways of computing Gx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Gx}}$$\end{document} and Hx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Hx}}$$\end{document}, with applications on real data from dairy sheep and dairy goat breeding schemes. For genomic relationships, an efficient indirect computation with quadratic instead of cubic cost is x=ZZ′x/k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{x}} = {\mathbf{Z}}\left( {{\mathbf{Z^{\prime}x}}} \right)/k$$\end{document}, where Z is a matrix relating animals to genotypes. For the relationship matrix H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{H}}$$\end{document}, we propose an indirect method based on the difference between vectors Hx-Ax\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Hx}} - {\mathbf{Ax}}$$\end{document}, which involves computation of Ax\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Ax}}$$\end{document} and of products such as Gw\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{Gw}}$$\end{document} and A22-1w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{A}}_{22}^{ - 1} {\mathbf{w}}$$\end{document}, where w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{w}}$$\end{document} is a working vector derived from x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{x}}$$\end{document}. The latter computation is the most demanding but can be done using sparse Cholesky decompositions of matrix A-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{A}}^{ - 1}$$\end{document}, which allows handling very large genomic and pedigree data files. Studies based on simulations reported in the literature show that the trends of average relationships in H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{H}}$$\end{document} and A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{A}}$$\end{document} differ as genomic selection proceeds. When selection is based on genomic relationships but management is based on pedigree data, the true genetic diversity is overestimated. However, our tests on real data from sheep and goat obtained before genomic selection started do not show this. We present efficient methods to compute elements and statistics of the genomic relationships G\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{G}}$$\end{document} and of matrix H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{H}}$$\end{document} that combines ungenotyped and genotyped individuals. These methods should be useful to monitor and handle genomic diversity.


Background
Optimal contribution [1][2][3] is a method of choice for the management of genomic diversity. In this method, reproducers are chosen such that the expected genetic gain and expected increase in homozygosity are properly weighted. The increase in homozygosity is estimated based on average relationships between selected individuals, and in livestock these relationships are usually pedigree-based. Such measures of diversity can be represented as x ′ Kx where K is a matrix of relationships and x a vector of contributions to the next generation. Optimizing contributions in x is a non-linear problem that requires repeated computation of x ′ Kx, where the most difficult part is the computation of Kx. In the case of pedigree relationships, a very fast method exists for this computation [4]. Here we recall that, in genomic selection, genomic relationships must be included in matrix K [5], and we present computational strategies in the case of genomic selection where all, or part of, the animals have been densely genotyped.
Genomic evaluation considers several tens of thousands of single nucleotide polymorphisms (SNPs) that are distributed across the whole genome, and in the most frequent implementation (genomic best linear unbiased prediction (GBLUP), or single-step GBLUP) it uses a so-called genomic relationship matrix. Following this approach, the accuracy in the evaluation of breeding values is improved compared to that of pedigree-based evaluation by exploiting existing linkage disequilibrium with neighboring quantitative trait loci (QTL) [6]. Consequently, genomic selection affects gene transmission, directly for SNPs and indirectly for QTL.
However, because of linkage between these high-density SNPs, indirect hitch-hiking also affects gene transmission at loci other than SNPs and QTL [7]. This fact impairs the conventional pedigree-based methods used for computing coancestry and inbreeding coefficients, where selection is assumed to be neutral, at any locus, concerning the gene transmission probabilities. Neutrality means that selection does not modify the probabilities of gene transmission within a given pedigree. For instance, before selection proceeds, the genotype of an unselected individual at a given locus comes from any possible grandparent pair with a probability of 1/4. However, when genomic selection occurs, grandparent combinations have local selective values that depend on the direction of selection (generally a combination of traits), and some of these combinations are better than others. Then, genomic selection restricts variability faster than predicted by the conventional algorithms based on pedigrees.
Sonesson et al. [5] illustrated by simulation that neutrality is impossible in a genomic selection scenario: they showed that the evolution of genomic relationship coefficients estimates the evolution of true inbreeding much more accurately than the evolution of pedigreebased coefficients. For instance, if two close sibs are selected because they have inherited the same beneficial allele at a QTL, and if they have a genomic relationship of 0.6, pedigree relationships can account for only 0.5 of the relationship. This is logical as genomic relationships describe realized instead of expected relationships, and take into account Mendelian segregation and linkage due to the finite size of the genome [8]. Thus, genomic management of genetic variability is required in order to avoid detrimental trends. For instance, Sonesson et al. [5] tried to maximize genetic gain when using genomic selection while restricting the rate of inbreeding per generation to 0.50% by using either genomic (each individual was genotyped) or pedigree-based coefficients. The true rate of inbreeding was 0.53% in the first case, which is in fairly good agreement with the restricted value. However, it reached a value as high as 2.26% when pedigree-based coefficients were used. In comparison, pedigree management with pedigree-based evaluation yields a true rate of inbreeding of only 0.74%, due to lower selection pressure on the QTL.
When monitoring evolution of genetic variability over time, or even optimizing management of genetic diversity at a given time, some individuals of interest may be ungenotyped (see "Appendix" for a comprehensive list of these situations). A simple example is when young genotyped rams are chosen, in which case these are genotyped whereas females are not. Estimating future inbreeding needs to consider both the genotyped rams and ungenotyped females.
Then, (ungenotyped, genotyped) and (ungenotyped, ungenotyped) relationships by combining pedigree and genomic information should be estimated. A natural approach is to use the matrix usually called H that was conceived to extend the information in genomic relationships to all individuals in a pedigree, regardless of the genotyping status [9]. Extensions of the theory accommodate different origins (metafounders), selection and drift [9][10][11]. Although it is most often used for genetic evaluation in the single-step GBLUP [9,12], its use for management of diversity is natural, even if the evaluation is not by single-step GBLUP e.g., for dairy cattle where multi-step methods are the most common.
The objective of our study was to develop an indirect method for computing genomic relationship coefficients and vector functions Gx and Hx, where the pedigree-based relationship matrix A may (or not) account for single or multiple origins. The method that we present here is useful to expedite the computations needed when monitoring or optimizing management of diversity in genomic selection, as already done by the indirect method for computing vectors Ax in the pedigree-based context [4].
The new approach was evaluated using data from dairy goat and dairy sheep breeding programs. We also discuss the issues that raise from using H instead of A in the world of practitioners and breeders and suggest methods to present genomic relationships at the classical pedigree scale via shift and scale conversion factors.

Computation of the matrix product Gx
Consider the genomic relationship matrix G = ZZ ′ /k [13], where Z is a matrix of genotypes for n animals and m markers coded additively, and often "centered" locuswise with reference either to base or to observed allele frequencies, and k is a scale factor, typically the sum of heterozygosities at the markers. Weights for each locus can be introduced in the form G = ZDZ ′ , and methods in this paper extend easily to this case. Aguilar et al. [14] presented efficient methods to compute G. To compute products, it is more efficient to use Gx = Z(Z ′ x)/k (without explicitly forming G) at a quadratic cost 2mn instead of the cubic cost of forming first G (cubic cost mn 2 ) to later compute Gx (quadratic cost n 2 ). The exception is when n is small compared to m, in which case it is easier to compute and store G.
Either of the matrix-vector products in Gx = Z(Z ′ x)/k can be programmed using public, already optimized, possibly parallel, subroutines such as DGEMV from BLAS [15]. Note that optimal contribution decisions are invariant to the choice of the reference allele (which results in the same G) or to different estimates of base allelic frequencies used in Z and k, because changing assumed allele frequencies only scale and sum constants to Gx but the optimum is the same.

Recalling the properties of the indirect computation of the matrix product Ax
Vectors Ax (where x is any vector) can be quickly obtained following [4] based on the well-known fact that the sparse matrix A −1 is the product of an upper sparse triangular matrix by its transpose [16,17]. The fast method is very handy to compute portions or functions of A wihout explicitly setting it up. For instance, extracting sections of A column-wise can be done by computing column i as the product Ax, where x contains 1 in position i and 0 elsewhere. After only a single run, it also allows the computation of average relationships within groups of individuals a = x ′ Ax or between two groups of individuals a = y ′ Ax, where x and y are vectors of individual contributions. On the contrary, setting up explicitly matrix A by the tabular rule is prohibitive because it involves a number of operations equal to the square of the number of individuals in the pedigree of candidates, which can be very large.

Computation of the matrix product Hx
Matrix H expands genomic information contained in G to ungenotyped individuals via pedigree relationships as follows [9,12]: where subindexes 1 and 2 refer to ungenotyped and genotyped individuals, respectively. The inverse of H is sparse and regularly used in singlestep GBLUP. However, computing Hx is more demanding than any of the two previous cases Ax or Gx, because it involves dense products and inverses that involve A 22 and G. The first purpose of our paper is to show this complexity and how this computation can be efficiently carried out.
An additional problem arises from the fact that the two terms forming H, i.e., G and A, should ideally refer to the same genetic base. Vitezica et al. [11] and Christensen et al. [18] suggested to compute G first using observed allele frequencies and then to convert it into matrix G , following metrics of pedigree base. The conversion principle was that the average of G and its average diagonal should be equal to their counterparts in matrix A 22 . Then, G = αJ + βG, where shift parameter α and scale parameter β were obtained from four means: the average terms A 22 and G, and the average diagonal terms d(A 22 ) and d(G). Based on the two constraints, This can be understood as correcting for drift of the overall mean (α) and reduction in variance (β) [19]. If the genotyped population is large enough and mating is approximately at random, then average inbreeding (in either G or A 22 ) is the average half relationships and β ≈ 1 − α 2 . These coefficients can also be interpreted as α = 2F st and β = 1 − F st , where F st is a measure of differentiation from the more recent genotyped population in G to the base population of A 22 [11,19,20].
However, considering the genomic base as the reference is preferable, i.e. modifying A, not G. Indeed, matrix A depends on pedigree recording and relies upon the assumption that pedigree founders are fully unrelated. This assumption can be removed using the metafounder approach [21], which postulates that the pedigree-based additive relationship between any pair of founders is equal to a positive parameter γ (from 0 to 2) that summarizes the situation of the pedigree base in reference to the genomic base [10]. This parameter γ can be estimated from genomic data [22], and represents the homozygosity across founders in the pedigree that would yield observed genomic relationships in G, where G is computed as the cross-product G = ZZ ′ /(2/m) with Z containing {− 1, 0, 1} values. Furthermore, Garcia-Baccino et al. [22] showed that the value of γ is relative to a theoretical genomic base that displays maximum variability at each marker locus (allelic frequencies 0.5), thus giving rise by drift to the pedigree base and to differentiation of frequencies in the genotyped population. Then, γ is simply eight times the variance of the (unobserved) marker frequencies in the pedigree founders. In this context, they interpreted γ as an F st index [23] and they proposed several estimation methods for parameter γ. The metafounder approach extends easily to several breeds or origins (e.g. genetic groups) by considering Ŵ, a matrix extension of γ, and this also provides an elegant solution to the problem of computing relationships including unknown parent groups [17], a case for which relationship is not a welldefined concept. As a result, we considered the metafounder approach to be adequate for the monitoring and management of genetic variability.

Direct computation of matrix product Hx
The following algorithms to compute Hx use the pedigree-based matrix A [24] and they are exactly the same when including metafounders in A [Γ ] [21].
Matrix H has the following components: x 1 x 2 be the product of matrix H by any vector x. The matrix expression of Gx 2 due to the complexity of H 11 . If w denotes a working vector, intermediate computations such as Gw, Aw (indirect method) and A −1 22 w (iterative or exact solving) are involved. The computation sequence that has to be carried out in order to obtain H 11 x 1 is quite long. Fortunately, results can be obtained more efficiently by an indirect method as detailed below.

An indirect computation of matrix product Hx
The computation method is indirect for two reasons. First, because it uses the difference d = y − z between y = Hx and z = Ax. Second, the method exploits the very simple expression of the inverse matrix H −1 [12,25]: so that To obtain d, note that x = H −1 y. Then: Consequently, and Then, we obtain Finally, y 1 = z 1 + d 1 . Then, computing y 1 through the indirect method is as simple as for y 2 , in total contrast with the direct method.
To summarize, in order to compute y = Hx: This is the final step.

Efficient solving
Product GA −1 22 z 2 can be obtained as G times vector A −1 22 z 2 , using the fast method for Gx described before. The main numerical hurdle consists in solving linear equation systems that involve A 22 , a full matrix. Replacing these systems by others that involve matrix A 11 , a sparse matrix, is appropriate because Furthermore, it is less time-consuming to restrict this equation to the genotyped individuals and their ancestors [26,27]. If B denotes the relationship matrix corresponding to such a pedigree, then (1) Eq. (4) becomes d 1 = A 12 f (d 2 ) i.e., section 1 of vector .

Computations in practical conditions
The indirect method can be used for monitoring and optimization diversity in large livestock populations: its implementation areas are briefly described in the "Appendix". Usually, breeding organizations that are willing to control genetic variability consider at a given time an (possibly long) operational list. This list consists in male and/or female candidates for selection, possibly extended by the rest of the live population when generations are overlapping. Optimal contributions of candidates to the next generation, represented by vector x, must be found, minimizing a function of the type 0.5x ′ Hx + w ′ x [22]. If all individuals in this operational list are genotyped, then computations are simple (at quadratic cost), restricted to the section of G individuals pertaining to the operational list. However, if some individuals in the operational list are not genotyped, computation of Hx vectors is needed.
In this case, all genotyped animals add information to the full matrix H and the full G matrix should be used. The fast indirect method is only used to compute (and possibly store) matrix H * , the section of H pertaining to the operational list. Afterwards, direct computations considering matrix H * provide function derivatives and Lagrange multipliers when analytic optimisation methods are used [1,3] or variations of functions for alternative contribution vectors when Monte-Carlo optimization is used [30,31]. In the first case, a small number of configurations is considered before obtaining the optimal one, whereas this number can be very high for a Monte-Carlo method such as simulated annealing.

Tayloring genomic relationship statistics to practitioners
In this section, we present elements to yield statistics in a scale that can be used by breeders. Genomic relationship coefficients derive from a statistical construction that has been basically developed for genomic evaluation purposes [13] although these coefficients are similar to marker-based relationships developed for conservation genetics [32]. Breeders and breeding organizations easily understand the output of research in the area of genetic evaluation, but understanding the concept of genomic relationships is more demanding. Practitioners are often puzzled by the unusual values of the genomic relationship coefficients (for instance negative genomic inbreeding, negative or very high relationships) in comparison with pedigree-based coefficients. This might deter breeders from implementing an effective genomic management of diversity.
A pragmatic compromise consists in optimization based on genomic relationships, possibly with metafounders, while the results (e.g. average inbreeding) are converted into more conventional scales before editing in output files. Conversion into conventional (pedigree-based) coefficients is carried out via a shift factor α conversion and a scale factor β conversion . We use the superscript "conversion" because these factors have not the same meaning as the α and β in the section on "Computation of matrix product Hx": these are essentially operational factors. For instance, they cannot be interpreted as drift between pedigree founders and genotyped individuals in later generations [11]. These factors α conversion and β conversion should be computed "once for all" based on a reference set of individuals that are genotyped before the effective start of genomic selection. Estimation forces equality of diagonals and overall means of G or H (computed with metafounders) and A (computed without metafounders), so that H converted = α conversion J + β conversion H, and output files meet the familiar scale of probabilities of identity-bydescent from unrelated founders. First, stability of conversion factors is required to allow management and monitoring of genomic variability across cohorts over time, i.e. the average inbreeding in 2016 can be reliably compared to the average inbreeding in 2017.
Moreover, α conversion and β conversion need to be estimated based on the animals genotyped before genomic selection proceeds. Otherwise, the shift factor α conversion would be biased negatively. This can be predicted from Sonesson et al. [5], who showed that, in the case of genomic selection with pedigree management, the average pedigree-based relationship increases less than the average genomic relationship. Conversion is unbiased if the rates ( F) of genomic inbreeding over time, either directly or based on converted values, are the same. At times t and t + 1, the average genomic relationship coefficients are h t and h t+1 with conversion formula h converted t = α conversion + β conversion h t . If the asymptotic regime has already been reached, then the rate of inbreeding based on genomics is If the rate of inbreeding is evaluated based on converted values, then �F converted = β conversion h t −h t−1 2−α conversion −β conversion h t . Both expressions are equal when β conversion = 1 − α conversion /2, which is usually the case if Hardy-Weinberg equilibrium holds in the genotyped population for which α conversion and β conversion have been estimated [19], i.e. if genotyping is at random or before genomic selection proceeds, but will possibly not hold if genotyped animals are selected based on genomic evaluation.

Application to real data
Pedigree and genomic data from dairy goat and dairy sheep breeding programs were used. The French dairy goat breed Alpine uses an optimized selection program where the average conventional relationship is minimized for desired genetic gains [2,30,31]. Genomic selection is under study [33] and is planned in a near future. The Manech Tête Rousse, (blonde faced Manech), MTR dairy sheep breed belongs to the genetic improvement schemes in the French Western Pyrenees that are transitioning towards genomic selection [34]. In this part of the whole Alpine population, pedigree recording was satisfactory and as a result, the pedigrees of the youngest individuals were 10 to 11 equivalent generations long, on average, and this is why tracing back the 3075 individuals yielded 30,000 more individuals. All the sires and maternal grand-sires of the ungenotyped individuals of the operational list were genotyped. Then, these males provided the connection between the 1006 ungenotyped individuals and the initial 2069 genotyped individuals. Sections of the A and H matrices corresponding to these animals were obtained.
The MTR dataset consists of 2108 genotyped rams born between 1999 and 2009, and 500,626 pedigree records, corresponding to the whole pedigree of the breed. Rams were genotyped with the OvineSNP50 Bead chip (Illumina Inc., San Diego, CA, USA). After applying filtering criteria [34], 38,997 SNPs were retained. Table 1 shows the number of genotyped rams and of females with all four grand-parents known (only these females are used as dams of rams) per year. With this MTR dataset, we were not able to carry out the same studies as in Alpine goats because neither of the 2108 genotyped individuals had genotyped sire, dam or grandparents, and they did not constitute a clear operational list since they already had offspring, i.e., they were not candidates to selection. In this case, we computed average relationships per year to assess robustness of these statistics to using either A or H.
For both breeds, these datasets were not affected by genomic selection, which is planned only for the near future. The genotypic values in Z were coded as − 1, 0, 1 and G obtained as G = ZZ ′ /(m/2) for m SNPs [10,21,22]. The conventional relationship coefficients considered for constructing matrix H introduced a single metafounder with parameter γ, estimated by generalized least squares [22].
In both cases, using the indirect method and optimized computations of Gw, Aw (by the indirect method) and A −1 22 w, computations are inexpensive, taking a few seconds on a laptop for any of the two datasets. To give a flavor of timing, in an Apple Macbook with 4 threads, computation of Gw = Z(Z′x)/k (2nm operations) with 5000 simulated animals and 50,000 simulated SNPs took 0.4 s, whereas computation of G itself (n 2 m operations) took 37 s.

Results for the Alpine breed
Parameter γ was estimated as 0.30. This means that the genetic variance in the pedigree base (the metafounder gene pool) was only 0.85 = 1 − γ /2 times that in the conceptual genomic base [21].
The terms of A 22 (pedigree-based and not accounting for γ) were sorted by ascending order and classified into 10 groups of equal size (deciles). Figure 1 shows for each decile the average a, the average h, the average difference between both a and h and the standard deviation of the difference over replicates (after multiplication by 100 for clarity). Parameter γ was estimated to be equal to 0.30, which explained the large average differences. The standard deviation of the difference was fairly constant irrespective of the decile considered. Differences in lower deciles were not less variable than in higher deciles: this meant that the relative impact of modifications was larger for lower pedigree-based coefficients. The order of magnitude of the standard deviation of the difference was 2. Expressed in usual terms (coancestry coefficient (%) in the classical pedigree base), this corresponded to 2×0.5 0.7 = 1.43, a very small value. Finally, relationship modifications revealed by genotyping were substantial.
Deciles of A were constructed for the ungenotyped × genotyped section of the operational list (see Fig. 2) and for the ungenotyped × ungenotyped section (see Fig. 3). Basically, H-matrix genomic relationships involving ungenotyped individuals are estimated by regression and consequently, are intermediate between conventional relationships and true genomic relationships. The standard deviation of the difference between pedigree-based and genomic coefficients substantially decreased from 2 to 1.1 (Fig. 2) and 0.8 (Fig. 3) for the (ungenotyped, genotyped) and (ungenotyped, ungenotyped) sections, respectively. Thus, H estimations of genomic relationships for ungenotyped animals by regression yielded shrunken relationships, which were intermediate between genomic and pedigree-based coefficients in spite of the sires and maternal-grandsires of the ungenotyped individuals being genotyped. Consequently, if some candidates (e.g. dams of young males) are not genotyped in the future as in our operational list, the efficiency of the selection optimization will be affected in comparison with full genotyping. If the objective is to maximize genetic gain while constraining for genomic inbreeding rate [1], ungenotyped individuals with good estimated breeding values (EBV) will not be sufficiently selected because they cannot be shown to be "original", leading to a loss for EBV. If the objective is to minimize inbreeding rate while constraining for genetic gain [2], as for the Alpine breed, favorable ungenotyped individuals will also be neglected, leading to a weaker minimization of inbreeding rate. Due to this partial genomic inbreeding control, targeted genetic gains are smaller than under full control.
From the 2069 genotyped individuals, 129 were in the operational list. The remaining 1940 genotyped individuals were reduced to 1500 or 1000 or 500 in order to check the effect of reducing the genomic information. These individuals were selected by considering the highest relationship with the 1006 ungenotyped individuals from the operational list to reduce the loss of genomic information. Three different values were obtained for the operational section of H. These values were compared, term by term, with the values obtained with the complete matrix G. When the size of the working G decreased, the average term of the operational section of H was lower than the Conversion of genomic relationships into pedigree-based coefficients for animals of the operational list provided the following results: shift factor α conversion = − 0.345 and scale factor β conversion = 1.177. Then, β conversion was very close to 1 − α conversion /2, which would be the result obtained under Hardy-Weinberg equilibrium. The absence of negative bias on α conversion might be due to the fact that data were obtained from a past breed history in conventional (not genomic) conditions of genetic evaluation, selection and management of diversity. Figure 4 shows the statistics about the converted values. Only a very small proportion of negative values was obtained in the lowest deciles (1)(2)(3).

Results for the Manech Tête Rousse breed
Parameter γ was estimated as 0.47. This means that the genetic variance in the pedigree base (the metafounder gene pool) was only 77% of that in the conceptual genomic base. Figure 5 compares both alternative measures of overall relationship (both A and H include metafounders, thus they are comparable) and shows that, in general, both are very similar. The decrease in overall relationship observed from 2006 onwards is due to the larger number of rams genotyped (Table 1). Before this date, genotyped rams were only elite rams whereas from 2006 onwards, these were candidate rams, thus more diverse.
The mean value of the difference of relationships based on H or on A is represented in Fig. 6. Although very small, the trend seems to indicate that H detects more inbreeding than A.

Discussion
The analytic expressions of the relationship matrix H and its inverse are complex because the terms of H concerning the ungenotyped individuals are estimated by regression, conditionally on the observed genomic matrix G. Managing or monitoring genomic variability when some of the individuals involved are not genotyped requires to be able to compute vectors Hx. Consequently, a naive extension of the indirect method used for computing conventional vectors Ax to compute Hx provides tedious expressions. However, the analytic expressions are rather simple after considering the difference Hx − Ax in two main computation steps. This makes the genomic indirect method easy to implement and very efficient. In this method, the conventional indirect method is used several times and the main computation hurdle is linked to dense matrices G and A 22 (the pedigree-based counterpart of G ). If w denotes a working vector, computations of vectors Gw and A −1 22 w are needed, but they can be carried out by efficient methods. Using G −1 , or an approximation as in the algorithm for the proven and young animals (APY) algorithm [35], is possible but has no operational advantage because it results in a larger number of operations.
In spite of the above-mentioned methodological improvements, computing estimated genomic relationship coefficients when needed is quite demanding in terms of memory requirements and computation time. Although this does not pose a problem for national genetic evaluations, this might be a hurdle for some breeding companies that use local personal computers. First, all the genotyped individuals should be accounted for, even if they are little related or unrelated with the ungenotyped individuals under consideration (a fact confirmed by the study on the Alpine breed). This unfavorable finding can be puzzling at first sight but is quite natural because pedigree founders (typically nominally 'unrelated' individuals) exhibit substantial genomic relationships (the γ parameter). Then, it is easy to infer that every member of the population pedigree is linked to the genotyped population, even if nominally (through pedigree) "unrelated". Second, many runs of the genomic indirect method should be carried out if the size of the operational list involved in managing procedures is large. This is also the case if monitoring procedures aim at estimating the average genomic inbreeding per cohort: these averages require computing each individual coefficient by a specific run of the indirect method.
Converting genomic coefficients into pedigree-based coefficients, ideally through formulas that are established before starting genomic selection, was proposed and tested on the Alpine data. This might help breeders to really implement genomic management in parallel with genomic selection, a mind attitude imperiously needed [5]. In particular, if management continues to be based on pedigree relationships, the decrease in the actual (genomic) variability will be faster than its estimate based on pedigree, a warning signal for breeders.

Conclusions
We presented efficient computation methods of products Hx for the single-step relationship matrices, which combine genotyped and ungenotyped individuals. Our methods are efficient and extend well to large datasets based on existing appropriate algorithms for computation of products Gw and A −1 22 w. These algorithms are useful for the management of genetic diversity in the genomic era.
When genomic selection has already started, but remains conventionally (pedigree) managed, an optimization akin to the previous one can be carried out for genotyping in selection nuclei, i.e., choosing which young candidates (especially females) deserve to be genotyped, for a given genotyping investment.
Another implementation is the optimization of insemination in large commercial populations, in order to determine the contribution of each candidate and the resulting mating design. This requires the full table of relationships between male candidates and females to be known. It can be constructed after computing as many vectors Ax as the number of male candidates (generally a few in comparison with commercial population size).

The scope of Hx methodology (partial genotyping)
In genomic selection, the same operations as described above need to be done by the breeders: monitoring and handling by optimal contribution or similar methods. Ideally, this can be carried out using genomic relationships and the Gx methodology. However, it does happen that some individuals of interest are genotyped and others are not. The probability of such an event depends on the kind of implementation.
The methodology Hx is useful for monitoring the evolution of large commercial populations, where many females are ungenotyped. For selection nuclei with extensive genotyping, e.g. in pigs or poultry, where all animals are genotyped, it allows a full description of the evolution of the population including generations with old ungenotyped animals. For instance, it allows the comparison of the increase in inbreeding in pedigreed generations versus genotyped generations.
Optimizing genotyping will be permanently useful with open nuclei, where some females (e.g. rams' dams) originate from commercial herds, are not necessarily genotyped, and for which the Hx methodology will be necessary.
Optimization of insemination in large commercial populations, with a large proportion of ungenotyped females will be always needed, which is an operation that requires the potential of the Hx methodology.