Selection against genetic defects in conservation schemes while controlling inbreeding

We studied different genetic models and evaluation systems to select against a genetic disease with additive, recessive or polygenic inheritance in genetic conservation schemes. When using optimum contribution selection with a restriction on the rate of inbreeding (ΔF) to select against a disease allele, selection directly on DNA-genotypes is, as expected, the most efficient strategy. Selection for BLUP or segregation analysis breeding value estimates both need 1–2 generations more to halve the frequency of the disease allele, while these methods do not require knowledge of the disease mutation at the DNA level. BLUP and segregation analysis methods were equally efficient when selecting against a disease with single gene or complex polygene inheritance, i.e. knowledge about the mode of inheritance of the disease was not needed for efficient selection against the disease. Smaller schemes or schemes with a more stringent restriction on ΔF needed more generations to halve the frequency of the disease alleles or the fraction of diseased animals. Optimum contribution selection maintained ΔF at its predefined level, even when selection of females was at random. It is argued that in the investigated small conservation schemes with selection against a genetic defect, control of ΔF is very important.


INTRODUCTION
Many domesticated animal populations show heritable defects. Some defects are inherited by a single gene, e.g. complex vertebral malformation (CVM) in cattle [1]. Other diseases have a complex inheritance involving multiple genes plus environmental effects, e.g. hip and elbow dysplasia in dogs [17].
One way to eliminate the disease from the population is to select against the disease in a breeding program. For diseases caused by an identified single gene, direct selection on DNA-genotypes against the disease allele is possible. This can be done irrespective of whether the disease is additionally affected by the environment (complete penetrance or not). For unknown genes, segregation analysis can be used to infer on the genotype probabilities of individual animals, using phenotypic records of the animal itself and relatives [2,5,7,13]. Segregation analysis can also be used to save genotyping costs when selecting on DNA-genotypes for known genes [15]. For diseases with complex inheritance (involving many genes), the assumption of normally distributed genetic effects seems more appropriate, leading to BLUP [12] or threshold model breeding value estimation [8]. However, the inheritance is unknown for many diseases and the breeding value estimation is not straightforward. We will here investigate the genetic models and evaluation methods to select against a disease of known [2,5,7,12,13] or unknown modes of inheritance.
Genetic drift increases the occurrence of heritable diseases. Genetic conservation schemes are often small and care has therefore to be taken to avoid high rates of inbreeding when selecting against the disease in such small populations. Increased inbreeding could for instance result from direct selection for a nondisease allele, detected by DNA genotyping, when the non-disease alleles come from a limited number of ancestral families. We will use a selection method that maximises genetic response with a restriction on the rate of inbreeding [10,11,18,20]. The optimum contributions, which are translated to the optimum number of progeny will be calculated for each male selection candidate, assuming that female selection is at random. This reflects the situation, where every female is needed in a conservation scheme, or where there is little control over selection of the females.
The aim of this study was to find the best strategy for eliminating different kinds of genetic diseases, where the genetic evaluation method does not always agree with the true inheritance of the disease. We compared a threshold model, where many genes and environmental effects affect the liability of an animal to be diseased with a genetic model for a single gene. We also compared breeding values estimated from DNA-genotyping (for a known disease gene) to breeding values estimated by BLUP [12] or segregation analysis [2,5,7,13]. The disease trait is binary and is not (systematically) affected by the presence or absence of an infectious agent. Also, the disease is not genetically correlated to other traits under (natural or artificial) selection.

Threshold model
The threshold genetic model assumes liabilities underlying the probability of having a diseased animal. The liability was assumed normally distributed. Genetic values for liability, g i , of the base animals were sampled from the distribution N(0, σ 2 a ), where σ 2 a = 0.5 is the base generation genetic variance. Environmental effects on liability, e i , of base animals were sampled from the distribution N(0, σ 2 e ), where σ 2 e = 0.5 is the environmental variance. Total liability was x i = g i + e i . Later generations were obtained by simulating offspring genotypes from g i = 1/2g s +1/2g d +m i , where s and d refers to sires and dams, respectively, and m i is the Mendelian sampling component, sampled from N 0, 1/2(1 −F)σ 2 a , whereF is the average inbreeding coefficient of parents s and d. If x i was higher than the threshold value, T, then the individual was diseased and y i = 0. Healthy animals had y i = 1. The threshold T was set to 0.0, which resulted in a disease incidence of 50% in the base generation. These phenotypic values, y i , were used as input to estimate breeding values (EBV).

Single gene
For the base generation, two alleles of each animal were sampled, where allele A was sampled with probability q 0 and allele a was sampled with probability (1 − q 0 ). For later generations, individual genotypes were sampled using Mendel rules. Animal i was diseased (y i = 0) with probability P(y i = 0|XX i ) , where P(y i = 0|XX i ) is the penetrance probability of having a diseased animal (y i = 0) given genotype XX i . When the inheritance was additive, the input values P(y i = 0|XX i ) were 0.0, 0.5 and 1.0 for genotypes XX i = aa, Aa and AA, respectively. When the inheritance was recessive, these values were 0.0, 0.0 and 1.0 for genotypes aa, Aa and AA, respectively. The phenotypic disease records, y i , which resulted from this sampling, were used as input for the genetic evaluation.

BLUP
Phenotypic values from the threshold and single gene model were input to obtain EBV using a BLUP-breeding value estimation procedure [12]. This ignores the binary nature of the disease traits, but, when the fixed effect structure is as simple as here, where only an overall mean is fitted, linear BLUP-EBV are almost as accurate as generalised linear mixed model EBV, which accounts for the binary nature of the disease trait [19].
For the threshold model [8], the animals are assumed to be diseased when a normally distributed liability trait is below a certain threshold, T, and the animals are assumed healthy when the trait is above T. For the estimation of BLUP breeding values, the heritability on the diseased scale, h 2 disease , is needed and obtained from [8]: where z is the proportion of diseased animals when the threshold value is T, f( ) = Normal density function and h 2 liab = heritability of the liability trait. Here, T = 0, z = 0.5 and h 2 liab = 0.5, yielding h 2 disease = 0.318.

DNA genotyping
In this case, the disease was assumed to be due to a single known gene and only males were genotyped. When assigning the recessive genotype a value of 1, and the others a value of 0 (in Falconer and Mackay [6] notation a = −d = 0.5), it follows that the frequency of the disease genotype q 2 equals the disease incidence in the population [6]. Breeding values for the single gene were calculated as EBV(aa) = 2qα, EBV(Aa) = (q − p)α and EBV(AA) = −2pα, where α is the average effect of gene substitution, α = a + d(q − p) and d is the dominance deviation, d = P(y i = 0|Aa) − 0.5 P(y i = 0|aa) + P(y i = 0|AA) . These breeding values correspond to (twice the deviation of) disease incidences in progeny of the respective genotypes, and will be used as input for the selection algorithm to reduce disease incidence.
In the case of the threshold genetic model, the genetic effect is affected by many genes. We assume that not all genes are known, such that EBV from DNA genotyping cannot be calculated for the threshold genetic model.

Segregation analysis
The algorithm by Kerr and Kinghorn [14] was used to calculate genotype probabilities of each animal. It is an algorithm based on iterative peeling [2,13] and it takes account of effects of selection.
Input for the segregation analysis is the probability that the phenotype was diseased given the genotypes XX i , i.e. the penetrance probabilities. For an additive trait, the penetrance probabilities, P(y i = 0|XX i ) of a diseased animal i are 0.0, 0.5 and 1.0 for genotypes aa, Aa and AA, respectively. The probability of a non-diseased animal is P(y i = 1|XX i ) = 1 − P(y i = 0|XX i ). For a recessive trait, P(y i = 0|XX i ) is 0.0, 0.0 and 1.0 for genotypes aa, Aa and AA, respectively, and again P(y i = 1|XX i ) = 1 − P(y i = 0|XX i ). From these penetrance probabilities, the algorithm by Kerr and Kinghorn [14] calculates the probability that the individual i has genotype XX, P(XX) i . The P(XX) i are used to calculate EBV as: These EBV are input for the selection algorithms.
For the threshold genetic model, we estimated the penetrance probabilities as P( Similarly, the initial allele frequencies were estimated as q o = base P(AA) i + base 1/2P(Aa) i /Nbase, where Nbase is the number of base animals. Because these estimates of penetrance probabilities and initial frequencies depend on estimates of genotype probabilities P(XX) i , which themselves depend on initial frequencies and penetrance probabilities, iteration was used to simultaneously estimate all these probabilities.

Optimum contribution selection method (OC)
Optimum contribution selection was used as proposed by Meuwissen [18]. This method maximises the genetic level of the next generation of animals, G t+1 = c t EBV t , where c t is the vector of genetic contributions of the selection candidates to generation t + 1 and EBV t is the vector of estimated breeding values of the candidates for selection in generation t. The c t EBV t , is maximised for c t under two restrictions: the first one is on the rate of inbreeding and the second one is on the contribution per sex. Rates of inbreeding are controlled by constraining the average coancestry of the selection candidates toC t+1 = c t A t c t /2, where A t is a (n × n) relationship matrix among the selection candidates,C t+1 = 1 − (1 − ∆F d ) t , and ∆F d is the desired rate of inbreeding [10]. Note that the level of the restrictionC t+1 , can be calculated for every generation before the breeding scheme starts. Contribution of males (females) are constrained to 1/2, i.e. Q c t = 1/2 where Q is a (n × 2) incidence matrix of the sex of the selection candidates (the first column yields ones for males and zeros for females, and the second column yields ones for females and zeros for males) and 1/2 is a (2 × 1) vector of halves. The selection algorithm presented in the Appendix of [18] optimised genetic contributions for each male selection candidate, c t , given that all dams had (a priori) equal contributions, i.e. there was no selection of females. In cases of single genes, at some point all selection candidates can have the desired genotype and a maximisation of genetic response is no longer relevant, in which case the algorithm switched to minimising inbreeding. What happens computationally is that the Lagrangian multiplier, λ 0 , becomes zero when all animals have the same EBV and the equations for the optimal contributions cannot be solved (since they require dividing by λ 0 ). If this was the case, the simulation program called the minimisation routine presented in [22], which was modified here to handle discrete generations.

Mating
Random mating was applied. For each mating pair, a sire was randomly sampled with probabilities following the optimal contributions of the sires and a dam was randomly sampled from the available females. A mating pair always had two progeny, one female and one male.

Schemes
The general structure was that of a closed scheme with discrete generation structure. Recording of the disease was on both sexes before selection. The results were based on 100 replicated schemes with 60 or 100 selection candidates and on 50 replicated schemes for schemes with 200 selection candidates. Each replicate consisted of 15 generations of selection. Different constraints of ∆F per generation were considered. Firstly, ∆F was constrained to 0.010, which is considered as the maximum acceptable rate of inbreeding for a population to survive [3]. Secondly, for the larger schemes, the use of a more stringent ∆F constraint was simulated, with ∆F = 0.006 and 0.003 for the schemes with 100 or 200 animals per generation, respectively. These more stringent ∆F constraints had the same ratio of N e to N as the small schemes with 60 animals (0.833). We compared the evaluation models for the number of generations they needed to halve the frequency of the disease allele or the fraction of diseased animals for the single gene and threshold models, respectively.

Single gene model
For the genetic model with a single gene, the genetic evaluation was on DNA-genotype (GENO), BLUP EBV (BLUP) or on EBV based on genotype probabilities calculated by segregation analysis (SEGR).
As expected, GENO was the most efficient in reducing the frequency of the disease allele. BLUP and SEGR schemes always gave very similar results. For a scheme with 100 animals per generation and additive inheritance, GENO needed 2.0 generations to halve the frequency of the disease allele, whereas both BLUP and SEGR needed 3.0 generations (Fig. 1). As for a gene with additive inheritance, GENO also needed 2.0 generations to halve the frequency of the disease allele for a gene with a recessive inheritance, as expected (Tab. I). However, BLUP and SEGR needed more generations (4.0) than in the case of additive inheritance, because it is more difficult to identify and avoid selection of heterozygous animals, which have the same phenotype as non-diseased homozygotes, when inheritance is recessive. Both BLUP and SEGR schemes achieved the restriction on ∆F of 0.010 during all generations (Fig. 2). The GENO scheme kept the restriction exactly until generation 3 (Fig. 2) and thereafter ∆F was lower than the maximum indicated by the restriction. This is because most animals have the nondisease genotype after three generations, and the simulation program switched to minimisation of ∆F, while still achieving the maximum selection response (selection of only homozygous non-disease genotypes).
In fact, the minimisation algorithm may already be used when many, but not all sires have the desirable genotype. In the latter situation, the selection algorithm leads to negative contributions for the disease allele carriers. The disease carriers will subsequently be eliminated from the list of selection candidates by the algorithm. In the resulting list of candidates, all animals have the desirable genotype and ∆F is minimised using these animals that are homozygous for the desirable allele (aa). EBV will differ somewhat in the BLUP and SEGR schemes, even if the gene frequency of the non-disease allele is 1.0. Selection among the candidates is then always possible, and the optimum contribution selection-algorithm will attempt to maximise EBV of the parents within the restriction on ∆F. Therefore, BLUP and SEGR kept the restriction on ∆F exactly and selected somewhat fewer sires than GENO (Tab. I). For the small schemes with 60 animals per generation, GENO needed 3.0 and BLUP and SEGR 4.0 generations to halve the frequency of the disease allele for the gene with additive inheritance, i.e. smaller numbers of animals reduced the genetic response (Tab. I). For schemes with 200 animals per generation, GENO needed 1.5 and BLUP and SEGR 3.0 generations to halve the frequency of the disease allele. Hence, it takes a longer time to reduce gene frequency in smaller schemes, which is expected, because fewer selection candidates have the non-disease genotype. Since it took more time to reduce the frequency of the disease allele for the smaller scheme with 60 animals per generation, ∆F was kept at the level of restriction for GENO for more generations (six for schemes with a gene that has an additive inheritance) than for the scheme with 100 animals per generation (not shown). Similarly, for the larger scheme with 200 animals per generation, ∆F was kept at the level of the restriction for GENO for only two generations. Thereafter, ∆F was minimised and thus lower than the restriction. For the BLUP and SEGR schemes, ∆F was kept at the restricted level during the whole period.
For the scheme with 200 animals per generation, GENO seemed in general to select more (about 38) sires than BLUP and SEGR (about 21), for the schemes with additive and recessive inheritance (Tab. I), because in later generations, the simulation program was able to minimise ∆F and still achieve a maximum selection response.
In order to investigate whether the higher genetic gain in the larger schemes is entirely due to their higher actual relative to effective population size, we also simulated single gene schemes, where the ratio of N e over N was the same as for 60 animals. In those schemes, N e over N was 0.833 and the rate of inbreeding was restricted to 0.006 and 0.003 per generation for schemes with 100 and 200 animals per generation, respectively. At constant N e /N, the three schemes with 60, 100 and 200 animals per generation indeed achieved a very similar selection response, i.e. GENO needed 3.0 generations and BLUP and SEGR 4.0 generations to halve the frequency of the disease allele for a gene with additive inheritance (Tab. I). For a gene with recessive inheritance, when compared at the same ratio of N e to N, GENO needed 3.0 generations and BLUP and SEGR schemes 5.0 generations to halve the frequency of the disease allele for all three sizes of schemes. Thus, the ratio of N e to N seems to determine the selection intensity of the scheme and also the genetic response.
For the scheme with 100 and 200 animals per generation, but the same ratio of N e to N as the scheme with 60 animals per generation, GENO kept ∆F at the restricted level for 8 generations and thereafter ∆F was lower than the maximum indicated by the restriction for both genes with additive and recessive inheritance (not shown). BLUP and SEGR kept the restriction on ∆F during all generations.
There was an increase in the number of selected sires with an increasing effective population size. The number of selected sires was twice as many for the scheme with ∆F restricted to 0.003 (about 70) than for the scheme with ∆F restricted to 0.006 (about 35) (Tab. I). The same number of sires (about 21) was selected for schemes with the same ∆F (0.010), but with different actual population sizes.
For all BLUP and SEGR schemes, the accuracy of selection was between 0.67 and 0.76 (Tab. I).

Threshold model
For the threshold genetic model, where the genetic evaluation was either with BLUP or segregation analysis (SEGR), the fraction of diseased animals, which started at 0.50, was monitored. For schemes with 100 animals per generation, it took about 3.5 generations to halve the fraction of diseased animals to 0.25 for both BLUP and SEGR (Tab. II). Hence, even if the true genetic model involves many genes, but it is believed that the disease is determined by a single gene, SEGR selects animals with high disease resistance and reduces the fraction of diseased animals as fast as BLUP.
The restriction on ∆F of 0.01 was kept at the restricted level for both BLUP and SEGR (not shown).
The number of selected sires was also about the same (Tab. II) for both BLUP (22.7) and SEGR (21.7). Schemes, which had the same ratio of N e to N as the scheme with 60 animals per generations, were also investigated for the threshold model. For schemes with 60, 100 and 200 animals per generation, it took 5.0 generations for both BLUP and SEGR to halve the fraction of diseased animals (Tab. II). Hence, also for the threshold model scheme, the ratio of N e to N seemed to determine the selection intensity and thus also genetic response.
For the threshold genetic model schemes, about the same numbers of sires were selected as for the single gene model.

General
For the single gene model, direct selection against the disease allele (GENO) was, as expected, the most efficient evaluation method to reduce the frequency of a disease allele of a known single gene. SEGR and BLUP needed about 1.0 to 2.0 more generations to halve the frequency of the disease allele. This period was shorter for a gene with an additive inheritance and longer for a gene with a recessive inheritance. In the situation where the gene is unknown, an approach could be to first identify the gene and then apply GENO. A gene needs to be identified approximately within 1.0 (additive gene) to 2.0 (recessive gene) generations for this approach (identifying the gene and using GENO) to be as effective as SEGR and BLUP to halve the frequency of the disease allele.
For a threshold genetic model, SEGR and BLUP were equally efficient in reducing the fraction of diseased animals. Hence, when genes are unknown, either BLUP or SEGR can be used, although BLUP is a more natural choice, because it assumes many genes, which reflects the true genetic model here.

Genetic evaluation methods
When selecting against a disease allele, there is no steady state reached after some generations, as is the case when selection is for normally distributed traits. Instead, the number of non-diseased candidates changes with the frequency of the allele. This has implications for the achieved ∆F. Here we used the optimum contribution selection algorithm for discrete generations [18] to restrict the rate of inbreeding. BLUP and SEGR achieved the restriction for all schemes, because EBV generally differ somewhat such that selection can be made easily. Another selection method, e.g. truncation selection on EBV, would yield different selection intensities, but differences in selection accuracy between BLUP-EBV and SEGR-EBV are probably similar to those in the current study with optimum contribution selection. Hence, the differences in selection response between BLUP-EBV and SEGR-EBV are probably similar for truncation and optimum contribution selection.
In contrast with previous studies on optimum contribution selection, e.g. [10,11,18], we assumed no total control of the selection scheme. Males were sampled, as in previous studies, according to their optimum contribution, but all females were selected. Also in these schemes, optimum contribution selection resulted in an effective selection scheme, and the restrictions on inbreeding were kept.
The method of Kerr and Kinghorn [14] was used to calculate genotype probabilities for individuals in the SEGR scheme. This iterative method handles the many loops in the pedigree. For the single gene model, SEGR has information on both the mode of inheritance and penetrance probability. BLUP only has information on the level of heritability. However, SEGR and BLUP give very similar genetic response and accuracy of selection (0.743 for the scheme with additive inheritance and 100 animals per generation). In the case of disease genes with recessive inheritance, SEGR was expected to be more accurate, because it accounts for the dominance effects. There was hardly an effect on the selection response, probably because selection was for EBV, i.e. the additive part of the SEGR model. For the threshold model, there was iteration on the genotypes and penetrance probabilities of the genotypes, which was used to estimate EBV of the animals, although there was no single gene and thus no need to estimate the genotypes or penetrance probabilities. Their estimated effects are thus an artefact of the segregation analysis model. However, the resulting EBV yield very similar selection response as those of the BLUP model. SEGR only showed a small reduction in genetic response for the threshold model when the frequency of diseased animals was very low (not shown). Possibly, this reduction in the long-term genetic response for SEGR was because the SEGR model expected genetic variance to be reduced faster than it occurred in the threshold model. Generally, BLUP-EBV yield good approximations to SEGR-EBV in the single gene model, and SEGR-EBV yield good approximations to BLUP-EBV in the polygenic threshold model.
The BLUP-EBV evaluation did not account for the binary nature of the disease trait. For ease of computation, we approximated these non-linear EBV of binary traits by linear BLUP-EBV, which yield very similar selection response when the fixed effect structures are rather simple as in the case considered here [19].

Relaxation of assumptions
All dams were selected and only selection of sires was optimised. For a scheme, where also the selection of dams was optimised, we would expect a faster decrease of the frequency of the disease allele, such that fewer animals, which are heterozygous carriers, would be selected at the same ∆F. Most practical schemes have hierarchical mating systems with more dams than sires, implying that only males would be genotyped, because it is too expensive to also genotype all females.
We used an initial frequency of the disease allele of 0.5. In practice, the initial frequency of the disease allele will often be lower. We also simulated a scheme with q 0 = 0.2. Then, the evolution of the allele frequency under selection was approximated by the part of the curve of Figure 1, where the allele frequency is smaller than q 0 (not shown). Similar results were found for the threshold model. This suggests that our approach of studying time to 50% reduction is also applicable to other starting frequencies.
The schemes simulated here had a discrete generation structure. An overlapping generation structure would mainly have an effect for the SEGR and BLUP schemes, which would increase the accuracy of selection due to the increased number of offspring for some individuals.
When the disease trait is determined by an infectious agent, the probability of getting the disease depends not only on genotypes, but also on epidemiological parameters, that is how infectious the disease is, how many other animals are diseased etcetera. Different models have to be used for such traits.
Biallelic single genes were considered here. The results would probably be similar for a tri-allelic gene, but it would take more generations to make the population homozygous for the best of the three alleles.
In the current study, selection was entirely against the disease with either a single gene or threshold model. In general, the results can also be applied to major genes in mixed inheritance models, where a major gene has a large effect on the trait, and many background genes have small effects [4]. Villanueva et al. [23] simulated schemes with a mixed model inheritance for BLUP optimum contribution selection (similar to our BLUP scheme). They found that the frequency of the favourable allele had reached 0.95 after eight generations for a gene with recessive inheritance. This finding is similar to the results in Figure 1, although the schemes differed considerably (the genotypic values (2.0 and dominance deviation was −0.5), initial frequency (0.15) of the non-disease allele, and size of the scheme (120 animals per generation)). In the case where a major (disease) gene and other (economic) polygenic traits are included in the breeding goal, the question arises how much weight should be given to the major gene relative to that of the polygenic traits in the short and long term [4,9,16,21,23]. This question is beyond the scope of this paper and is currently under active investigation [24,25].
The disease gene that was selected against was not lethal, because also individuals with the homozygous disease alleles could be selected and they would also survive. A reduction of fitness due to natural selection was not accounted for here, but should be accounted for in cases where the artificial selection against a disease allele is weak relative to the natural selection.
We assumed random mating among the selected animals. When the gene is unknown, non-random mating can also be used to test whether an animal is a heterozygous carrier of the disease allele, by mating it to a known homozygous (non-carrier) in a progeny-testing scheme. This is, however, an expensive and time-consuming strategy to reduce the frequency of the disease allele.

CONCLUSIONS
1. Selecting against a known genetic defect (GENO) is more efficient than selection against an unknown genetic defect, although the difference in efficiency was rather small (GENO reduced the frequency of the disease allele only 1-2 generations faster than BLUP and SEGR). Thus, selection for a genetic defect that is not identified at the DNA level is only slightly less efficient.
2. Knowledge about the mode of inheritance of the disease was rather unimportant, because assuming the wrong mode of inheritance hardly reduced the efficiency of the selection scheme.
3. Optimum contribution selection was able to maintain ∆F at the predefined level even when it was only controlling the selection of the sires and females were randomly selected. Control of ∆F is, however, very important in small selection and conservation schemes.