Modeling the mutation and reversal of engineered underdominance gene drives

Highlights • A model with mutation in engineered underdominance gene drive components is proposed.• Loss-of-function mutations are likely to cause elimination of the gene drive.• The gene drive is likely to persist at high frequency for hundreds of generations.• Engineered underdominance can be reversed by release of free-suppressor constructs.• Free-suppressors can reverse engineered underdominance with extremely small releases.


Introduction
Gene drive systems have gained much attention in recent years for their predicted ability to increase the frequency of desirable genetic material within a population. These systems have been proposed to have a number of important applications ( Gould, 2008 ). Our interest in these systems relates to their potential use in preventing the spread of mosquito-borne viruses such as dengue ( Alphey, 2014 ). In this context, refractory genes have been developed that are capable of significantly reducing the ability of Aedes aegypti mosquitoes to transmit the virus ( Franz et al., 2006 ). However, the incorporation of these genes into the genome resulted in individuals of lower fitness than their wild-type counterparts ( Franz et al., 2014 ). It is thus necessary to develop gene drives that are capable of increasing the frequency of such desirable genes in spite of their expected fitness costs.
Engineered underdominance (UD) is one class of gene drive that has been proposed for driving desirable genetic traits into a pop-ulation ( Davis et al., 2001 ). This technique is based on the introduction of two transgenic constructs ( A and B ) at unlinked genetic loci (see Fig. 1 ). These constructs consist of three component genes, namely a lethal effector, a suppressor for the lethal at the other locus and a "cargo" gene conferring a desirable phenotype. Due to the cross suppressing transgenic constructs, individuals carrying one or more copies of either construct will be non-viable if they do not also possess at least one copy of the other construct. Individuals carrying one or more copies of both constructs are viable since each of the lethal effectors are inactivated by suppressors carried at the other locus. These effects combine to create a selection pressure for individuals to either carry both transgenic constructs or neither.
Theoretical work on UD has demonstrated that these systems, when introduced above a threshold frequency, are capable of spreading desirable genes thus replacing wild populations with those carrying UD transgenic constructs ( Davis et al., 2001;Magori and Gould, 2006;Huang et al., 2009;Edgington and Alphey, 2017;Dhole et al., 2018 ). Whilst the existence of this threshold frequency means that UD systems may be relatively expensive to deploy in the field relative to other classes of gene drive, it A schematic diagram of the engineered underdominance gene drive system. Each transgenic construct possesses three genes; a lethal, a suppressor for the lethal on the other construct and a desirable genetic cargo. Genotypes possessing one or more copies of both constructs (or none of either) are viable since lethals will be deactivated by the suppressor(s) on the other construct. Those genotypes carrying one or more copies of either construct but none of the other are non-viable since they have non-suppressed lethal genes.
should restrict the invasion of transgenes into neighboring populations ( Marshall, 2009;Marshall and Hay, 2012 ).
Previous modeling work has mostly assumed a population genetics framework neglecting the possibility of mutations forming within the introduced transgenes ( Davis et al., 2001;Magori and Gould, 2006;Marshall and Hay, 2012;Marshall, 2009;Edgington and Alphey, 2017;Huang et al., 2011;2009 ). Under this assumption it has been predicted that UD should persist indefinitely when introduced above the threshold frequency.
This threshold-dependent nature of UD systems has lead many to suggest that they may be reversed via the introduction of wildtype individuals ( Champer et al., 2016;Marshall and Akbari, 2016 ). This should lower the transgene frequency to sub-threshold levels and result in the elimination of introduced transgenes. To our knowledge this is the only reversal strategy proposed for UD gene drives and has yet to be investigated in detail.
Here we extend upon results in the previous literature by formulating a population genetics model of two-locus UD (as proposed in Davis et al., 2001 ) that incorporates the effects of loss-offunction mutations in the introduced transgenes. In particular, we investigate the predicted dynamics of UD systems in the presence of a constant rate of mutation for each introduced transgene. We then go on to propose that the release of individuals carrying "free suppressors" could function as a genetics-based reversal strategy and compare this to the introduction of wild-types.

Mathematical modeling
We consider here a discrete generation deterministic population genetics model of two-locus UD gene drive in a panmictic (randomly mating), isolated (closed) population of infinite size. Mutation is allowed to occur in each transgenic construct at a constant rate ( m per gene) and we assume that m is low enough for multiple mutations in a single generation to be neglected ( Fig. 2   Transgenic constructs are assumed to mutate at a rate m per gene. This is assumed low enough that multiple mutations per generation may be neglected. For example the initial transgenic construct (say, A ) mutates at a rate of 3 m , producing mutations in the lethal (giving A L ), suppressor ( A S ) and the cargo ( A C ) gene each at a rate of m . Then, transgenic constructs possessing one mutated gene (e.g. A L ) mutate at a rate of 2 m (giving A LS and A LC each at rate m ). Transgenic constructs with two mutated genes (e.g. A LS ) then mutate at rate m producing constructs with all three genes mutated (i.e. A LSC ). Here non-mutated genes are represented by black squares whereas genes with loss-of-function mutations are shown in red circles. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)  The fitness of each genotype is expressed relative to wild-type and represents a reduction in survival due to the carrying of transgenic constructs. Many genotypes also suffer a lethal effect from non-suppressed lethal genes. These factors are combined to give the relative fitness of each genotype: where ε denotes the relative fitness per construct conferred by non-mutated ( A, B ) or mutated ( A M , B M where M = L, S, C, LS, LC, SC, LSC) transgenic constructs. We assume relative fitnesses act multiplicatively such that the overall relative fitness of an individual resulting from all transgenic constructs carried is restricted to the biologically feasible range 0 ≤ ≤ 1. We also assume that all mutated transgenic constructs confer the same relative fitness regardless of the type or number of mutations (although resulting genotypes may confer a separate lethal effect ( γ i )). Exponents β, φ, μ and ψ denote the number (0, 1 or 2) of each transgenic construct type carried by a given genotype while γ i represents the lethality of that genotype ( γ i = 1 if non-viable or γ i = 0 if viable). Finally, we assume all lethal effectors are 100% effective when unsuppressed and that any number (i.e. one or two) of lethal effector copies are fully suppressed by any number (one or more) of the relevant suppressor (i.e. the strong suppression case in Edgington and Alphey, 2017 ). For numerical simulations we used a set of MATLAB (MATLAB R2014b, The MathWorks Inc., Natick, MA) scripts run in parallel using the MATLAB Parallel Computing Toolbox through a MATLAB Distributed Computing Server. These allow simulation of the system without a manual formulation of 2025 difference equations Diagram showing the simulation procedure for an engineered underdominance system with a constant rate of mutation ( m ) per gene. Parameter symbols used here are: ε, the relative fitness conferred by a given allele (denoted in the subscripts where M represents a mutated allele); β and φ, the numbers of non-mutated copies of construct A and B, respectively; μ and ψ, the number of mutated copies of transgenic constructs A and B, respectively; γ i , the lethality conferred by a given genotype ( i ); i , the overall relative fitness of individuals of genotype i ; and ¯ , the average relative fitness of the entire population.
(see Fig. 3 ). Briefly, the process begins by converting each possible genotype into a numerical form and computing relative fitnesses for each (using Eq. (1) ). Initial conditions are then calculated according to: where G t i is the i -th genotype frequency t generations after the initial release; α is the release ratio (introduced/wild) of non-mutated transgene double homozygotes ( AABB ) into a wild-type ( aabb ) population; and all other genotypes have an initial frequency of zero (i.e. G t=0 i = 0 for i = WT, AABB ). We assume the absence of mutations in released individuals is feasible due to quality controls at the rearing facility. Genotype frequencies in subsequent generations are computed iteratively. Expected (Mendelian) frequencies are first calculated in absence of any mutation using a matrix of outcomes from every genotype mating pair and then multiplied by a matrix of mutation rates (giving proportional frequencies G e i ). As such, it is assumed that mutation occurs during gamete production, meaning fitness effects are applied based on an individuals genotype after mutations occur. Finally, proportional frequencies ( G e i ) are normalized using the average fitness of the overall population ( ¯ = i G e i ) such that genotype frequencies ( G t i ) sum to one.
Simpler haplotype-based population genetics models of UD in absence of mutation are also considered here to allow stability analyses of various equilibrium states since they are more analytically tractable. Details of these models and their respective stability analyses are given in the Supplementary Information.

Example numerical simulations with various rates of mutation per gene
To our knowledge, experimentally measured rates of mutation in key organisms considered as likely targets for UD gene drives are not well known. Thus, here we conduct numerical simulations for rates of mutation (per gene) spanning five orders of magnitude, namely m = 10 −4 , 10 −5 , 10 −6 , 10 −7 and 10 −8 . Given previous estimates of the mutation rate in Drosophila melanogaster (2.8 ×10 Note the difference in time-scales between these panels. nucleotide per generation), the size of gene drive components used in previous studies (1-10 kb e.g. Reeves et al., 2014;Windbichler et al., 2011;Champer et al., 2017b ) and an approximation that 1-10% of nucleotides in these sequences are essential (i.e. causes a loss of gene function when mutated) we anticipate that this range should include rates relevant to a range of proposed target species. Fig. 4 shows some sample results considering a 1:1 (introduced:wild) release of double homozygote (i.e. AABB ) individuals.
In these examples, we assume each non-mutated transgenic construct confers a fitness load of either 5% or 10% (i.e. ε A = ε B = 0.95 or 0.90) and mutated constructs 4% or 8% (i.e. ε A M = ε B M = 0.96 or 0.92), respectively. Thus, mutated transgenic constructs have a small fitness advantage over non-mutated ones, yet still a deficit relative to wild-type. In examples conducted with mutated transgenic alleles conferring a greater fitness cost than non-mutated alleles, the UD system progressed without significant accumulation of mutated alleles, thus we do not consider these cases any further.
In Fig. 4 the non-mutated UD system initially reaches a high frequency, as previously modeled ( Davis et al., 2001;Magori and Gould, 2006;Marshall and Hay, 2012;Marshall, 2009;Edgington and Alphey, 2017;Huang et al., 2011;2009 ). However, each type of mutated construct begins to accumulate in the population, with large frequencies reached by constructs carrying a single mutation in either the lethal or cargo gene. This results in a concurrent decrease in the frequency of non-mutated constructs. Once these mutated constructs have reached a high frequency in the population, they begin to be replaced by the remaining wild-type alleles due to the relative fitness advantage of wild-type alleles over mutated transgenic alleles. This eventually returns the population to a fully wild-type state. Note that we did not observe the stable coexistence of non-mutated and mutated transgenes under any parameter combination considered within this study.
For the full range of mutation rates considered here we observe similar dynamics to those in Fig. 4 except that increasing (decreasing) mutation rates lead to faster (slower) accumulation and higher (lower) maximum frequencies of most types of mutated transgene allele (see Fig. 5 (a)). This inevitably means that higher rates of mutation reduce the period over which the UD system persists (see Fig. 5 (b)) and also the period for which functional cargo genes are present at high frequency (see Fig. 5 (c)). This should form a reasonable proxy for the efficacious period of a released UD system. Importantly, while mutation may eliminate the UD system, it is still likely to be maintained at high frequency for hundreds of generations -likely long enough for introduced cargo genes to have produced their desired effect.

Reversal strategies
To our knowledge, the only reversal strategy previously proposed for UD systems is the release of wild-type individuals in sufficient numbers to push the transgene frequency to sub-threshold levels ( Champer et al., 2016;Marshall and Akbari, 2016 ). When successful, wild-type alleles should recover and eliminate transgenes from the population. Whilst this mechanism has been mentioned a number of times previously it has yet to be explored in any detail.
Based on results presented in Fig. 4 here we propose an alternative, genetics-based, reversal strategy for UD systems. In this strategy transgenic individuals carrying only the suppressor genes of the original UD system would be released. These "free suppressor" carrying individuals should, under certain conditions, be able to increase in frequency at the expense of the initial UD system. Once at high frequency, assuming they suffer even a small fitness load, these free-suppressor carriers should begin to decline in frequency as they are replaced with fitter wild-type individuals.
To provide theoretical support for this reversal mechanism we conduct mathematical stability analyses of equilibria resulting from deterministic haplotype-based population genetics models of UD systems both in absence and in the presence of freesuppressor constructs since this is more analytically tractable than the genotype-based model used elsewhere in this study (see Supplementary Information for details). The models used in these stability analyses are formulated in absence of mutation in other transgenic components. This ensures that the proposed reversal strategy is capable of reversing the initial UD system on its own (i.e. without requiring additional mutation in other transgenic components). Note also that the haplotype-based and genotypebased models considered in this study produce equivalent allele frequencies for all parameter configurations tested and thus results obtained here should apply to both model formulations. In the models both with and without free-suppressor constructs we find that the transgene free (i.e. fully wild-type) equilibrium state is stable for all relative fitness parameters considered (i.e. 0 ≤ ε ≤ 1 and where applicable 0 ≤ ε S ≤ 1). In the model without free suppressors, the transgene introgression equilibrium is stable if ε * ≤ ε ≤ 1, where ε * is the relative fitness threshold for a given transgene release ratio. However, under certain conditions, when free-suppressor constructs are introduced into the model this transgene introgression equilibrium is found to be unstable which is indicative of the ability of free-suppressor constructs to reverse an initially introduced UD system. The introduction of freesuppressor constructs with ε S ≥ ε at any release ratio appears to result in a return to a fully wild-type population. Perhaps more surprisingly, even where free suppressors confer a slight relative fitness disadvantage compared to the initial transgenic constructs, they may still lead to a fully wild-type population so long as the release ratio is large enough. Some examples of this are given in Figures S3-S6. Interestingly, in spite of the large difference in release ratio of free-suppressor constructs and the differences in the maximum frequencies achieved (see Figures S4-S6), there does not appear to be a major difference in the time taken for a given ε and ε S combination to return a fully wild-type population (See Figure   S7).
We conduct numerical simulations in order to compare the different reversal strategies discussed here. Similar to the examples above, we consider the relative fitness conferred by the initial UD system to be ε A = 0 . 95 = ε B or ε A = 0 . 90 = ε B and the free suppressor elements to confer a relative fitness of ε Rev = 0 . 96 or ε Rev = 0 . 92 , giving them a small fitness advantage over the original UD constructs and a deficit relative to wild-type. Results are shown in Fig. 6 for a 2:1 (introduced:wild (all genotypes)) release of wild-type individuals and individuals carrying free suppressors at both loci. These releases are assumed to occur 100 generations after an initial 1:1 (introduced:wild) UD release so that the original system is at high frequency and mutations would not have accumulated to significant frequencies. As such we initially assume that no mutation occurs (i.e. m = 0 ) to ensure that free suppressors work in absence of other types of allele. Fig. 6 (b) demonstrates the feasibility of reversing of UD systems via the introduction of free suppressors in the absence of mutated alleles. We then relax this assumption to consider the behavior of a reversal drive in the presence of mutation (here we assume A Rev = A LC and B Rev = B LC ). These numerical simulations demonstrate that this reversal strategy can function even with small releases (i.e. α = 0 . 01 and α = 0 . 1 ; see Fig. 7 ). However, one would need to be wary of stochastic effects when making such small releases. These results imply that the release of free suppressor carrying individuals could represent a very cost effective reversal strategy for UD systems.
It is clear that release of wild-type individuals provides a much faster option for returning a fully wild-type population ( Fig. 6 ). However, releases of wild-type individuals must be significantly larger (for example ∼ 1.75 times the wild population when ε A = 0 . 95 = ε B ) in order to achieve reversal of the initial UD system. This likely makes the direct cost of releasing wild-type individuals far higher than for the use of free-suppressor carrying individuals.

Discussion
Since gene drive systems were first proposed they have gained much attention for their potential to help fight a number of important global issues (e.g. Alphey (2014) ; Gould (2008) ; Prowse et al. (2017) ; Thresher et al. (2014) ). However, various genetic control measures have been shown to be limited by the potential generation of mutation or resistance ( Alphey et al., 2011;Watkinson-Powell and Alphey, 2017;Hammond et al., 2017;Unckless et al., 2017;Champer et al., 2017b;2017a ) the dissociation of gene drive components (e.g. Marshall (2008) ) and loss-of function mutation similar to the type studied here ( Beaghton et al., 2017 ). A number of studies have also begun to investigate mechanisms capable of reversing gene drives in case they produce unexpected consequences ( Vella et al., 2017;DiCarlo et al., 2017;Wu et al., 2016 ). Here we showed that mutations will likely lead to the break down of two-locus UD systems and the eventual elimination of introduced transgenes (a feature similar to that seen previously for homing-based gene drives Beaghton et al., 2017 ) although under the assumptions used, the UD system is likely to persist at high allele frequency for hundreds of generations. We then went on to demonstrate that reversal of UD systems is feasible via the release of either wild-type individuals or individuals carrying free suppressors.
As with all mathematical models, the work presented here relies on a number of simplifying assumptions, the majority of which are common in this type of study ( Edgington and Alphey, 2017;Davis et al., 2001;Magori and Gould, 2006;Marshall and Hay, 2012 ). Since these assumptions have been discussed previously, we do not consider them any further here. There are however a few areas specific to this study where more detailed modeling would be useful to further elucidate the effects of mutation in UD systems. For example, here mutations are assumed to fully eliminate gene function whereas in reality they may produce only a partial loss of function. We also assumed that all mutated transgenic constructs confer the same fitness cost regardless of the number and type of mutations carried. These assumptions could be relaxed in future models by expressing gene efficacy as a function of mutations carried. Another assumption is that the consideration of a population with discrete generations is realistic. For some target species, such as mosquitoes, this may be a reasonable approximation where populations are synchronized either by climate (e.g. wet and dry seasons) or laboratory conditions. However, it is unlikely that this assumption will hold in other areas where populations are thought to reproduce continuously -although we would not anticipate vastly different conclusions emerging from overlapping generation and/or population dynamics models. Finally, it is possible that both natural genetic polymorphism and mutation outside of the introduced transgenic constructs within a target population could mean that certain individuals would be, at least partially, resistant to the effects of introduced genes. In such cases, the UD system may place resistance alleles under a selective pressure causing them to increase in frequency and reduce the efficacy of UD systems. In the worst case scenario, such polymorphisms/mutations would be fully effective and dominant suppressors of one or both lethal transgenic components. We would anticipate that this would produce similar dynamics to the introduction of a free-suppressor construct albeit with slight differences in inheritance patterns since such polymorphisms/mutations would not be allelic to either of the UD constructs. Screening and sampling of the target population would likely ensure that any preexisting troublesome polymorphisms were not present at high frequency within the population.
Results presented here assume that individuals carrying mutated transgenes are fitter than those carrying non-mutated transgenic constructs. This is a common assumption when modeling the formation of resistance/mutations in gene drive systems (e.g. Unckless et al., 2017;Alphey et al., 2011 ). The fitness advantage of individuals carrying specific mutations in each transgene (or combination of transgenes) is likely to determine the exact dynamics and time scales that would be observed in experimental work. Results presented here are intended to give a guide as to the type of behavior we would expect to emerge rather than giving definitive predictions for specific applications.
Previous literature has estimated the mutation rate in Drosophila melanogaster to be ∼ 5 . 6 × 10 −9 per nucleotide per generation (mean of estimates in Keightley et al. (2014) and Haag-Liautard et al. (2007) ). Given the size of gene drive components used in previous experimental studies (1-10 kb e.g. Reeves et al., 2014;Windbichler et al., 2011;Champer et al., 2017b ) and an approximation that 1-10% of nucleotides in these sequences are essential (i.e. causes a loss of gene function when mutated), we believe the mutation rates explored here to be feasible for a range of proposed target organisms. Using these mutation rates, we assumed only one mutation was able to emerge in a single generation. In practice, deletions could remove two adjacent components simultaneously. However, assuming the ordering of components shown in Fig. 1 such deletions would leave individuals without suppressor elements, thus making them non-viable in the presence of the construct at the other locus. Coupling this with the extremely low probability of multiple mutations occurring simultaneously, we do not anticipate that relaxing this assumption would lead to major differences from the results presented here. Within this study we have also neglected the possibility of recombination leading to loss of transgenic components. However, since the components of each transgenic construct are located extremely close together, We would expect this to occur at an extremely low rate. As such, we would not expect any major differences to emerge if recombination were incorporated into the model.
A number of recent studies have discussed the need to explore reversal strategies for gene drives in case they have unexpected consequences. While the majority of work has focused on the reversal of CRISPR-Cas9 gene drive systems (e.g. Vella et al., 2017;DiCarlo et al., 2017;Oye et al., 2014;Wu et al., 2016 ), it has been suggested that the release of wild-type individuals could reverse UD systems ( Champer et al., 2016;Marshall and Akbari, 2016 ). The release of wild-type individuals represents a threshold dependent reversal strategy with the precise number of wild-types required to reverse a released UD system depending on the fitness of UD carrying individuals and the size of the wild population, both of which are difficult to measure. What is known however, is that the size of these wild-type releases would need to be very large. For example, when transgenic constructs each confer a 5% fitness cost (i.e. ε A = 0 . 95 = ε B ), the wild-type release would need to be ∼ 1.75 times the size of the entire wild population. Importantly, insufficiently sized releases of wild-type individuals would fail to reverse a given UD system. While it would be feasible to make further wild-type releases, this would have obvious cost implications. To help overcome these issues, here we proposed a genetics-based alternative (release of free suppressor carriers) that appears to be threshold independent (i.e. it is predicted to be effective from very small releases, leaving aside stochastic effects) if free-suppressor constructs confer an equal or smaller fitness load than the originally introduced transgenic constructs. This approach may also be effective where free-suppressor constructs confer a slightly larger fitness load than the original transgenic constructs, however much larger releases may be required for this to work. One of the main issues associated with employing free-suppressor carrying individuals as a reversal strategy is that it will take longer to return a fully wild-type population than release of wild-types. It may also be difficult to convince the public that releasing further transgenic individuals is an acceptable method of eliminating a gene drive that produced unexpected consequences. Here we do not wish to draw a firm conclusion on which reversal strategy should be pursued, since this may depend on case-specific social, economic and technical factors that are beyond the scope of this study.
In spite of demonstrating that UD gene drives will likely break down over time with the emergence of mutations, this study provides reason to be optimistic about the prospect of using UD gene drives to spread desirable genes through a target population. In particular, even though introduced transgenes are likely to be eliminated, the desirable genetic cargo is expected to reach and maintain a high frequency in the population for hundreds of generations. This is likely long enough for them to have produced their desired effect. Here we have given an initial theoretical examination of this UD break down due to mutation. We anticipate that future modeling studies will be able to produce more application specific models, thus refining the predictions presented here.