The evolution of hybrid fitness during speciation

The evolution of postzygotic reproductive isolation is an important component of speciation. But before isolation is complete there is sometimes a phase of heterosis in which hybrid fitness exceeds that of the two parental species. The genetics and evolution of heterosis and postzygotic isolation have typically been studied in isolation, precluding the development of a unified theory of speciation. Here, we develop a model that incorporates both positive and negative gene interactions, and accounts for the evolution of both heterosis and postzygotic isolation. We parameterize the model with recent data on the fitness effects of 10,000 mutations in yeast, singly and in pairwise epistatic combinations. The model makes novel predictions about the types of interactions that contribute to declining hybrid fitness. We reproduce patterns familiar from earlier models of speciation (e.g. Haldane’s Rule and Darwin’s Corollary) and identify new mechanisms that may underlie these patterns. Our approach provides a general framework for integrating experimental data from gene interaction networks into speciation theory and makes new predictions about the genetic mechanisms of speciation.


Author summary
As populations diverge and ultimately evolve into distinct species, hybrids between them gradually become inviable and infertile. Before this happens, there can be a brief period in which hybrids are actually more fit than their parents, a condition called heterosis or hybrid vigor. This paper describes a model of how hybrid fitness changes as two populations diverge that can explain both hybrid vigor and hybrid inviability in a single framework. These results show that interactions between alleles within a population can often be more important to hybrid fitness than new interactions first seen in hybrids. Simulations of the model, using state-of-the-art experimental data on gene interactions, show evolutionary trajectories that mirror patterns seen in nature. The model suggests how the process by which populations diverge affects the rate of speciation.

Introduction
Isolated populations accrue genetic differences. These differences can reduce the fitness of hybrids, constituting a major step the origin of new species [1,2]. However, important exceptions exist: there are many examples of recently diverged populations whose hybrids are actually more fit than either parent [3][4][5][6][7][8]. Existing theories of speciation genetics do not provide a unified explanation of the seemingly contradictory observations of heterosis and hybrid incompatibility.
The decline in hybrid fitness is generally attributed to Dobzhansky-Muller incompatibilities, or "DMIs" [9][10][11][12]. These are negative epistatic interactions that occur between alleles that are brought together for the first time in hybrids. DMIs have been a major focus of empirical and theoretical studies of speciation genetics [13][14][15]. Mathematical models of DMIs [16,17] have generated testable predictions about the evolution of hybrid inviability [16,18], the asymmetric inviability of reciprocal crosses [19], and sex differences in hybrid fitness [20] (reviewed in [21]). Although these predictions have generally been supported by empirical studies [2], the majority of these models share four limitations. First, they consider only deleterious gene interactions, and so are not able to explain heterosis. Second, many previous speciation models do not account for the mechanism by which mutations become fixed as two species diverge. Substitutions are typically assumed to occur randomly through time, with no explicit consideration of the roles played by selection and drift. How these mutations become fixed could have important consequences for how they affect hybrid fitness. Third, although models of DMI evolution consider the effects of new interactions among genes brought together for the first time in hybrids, they neglect the effects of disrupting beneficial gene interactions that evolved within the parental species. Last, previous models assume interactions between all pairs of genes in hybrids are equally likely, while real gene interaction networks are highly structured [22]. A new wave of models has begun to challenge some of these assumptions individually [23][24][25][26][27]. Here, we develop a more general model that relaxes all four assumptions.
Some recent models have set aside the DMI framework in favor of explaining hybrid fitness using extensions of Fisher's Geometric Model [27][28][29][30]. In such models, mutations have direct effects on phenotypes, and fitness depends on distance to one (or a few) optimal combinations of traits. Appeals of this approach are that hybrid fitness is easily predictable given assumptions about parental fitnesses and it is able to fit empirical data surprisingly well [28]. This framework can explain heterosis as the result of uniparentally inherited elements [27] or low fitness in the parents [28]. Nonetheless, the approach has limitations. Predictions about heterosis are only consistent with speciation under limited assumptions about dominance [28]. More importantly, these models make highly restrictive assumptions about the relationship of genotype to phenotype and the shape of the fitness landscape.
Here we develop a framework to model how hybrid fitness evolves as the result of individual and epistatic fitness effects of mutations. Unlike previous DMI models, we allow for positive as well as negative epistatic fitness effects, as shown schematically in Fig 1. We also explicitly model the substitution process. These advances lead to several new conceptual insights. Depending on how mutations become fixed and their spectrum of fitness interactions, hybrid fitness can be higher, lower, or equal to that of the parental species. We find that hybrid fitness is contingent on two classes of epistatic interactions. The first, which has been the focus of previous DMI models, are the new interactions created by hybridization (upper half of Fig 1). The second, which has generally been ignored, are gene interactions present in the two parental species that are disrupted in the hybrids (bottom half of Fig 1).
Under simplified conditions, this framework leads to several analytic results that can explain both heterosis and hybrid incompatibility. We extend the framework with a stochastic model that allows for variation in the effects of mutations and accounts for the probability that a mutation will become fixed in a parental species given its fitness effects. We then parameterize the model with data from recent studies in yeast systems biology. We use the fitness measurements of 10,277 unique single mutants and 20,712,321 double mutants measured in refs. [22,31,32] to calculate fixation probabilities and hybrid fitness. The result is the first model Gene interactions determine parental and hybrid fitness. Parents P 1 and P 2 cross to produce a hybrid F 1 offspring. Each individual carries two loci (circle and square). The ancestral allele is white and derived mutations that have fixed in the two parental species are shaded. If one mutation is fixed in each species (top panels), negative epistasis causes decreased hybrid fitness (top left), while positive epistasis causes heterosis (top right). If two mutations are fixed in one species and none in the other (bottom panels), both negative epistasis and positive epistasis cause the hybrid to have intermediate fitness. Depending on the dominance of epistasis, hybrid fitness may be higher or lower than the average of the parents. for the evolution of postzygotic isolation that is grounded in empirical estimates of fitness effects.
We find that heterosis sometimes occurs and sometimes not. When it does occur, it is always transient and happens early in the speciation process. Heterosis may appear when epistatic interactions within a parental species are deleterious, and their disruption causes F 1 hybrids to have higher fitness than its parents (Fig 1, bottom left). This case is possible when drift is strong, and mutations with deleterious interactions are fixed in the same population. On the other hand, if epistatic interactions within a parent are beneficial, then F 1 hybrids can have lower fitness, even in the absence of deleterious DMIs (Fig 1, bottom right). Simulation results show that that the decline of hybrid fitness mainly results from many gene interactions of small effect, which suggests that mapping of genetic incompatibilities will often be difficult. We find that the disruption of co-adapted sets of alleles in a parental species are often more important to determining F 1 fitness than are novel negative interactions that occur in hybrids.
Our study suggests that understanding the structure of epistatic networks may be a crucial prerequisite to predicting hybrid fitness.

Results
Our model considers two populations that are fixing mutations while diverging in allopatry. Isolation is complete: there is no gene flow between the populations. For simplicity, we assume no single mutation becomes fixed in both populations, and that a negligible number of loci are polymorphic for mutations at any time.
The fitness of an individual, either from a parental population or a hybrid, is a function of the derived alleles (that is, mutations) that it carries. A mutation's independent and epistatic fitness effects, which can be either positive or negative, are assumed to be multiplicative. Individuals homozygous for mutations at loci i and j have fitness (1 + s i )(1 + s j )(1 + ε ij ), where s i and s j are the independent fitness effects of the mutations, and ε ij is the epistatic fitness effect between them when they are both homozygous. An individual that is heterozygous for mutations in set G 1 and homozygous for mutations in G 2 has fitness ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Here, h i is the dominance of a mutation at locus i. There can be one, two, or four interactions between mutations at a pair of loci, depending on whether the loci are heterozygous or homozygous (S1 Fig). An interaction between two heterozygous derived alleles may be of the same strength as that between two homozygous derived alleles-a form of epistatic dominance. We parameterize epistatic dominance as α 1 and α 2 for interactions between two heterozygous loci and between a heterozygous and a homozygous locus, respectively. This approach is similar to the H 0 , H 1 and H 2 parameters described by Turelli and Orr [17], but allows us to parameterize our model more easily. The fitness of the common ancestor of the diverging species is defined to be 1, and epistatic interactions in the ancestor are defined to be zero. All fitness effects caused by interactions between a derived mutation at locus i and ancestral alleles throughout the genome are accounted for by the independent selection coefficient s i . The square root appears in Eq (1) because each combination of mutations i and j occurs twice in the products over sets G 1 and G 2 .
To make our results comparable to empirical studies, we measure the fitness of hybrids relative to the mean fitnesses of the two parental species, and denote relative hybrid fitness as w H . The expression for w H is given in the Methods section.

Analytical results
We first studied the model analytically by making several simplifying assumptions. Intuitively, we expect mutations that become fixed within a population will tend to have positive interactions with mutations that have previously fixed. If not, then the new mutations would be deleterious and unlikely to fix. We denote the mean of these "within-population epistasis" effects as � ε w , which for simplicity we assume is constant in time. In contrast, the interactions between mutations fixed in different populations will not have experienced this selective sieve, so we expect them to be less positive (and perhaps even negative). We denote the mean of the "between-population epistasis" effects as � ε b , and again assume that it is constant in time. The final simplifying assumptions are that the means and the coefficients of variation of the fitness effects of mutations are much smaller than one, and that independent fitness effects have additive dominance (h = 1/2).
Consider the situation when a total of n substitutions have fixed in the two populations, a fraction F of which fixed in the first population. The relative fitness of hybrids (which is derived in File S1) is then The  Fig 1, top panels). These interactions may on average be deleterious (� ε b < 0), as assumed by previous Dobzhansky-Muller incompatibility (DMI) models, or they may be beneficial (� ε b > 0). Second, epistatic interactions between mutations that were fixed in each parental population are disrupted in the hybrids, represented by the second product in the brackets (Fig 1, bottom panels). This term has been overlooked in most prior DMI models. The strength of each of these two effects depends on both the dominance of epistasis and the number of mutations fixed in the two populations. When only one population evolves, or when epistatic effects are completely recessive (v = 0 or α 1 = 0), then hybrid fitness is affected solely by the loss of interactions within the parental populations. The epistatic dominance parameter α 2 does not appear in Eq (2) because all mutations carried by F 1 hybrids are heterozygous (due to our assumption that polymorphic loci are vanishingly rare). The independent fitness effects of mutations (s i ) are also absent because the number of mutations carried by hybrids is equal to the mean number carried by their parents.
Heterosis occurs whenever w H is greater than 1. Of particular interest are cases in which heterosis is transient, as seen in empirical studies of some emerging pairs of species [3,8]. The results from Eq (2) are particularly simple when the two populations evolve at the same rate, and epistatic dominance is additive (v = ¼, α 1 = ¼). Heterosis then occurs whenever both within-and between-population epistasis are on average positive and within-population epistasis is stronger than between-population epistasis (0 < � ε b < � ε w ). Hybrid fitness ultimately declines relative to parents, and heterosis is replaced by postzygotic isolation, after a total of (� ε w À � The potential for heterosis thus depends on the relative effects of within-population and between-population epistasis. A mutation is more likely to fix in one population if it has positive interactions with mutations that are already fixed in that same population, while its potential interactions with mutations fixed in another population are irrelevant to its fixation probability. Between-population effects will therefore generally represent a random draw from the distribution of mutant effects. The within-and between-population effects are in turn expected to be quite different. The interactions between mutations fixed within a population can be influenced by the strength of drift: mutations with negative interactions are more likely to fix when drift is strong, which can cause � ε w to be negative. Hybrids are rescued from these negative effects, and heterosis can result. Finally, if mutations that fix in one population happen to interact badly with those fixed in the other (so � ε b is large and negative), then no heterosis will occur, and hybrid fitness can decline rapidly with the fixation of only a small number of mutations.

Classic patterns in speciation
In a classic model of speciation caused by Dobzhansky-Muller incompatibilities, the decline in hybrid fitness accelerates, or "snowballs", because the number of possible DMIs grows quadratically with the number of mutations fixed in the diverging populations [33]. This prediction has not been strongly supported by empirical studies, however, which has led to the suggestion that the strength of interactions changes over the course of speciation [34,35]. The evolution of hybrid fitness during speciation Our model suggests a different explanation for the missing snowball. When the strengths of within-and between-population epistasis are equal (� ε b ¼ � ε w ) and the populations evolve at the same rate (v = ¼), hybrid fitness changes in a linear manner as mutations are successively fixed, rather than accelerating. The reason for this discrepancy with the snowball model is quite simple. While the number of new gene interactions in hybrids grows combinatorically, so does the number of beneficial interactions that have evolved in the parental species and that are disrupted in the hybrids. Even when both populations are evolving at equal rates, the number of between-population interactions is only greater than the number of within-population interactions by a small linear factor (n/2). Hybrid fitness is therefore determined by this small number of between population interactions. These conditions occur when epistasis is very weak or drift is very strong. In that case, within-population epistasis will not be much affected by selection and will strongly resemble epistasis between populations. Earlier conclusions about the pattern of decline in hybrid fitness failed to separate out the effect of within-population interactions and therefore need reevaluation (although see [36] for a model that only includes within population interactions).
Haldane's Rule is the observation that when only one sex of hybrids suffers low fitness, it is the heterogametic sex [20, 37,38]. This pattern is frequently attributed to dominance [39], but a large body of theoretical work has identified recessive interactions between sex chromosomes and autosomes, and faster evolution of the X chromosome, as important drivers (see [40] for a recent review). To learn how this pattern might emerge from our model, we extended it to include sex-linked loci (Section 2d of Supplementary Text). Hybrid female fitness now depends on interactions between mutations that fixed on X chromosomes evolving independently in the two populations, while hybrid male fitness depends on interactions between mutations fixed on the X in one population and mutations fixed on the Y in the other. The relevant comparison for Haldane's Rule is then to ask whether a diverged X would have more deleterious interactions with an X or Y from a different species (considering only mutations fixed on non-recombining regions of the X and Y). If X mutations tend to compensate for those fixed on the Y (which is highly susceptible to drift), Haldane's Rule may result due to largely negative epistasis between a Y and an X from different populations. In an important departure from previous theory [17], our model shows that Haldane's Rule can result if epistasis between diverged Xs is positive even when no mutations are fixed on the Y. This result makes the testable prediction that Haldane's Rule in XO systems could be explained by positive epistasis between diverged Xs [20]. Our model thus compliments prior explanations for Haldane's Rule that depend on dominance.
Darwin's Corollary is the observation that hybrid fitness sometimes depends on the direction of the cross [19, 41,42]. It is thought this asymmetry results from genetic interactions between one set of mutations on autosomes and another set on the sex chromosomes and/or mitochondria [41,42]. Extending our model further to include mutations that fix on mitochondria (Supplementary Text, section 2f), we find that the pattern of Darwin's Corollary emerges whenever one population evolves faster than the other, and is especially prominent if the X or mitochondria evolve faster than autosomes (consistent with the "faster-X effect" [43,44]).

Parameterizing the model with data
We next explored the implications of our model when the fitness effects are fit to data from laboratory populations of yeast. We estimated the selection coefficients that appear in Eq (1) using data on the 10,277 knockout mutations studied by Costanzo et al. [22], who measured the independent effects and the epistatic effects between 20,712,321 pairs of mutations. These measurements were taken during the haploid phase of the life cycle, so they provide no information about dominance effects. We therefore simply assumed mutations have no dominance (h i = ½). Further details about the data are given in Materials and Methods.
As observed previously [31], in these yeast there is a significant positive correlation between a mutation's independent effect and its interactions with other mutations (S3 Fig). Mutations that are highly deleterious individually also tend have highly negative interactions with other mutations. While the biological cause of this correlation remains unclear, it does have an important implication for speciation: pairs of mutations that have strongly negative interactions, and are therefore more capable of killing hybrids, are unlikely to fix in the first place.
This pattern leads to a novel prediction: interactions between mutations fixed early in divergence may contribute less to reproductive isolation than those fixed later. Mutations that appear early will interact with only a small number of mutations that have previously fixed, and so the fate of these mutations is determined mainly by their independent effects. Only those with positive independent effects are likely to fix, and because of the correlation noted above, these early mutations will also tend to have beneficial interactions in hybrids. Later in divergence, the probability that a mutation becomes fixed is mainly determined by how well it interacts with mutations that were previously fixed in that population rather than its own independent effect. It will therefore not be more likely than random to have beneficial interactions with mutations fixed in the other population. Thus within-population epistasis is expected to become increasingly strong and positive, while between population epistasis will tend to become more negative. The result is that the loss of hybrid fitness accelerates. This acceleration resembles the snowball effect discussed earlier [33], but occurs for a very different reason: changes in the average epistatic effects of mutations over time, rather than changes in their numbers.

A stochastic model
To determine the full evolutionary trajectories of hybrid fitness, we turned to stochastic simulations based on the parameterized model. In the simulations, mutations fix sequentially as the result of selection and drift (details are given in the Methods). As the populations diverge, the fitness effects of mutations that fix are altered by their interactions with mutations that were fixed previously. This change allows the fixation of mutations later in the process of divergence that previously had been highly deleterious (S4 Fig). As a result, each population becomes enriched for co-adapted sets of mutations. In contrast, between-population effects of mutations change little through time because those effects are not tested by selection. These contrasting patterns are shown in Fig 3.  Fig 4 shows how hybrid fitness evolves when we parameterize our general model with fitness data from yeast. Heterosis is quite common early in the process of speciation: it occurs in fully 10% of the simulations after 10 mutations are fixed in each population. But even in those cases, hybrid fitness inevitably declines. On average, relative hybrid fitness decreases by only 1% with each additional mutation that becomes fixed, and in only 2% of simulations does a single mutation causes a large change (> 10%) at any time during divergence. These results suggest that the genetic mapping of incompatibilities will often be difficult because hybrid fitness is typically determined by many interactions of small effect.
The two dominance parameters for epistatic effects modify our conclusions in several ways. The dominance of epistasis between two heterozygous loci, α 1 , acts to modify the effect of within and between population epistasis to hybrid fitness (S8 Fig). On the other hand, the dominance of epistasis between a heterozygous and a homozygous locus, α 2 , modifies which mutations fix within populations. This greatly affects how hybrid fitness evolves (S9-S12 Figs).
Most notably, when α 2 is very small (epistatic effects are highly recessive), the probability that a mutation becomes fixed is almost entirely based on its independent fitness effect. Mutations that do fix will also have little effect on hybrid fitness (because they are heterozygous in F 1 s). As a result, the average epistatic effects within and between populations are very close to the average effect of all mutations, and to each other (� ε b � � ε w ). As seen in our analytical results, this leads to linear change in hybrid fitness as the number of substitutions increases.
Lastly, we find that even late in divergence, when few hybrids survive, a large proportion of mutations that have fixed in one population are also beneficial in the genomic background of the other. This fraction declines from roughly 75% when only 2 mutations are fixed in each population to 50% after 50 substitutions have fixed (S5 Fig). Thus, even when hybrids are far less fit than their parents, many of the genes they carry may be able to introgress from one population into the other. Hybridization may thus involve both reduced hybrid fitness overall, but also deterministic introgression for particular alleles.

Discussion
Although observations of heterosis and postzygotic isolation may appear to be incompatible, the results of this model show they can neatly be explained by the same factors. Hybrid fitness is affected by two kinds of epistatic interactions: those between mutations fixed within a population, and those between mutations fixed in different populations. Each of the two can be beneficial or deleterious. Previous models of speciation have focused on the negative consequences of between-population epistasis, and neglected the consequences of disrupting the interactions between mutations fixed within each parental population.
Our model reveals the importance that within-population epistasis may have in speciation. Data from yeast suggest that interactions between many mutations are positive [22] (although The evolution of hybrid fitness during speciation these data are biased for mutations of positive effect), and our results show that these can contribute to heterosis. Positive epistasis also plays a key role in the evolution of reproductive incompatibility because mutations fixed in a parental population are likely to interact positively with each other, and hybrids can miss out on those fitness benefits (Fig 1). Our simulations show that early in divergence, interactions between mutations fixed in a population are typically weak, and disrupting them has little effect on hybrid fitness. But as a population continues to evolve in isolation, it builds up sets of co-adapted alleles, and their positive interactions are disrupted in hybrids. The loss in hybrid fitness caused by this disruption can be an order of magnitude larger than that caused by new deleterious interactions between mutations that first encounter each other in hybrids (Fig 3). In the most extreme cases, the disruption of beneficial interactions within the parent populations could cause hybrid lethality.
The average strength of within-population epistasis (� ε w ) depends strongly on the roles that selection and drift play in the divergence of the two populations. If epistasis is strong and drift is weak, mutations that fix will tend to have positive interactions, and (as we have seen) their loss can be the major cause of hybrid incompatibility. But when epistatic effects are recessive or very weak and drift is strong, then � ε w will be similar to the network average. In that case, the disruption of gene interactions that occur in the parental population will play a small role in determining hybrid fitness.
The second type of gene interaction that appears in our model is between-population epistasis. In contrast to within-population epistasis, the average strength of between-population epistasis (� ε b ) is not determined by adaptation during divergence. Instead, it is expected to represent the effects of random samples from the epistatic interaction network. Our simulations confirm this intuition (Figs 4, S12 and S14). The evolution of hybrid fitness during speciation Another factor that contributes to the evolution of isolation is the relative rates at which the two populations are evolving. Our results suggest that isolation develops more quickly when one population is evolving much more rapidly than the other. In the extreme case when only one population is evolving (and therefore v is zero), Eq (2) shows hybrid fitness is determined entirely by the loss of beneficial epistatic effects enjoyed within that population. A follow-up prediction is that backcrossing an F 1 hybrid with the faster adapting population should restore fitness, while the reciprocal backcross should not.
A final factor that influences the evolution of postzygotic isolation is dominance. Eq (2) shows that interactions between pairs of heterozygous mutations (measured by α 1 ) play a direct role in determining hybrid fitness, but interactions between a heterozygous and a homozygous mutation (measured by α 2 ) do not. This second parameter does, however, influence which mutations fix, as a rare mutant allele will occur in a genetic background where it is heterozygous but other derived alleles are homozygous. This intuition is confirmed by the simulations (S13 and S15 Figs). To our knowledge, there has been no experimental study of the dominance of epistatic effects, and so these parameters, crucial to determining the trajectory of hybrid fitness, remain unknown. We suspect, however, that deleterious gene interactions are often recessive, because that is the basis of the highly successful theories constructed to explain Haldane's Rule [11,40] and the large X effect [17].

Relations with existing models
Heterosis has previously been explained by overdominance and by rescue from recessive deleterious alleles that have fixed by strong drift [4,5,45,46]. The role of epistasis has received less attention. Studies of natural populations generally do not have sufficient power to detect epistatic effects. Studies in crop plants have found that epistasis may be important to hybrid fitness [47][48][49][50], but it is not known whether these findings generalize to natural populations. In nature, heterosis could well result from a combination of the epistatic effects that are the focus of our model and the effects of overdominance and recessive deleterious alleles that have been modeled previously. On the other hand, epistasis seems to be the only hypothesis that can explain why heterosis is frequently seen between closely related species, but not between ones that are more highly diverged. Our model suggests two explanations for this pattern. First, heterosis is transient under simple limiting conditions laid out by Eq 2. Second, in our simulations the variance in epistasis within and between populations is large early in divergence and becomes small as populations continue to diverge (Fig 3). This suggests that heterosis may often be a result of the stochastic sampling of epistatic interactions within and between populations.
Adaptive introgression between hybridizing species has been uncovered in several taxa including Heliconius butterflies [51], Anopheles mosquitoes [52], and (most famously) in humans [53]. Previous work has shown that DMIs, even of strong effect, may be weak barriers to gene flow across the rest of the genome [54][55][56], and so even when hybrid fitness is low, introgression may be possible. The genetic consequences of hybridization depend not only on the individual effects that genes have on fitness, but also on how they interact: introgression is aided by positive epistasis and hampered by negative epistasis. In our simulations, fully half of the substitutions in one population can be positively selected in the other even when the populations are nearly completely reproductively isolated (S4 Fig). These results confirm that there is widespread potential for adaptive introgression.
The classic models for the evolution of postzygotic isolation focus on Dobzhansky-Muller incompatibilities (DMIs). A core prediction from these models is the so-called "snowball effect" [16,18,57]. This is the pattern in which the loss of hybrid fitness accelerates because the number of new deleterious interactions that occur in hybrids increases quadratically with the number of substitutions that have occurred. The snowball pattern is also predicted by our model under some conditions (Fig 2B and 2C).
There is mixed support for the snowball pattern in natural populations [18,57], leading to the development of theories to explain the so-called "missing snowball" [24,34,35]. Our model is also compatible with the missing snowball. When the populations evolve at the same rate (v = ¼) and the average strength of epistasis is equal within and between populations (� ε b ¼ � ε w ), hybrid fitness declines linearly (rather than accelerating) with the number of mutations that have fixed (n). That is because the number of new gene interactions that appear in hybrids is largely offset by the number of co-adapted interactions that occur in the parents that are lost in the hybrids. Thus, even when the total number of incompatibilities increases quadratically, hybrid fitness may change linearly.
Another family of models for the evolution of postzygotic isolation is based on Fisher's Geometric Model (FGM) [27,28,30,58]. They assume there is a single phenotypic optimum, and that mutations have direct phenotypic effects. In contrast, our model (as well as the models of DMIs) does not make any assumptions about the underlying phenotypes. The price for this generality is that our model has many more parameters. Under some conditions, the two types of models can produce similar patterns of the change in postzygotic with time. There are, however, differences in some predictions. The core difference is the number of incompatibilities. In our model and the DMI models, the number of potential incompatibilities increases at least quadratically as populations diverge. FGM models, on the other hand, predict only a linear increase in the total number of incompatibilities. In practice it is difficult to determine the number of incompatibilities, making this difference hard to test. Our results suggest another contrast between the models: we expect more rapid speciation when one population is evolving much faster than the other, while FGM models make no such prediction.

Caveats and future extensions
Our model assumes the two populations diverge in complete isolation, and its conclusions might change substantially if there is gene flow between them. If gene flow causes mutations to be tested in the genetic backgrounds of both populations, they may have more positive effects when brought together in hybrids than they do in the absence of gene flow. Conversely, epistatic effects within populations may be less beneficial because a mutation's survival is affected by its interactions with substitutions in the alternate population. Both effects will tend to cause hybrid fitness to be greater in the presence of gene flow than in its absence, even when the number of mutations that differ between the populations is controlled for.
Our model does not consider interactions between alleles at three or more loci. Recent work on yeast has shown that these higher order interactions may be common, but weaker than pairwise interactions [59]. As populations diverge, the number of higher order interactions increases even more rapidly than the number of pairwise interactions. Consequently, higher order interactions could be the main determinant of hybrid fitness breakdown. Future empirical and theoretical investigation should consider these higher order effects.
We did not study the fitness of F 2 , backcrosses, or later generations of hybrids. It is plausible that later generations may experience both effects of further hybrid breakdown and of transgressive segregation [60][61][62]. Fitness in later generations depends strongly on the linkage relationships between interacting genes. (For example, if there are sets of interacting alleles that are tightly linked, their contribution to hybrid fitness will break down very slowly as recombination breaks them apart.) Thus, extending this framework to later generations of hybrids requires reformulating the model to account for recombination. We parameterized the model with data from mutations in yeast that may not be representative. Their fitness effects were measured in the lab, and based on only one fitness component (growth rate). Measurements were made in a single benign environment, so they neglect the potential effects of environmental heterogeneity that may be important to speciation in yeast [63,64]. Most importantly, our model is of a diploid organism, but the yeast were measured in their haploid state. The dominance of these mutations is unknown, and the distribution of their fitness effects as haploids may not reflect those in diploids. The actual proportion of positive epistasis is difficult to measure and is likely to be different in different species and for different measures of epistasis [65]. Speciation studies in yeast have shown a variety of patterns for both the tempo and cause of speciation in laboratory settings, suggesting the particular rate of speciation observed in our simulations is not universal [4,64,[66][67][68][69][70][71][72][73]. But while these caveats may challenge the quantitative results, the qualitative results should be robust if the distributions of epistatic effects in nature are similar to those in the data we used.
Here, we have shown that a model of speciation can be parameterized using data from a real biological system. The resulting model can recapitulate basic patterns observed in speciation, including both hybrid incompatibility and heterosis. The model also provides new insights into the genetic basis of postzygotic isolation. Refinements of this modeling framework will be possible when additional data about gene interactions become available, leading to further understanding of the kinetics of speciation.

Data and simulations
To parameterize a stochastic model of evolutionary divergence and speciation, we used the fitnesses in haploids for all 10,858 single knockouts and all 20,712,321 double knockouts in each strain reported by Costanzo et al. [22]. The fitness proxy is the colony growth rate of the knockouts. See Costanzo et al. supplemental materials for details of fitness measurements and experimental design [22,32]. Data on the distribution of single-mutation and pairwise epistatic fitness effects in yeast were obtained from the online repository at http://www. thecellmap.org/costanzo2016/.
Our goal in using these data was to estimate the complete distribution of epistatic effects, including weak and absent effects. These weak effects, which were disregarded in the paper reporting the data, may nonetheless be evolutionarily important. To obtain this distribution, we downloaded and concatenated all data files containing fitness measurements including nonessential-by-nonessential genes, nonessential-by-essential, essential-by-essential, and the DAmP experiments. To avoid biasing the dataset against small selection coefficients, we used all measured interactions regardless of their p-value.
We write the haploid fitnesses of a single knockout at locus i and a double knockout at loci i and j (respectively) as: Using these relations, we fit the independent selection coefficients and the pairwise epistatic selection coefficients to the yeast dataset. Since some measurements did not report either double knockout fitness or the single knockout fitness of one of the mutations, we were left with 18,743,950 interactions.
To determine which mutation will be the next to become fixed, we calculated the fitness of mutations at all loci in the genome not already fixed for mutations. The fitness of a mutation depends on both its individual effect and its interactions with mutations already fixed (as described by Eq 1). Thus, the fitness of a given mutation changes in time as the genetic background of its population evolves with the fixation of mutations at other loci. The fitnesses of all new mutations were used to calculate their probability of fixation using a standard diffusion approximation, and assuming an effective population size of 10 6 . Finally, a mutation was randomly chosen for fixation with probability proportional to the chance that it would become fixed. This process was iterated until each population was fixed for 50 mutations. The entire process was replicated in 10,000 stochastic simulations.
Analyses and simulations were performed in R, and figures were generated using ggplot2 and cowplot packages [74][75][76]. The scripts used to generate the simulation data and perform the following analyses are available at https://github.com/adagilis/SpeciationSimulation2018.

Analytical results
We consider two populations that are diverging in isolation and independently become fixed for different sets of mutations. To assess hybrid fitness as the populations, diverge, we assume that the number of loci at which mutations can fix is so large that there is a negligible chance the same mutation will become fixed independently in both populations. Denote as A and B the sets of loci at which mutations have fixed in populations A and B, respectively, and assume that neither population is polymorphic for derived mutations. We measure hybrid fitness relative to the mean of the two parental species: where W X;Y is the absolute fitness of an individual that is heterozygous for mutations at loci in set X and homozygous for loci in set Y, [ represents the union operator, and ; is the empty set. The numerator is the absolute fitness of an F 1 hybrid, which is heterozygous for mutations at loci in the set ðA [ BÞ, and homozygous for no mutations. The denominator represents the mean of the absolute fitnesses of the parental species. Individuals from population A are heterozygous for mutations at no loci, but homozygous for mutations at loci in set A, and likewise for individuals from population B.

Divergence in allopatry
To model allopatric population divergence, we calculated the probability of fixation for each mutation in the yeast genome using Kimura's diffusion approximation [77]. This probability depends on the mutation's fitness in heterozygous and homozygous states. Since the strength of selection on each mutation will depend on the genetic background in which it occurs, we use the effective selection coefficient and the effective dominance of each mutation. Let s ijA and o ijA be the effective selection coefficient and dominance for a mutation at locus i, which accounts for the mutation's independent fitness effect and its epistatic interactions with mutations at loci in the set A that have already fixed. Then: We assume that polymorphism is sufficiently rare that a population can be assumed fixed at all other loci when considering the fixation probability of a new mutation. (An interesting extension for future work would relax the assumption of no polymorphism [78].) When no other mutations have yet been fixed, s ijA is equal to s i and o ijA = h i. Relative hybrid fitness when assumptions about dominance of epistasis are modified. We consider our simulations in which α 2 = 0.5. Since hybrids will only experience epistasis of strength α 1 ε, we reanalyze their fitness assuming epistasis seen in hybrids is recessive (blue, α 1 = 0), additive (yellow, α 1 = 1/2 α 2 ) and dominant (red, α 1 = α 2 ). Solid lines represent means of simulations, shaded areas are 95% confidence intervals. As suggested by our model, highly dominant interactions lead to slower decline in fitness. Simulations demonstrate that a secondary effect is increased variance in observed hybrid fitness. Relative hybrid fitness in simulations when epistatic effects are nearly completely recessive as mutations are fixed (α 2 = 0.01). Hybrid fitness depends very little on whether epistasis seen in hybrids is recessive (blue, α 1 = 0), additive (yellow, α 1 = 1/2 α 2 ), or dominant (red, α 1 = α 2 ). Hybrid fitness therefore increases linearly with respect to fixed substitutions, proportional to the average epistatic effect. Because within population epistasis is large, hybrid fitness depends heavily on whether epistasis seen in hybrids is recessive (blue, α 1 = 0), additive (yellow, α 1 = 1/2 α 2 ), or dominant (red, α 1 = α 2 ). When effects are recessive, hybrids lose out on co-adapted blocks and rapidly lose fitness, while interactions that display nearly the same strength as seen in parents (red) lead to extreme heterosis due to combinations of positive epistatic effects from both parents. (TIF)