Heterozygote advantage can explain the extraordinary diversity of immune genes

The majority of highly polymorphic genes are related to immune functions and with over 100 alleles within a population, genes of the major histocompatibility complex (MHC) are the most polymorphic loci in vertebrates. How such extraordinary polymorphism arose and is maintained is controversial. One possibility is heterozygote advantage (HA), which can in principle maintain any number of alleles, but biologically explicit models based on this mechanism have so far failed to reliably predict the coexistence of significantly more than ten alleles. We here present an eco-evolutionary model showing that under HA evolution can result in the emergence and maintenance of more than 100 alleles if the following two assumptions are fulfilled: first, pathogens are lethal in the absence of an appropriate immune defence; second, the combined effect of multiple pathogens on host survival exceeds the sum of the effects of each pathogen alone. Thus, our results show that HA can be a more potent force in explaining the extraordinary polymorphism found at MHC loci than currently recognized.


Introduction
Heterozygote advantage (HA) is a well-established explanation for single locus polymorphism, with the sickle cell locus as a classical text book example (Allison, 1954).However, whether HA is generally important for the maintenance of genetic polymorphism is questioned (Hedrick, 2012;Sellis et al., 2016).Genes of the major histocompatibility complex (MHC), responsible for inducing immune defences by recognizing pathogenic antigens, are the most polymorphic loci among vertebrates (Duncan et al., 1979;Apanius et al., 1997;Penn, 2002;Sommer, 2005;Eizaguirre and Lenz, 2010).HA as an explanation for this high level of polymorphism was introduced almost 50 years ago by Doherty and Zinkernagel (1975).The idea suggests that individuals with MHC molecules from two different alleles are capable of recognizing a broader spectrum of pathogens, resulting in higher fitness.This is especially evident when the MHC molecules of the two alleles have complementary immune profiles (Pierini and Lenz, 2018), a phenomenon known as divergent allele advantage (Wakeland et al., 1990).Stefan et al. (2019) show that this allows for the coexistence of alleles with larger variation in their immune efficiencies.Early theoretical work suggested that HA can maintain an arbitrarily high number of alleles if these alleles have appropriately fine-tuned homo-and heterozygote fitness values (Kimura and Crow, 1964;Wright, 1966;Maruyama and Nei, 1981).However, later work suggests that such genotypic fitness values are unlikely to emerge through random mutations (Lewontin et al., 1978).More mechanistic models have also failed to reliably predict very high allele numbers (Spencer and Marks, 1988;Hedrick, 2002;de Boer et al., 2004;Borghans et al., 2004;Stoffels and Spencer, 2008;Trotter andSpencer, 2008, 2013;Ejsmond and Radwan, 2015;Lau et al., 2015).As a result, HA plays only a minor role in current explanations of polymorphism at MHC loci (Hedrick, 1999;Gould et al., 2004;Wegner, 2008;Kekäläinen et al., 2009;Eizaguirre and Lenz, 2010;Lenz, 2011;Loiseau et al., 2011), despite empirical evidence for its existence (Doherty and Zinkernagel, 1975;Hughes and Nei, 1989;Jeffery and Bangham, 2000;McClelland et al., 2003;Froeschke and Sommer, 2005;Kekäläinen et al., 2009;Oliver et al., 2009;Lenz, 2011).Consequently, other mechanisms are suggested to be important for the maintenance of allelic diversity, such as Red-Queen dynamics, fluctuating selection and disassortative mating (Apanius et al., 1997;Hedrick, 1999;Penn, 2002;Borghans et al., 2004;Wegner, 2008;Spurgin and Richardson, 2010;Loiseau et al., 2011;Ejsmond and Radwan, 2015).
Our study challenges this status quo by demonstrating that HA is a potent force that can drive the evolution and subsequent maintenance of more than 100 alleles.The novelty of our approach lies in the fact that we do not rely on hand-picked genotypic fitness values.Instead, these fitness values emerge from our eco-evolutionary models, where the allelic values that allow for extraordinary polymorphism are found by evolution in a self-organized process.We do not claim that HA is the only mechanism responsible for the diversity of MHC alleles in nature.However, our results show that HA can be more important than currently recognized.
In the following, we first outline two key assumptions necessary for high allelic diversity describing how pathogens affect host fitness.Second, we introduce two distinct mathematical models that adhere to these assumptions.Lastly, we present results obtained from mathematical analysis and individual-based simulations of these models.

Model
We investigate the evolution at an MHC locus using mathematical modelling and computer simulations.In the following, we describe how genotypes map to immune response and ultimately survival, and then continue with a description of our evolutionary algorithm.
We assume that the MHC molecules produced by the two alleles at an MHC diploid locus determine the immune response against the multiple pathogens present in the environment.Our approach is based on the following two assumptions about the relationship between pathogen virulence and host fitness: a) Pathogens are assumed to be lethal in the absence of an appropriate immune defence.b) The combined effect of multiple pathogens on host survival exceeds the sum of the effects of each pathogen alone.In other words, the effects of pathogens are not independent of each other.
To incorporate these two assumptions, we assume that the effect of pathogen attacks on host survival acts through the intermediary step of the host's 'condition', which is a proxy for a suit of measurements describing an individual's body composition and physiology (Wilder et al., 2016).In the absence of an adequate immune response, a pathogen attack reduces the condition of a host to zero, causing its death (assumption a).More generally, we assume that the probability to survive is an increasing function of condition.Since the survival probability cannot exceed one, the function that maps condition to survival has to be saturating.Consequently, for high values of conditions, where the survival function has saturated, pathogens reducing condition have small effects on survival.As condition decreases, pathogen induced reductions have larger effects on survival (assumption b).Hence, we assume that a host clearing a pathogen is in a weaker condition afterwards.For example, hosts that have cleared a parasite infection might have lost weight in this process, which affects their ability to cope with future infections.
A further assumption of our model is the existence of a trade-off between the efficiencies of MHC molecules to induce a defence against different pathogens.Thus, no MHC molecule can perform optimally with respect to all pathogens and an improved efficiency against one set of pathogens can only be achieved at the expense of a decreased efficiency against another set of pathogens.Under such trade-offs, an MHC molecule can be specialized to detect a few pathogens with high efficiency, or, alternatively, be a generalist molecule that can detect many pathogens but with low efficiency.There is empirical support for the existence of such trade-offs.First, many MHC molecules can detect only a certain set of antigens (Wakeland et al., 1990;Froeschke and Sommer, 2012;Eizaguirre et al., 2012;Chappell et al., 2015;Pierini and Lenz, 2018) and therefore provide different degrees of protection against different pathogens (Wakeland et al., 1990;Apanius et al., 1997;Eizaguirre and Lenz, 2010;Froeschke and Sommer, 2012;Eizaguirre et al., 2012;Cortazar-Chinarro et al., 2022).Second, it has also been found that specialist MHC molecules are expressed at higher levels at the cell surface while generalist MHC molecules that bind less selectively are expressed at lower levels (Chappell et al., 2015), potentially to reduce the harm of binding selfpeptides.This could explain the lower efficiency of generalist MHC molecules.
We employ two approaches to model this trade-off.First, we use unimodal functions to model the match between MHC molecules and pathogens.This approach has a long history in evolutionary ecology (e.g.Levins, 1968;Sheftel et al., 2018) and, when using Gaussian functions, the model becomes amenable to mathematical analysis.Second, inspired by Borghans et al. (2004), we use a more mechanistic bit-string approach where immune defence depends on the match between two binary strings, one representing the MHC molecule and the other a peptide of the pathogen.By explicitly modelling MHC efficiencies against various pathogens-rather than assuming a fixed proportion of pathogens detected per MHC molecule (as, e.g., de Boer et al., 2004)-our model accounts for the possibility that MHC molecules can have complementary immune profiles.When paired together, complementary alleles lead to fit heterozygotes able to detect an increased number of pathogen peptides (Pierini and Lenz, 2018), exemplifying the concept of divergent allele advantage in the sense of Wakeland et al. (1990).

Gaussian model
In this approach, we use Gaussian functions to model the ability of MHC molecules to recognize different pathogens, as illustrated in Figure 1.Here, MHC-alleles and pathogens are represented by vectors = ( 1 , 2 , … , ℎ ) and = ( 1 , 2 , … , ℎ ), respectively.The MHC-alleles code for MHCmolecules, and the ability of an MHC molecule to recognize the th pathogen is maximal if = .This ability decreases with increasing distance between and .The decrease is modelled using an ℎ-dimensional Gaussian function ( ), as detailed in Equation S14.The nature of the trade-off can be varied by adjusting the positions of the pathogen optima and the shape of the Gaussian

functions.
Without loss of generality, we can reduce the dimension of the vectors and to ℎ = − 1, such that = ( 1 , 2 , … , −1 ) and = ( 1 , 2 , … , −1, ).For example, in Figure 1A, B, where = 2, the x-axis represents the unique line passing through two pathogen optima in a trait space of potentially much higher dimension.Similarly, in Figure 1C, D, where = 3, the two-dimensional coordinate system represented by the gray surfaces describes the unique plane passing through three pathogen optima.Mathematically speaking, linearly independent pathogen optima form the basis of a vector space of dimension − 1, which we choose as the coordinate system for the vectors and .Allelic vectors outside this set are necessarily maladapted for all pathogens along at least one dimension, and owing to our dimensionality reduction we ignore such trait vectors.
We examine two versions of the Gaussian model.The first one is based on two symmetry assumptions and shown in Figure 1: pathogen optima are placed symmetrically such that the distance between any two pathogens equals 1, and the Gaussian functions ( ) are rotationally symmetric and of equal width.This allows to simplify the covariance matrix in the Gaussian function ( ) (Equation S14) such that it can be replaced with a single parameter (SI Appendix 6.2), where the superscript T indicates vector transposition.The parameter , to which we refer as virulence, is the inverse of the width of the Gaussian function.If the Gaussian function is narrow, corresponding to a high virulence , a pathogen causes significant harm if MHC-molecules are not well adapted against it (Figure 1B and D).On the other hand, if the Gaussian function is wide, corresponding to a low virulence , a pathogen causes less harm (Figure 1A and C).
We relax these symmetry assumptions in the second version, where we allow for Gaussian functions with arbitrary shape and position.Since the results for the two versions are similar, we here focus on the case with symmetry and refer to SI Appendix 4, 6.2 and 7 for results based on general Gaussian functions.

Bit-string model
Our second approach is inspired by Borghans et al. (2004) and commonly referred to as a bit-string model.Pathogens are assumed to produce pep peptides, and for a pathogen to cause virulence, all of its peptides have to avoid detection by the host's MHC molecules.We here equate MHCalleles with the MHC-molecule they code for, and both MHC molecules and pathogen peptides are represented by binary strings (or bit-strings) of, following Borghans et al. (2004), length 16.
The probability that an MHC molecule detects a pathogen peptide increases with the maximum match length of consecutive matches between their binary strings.For an MHC molecule and the th peptide of the th pathogen, this match length is denoted ( ), or for short (see Figure 2A).The corresponding detection probability, denoted ( ( )), is then given by the logistic function Here, denotes the required match length for a 50% chance of detection.The parameter has again the interpretation of virulence, with higher values indicating pathogen peptides that are harder to detect by MHC molecules.The positive parameter governs the steepness of the function .We choose = log(9), which results in ( ) equalling 10% when = − 1 and 90% when = + 1 (Figure 2B).Finally, the realised efficiency of an MHC molecule against the th pathogen is given by the probability of detecting at least one of its pep peptides, which equals (3)

From immune defence to survival
For both versions of our model, we assume that MHC alleles are co-dominantly expressed (Eizaguirre and Lenz, 2010;Abbas et al., 2014), and an individual's efficiency to recognize pathogens of type is given by the arithmetic mean of the efficiencies from its two alleles.We want to note that assuming co-dominance gives more conservative results in terms of the number of coexisting alleles, as dominance would increase the degree of heterozygote advantage.
For each pathogen attack, an individual's condition is reduced by a certain fraction that depends on the efficiency of the defence against that pathogen.Since each individual is exposed to all pathogens during their lifetime, the condition is determined by the product of its defences against all pathogens, where and represent the MHC alleles the host carries at the focal locus, and max is the condition of a hypothetical individual with perfect defence against all pathogens (see Supplementary 6.2 for more details).Because ( ) < 1, condition is reduced with each additional pathogen in a proportional manner.The multiplicative nature of Equation 4 has the effect that a poor defence against a single pathogen is sufficient to severely compromise condition, and therefore survival (see next paragraph), fulfilling assumption (a) above.Finally, survival is an increasing but saturating function of an individual's condition , Here, is the survival half-saturation constant, giving the condition required for a 50% chance of survival.This function fulfils assumption (b) above as long as is not too large.Individuals in good health then have a condition far above , and a decrease in condition only has a small effect on survival.If is lower than , then the host is in bad health and any additional pathogen causes a large reduction in survival (orange lines in bottom panel of Figures 3 and 6).In summary, Equation 4 and 5 cause that assumptions (a) and (b), as formulated above, are satisfied.Using two distinct models to describe the interaction of hosts and pathogens, which both impose a trade-off between the ability to detect different pathogens-namely the Gaussian and the bit-string model-we demonstrate below that HA emerges as a potent force capable of driving the evolution of a very high number of coexisting alleles.

Analysis
To study the evolutionary dynamics of allelic values in both the Gaussian and the bit-string model, we simulate a diploid Wright-Fisher model with mutation and selection (Fisher, 1930;Wright, 1931).Thus, we consider a diploid population of fixed size with non-overlapping generations and random mating.Individuals produce, independent of their genotype, a large number of offspring, resulting in deterministic Hardy-Weinberg proportions before viability selection.After viability selection, which is based on Equation 5 and adjusts the proportion of genotypes accordingly, stochasticity is introduced by random multinomial sampling of surviving offspring, which constitute the adult population of the next generation.Using this model, we follow the fate of recurrent mutations that occur with a per-capita mutation probability .The long-term evolutionary dynamics is obtained by iterating this procedure (Figure 3, top panel) until the number of alleles equilibrates.This procedure can result in high numbers of coexisting alleles, where the emerging allelic polymorphism is driven by increasing the alleles' expected survival (or marginal fitness, see Equation S5-S6 in SI Appendix 6.1).
For the Gaussian model, mutations are drawn from a rotational symmetric normal distribution with expected mutational effect-size (SI Appendix 3).We here focus on mutations of small effect ( = 0.016 in Figure 3 and = 0.03 in Figure 4) and thus near-gradual evolution.The effect of smaller and larger effect sizes is investigated in SI Appendix 3. To minimize computation time, simulations (other than those in Figure 3) are initialized at the trait vector that is given by the mean of the vectors describing the pathogens.In the bit-string model, the pathogens are each given pep randomly drawn bit-strings.The host population is initialized with a single MHC allele given by a randomly drawn bit-string, and mutations change a random bit of the MHC allele.The bit-string model can indeed only be analysed with computer simulations.In contrast, for the Gaussian model we can analytically derive conditions under which to expect either a single generalist allele or the build-up of allelic diversity through gradual evolution in a process known as evolutionary branching (Metz et al., 1992;Geritz et al., 1998;Kisdi and Geritz, 1999;Doebeli, 2011) (see SI Appendix 6 and 7 for details).

Gaussian model
In the simulations of the Gaussian model, the evolutionary dynamics first proceed toward a generalist allele with an intermediate efficiency against all pathogens, to which we refer to as * .This generalist allele maximizes the condition for homozygote genotypes (grey lines and cones in Figure 1, SI Appendix 7.2).Once this generalist allele is reached, the evolutionary dynamics either stops (Figure S1), resulting in a population where all individuals are homozygous for * , or allelic diversification ensues (Figure 3), resulting in the coexistence of specialist and generalist alleles.Based on the adaptive dynamics approximation, we show analytically (SI Appendix 7.1) that * is given by the arithmetic mean of the vectors describing the pathogens 1 , … , (see Equation S27), that it is an attractor of any sequence of allelic substitutions, and that, for the symmetric case, diversification occurs if and only if > 2 √ .These predictions are well-matched by our simulations.
Figure 4 presents the final number of coexisting alleles as derived from individual-based simulations.It shows that the number of coexisting alleles increases with the number of pathogens and their virulence , but also depends on the survival half-saturation constant (Equation 5).For a large part of the parameter space, more than 100 (solid contour lines in Figure 4) and up to over 200 alleles can emerge and coexist.
In order to better understand the process of allelic diversification, it is useful to inspect our analytical results in more detail.Evolutionary diversification occurs if mutant alleles ′ exist that can invade a population that is monomorphic for the generalist allele * .Initially, while still rare, such mutant alleles will always occur in heterozygous individuals, where they are paired with the ) that characterize an allele, while the vertical axis shows evolutionary time.The thickness of the blue tubes is proportional to allele frequencies.Allelic values at the last generation are projected as red dots on the top as well as on the bottom plane.Coloured circles represent the contour lines of the Gaussian efficiency functions ( ) shown in Figure 1D.In all simulations, gradual evolution leads toward the generalist allele * = (0, 0) and branching occurs in its neighbourhood, as predicted by our analytical derivations (SI Appendix 7.1).In A there are three consecutive branching events with the second branching event marked by the grey plane.B and C show that, as decreases, the number of branching events increases, resulting in more coexisting alleles.Finally, D reveals that, as decreases even further such that already low condition values result in high survival, the number of branching events decreases again, resulting in a set of alleles closely clustered around the generalist allele.The bottom panel shows survival as a function of condition as defined by Equation 5 on a log-log scale (orange line, left vertical axis) and the frequencies of individual conditions at the final generation (dark blue bars for homozygotes and light-blue bars for heterozygotes, right vertical axis; conditions from 0 to 10 −4 are incorporated into the first bar).Other parameter values: = 2 × 10 5 , = 7, = 10 −6 , and = 0.016.
The left-hand side of the diversification criterion indicates that invasion of more specialized alleles is favoured when pathogen virulence is large (narrow Gaussian functions, see Figure 1A, C).In this case, homozygotes for the generalist allele * are relatively poorly protected against pathogens and more specialized alleles enjoy a fitness advantage while invading.The opposite is true when is small (wide Gaussian functions, see Figure 1A, C).The right-hand side of the diversification criterion indicates that the benefit of specialization decreases with an increasing number of pathogens (compare position of red arrows in Figure 4), because different pathogens require different adaptations.Thus, counter to intuition, initial allelic diversification is disfavoured in the presence of many pathogens.
If initial allelic diversification occurs, it leads to a dimorphism from which new mutant alleles can invade if they are more specialized than the allele from which they originated and two allelic lineages emerge from the generalist allele * and subsequently diverge (Figure 3A, up to = 3 × 10 4 below grey plane).Increasing the difference between the two alleles present in such a dimorphism has two opposing effects.The condition and thereby the survival of the heterozygote genotype in-creases because the MHC molecules of the two more specialized alleles provide increasingly better protection against complementary sets of pathogens, that is, these alleles are subject to a divergent allele advantage (Wakeland et al., 1990;Pierini and Lenz, 2018).On the other hand, survival of the two homozygote genotypes decreases because they become increasingly more vulnerable to the set of pathogens for which their MHC molecules do not offer protection.Note that, due to random mating and assuming equal allele frequencies, half of the population are high survival heterozygotes and the remaining half homozygotes with low survival.Since survival is a saturating function of condition (Equation 5), it follows that the increase in survival of heterozygotes slows down with increasing condition (plateau of the orange curves in Figure 3), and the two opposing forces eventually balance each other such that divergence comes to a halt.At this point, our simulations show that the allelic lineages can branch again, resulting in three coexisting alleles.As a result, the proportion of low-survival homozygotes decreases, assuming equal allele frequencies, from one-half to one-third.Subsequently, the coexisting alleles diverge further from each other because the increase in heterozygote survival once again outweighs the decreased survival of the (now less frequent) homozygotes (see an intermediate condition (middle bar), and all heterozygote genotypes have a survival close to 1 (right bar).
In Figure 3B-D, the process of evolutionary branching and allelic divergence continues to recur.As a consequence, allelic diversity continues to increase while simultaneously the proportion of vulnerable homozygote genotypes decreases (Figure 3, lower panel).Thus, in contrast to prior approaches (e.g.Kimura and Crow, 1964;Wright, 1966;Lewontin et al., 1978;Maruyama and Nei, 1981), we do not rely on hand-picked genotypic fitness values.Instead, in our approach, fitness values emerge from an eco-evolutionary model where evolution can be viewed as a self-organizing process finding large sets of alleles that can coexist (Figure 3, upper panel).
We note that the half-saturation constant does not appear in the branching condition and thus does not affect whether polymorphism evolves.However, does affect the final number of alleles, which is maximal for intermediate values of .This can be understood as follows.If is very large (right-hand side of the panels in Figure 4), then heterozygote survival saturates more slowly with increased allelic divergence so that continued allelic divergence is less counteracted.This hinders repeated branching (compare A and C in Figure 3).On the other-hand, if is very small (left-hand side of the panels in Figure 4), then homozygous individuals can have high survival, which decreases the selective advantage of specialisation, leading to incomplete specialization and a reduced number of branching events (compare D and C in Figure 3).
In summary, a higher virulence promotes allelic diversification.Increasing the number of pathogens has a dual effect: it hinders initial diversification but facilitates a higher number of coexisting alleles if diversification occurs, especially, for intermediate values of the half-saturation constant .
We perform several robustness checks.First, Figure S2 shows simulations in which we vary the expected mutational step size.These simulations show that the gradual build-up of diversity occurs most readily as long as the mutational step size is neither very small, since then the evolutionary dynamics becomes exceedingly slow, nor very large, since a large fraction of the mutants are then deleterious and end up outside the simplex made up of the pathogen optima (e.g., outside the triangle made up by the three pathogen optima in Figure 1C-D) so that they perform worse against all pathogens.Second, the results presented in Figure 3 and Figure 4 are based on the assumptions of equally spaced pathogen optima and equal width and rotational symmetry of the Gaussian functions ( ) as shown in Figure 1.In SI Appendix 4 and SI Appendix 7, we present simulation and analytical results, respectively, for the non-symmetric case.In particular, Figure S3 shows that the predictions presented here are qualitatively robust against deviations from symmetry.

Bit-string model
Evolutionary diversification of MHC-alleles in the bit-string model is analysed with individual-based simulations, and the results are summarized in Figure 5. Similar to the Gaussian model, we find high levels of allelic polymorphism, with over 100 alleles coexisting in a significant portion of the parameter space.Note that in Figure 5, we keep the half-saturation constant fixed at 1.With this choice, the realized conditions occur both in the range where survival changes drastically with condition and where the survival function saturates (Figure 6), fulfilling assumption (b).This allows us to focus on the effect of the number of peptides pep per pathogen.
Our results can be understood as follows.The likelihood that an MHC-molecule can recognize all pathogens is high in the following region of the parameter space.Firstly, if virulence is low, then peptide recognition is more likely (Equation 2).Secondly, if the number of peptides pep per pathogen is high, then the potential for successful pathogen detection increases (Equation 3).Thirdly, if the number of pathogens is low, then detection of all pathogens is a simpler task.Under these circumstances, a single best allele whose MHC molecule recognizes all pathogens with a high probability will be the endpoint of evolution (dark blue regions in Figure 5).
As virulence or the number of pathogens increases, or as the number of peptides decreases, the task of recognizing all pathogens with high probability becomes progressively more challenging.This leaves homozygous individuals vulnerable to an increasing array of pathogens.As homozygotes get more vulnerable, there is a growing advantage for heterozygotes carrying alleles with complementary immune profiles, as these are able to detect up to twice as many pathogens as either homozygote.This increasingly stronger HA, in turn, facilitates coexistence of an increasing number of alleles, illustrated by increasingly warmer colours in Figure 5, and thereby decreases the proportion of vulnerable homozygotes.Thus, similar to the Gaussian model, increasing either the virulence or the number of pathogens enables a higher number of alleles to coexist.However, unlike the Gaussian model, increasing actually facilitates initial diversification rather than hindering it.
Importantly, in the bit-string model, a point mutation, switching the value of an arbitrary bit in the bit-string, can drastically alter the efficiencies against a large set of pathogens.Because of this, and in contrast to the Gaussian model, a polymorphism maintained by HA can emerge from many different alleles.On the other hand, gradual evolution in the Gaussian model is more efficient in finding the evolutionary end-point of complementary alleles (Figure 3), while for the bit-string model, as the number of alleles increases, this becomes slower due to the lack of fine-tuning as mutations have large effect.To compensate for this, we use, compared to the Gaussian model, a higher mutation probability and run simulations for more generations.
Figure 6A shows the build-up of allelic diversity over time in an exemplary simulation run, and B-D show the distribution of condition values at three time points, as indicated by green hatched lines in A. In B the population is dimorphic.Due to random mating, half of the population consists of homozygotes with low condition (dark blue bar), while the remaining half are heterozygotes with high condition (light blue bar).As time proceeds, the number of coexisting alleles increases.
Figure 6C depicts a stage with five coexisting alleles (with at least 1% frequency) and an effective number of alleles ( e ) of 4.4.Ultimately, evolution results in 19 coexisting alleles (with at least 1% frequency), and an e of 16.1, as shown in D. In this process, the number of low-condition homozygotes decreases, as indicated by the dark blue bars.We note that the Gaussian and bit-string model suggest slightly different interpretations about the nature of the involved pathogens.In the bit-string model, the same MHC-molecule can in principle detect several pathogens (especially for pep > 1).This suggests interpreting different pathogens as strains of the same species or different species that are closely related.For the Gaussian model, a single MHC molecule can generally not detect different pathogens with a high probability.This suggests to interpret different pathogens as clearly distinct species, possibly belonging to different higher taxonomic groups.This goes in line with our finding that for a given number of pathogens, allelic diversity in the Gaussian model tends to exceed allelic diversity in the bit-string model.

Discussion
Heterozygote advantage (HA) as an explanation for the coexistence of a large number of alleles at a single locus has a long history in evolutionary genetics.Kimura and Crow (1964) and subsequently Wright (1966) showed that HA can in principle result in the coexistence of an arbitrary number of alleles at a single locus if two conditions are met: (1) all heterozygotes have a similarly high fitness, and (2) all homozygotes have a similarly low fitness.One special class of genes fulfilling these assumptions occur at self-incompatibility loci, where mating partners need to carry different alleles for fertilisation to be successful (Wright, 1939;Castric and Vekemans, 2004), or loci where homozygosity is lethal (Ding et al., 2021).However, more generally these conditions were deemed unrealistic by Kimura, Crow and Wright themselves.This assessment was subsequently confirmed by Lewontin et al. (1978), who investigated a model in which the exact fitnesses are determined by drawing random numbers in a manner that all heterozygotes are more fit than all homozygotes.They found that the proportion of fitness arrays that leads to a stable equilibrium of more than six or seven alleles is vanishingly small.Similarly, the idea that the high allelic diversity found at MHC loci can be explained by HA was initially accepted by theoreticians (e.g.Maruyama and Nei, 1981;Takahata and Nei, 1990), but several later authors studying models based on more mechanistic assumptions were unable to reliably predict the coexistence of significantly more than ten alleles (Spencer and Marks, 1988;Hedrick, 2002;de Boer et al., 2004;Borghans et al., 2004;Stoffels and Spencer, 2008;Trotter andSpencer, 2008, 2013;Ejsmond and Radwan, 2015;Lau et al., 2015).Thus, currently HA is largely dismissed as an explanation for highly polymorphic loci (Gould et al., 2004;Eizaguirre and Lenz, 2010;Lenz, 2011;Hedrick, 2012).
Our study, while not meant to be a highly realistic mechanistic representations of the interaction between MHC genes and pathogens, serves as a proof of principle that a high number of alleles, matching those found at MHC loci in natural populations, can indeed arise in an evolutionary process driven by HA.Our results thus revive the idea that HA has the potential to explain extraordinary allelic diversity.Importantly, and in contrast to several of the above-mentioned studies, this is achieved without making direct assumptions about homozygote and heterozygote fitnesses.Instead, our results emerge from two assumptions about how pathogens affect a host's condition and how this, in turn, affect survival.Assumption (a) states that pathogens are lethal in the absence of an appropriate immune response.This assumption is implemented in our model by assuming that each pathogen decreases a host's condition in a proportional manner (Equation 4), rather than by a fixed amount.Assumption (b) states that the effects of pathogens are not independent of each other.Instead, the combined effect of multiple pathogens on host survival exceeds the sum of the effects of each pathogen alone.Thus, many pathogens against which a host has an imperfect immune response can collectively push a host's condition below a threshold where mortality becomes rather high (orange lines in Figure 3 and Figure 6).In our model, this assumption is fulfilled rather naturally.Since the probability to survive can logically not exceed 1, the function that maps condition to survival has to be saturating (Equation 5).
In the following, we detail how assumptions (a) and (b) can result in the emergence of well over 100 alleles such that heterozygotes have similarly high fitness (condition (1) of Kimura and Crow) and homozygotes have similarly low fitness (condition (2) of Kimura and Crow).We start with the observation that the survival probabilities in evolved polymorphic populations varies between individuals (lower panels in Figure 4 and Figure 5B-D).Part of the population consists of individuals that have very low survival probabilities.These are individuals with a condition value considerably less than and they are almost exclusively homozygotes.This is because, whenever polymorphism is favoured, homozygotes are poorly defended against some pathogens and the fact that pathogens affect condition multiplicatively (Equation 4).The remaining part of the population consists of individuals with condition values considerably above .Although the condition of these individuals can differ by several orders of magnitude, their survival is close to 1, which results from the fact that the function that maps condition to survival is saturating.These individuals are almost exclusively heterozygotes.This is because alleles that protect against complementary sets of pathogens, when paired together, offer at least a decent protection against all pathogens.In summary, our assumptions (a) and (b) lead to a set of alleles such that their survival probabilities fall into two clusters as required for condition (1) and (2) of Kimura and Crow (1964) to be fulfilled.The larger the number of alleles, the lower becomes the proportion of vulnerable homozygotes, and the population consists increasingly of equally fit heterozygotes.2).In contrast to our model, they assume that an individual's condition equals the proportion of detected pathogens, meaning that each pathogen can reduce fitness by only 2% (thereby not fulfilling our assumption a).Additionally, they assume that survival is proportional to the squared condition (not fulfilling our assumption b). Figure S4 in SI Appendix 5 shows a run of our bit-string model with the parameter values used by Borghans et al. (2004), resulting in more than 100 coexisting alleles.In contrast, they find only up to seven coexisting alleles, demonstrating that assumption (a) and (b) in our model drive the high number of coexisting alleles found by us.
Currently, there are several mechanisms proposed to explain the diversity observed at MHC loci.First, in the presence of an HA, each allele has an advantage when rare because it almost always occurs in heterozygotes.Thus, there is negative frequency-dependent selection acting at the level of the allele.In addition, negative frequency-dependent selection can arise from, for example, Red-Queen dynamics, fluctuating selection and disassortative mating (Apanius et al., 1997;Hedrick, 1999;Penn, 2002;Borghans et al., 2004;Wegner, 2008;Spurgin and Richardson, 2010;Loiseau et al., 2011;Ejsmond and Radwan, 2015;Lighten et al., 2017).These mechanisms are similar to HA in the sense that the selective advantage of an allele increases with decreasing frequency.However, they do not result in heterozygotes being more fit than the homozygotes carrying the rare allele.In addition, neutral diversity can be enhanced by recombination (Klitz et al., 2012;Linnenbrink et al., 2018;Robinson et al., 2017).If many individuals are heterozygous, the particularly high levels of gene conversion found at MHC genes can be effective in creating new allelic variants.For instance, for urban human populations with a large effective population size of e = 10 6 and a per-capita gene conversion probability of = 10 −4 an effective number of alleles as high as e = 1 + 4 e = 401 can theoretically be maintained by gene conversion (Klitz et al., 2012).However, it is important to point out that for gene conversion to increase allelic diversity, some genetic polymorphism due to balancing selection has to exist to start with.We do not claim that these mechanisms do not play an important role in maintaining allelic diversity at MHC loci.Rather, our results show that, contrary to the currently widespread view, HA should not be dismissed as a potent force.In any real system, these different mechanisms will jointly affect allelic diversity.For instance, Lighten et al. (2017) present a model in which, for Red-Queen co-evolution to maintain allelic polymorphism, HA in the form of a divergent allele advantage (Wakeland et al., 1990) seems to be a necessary ingredient.Similarly, Borghans et al. (2004) show that pathogen co-evolution can further increase the number of co-existing alleles compared to HA alone.
The aim of our study is to understand how HA on its own can can result in allelic polymorphism.For this reason, we kept all aspects concerning pathogens fixed, thereby excluding Red-Queen dynamics and fluctuating selection.It is, however, known that pathogens evolving to avoid an immune response can increase their virulence (Frank and Schmid-Hempel, 2008), and the transmissionvirulence trade-off hypothesis (Anderson and May, 1982;Frank, 1996;Alizon et al., 2009) predicts that pathogens that harm their host relatively little, i.e., pathogens that have low virulence, evolve towards higher virulence to increase their transmission rate.In line with this hypothesis, we speculate that a model that incorporates pathogen evolution could lead to the evolution of pathogens with higher virulence whenever they inflict little harm on their hosts.This scenario applies in the dark blue parameter regions in Figure 4 and 5, where the host populations possess a single effective generalist allele.In these regions, we hypothesise that pathogen evolution would lead to higher virulence, entering parameter regions where allelic polymorphism becomes adaptive.Hence, with evolving pathogens, not only could the host evolve towards allelic configurations where HA can maintain high allelic diversity (provided pathogens are sufficiently virulent), but pathogens themselves might also evolve levels of virulence that would allow for HA polymorphism to evolve in the first place.
Our Gaussian model is not restricted to MHC genes, but can apply to any gene that affects several functions important for survival.Examples are genes that are expressed in different ontogenetic stages or different tissues with competing demands on the optimal gene product.However, gene duplication is expected to reduce the potential number of coexisting alleles per locus and eventually leads to a situation where the number of duplicates equals the number of functions (Proulx and Phillips, 2006).Under this scenario, the high degree of polymorphism reported here would be transient.However, for MHC genes evidence exist that other forces limit the number of MHC loci (Penn, 2002;Wegner, 2008;Eizaguirre and Lenz, 2010;Spurgin and Richardson, 2010).But it is important to point out that, while our model focuses on evolution at a single MHC locus, many vertebrates have more than one MHC locus with similar functions (Wegner, 2008;Eizaguirre and Lenz, 2010;Spurgin and Richardson, 2010).The diversity generating mechanism described here still applies if the different loci are responsible for largely non-overlapping sets of pathogens, indicating that the mechanism presented here can in principle explain the high number of coexisting MHC alleles.
In summary, our research offers a fresh view that can help us to understand allelic diversity at MHC loci.We identify two crucial assumptions related to pathogen-host interactions, under which we show that heterozygote advantage emerges as a potent force capable of driving the evolution of a very high number of coexisting alleles.
*Corresponding author: mattias@siljestam.comGaussians (v = 2.5, as in Figure 1C).As a consequence, the condition for evolutionary branching (v > 2 √ m) is not fulfilled and the evolutionary dynamics result in a monomorphic population consisting essentially of only the generalist allele x * = (0, 0).This result is independent of the half-saturation constant K, here chosen to be K = 10.

Table of mathematical notation
List of all mathematical symbols used in the Supplementary Information.Bold italic font indicates vectors (e.g., x) while normal italic font indicates numbers or scalar-valued functions.Capital letters in sans serif font indicate matrices (e.g., H).

Varying the expected mutational step size in the Gaussian model
For the Gaussian model, mutations are drawn from a rotational symmetric normal distribution, i.e., a matrix with covariance matrix σµI of dimension h.The expected mutational step-size δ is given by σµ times the expected value of the Chidistribution (Equation 18.14 in Johnson et al., 1994),  E-G).In the extreme case shown in G (pathogen vectors are 1/0.001= 1000 average mutational steps apart), the evolutionary dynamics is strongly limited by the rate of phenotypic change due to the small step size and the number of alleles after 10 7 time steps has reached only 10% the number reached in D. Increasing the average mutational step size also slows down the build-up of allelic diversity (A-C).In the extreme case shown in A (pathogen vectors are 1.25 average mutational steps apart), the evolutionary dynamics are strongly limited by the very large proportion of maladapted mutants.Other parameters (as in Figure 4): N = 10 5 , µ = 5 × 10 −7 .

Deviations from symmetry in the Gaussian model
The number of coexisting alleles for different parameter combinations are shown in Figure 4 in the main text.These results are based on two symmetry assumptions.First, the m points describing by the pathogen vectors are placed equidistantly with d = 1, resulting in a regular (m − 1)-simplex.Second, the multivariate Gaussian functions e k describing the MHCmolecule's efficiencies against the different pathogens are rotational symmetric and have equal width, as shown in Figure 1.Thus, the covariance matrices Σ k in Equation S14 are equal to σ −2 G I, where I is the identity matrix.Here, we test the robustness of the outcome shown Figure 4 with respect to violations of these symmetry assumptions.We focus on the case with eight pathogens, and the results are summarized in Figure S3.Panel (A) is identical to the bottom right panel in Figure 4, and shown here for comparison.Panels (B-D) show the final number of coexisting alleles for increasing deviations from symmetry, as explained in the following.Note that each pixel in the figure is based on a single simulation with a unique random perturbation from symmetry.
In Figure S3B the assumption of symmetrically placed pathogen vectors is perturbed while the Gaussian functions e k are kept rotationally symmetric with equal width.Section 4.1 describes the procedure how the positions of the pathogen vectors are randomized.The similarity between panel (A) and (B) indicates that deviations from a symmetric placement of pathogen vectors has a minor effect on the number of coexisting alleles.The slightly decreased smoothness of the contours corresponding to 50 and 100 coexisting alleles stems from the fact that each simulation (corresponding to a pixel) is based from a unique perturbation.Note that polymorphism can emerge for values of v such that the branching condition v > 2 √ m derived for the symmetric case is not fulfilled (below the red arrow).This can be understood based on the expression for the Hessian matrix given in Equation S43.This Hessian matrix is more likely to be positive definite for asymmetrically placed pathogen vectors.
In Figure S3C and D we, additionally to the non-symmetric placement of pathogen vectors, allow for Gaussian functions e k that are not rotational symmetric.The variances of the perturbed covariance matrices are drawn from the interval ) and constrained such that the average variance is equals 1, and then rotated randomly.Section 4.2 describes this procedure in detail.Panel (C) shows the result for modest (ε = 0.2) and panel (D) for strong (ε = 0.5) deviations from rotational symmetry.Comparing panel (C) to (B) indicates that modest deviations from rotational symmetry have a relatively minor effect on the final number of coexisting alleles.In contrast, in panel (D) configurations exist where significantly fewer alleles are able to coexist.Interestingly, configurations resulting in a high number of alleles are more likely to occur in combination with high K-values.The highly irregular pattern results from each pixel corresponding to a single simulation with a unique random perturbation from Furthermore, the threshold for polymorphism decreases even more because the Hessian matrix given in Equation S9 is even more likely to be positive definite with perturbations in Σ k .

Random placement of pathogen vectors.
We here describe how we randomly place eight pathogen vectors in trait space.In order to keep the results comparable to the symmetric case, we keep the average variance calculated from the position of their mid-points constant.The distribution of eight pathogen vectors can be described by their seven dimensional covariance matrix Σp calculated from the coordinates p1, . . ., p8.Since each diagonal element of Σp describes the variance of the pathogen vectors along a different dimension of the trait space, the average variance equals tr(Σp)/7, where tr(Σp) denotes the trace.This measure is unaffected by rotation of the points p1, . . ., p8.For symmetrically placed pathogen vectors Σp,sym = d 2 /(2m)I, where I denotes the identity matrix, and therefore tr(Σp,sym) = d 2 (m − 1)/(2m).For the pathogens with perturbed placements (with covariance matrix Σp,per), we demand tr(Σp,per) = tr(Σp,sym).We achieve this by first choosing eight preliminary points p1, . . ., p8 that are placed randomly within a unit 7-sphere, having the covariance matrix Σp.By subsequently multiplying the coordinates of these points by a scalar α, the variances in Σp are multiplied by α 2 .By setting with m = 8, we obtain the final set of pathogen vectors p1, . . ., p8 with a covariance matrix Σp,per fulfilling tr(Σp,per) = tr(Σp,sym).

Random covariance matrices for the pathogen efficiencies.
We here describe how we create random covariance matrices Σ k .In order to keep the results between the symmetric and asymmetric case comparable, we fix the mean variance over all Σ k to σ 2 = v −2 .We obtain the eight random covariance matrices Σ1, . . ., Σ8 in the following manner.First, eight random diagonal matrices D1, . . ., D8 are determined (one per pathogen vectors) with entries drawn from a uniform distribution U(1 − ε, 1 + ε).These matrices are then multiplied with the scalar We here present a comparison of our bit-string model with that of Borghans et al. (2004).These authors analyse a bit-string model with m = 50 pathogens (that are allowed to mutate but are not subject to selection), with npep = 20 peptides each, a virulence of v = 7 (with a step function for the probability that an MHC molecule detects a peptide), a population size of N = 10 3 and a per-capita mutation probability of µ = 10 −5 .In contrast to our model, they assume that condition equals the proportion of detected pathogens, such that each pathogen can lower fitness by only 2% (not fulfilling our assumption a) and that survival is proportional to the squared condition (not fulfilling our assumption b).With these parameters and parameter values, their simulation results in up to seven alleles.We note, that the effective number of alleles in these simulations is likely lower, but no allele frequencies are given.
We contrast their results with those from our model, which, as detailed in the main part, fulfils assumptions a) and b).To approximate the step function for the detection probability, we use For this function, v is the required match length L for a 99% chance of detection, while a match length L = v − 1 gives only 1% detection probability.Note, that compared to Equation 2, we here subtract 1/2 in the denominator and a = 2 log(99).
Then, our model with the exact same parameters (omitting pathogen mutations) results in 18 alleles and ne = 16.7,clearly exceeding the number of alleles found by Borghans et al. (2004).
Based on Kimura and Crow (1964), for the above N and µ the effective number of alleles that can be maintained by heterozygote advantage cannot exceed ne = 17.6 at mutation-drift-selection balance.This suggests that the allelic diversity found by Borghans et al. (2004) is likely not limited by the parameters affecting mutation and drift, µ and N .In contrast, our final number of alleles (being 95% of the maximum), is likely limited by these parameters.To demonstrate that this is indeed the case, we simulate our model with N = 10 5 and µ = 5 × 10 −6 , shown in Figure S4.We find well over 100 alleles (n = 157 and ne = 140).This demonstrates that the ecological parameter values used by Borghans et al. (2004), m = 50, npep = 20 and v = 7, under our model allows for more than a 20-fold higher allelic diversity.For the Gaussian model presented in the main part, we investigate with an evolutionary invasion analysis using the adaptive dynamics formalism (Dieckmann and Law, 1996, Geritz et al., 1998, Metz et al., 1992) whether selection favours a single generalist allele or a polymorphic population.In the language of adaptive dynamics, we ask whether a monomorphic population evolves toward an evolutionary branching point, where two coexisting allelic lineages emerge.
Let us consider a large population of N individuals with two segregating alleles x1 and x2 under Wright-Fisher population dynamics (Fisher, 1930, Wright, 1931) is the population mean survival at time t.Note, that the expression within brackets on the right-hand side of Equation S5 describes the marginal fitness of allele xa.Consider a resident population carrying allele x to which a mutant allele y = x + is introduced.In the limit of a mutant allele-frequency close to zero, its marginal fitness is given by We refer to w(y, x) as invasion fitness, which is the expected long-term exponential growth rate of an infinitesimally rare mutant allele y in a resident population with allele x (Metz, 2008, Metz et al., 1992).Allele y has a positive probability to invade and increase in frequency if w(y, x) > 1 and disappears otherwise.denote the gradient of invasion fitness with respect to the mutant allele y, evaluated at y = x, with w(x, x).It has the entries and gives the direction in the h-dimensional allelic trait space in which deviations from x result in the fastest increase of invasion fitness.If mutations rarely occur, a mutant allele y will either go extinct or reach an equilibrium frequency before the next mutant appears.If, additionally, w(x, x) = 0 and mutational effects are sufficiently small (i.e., y = x + for small), then invasion of y implies extinction of x (Dercole andRinaldi, 2008, Priklopil andLehmann, 2020).
In the limit of small mutational steps, the evolutionary dynamics of an allelic lineage becomes gradual and is given by et al., 2006, Dieckmann and Law, 1996, Durinx et al., 2008, Metz and de Kovel, 2013).Here, µ is the per-capita mutation probability and C the covariance matrix for the distribution of mutational effects on the trait x.We note that Equation S8 is structurally similar to the gradient equation of quantitative genetics, which is based on the assumption of weak selection or, equivalently, small genetic variances (Abrams et al., 1993, Débarre et al., 2014, Iwasa et al., 1991, Lande, 1979).In this case, x characterizes the mean of the phenotype distribution, the covariance matrix describes the distribution of the standing genetic variation, and the factor µN is replaced with a constant.
Allelic trait values x where w(x, x) = 0 are of special interest, and such x are referred to as evolutionarily singular points x * .Evolutionarily singular points can be either attractors or repellers of the evolutionary dynamics described by Equation S8.Furthermore, an evolutionarily singular point can be either invadable or uninvadable by nearby mutants.For a resident allele with a one-dimensional trait x = x, a classification of singular strategies is straight forward (Geritz et al., 1998).Evolutionarily singular points that are not approached, irrespective of whether they are invadable or uninvadable, act as repellers, and we do not expect to ever find resident alleles with such values.Evolutionarily singular strategies that are attractors and uninvadable are endpoints of the evolutionary dynamics.Finally, evolutionarily singular points that are attractors and invadable are known as evolutionary branching points.In this case, any nearby mutant can invade the singular point and coexist with it in a protected dimorphism.Further evolution leads to divergence of the alleles present Without loss of generality, c(x * , x * ) is standardized to 1 (by choosing cmax in Equation S13 appropriately).This is helpful because it allows us to choose an interval of K-values where individuals homozygous for the generalist allele have either a condition in the range where survival changes rapidly (K >> 1) or slowly (K << 1) with condition.In Figure S3 and 4, the x-axis can be translated into survival s(x * , x * ) of the generalist genotype using Equation S12, which then varies between 0.01 for K = 10 −2 and 0.99 for K = 10 2 .
We assume that the efficiencies e k (x) of inducing immune defence against the m different pathogens are traded off.This trade-off emerges by describing the efficiencies against different pathogens with multivariate Gaussian functions (see Figure 1) that have pathogen-dependent optima, These function describes how the efficiency of an allele characterized by the h-dimensional vector x decreases with increasing distance from the pathogen vector p k .The closer an allelic trait vector is to a pathogen vector, the higher is the efficiency of the MHC molecule against that pathogen.The magnitude of the decrease in efficiency with increasing distance to the kth pathogen is determined by the shape and width of the Gaussian function as determined by the h-dimensional covariance matrix Σ k .
In the main part, we consider the special case of rotationally symmetric Gaussian functions e k (x).These matrices are thus specified by an inverse matrix-covariance matrix Σ −1 k (see Equation S14) that takes the form of a scalar matrix, that is, a scalar multiple of the identity matrix I. Furthermore, we assume that all Gaussians are of equal width.Hence, we have a common scalar for all Gaussians that we denote with v 2 , that is, v is the inverse of the width of the Gaussian function.We refer to v as virulence (see Section 7.5, below).
7. Analytical results for the Gaussian model 7.1.The evolutionarily singular point.
In this section, we analyse the evolutionary dynamics of a monomorphic resident population in full generality.By subsequently applying several symmetry assumptions, we then derive the analytical results presented in the main text (see Section 7.4 and 7.5).Invasion fitness of a rare mutant allele y in a resident population with allele x is given by its marginal fitness, (see derivation of Equation S6).Note, that smax cancels out.It is therefore omitted from all further calculations.The direction of the evolutionary dynamics is governed by the selection gradient.Its ith entry calculates to Using the definitions for s (Equation S12) and c (Equation S13) and their derivatives, where Equation S19 is obtained by applying the generalised product rule In the next step, we calculate the derivative of the function e k (x) (Equation S14).Applying the chain rule and simplifying results in ) where the entries of the matrix Σ −1 k are denoted by σ −1 kjl .Using that dxj/dxi = 0 for i = j and dxj/dxi = 1 for i = j this further simplifies to as stated in the main text.
7.2.Absolute convergence stability.Below, we prove that the unique singular point x * (Equation S27) is absolutely convergence stable.To this end, we first demonstrate that the gradient of invasion fitness can be expressed as w(x, x) = α(x) c(x, x), where α(x) is a positive function, and then show that x * is a maximum of c(x, x).where the fraction before c(x, x) is positive.Thus, w(x, x) = α(x) c(x, x).where the last simplification uses that dxj/dxi = 0 for i = j and dxj/dxi = 1 for i = j, and which is always negative definite.Thus, c(x * , x * ) is a maximum and x * is absolute convergence stable.

Derivation of the Hessian matrix of invasion fitness.
As stated in Equation S9, the entries of the Hessian matrix of invasion fitness are given by hij = ∂ 2 w(y, x * ) ∂yi∂yj where in the last step we substituted Equation S23.This result can be rewritten as a matrix Finally, substituting x * with Equation S27, we obtain ) 7.4.Special case: Identically shaped Gaussian efficiency functions.
For the special case that the Gaussian covariance matrices Σ k are equal (Σ1 = Σ2 = . . .= Σm = Σ G ), the Hessian matrix simplifies to where we used that Σp = (

Figure 1 .
Figure 1.Efficiency against two pathogens (coloured lines in A-B) and three pathogens (coloured cones in C-D) as a function of allelic values .Efficiencies are modelled with Gaussian functions with pathogen optima at equal distances = 1 (indicated by 1 and 2 in A, B).The width of the Gaussian functions, which determine how severely pathogens affect hosts with suboptimal MHC molecules, is given by the virulence parameter .With high virulence ( = 7, narrow Gaussians in B, D), alleles away from the optima have a low efficiency, while for a low virulence ( = 2.5, wide Gaussians in A, C) efficiency is higher.Gray lines and cones give the condition of homozygote individuals.The generalist allele, maximizing condition, is located at the centre with equal distance to all pathogen optima (indicated by * in A, B).

Figure 2 .
Figure 2. (A) MHC bit-string matching against a pathogen with pep = 10 peptides.Yellow indicates a match between MHC and peptide bits.The longest consecutive match per peptide ( ) is indicated with a black box.The longest match over all peptides occurs for the last peptide, marked in green, with match length = 7. (B) Detection probability for peptides as a function of match length (Equation 2 with = log(9)).The dashed lines indicate, from left to right, 10%, 50% and 90% detection probability.

Figure 3 .
Figure 3. Evolution of allelic values under the Gaussian model in the presence of three pathogens (arranged as in Figure 1D) for three different values of the survival half-saturation constant (A: = 10, B: = 1, C: = 0.1, D: = 0.01; dashed line in lower panel).The top panel shows individual-based simulations.The two horizontal axes give the two allelic values = ( 1 , 2) that characterize an allele, while the vertical axis shows evolutionary time.The thickness of the blue tubes is proportional to allele frequencies.Allelic values at the last generation are projected as red dots on the top as well as on the bottom plane.Coloured circles represent the contour lines of the Gaussian efficiency functions ( ) shown in Figure1D.In all simulations, gradual evolution leads toward the generalist allele * = (0, 0) and branching occurs in its neighbourhood, as predicted by our analytical derivations (SI Appendix 7.1).In A there are three consecutive branching events with the second branching event marked by the grey plane.B and C show that, as decreases, the number of branching events increases, resulting in more coexisting alleles.Finally, D reveals that, as decreases even further such that already low condition values result in high survival, the number of branching events decreases again, resulting in a set of alleles closely clustered around the generalist allele.The bottom panel shows survival as a function of condition as defined by Equation 5 on a log-log scale (orange line, left vertical axis) and the frequencies of individual conditions at the final generation (dark blue bars for homozygotes and light-blue bars for heterozygotes, right vertical axis; conditions from 0 to 10 −4 are incorporated into the first bar).Other parameter values:= 2 × 10 5 , = 7, = 10 −6 , and = 0.016.

Figure 4 .
Figure 4. Number of coexisting alleles under the Gaussian model for pathogens as a function of pathogen virulence and the survival half-saturation constant .Figures are based on a single individual-based simulation per pixel.The clear pattern in the figures indicates a high degree of determinism in the simulations.Results are reported in terms of the effective number of alleles e , which is a conservative measure for the number of alleles, discounting for rare alleles present at mutation-drift balance.Using population size = 10 5 and per-capita mutation probability = 5 × 10 −7 , the expected e under mutation-drift balance alone equals 1.2 (see Appendix 1).Dashed and solid lines give the contours for e = 50 and e = 100, respectively.Red arrows indicate = 2 √ , the threshold for polymorphism to emerge from branching (EquationS45).Accordingly, simulations in the dark blue area result in a single abundant allele with e close to one.Simulations are run for 10 6 generations, assuring that the equilibrium distribution of alleles is reached.Other parameters: expected mutational step size = 0.03.

Figure 5 .
Figure 5. Number of coexisting alleles for the bit-string model for four values of virulence as a function of the number of pathogens (increased in steps of 7) and the number of peptides per pathogen pep .Results are based on a single individual-based simulation per pixel and run for 10 6 generations.Results are reported in terms of the effective number of alleles e , which discounts for rare alleles present at mutation-drift balance (see Appendix 1).Using population size = 10 5 and per-capita mutation probability = 5 × 10 −6 , the expected e under mutation-drift balance alone equals 3. Dashed and solid lines give the contours for e = 50 and e = 100, respectively.Evolution started from populations monomorphic for a random allele, and run for 2 × 10 6 generations, assuring that the equilibrium distribution of alleles is reached.Other parameters: halfsaturation constant = 1.

Figure 6 .
Figure 6.A simulation run showing the evolution of allelic diversity under the bit-string model in the presence of = 12 pathogens.Panel A show the number of alleles and the effective number of alleles e as a function of time (on a log-log plot).Panel B-D gives survival as a function of condition as defined by Equation 5 on a log-log scale (orange line, left vertical axis) and the distribution of conditions at three different time points (B: = 100, C: = 3900, D: = 10 6 ; green dashed lines in A), with dark blue bars for homozygotes and light-blue bars for heterozygotes, right vertical axis (conditions from 0 to 10 −4 are incorporated into the first bar, and conditions from 10 4 and greater are incorporated in the last bar).The black dashed lines indicate the value of = 1.Other parameter values: = 7, = 12, pep = 3, = 10 5 , = 5 × 10 −6 .
Borghans et al. (2004) use a bit-string model similar to ours with = 50 pathogens, pep = 20 peptides, a virulence of = 7 and a step function for the probability that an MHC molecule detects a peptide ( → ∞ in Equation

57
Fig. S1.Evolution of allelic values in the presence of three pathogens.This figure is analogous to Figure3(see that legend for details) but with wider Gaussians (v = 2.5, as in Figure1C).As a consequence, the condition for evolutionary branching (v > 2 √ m) is not fulfilled and the evolutionary dynamics result in a monomorphic population consisting essentially of only the generalist allele x * = (0, 0).This result is independent of the half-saturation constant K, here chosen to be K = 10.

2 G
the Gaussian efficiency function e k for pathogen k (Equation S14) Σ G covariance matrix of the Gaussian efficiency function e k , assuming equal covariance matrices for all pathogens σ variance of the Gaussian efficiency function e k , assuming equal and rotational symmetric covariance matrices for all pathogens Σp covariance matrix of the position of the pathogen vectors (Gaussian model) σ 2 p variance of the position of the pathogen vectors under full symmetry (Gaussian model) v virulence; given by σ −1 G for the Gaussian model and the threshold value of the detection function D in the bit-string model xa allelic trait vector of allele a x allelic trait vector of a resident allele y allelic trait vector of rare mutant allele Siljestam and Rueffler S3 Fig. S2.Number of coexisting alleles as they emerge in individual-based simulations for different expected mutational step sizes δ and eight pathogens (m = 8).Parameters are chosen such that up to 200 alleles can evolve (K = 0.5, v = 20; see bottom right panel in Figure 4 in the main text).Solid orange lines and dotted blue lines give the effective number ne and the absolute number n of alleles, respectively.The number of alleles increases fastest and saturates earliest for an intermediate expected mutational step size of δ = 0.03 (D; pathogen vectors are 1/δ = 1/0.03≈ 30 average mutational steps apart) as used in Figure 4. Decreasing the average mutational step size slows down the build-up of allelic diversity (E-G).In the extreme case shown in G (pathogen vectors are 1/0.001= 1000 average mutational steps apart), the evolutionary dynamics is strongly limited by the rate of phenotypic change due to the small step size and the number of alleles after 10 7 time steps has reached only 10% the number reached in D. Increasing the average Fig. S3.Number of coexisting alleles for eight pathogens as a function of pathogen virulence v and the half-saturation constant K for symmetrically (A) and non-symmetrically placed pathogen vectors (B-D).Results are based on a single individual-based simulation per pixel.These were run for 10 6 generations, assuring that the equilibrium distribution of alleles is reached.Panel (A) shows results for equally spaced pathogen vectors and rotational symmetric functions e k (Equation S14).It is identical to the bottom right panel in Figure 4 and shown here for comparison.Panel (B-D) show the result for increasing perturbations from symmetry.In panel (B), pathogen vectors are placed randomly (see Section 4.1 for details) while the functions e k are kept rotationally symmetric.In panel (C) and (D), additionally to the non-symmetric placement of pathogen vectors, the functions e k are independently perturbed from rotational symmetry (see Appendix 4.2 for details).In panel (C) the deviations from rotational symmetry are moderate, while in panel (D) they are strong.Note that in panel (B-D) pathogen vectors are no longer at a constant distance 1, but instead have the mean variance calculated from the pathogen optima corresponds to the variance of symmetrically placed pathogens optima with distance 1. Results are reported in terms of the effective number of alleles ne, which discounts for alleles arising from mutation-drift balance (see Appendix 1).Dashed and solid lines give the contours for ne = 50 and ne = 100, respectively.Red arrows indicate v = 2 √ 8, the minimal value for polymorphism to emerge from branching under full symmetry (Equation S45).Accordingly, simulations in the dark blue area result in a single abundant allele with ne close to one.Other parameters: population size N = 10 5 , per-capita mutation probability µ = 5 × 10 −7 , expected mutational step size δ = 0.03.
Fig. S4.The number of alleles n and the effective number of alleles ne as a function of time (on a log-log plot) for a simulation run of our bit-string model.Parameters values: N = 10 5 and µ = 5 × 10 −6 .Other parameters as in Borghans et al. (2004): v = 7, m = 50, npep = 20.
into Equation S21 finally results in w(x, x)i = K 2 K + c(x, x) Section 6.1, singular points x * are allelic trait vectors for which Equation S25 equals zero.From Equation S25 follows that in our model singular points have to fulfil unique singular point x * equals the arithmetic mean of the pathogen vectors p1, . . .pm, each weighted by the inverse of their Gaussian covariance matrices Σ1, . . ., Σm.For a one dimensional trait space (h = 1) this simplifies to well-known weighted average for scalars.If Σ1 = . . .= Σm, then Equation S27 simplifies to the arithmetic mean of condition.An individual homozygote for x has a condition given by c(x, x) ith entry of the gradient c(x, x) is given byc(x, x)i = ∂c(x, x) ∂xi = c(x, x) comparing Equation S33with Equation S25 we see thatw(x, x) = K 2 K + c(x, x) m k=1 Σ −1 k (p k − x) = K 2c(x, x) K + c(x, x) c(x, x), (S34) matrix of condition.The Hessian matrix of a homozygote individual's condition, evaluated at the singular point, is given by the second order partial derivative of c(x * , x * ) with respect to the ith and jth entry of x * .We obtain this by differentiating Equation S33 with respect of xj, evaluated at x = x * , resulting in hcij = ∂ 2 c(x * , x * ) product rule and using that the first derivative at x * is zero gives hcij = c(x * , x * )

K
of the function s is obtained by differentiating Equation S18 with respect to yj, resulting in∂ 2 s(y, x * ) + c(x * , x * ) 4 ∂ 2 c(y, x * ) ∂yi∂yj y=x * K + c(x * , x * ) 2 − ∂c(y, x * ) ∂yi y=x * ∂c(y, x * ) 2 ∂yj y=x * = K K + c(x * , x * ) 2 ∂ 2 c(y, x * ) ∂yi∂yj y=x * .(S39)In the final simplification step we use the conclusion drawn from Equation S21 that 0 = ∂c(y, x * )/∂yi|y=x * and therefore the term after the minus sign disappears.The second derivative for the function c is obtained by differentiating Equation S19 with respect to yj, resulting in∂ 2 c(y, x * ) x * ) 2 2 ∂ 2 e k (y) ∂yi∂yj y=x * e k (x * ) − ∂e k (y) ∂yi y=x * ∂e k (y) ∂yj y=x * .(S40)Here, the one but last simplification step again follows from the fact that 0 = ∂c(y, x * )/∂yi|y=x * .The second derivative for the function e k is obtained by differentiating Equation S23 with respect to yj, resulting in∂ 2 e k (y) k il (p kl − y l ) − e k (y) ∂e k (y) ∂yi ∂e k (y) ∂yj − e k (y)σ −1 k ij , (S41)where the last simplification uses that dyj/dyi = 0 for i = j and dyj/dyi = 1 for i = j.By recursively substituting Equations S39-S41 into Equation S9 we obtainhij = K K + c(x * , x * ) k jl (p kl − x * l ) − 2σ −1 k ij , m k=1 (p k − p)(p k − p) T )/m is the covariance matrix of the positions p k of the pathogen vectors.Since the fraction in Equation S43 is positive and Σ G −1 is positive definite, it follows that the definiteness of H is given by the definiteness of the matrix Σp − 2Σ G .Hence, the singular point x * is uninvadable whenever Σp − 2Σ G is negative definite.Panel (B) in Figure S3 is based on this case.

.
The allelic frequencies at time t are denoted fx 1 ,t and fx 2 ,t, respectively.The recurrence equation for the change of frequency of an allele xa ∈ {x1, x2} is then given by fx a,t+1 = fx a ,t fx a ,t ) is the survival of an individual carrying the alleles xa and x b (see Equation S12) and