Mathematical constraints on FST: multiallelic markers in arbitrarily many populations

Interpretations of values of the FST measure of genetic differentiation rely on an understanding of its mathematical constraints. Previously, it has been shown that FST values computed from a biallelic locus in a set of multiple populations and FST values computed from a multiallelic locus in a pair of populations are mathematically constrained as a function of the frequency of the allele that is most frequent across populations. We generalize from these cases to report here the mathematical constraint on FST given the frequency M of the most frequent allele at a multiallelic locus in a set of multiple populations. Using coalescent simulations of an island model of migration with an infinitely-many-alleles mutation model, we argue that the joint distribution of FST and M helps in disentangling the separate influences of mutation and migration on FST. Finally, we show that our results explain a puzzling pattern of microsatellite differentiation: the lower FST in an interspecific comparison between humans and chimpanzees than in the comparison of chimpanzee populations. We discuss the implications of our results for the use of FST. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.


Introduction
Multiallelic loci such as microsatellites and haplotype assignments are used to study genetic differentiation in a variety of fields, ranging from ecology and conservation genetics to anthropology and human genomics. Genetic differentiation is often measured for multiallelic loci using the multiallelic extension of Wright's fixation index F ST [1] For a polymorphic multiallelic locus with I distinct alleles in a set of K subpopulations, denoting by p k,i the frequency of allele i in subpopulation k, H S ¼ 1 À ð1=KÞ P K k¼1 P I i¼1 p 2 k,i and H T ¼ 1 À P K k¼1 p k,i Þ 2 . F ST values are known to be smaller for multiallelic than for biallelic loci [2]. One reason invoked to explain this difference is that within-subpopulation heterozygosity H S mathematically constrains the maximal value of F ST to be below 1, and the constraint is stronger when H S is high. This phenomenon was noticed concurrently in simulation-based, empirical and theoretical studies [3][4][5][6][7], and the mathematical constraints describing the dependence were subsequently clarified [8,9].
Studies have found that the maximal value of F ST can be viewed as constrained not only by functions of the within-subpopulation allele frequency distribution such as H S , but alternatively by aspects of the global allele frequency distribution across subpopulations. For a biallelic locus in K = 2 subpopulations, Maruki et al. [10] showed that the maximal F ST as a function of the frequency M of the most frequent allele decreases as M increases from 1=2 to 1 (see also [11]). Generalizing the biallelic case to arbitrarily many alleles, Jakobsson et al. [12] showed that for multiallelic loci with an unspecified number of distinct alleles, the maximal F ST increases from 0 to 1 as a function of M if 0 , M , 1=2, and decreases from 1 to 0 for 1=2 M , 1 in the manner reported by Maruki et al. [10] for biallelic loci. Edge & Rosenberg [13] generalized these results to the case of a fixed finite number of alleles, showing that the maximal F ST differs slightly from the unspecified case when the fixed number of distinct alleles is an odd number.
Generalizing the simplest case of K = I = 2 in a different direction, Alcala & Rosenberg [14] considered biallelic loci in the case of a fixed number of subpopulations K ≥ 2. We showed that the maximal value of F ST displays a peculiar behaviour as a function of M: the upper bound has a maximum of 1 if and only if M = k/ K, for integers k with dK=2e k K À 1. The constraints on the maximal value of F ST dissipate as K tends to infinity, even though for any fixed K, there always exists a value of M for which F ST , 2 ffiffi ffi 2 p À 2 % 0:8284. Relating F ST to its maximum as a function of M helps explain surprising phenomena that arise during populationgenetic data analysis. For example, Jakobsson et al. [12] showed that stronger constraints on F ST could explain the low F ST values seen in pairs of African human populations. They also found that such constraints could explain the lower F ST values seen in high-diversity multiallelic loci compared to lower-diversity loci-microsatellites compared to single-nucleotide polymorphisms. Alcala & Rosenberg [14] showed that constraints on the maximal F ST could explain the lower F ST values between human populations seen when computing F ST pairwise rather than from all populations simultaneously.
In this study, we characterize the relationship between F ST and the frequency M of the most frequent allele, for a multiallelic locus and an arbitrary specified value of the number of subpopulations K. We derive the mathematical upper bound on F ST in terms of M, extending the biallelic result of Alcala & Rosenberg [14] to the multiallelic case, and providing the most comprehensive description of the mathematical constraints on F ST in terms of M to date (table 1). To assist in interpreting the new bound, we simulate the joint distribution of F ST and M in the island migration model, describing its properties as a function of the number of subpopulations, the migration rate and a mutation rate. The K-subpopulation upper bound on F ST in terms of M facilitates an explanation of counterintuitive aspects of inter-species genetic differentiation. We discuss the importance of the results for applications of F ST more generally.

Model
Our goal is to derive the range of values that F ST can take-the lower and upper bounds on F ST -as a function of the frequency M of the most frequent allele for a multiallelic locus, when the number of subpopulations K is a fixed finite value greater than or equal to 2. We follow previous studies [12][13][14][15] in describing notation and constructing the scenario.
We consider a polymorphic locus with an unspecified number of distinct alleles, in a setting with K subpopulations contributing equally to the total population. We denote the frequency of allele i in subpopulation k by p k,i , with sum s i ¼ P K k¼1 p k,i across subpopulations. Each allele frequency p k,i lies in [0, 1]. Within subpopulations, allele frequencies sum to 1: for each k, P 1 i¼1 p k,i ¼ 1. Hence, σ i lies in [0, K ], and P 1 i¼1 s i ¼ K. We number alleles from most to least frequent, so σ i ≥ σ j for i ≤ j.
Because by assumption the locus is polymorphic, σ i < K for each i. Alleles 1 and 2 have non-zero frequency in at least one subpopulation, not necessarily the same one; we have σ 1 > 0 and σ 2 > 0. We denote the mean frequency of the most frequent allele across subpopulations by M = σ 1 /K. We then have 0 < M < 1. We treat the allele frequencies p k,i and associated quantities M and σ i as parametric values, and not as estimates computed from data.
Equation (1.1) expresses F ST as a ratio involving withinsubpopulation heterozygosity, H S , and total heterozygosity, H T , with 0 ≤ H S < 1 and 0 ≤ H T < 1. Because we assume the locus is polymorphic, H T > 0. We write equation (1.1) in terms of allele frequencies, permitting the number of distinct alleles to be arbitrarily large Hence, our goal is, for fixed σ 1 = KM, 0 < σ 1 < K, to identify the matrices ( p k,i ) K×∞ , with p k,i in [0, 1], P 1 i¼1 p k,i ¼ 1 and ð1=KÞ P K k¼1 p k,1 ¼ s 1 =K ¼ M, that minimize and maximize F ST in equation (2.1).
Note that we adopt the interpretation of F ST as a 'statistic' that describes a mathematical function of allele frequencies rather than as a 'parameter' that describes coancestry of individuals in a population [e.g. 16]. See Alcala & Rosenberg [14] for a discussion of interpretations of F ST when studying its mathematical properties.

Mathematical constraints (a) Lower bound of F ST
Bounds on F ST in terms of the frequency of the most frequent allele can be written with respect to M or σ 1 , noting that M ranges in (0, 1) and σ 1 ranges in (0, K). For the lower bound, from equation (2.1), for any choice of σ 1 , F ST = 0 can be achieved.
and σ 1 > 0 and σ 2 > 0. We set p k,i = σ i /K for all subpopulations k and alleles i; this choice yields F ST = 0.
F ST = 0 implies that the numerator of equation (2.1), H T − H S , is zero. This numerator can be written The Cauchy-Schwarz inequality guarantees that K P K k¼1 p 2 k,i ! s 2 i , with equality if and only if p 1,i = p 2,i = … = p K,i = σ i /K. Applying the Cauchy-Schwarz inequality to all alleles i, the numerator of equation (2.1) is zero only if for all i, ( p 1,i , p 2,i , …, p K,i ) = (σ i /K, σ i /K, …, σ i /K).
Thus, we can conclude that the allele frequency matrices in which all K subpopulations have identical allele frequency vectors are the only matrices for which F ST = 0. The lower bound on F ST is equal to 0 irrespective of M or σ 1 , for any value of the number of subpopulations K.

(b) Upper bound of F ST
To derive the upper bound on F ST in terms of M = σ 1 /K, we must maximize F ST in equation (2.1), assuming that σ 1 and K are constant. The computations are performed in appendix A; we write the main result as a function of σ 1 , noting that it can be converted into a function of M by replacing σ 1 with KM.
In theorem A.1, we treat the case in which σ 1 has an integer value. For non-integer σ 1 , theorem A.2 shows that the maximal F ST requires that (i) the sum of squared allele frequencies across alleles and subpopulations, S ¼ P 1 i¼1 P K k¼1 p 2 k,i , is maximal, and (ii) alleles i = 2, 3, … are each present in at most one subpopulation, but allele 1 might be present in more than one subpopulation. We then separately maximize F ST as a function of σ 1 for σ 1 in (0, 1) and non-integer σ 1 in (1, K ). These two cases differ in that allele 1 appears in a single subpopulation in the former case, and it must appear in at least two subpopulations in the latter.
The maximal F ST as a function of σ 1 for σ 1 in (0, K ) is KðK À 1Þ À s 2 1 þ bs 1 c À 2ðK À 1Þfs 1 g þ ð2K À 1Þfs 1 g 2 KðK À 1Þ À s 2 1 À bs 1 c þ 2s 1 À fs 1 g 2 , non-integer s 1 , 1 , s 1 , K, where J ¼ ds À1 1 e. Here, dxe denotes the smallest integer greater than or equal to x, bxc denotes the greatest integer less than or equal to x, and fxg ¼ xÀbxc denotes the fractional part of x. Note that for an integer choice of σ 1 , the maximum from equation (3.1) and the limits as σ 1 tends to the integer from above and below all equal 1, so that the maximum as a function of σ 1 is continuous.
From appendix A, F ST reaches its upper bound for integer σ 1 when allele 1 has frequency 1 in each of σ 1 subpopulations, and when in each of the remaining K − σ 1 subpopulations, an allele other than allele 1 has frequency 1. These alleles of frequency 1 need not be private, although they can be; any identity relationships among them are permissible, provided that when summing frequencies across subpopulations, none of these alleles has a sum that exceeds σ 1 . The locus can have as few as dKs À1 1 e alleles of non-zero frequency and as many as K − σ 1 + 1.
For σ 1 in interval (0, 1), F ST is maximal when each allele is present in only a single subpopulation, and when each subpopulation has exactly J alleles with a non-zero frequency: J − 1 alleles at frequency σ 1 and one allele at frequency 1 − (J − 1)σ 1 ≤ σ 1 . Because each subpopulation has J distinct alleles and no alleles are shared across subpopulations, this upper bound requires that the locus has KJ alleles of non-zero frequency.
For non-integer σ 1 in (1, K), F ST reaches its maximum when there are bs 1 c subpopulations in which the most frequent allele has frequency 1, a single subpopulation in which it has frequency {σ 1 } and a private allele has frequency 1 − {σ 1 }, and K À bs 1 c À 1 subpopulations each with a different private allele at frequency 1. Only the most frequent allele is shared across subpopulations, and a single subpopulation displays polymorphism. At the maximum, K À bs 1 c þ 1 alleles have non-zero frequency.   First, we observe that the upper bound has a piecewise structure.
At M = 1/K , the upper bound has its first transition between cases. For M > 1/K, the upper bound depends on bs 1 c ¼ bKMc. As KM increases in [1, K ), each increment in bKMc also produces a distinct piece of the domain. For each k Counting the intervals of the domain, we see that an infinite number of distinct intervals occur for M in (0, 1/K ), and K − 1 intervals occur for M in (1/K , 1). Within intervals, the function describing the upper bound is smooth.
This curve is represented in figure 1 as a dashed line. Note that for K = 2, the special case considered by Jakobsson et al.

Joint distribution of M and F ST under an evolutionary model
So far, we have described the mathematical constraint imposed on F ST by M without respect to the frequency with which particular values of M arise in evolutionary scenarios. As an assessment of the bounds in evolutionary models can illuminate the settings in which they are most salient in population-genetic data analysis [9,14,[17][18][19][20], we simulated the joint distribution of F ST and M under an island migration model, relating the distribution to the mathematical bounds on F ST . This analysis considers allele frequency distributions, and hence values of M and F ST , generated by evolutionary models. The simulation approach is modified from [14,15].

(a) Simulations
We simulated alleles under a coalescent model, using the software MS [21]. We considered a total population of KN diploid individuals subdivided into K subpopulations of size N. At each generation, a proportion m of the individuals in a subpopulation originated outside the subpopulation. Thus, the scaled migration rate is 4Nm, and it corresponds to twice the number of individuals in a subpopulation that originate elsewhere. We considered the island model [22][23][24], in which migrants have the same probability m/(K − 1) of coming from any other specific subpopulation.
We used an infinitely-many-alleles model; mutations occur at rate μ, and the scaled mutation rate is 4Nμ. We examined three values of K (2, 6, 40), three values of 4Nμ (0.1, 1, 10) and three values of 4Nm (0.1, 1, 10). Note that in MS, time is scaled in units of 4N generations, and there is no need to specify subpopulation sizes N. MS simulates an infinitely-many-sites model, where each mutation occurs at a new site; each haplotype is a new allele, so that each mutation creates a new allele. For our analysis, we are concerned only with the allelic categories and not with the simulated sequences; thus, although the simulation follows the infinitely-many-sites model, the analysis treats simulated datasets as having been generated under an infinitely-manyalleles model.
For each parameter triplet (K, 4Nμ, 4Nm), we performed 1000 replicate simulations, sampling 100 sequences per royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 377: 20200414 subpopulation in each replicate. We computed F ST values from the parametric allele (haplotype) frequencies. MS commands appear in electronic supplementary material, File S1; note that the simulation approach here uses the standard method of simulating MS with a specified mutation rate θ = 4Nμ, whereas in our previous analyses of biallelic cases [14,15], we had employed the alternative approach of requiring simulated datasets to possess exactly one segregating site. Figure 2 shows the joint distribution of M and F ST for the nine values of (4Nμ, 4Nm) in the case of K = 2. Electronic supplementary material, figures S1 and S2 provide similar figures for K = 6 and K = 40, respectively.
(b) Impact of the mutation rate

Example: humans and chimpanzees
We now use our theoretical results to examine genetic differentiation in humans and chimpanzees. Because humans and chimpanzees are distinct species, we might expect a genetic differentiation measure such as F ST to produce a greater value for a computation between them than for a computation among populations within one or the other. Indeed, studies of multiallelic loci do find that adding chimpanzees to data on multiple human populations increases the value of F ST [8,25]. However, we will see that F ST has a more subtle pattern when considering data on multiple chimpanzee populations, and that our theoretical computations explain a surprising result.
We examine data on 246 multiallelic microsatellite loci assembled by Pemberton et al. [26] from several studies of worldwide human populations and a study of chimpanzees [27]. We consider F ST comparisons both between humans and chimpanzees and among populations of chimpanzees. For the human data, we consider all 5795 individuals in the dataset, and for the chimpanzee data, we consider 84 chimpanzee individuals from six populations: one bonobo population, and five common chimpanzee populations (Central, Eastern, Western, hybrid and captive).
In the data analysis, we perform a computation to summarize the relationship of F ST to the upper bound. For a set of Z loci, denote by F z and M z the values of F ST and M at locus z. The mean F ST for the set, or F ST , is Denoting this quantity by F max,z , we have We computed the parametric allele frequencies for each subpopulation-the human and chimpanzee groups for the human-chimpanzee comparison, and chimpanzee subpopulations for the comparison of chimpanzees-averaging across subpopulations to obtain the frequency M of the most frequent allele. We then computed F ST and the associated upper bound for each locus, averaging across loci to obtain the overall F ST and F ST =F max for the full microsatellite set (equations (5.1) and (5.2)).
Surprisingly, given the longer evolutionary time between humans and chimpanzees than among chimpanzee populations, the F ST value is significantly greater when comparing chimpanzee populations ( Given the stronger constraint in pairwise calculations than in calculations with more subpopulations, it is not unexpected that pairwise F ST values would be smaller than those in a 6-region computation. A high F ST among chimpanzees compared to between humans and chimpanzees is a by-product of mathematical constraints on F ST . Interestingly, the effect of K on F ST is largely eliminated when each F ST value is normalized by the associated maximum given K and M (figure 4c). The normalization leads to higher values for human-chimpanzee comparisons than among chimpanzee subpopulations (F ST =F max ¼ 0:32 and 0.20, respectively; p = 1.1 × 10 −9 , Wilcoxon rank sum test), as expected from the greater evolutionary distance between humans and chimpanzees compared to that among chimpanzees.

Discussion
We This study provides the most complete relationship between F ST and M obtained to date, generalizing previous results for the case of K = 2 subpopulations [12] and for a restriction to I = 2 alleles [14]. Interestingly, the maximal F ST we have obtained merges patterns observed in these previous studies. Fixing K = 2, we obtain the upper bound on F ST in terms of M that was reported by Jakobsson et al. [12]. As K increases, the piecewise pattern seen by Jakobsson et al. [12] for the maximal F ST in the K = 2 case for M in ð0, 1=2Þ is observed in the multiallelic case for M in (0, 1/K ). The decay from ðM, F ST Þ ¼ ð1=2, 1Þ to (M, F ST ) = (1, 0) seen by Jakobsson et al. [12] for K = 2 is observed for M in the decay from ((K − 1)/K, 1) to (1, 0) for arbitrary K.
The allele frequency values for which the upper bound is reached for M in (0, 1/K ) generalize those seen for the case of K = 2 and M in ð0, 1=2Þ [12]. The upper bound is reached when all alleles are private, each subpopulation has as many alleles as possible at frequency KM, and at most one additional allele. The allele frequency values for which the upper bound is reached for M in ((K − 1)/K, 1) also generalize those seen for K = 2 and M in ð1=2, 1Þ: the maximum is reached when the most frequent allele is fixed in all subpopulations except one, and a single private allele is present in this last subpopulation.
The results from Alcala & Rosenberg [14] for I = 2 produce a more constrained upper bound on F ST than for arbitrary I, with the domain of M restricted to ð1=2, 1Þ. Nevertheless, many properties of the maximal F ST we observe for unspecified I and M in (1/K, 1) are similar to those seen for I = 2 and M in ð1=2, 1Þ: finitely many peaks at points M = k/K, local minima between the peaks, and an increase in coverage of the unit square for (M, F ST ) as K increases. The maximal F ST functions for M in ((K − 1)/K, 1) for unspecified I and for I = 2 agree, as the number of alleles required to maximize F ST in this interval in the case of unspecified I is simply equal to 2.
In assuming that the number of alleles is unspecified, we found that the number of distinct alleles needed for achieving the maximal F ST is Kds À1 1 e for M in (0, 1/K ) and K À bs 1 c þ 1 for non-integer M in (1/K, 1); the maximum can be achieved with each number of distinct alleles in ½dKs À1 1 e, K À s 1 þ 1 for M equal to 1/K, 2/K, …, (K − 1)/K. With a fixed maximal number of distinct alleles, such as in the I = 2 case of Alcala & Rosenberg [14] with K specified and in the K = 2 case with I specified [13], the upper bound on F ST is less than or equal to that seen in the corresponding unspecified-I case. For K = 2, specifying I has a relatively small effect in reducing the maximal value of F ST [13]. As in Edge & Rosenberg [13], specifying I in the case of larger values of K is expected to have the greatest impact on the F ST upper bound at the lowest end of the domain for M.
In coalescent simulations, we found that the joint distribution of M and F ST within their permissible space can help separate the impact of mutation and migration. Although the dependence of F ST on mutation and migration rates has been long documented, the symmetric effects of mutation and migration under the island model [22] illustrate the difficulty in separating their effects. Under the island model, allele frequency M is informative about the scaled mutation rate 4Nμ, and comparing the value of F ST to its maximum given M is informative about the scaled migration rate 4Nm. Adding a dimension that is more sensitive to mutation than to migration-M in our case-enables the separation of their effects. Other statistics, such as total heterozygosity H T or within-subpopulation heterozygosity H S , have the potential to play a similar role [20].
Our results can inform data analyses. In particular, we caution users to examine upper bounds on F ST to assess how mathematical constraints influence observations. As the constraints are strongest for K = 2, this step is valuable in pairwise comparisons; it is also useful when the frequency M of the most frequent allele can be small in relation to the number of populations K, such as for high-diversity forensic [28] and immunological [29] loci in human populations. Visual inspection of the values of M and F ST within their bounds can suggest that constraints have an effect. F ST =F max can provide a helpful summary by evaluating the proximity of F ST values to their maxima. Further, joint use of M along with F ST could be useful in various applications of F ST , such as in inference of model parameters by approximate Bayesian computation [30] and machine learning [31]. F ST outlier tests to detect local adaptation from multiallelic loci [32] could search for F ST values that represent outliers not in the distribution of F ST values, but rather, outliers in relation to associated upper bounds. Computing null distributions for F ST conditional on M could enhance the approach.
In an example data analysis, we have shown that taking into account mathematical constraints on F ST can help understand puzzling F ST behaviour. In our example, F ST at a set of loci was higher when comparing K = 6 chimpanzee populations than when comparing humans and chimpanzees (K = 2), even though the same loci were used and the mean value for M was similar in the two comparisons. A comparison of F ST values to their respective maxima explained these counterintuitive results.
We note that analyses of F ST in relation to M differ from analyses of F ST in relation to within-subpopulation statistics H S and J S = 1 − H S , such as those performed in deriving the influential Hedrick's G 0 ST [9] and Jost's D [33] statistics. We have previously shown that for biallelic loci in K subpopulations, for fixed M, the statistics F ST , G 0 ST and D are all maximized at the same set of allele frequency values [15]. Although the normalizations of F ST used to produce G 0 ST and D lead to statistics that are unconstrained in the unit interval as functions of H S , G 0 ST and D continue to be constrained as functions of M. A statistic that instead normalizes F ST by its maximum as a function of M, a statistic of the total population, captures aspects of the allele frequency dependence of F ST that differ from those captured by normalizations by functions of within-subpopulation statistics.
In human populations, efforts to understand F ST patterns trace in large part to Lewontin's foundational F ST -like variance-partitioning computation [34], in which it was seen that among-population differences (analogous to F ST ) were small relative to within-population differences (analogous to 1 − F ST ). Studies using loci with different numbers of alleles, loci with different frequencies for the most frequent allele, and samples with different numbers of subpopulations have varied to some extent in their numerical estimates of F ST [14,[35][36][37][38]. Mathematical results on F ST bounds provide part of the explanation for these differences: they establish that each dataset differing in the character of its loci and subpopulation set has its own distinctive interval in which its associated F ST calculation could potentially land. Hence, each dataset can give rise to a numerically distinct value not due to features of the underlying human biology, but rather, due to different constraints on the F ST measure itself. F ST bounds contribute to explaining quantitative variation in variance-partitioning computations-in which, although numerical values differ, the within-population component of genetic variation consistently predominates. The mathematics serves to support the qualitative claim that worldwide human genetic differentiation measurements represented by F ST -like statistics have low values-as was argued by Lewontin 50 years ago.
Data accessibility. Data are publicly available as described in the references cited. MS commands are provided in electronic supplementary material [39].
This appendix derives the upper bound on F ST as a function of σ 1 (equation 3.1). First, we separate the case of integer values of σ 1 . Next, for non-integer values of σ 1 , we reduce the problem of maximizing F ST to the problem of maximizing the sum of squared allele frequencies across alleles and subpopulations, S ¼ P 1 i¼1 P K k¼1 p 2 k,i . Next, we maximize S as a function of σ 1 , separately for σ 1 in (0, 1) and for non-integer σ 1 in (1, K ).

(a) A useful expression for F ST
Suppose K ≥ 2 is a specified integer. Suppose σ 1 is a fixed value, with 0 < σ 1 < K. We leave the number of alleles I unspecified. For each i ≥ 1, we write s i ¼ P K k¼1 p k,i , with σ i ≥ σ j for each i and j with i ≤ j. For convenience, σ 1 is taken to mean both the function that computes the sum P K k¼1 p k,1 for a specified set of values of the p k,i and a fixed value for that sum.
For each (k, i) with 1 ≤ k ≤ K and i ≥ 1, p k,i lies in [0, 1], and P 1 i¼1 p k,i ¼ 1 for all k, 1 ≤ k ≤ K. Define F ST as in equation (2.1). We seek to maximize F ST over all possible sets of values of the p k,i with a fixed value σ 1 for the sum P K k¼1 p k,1 . Note that because σ 1 < K and Denote the sum of squared frequencies of allele 1 across subpopulations, for the corresponding sum of squared frequencies of all alleles. We express equation (2.1) in terms of σ 1 , S 1 and S: In equation (A 1), the numerator is less than or equal to the denominator, with equality if and only if This equality in turn requires that for each k, there exists some i for which p k,i = 1, a condition that can be achieved only if σ 1 is an integer. Note that any set of equivalence relationships can exist among the values of i associated with the K − σ 1 subpopulations in which p k,1 = 0, provided that none of these values of i is associated with more than σ 1 subpopulations. For example, these values of i can be mutually distinct, or groups of them with size as large as σ 1 can be mutually equal.

(c) Non-integer values of σ 1
For non-integer σ 1 , the numerator of equation (A 1) ðA 2Þ equality requiring that for each i ≥ 2, there exists at most one value of k for which p k,i > 0.
Proof. Because 2 P 1 i¼2 P KÀ1 k¼1 P K '¼kþ1 p k,i p ',i is subtracted in both the numerator and the denominator of equation (A 1), and because the numerator is strictly less than the denominator for non-integer σ 1 , F ST can be bounded above by minimizing this term. Because p k,i ≥ 0 for all (k, i), each sum P KÀ1 k¼1 P K '¼kþ1 p k,i p ',i is bounded below by zero. Setting the sum to 0 for all i ≥ 2 gives the upper bound in equation (A 2).
For the equality condition, only if all products p k,i p ',i are zero-that is, if and only if for each i ≥ 2, at most one value of k has p k,i > 0. ▪ By theorem A.2, to maximize F ST for fixed non-integer σ 1 , we must maximize the quantity in equation (A 2). It suffices to consider sets of values of p k,i in which for each i ≥ 2, at most one value of k has p k,i > 0.
(d) The case of (non-integer) σ 1 in (0, 1) In this section, we find the set of values of the p k,i that maximize F ST for σ 1 in (0, 1). We proceed in two steps. (i) We show that for σ 1 in (0, 1), the maximal F ST occurs at a set of p k,i values for which all alleles are private: that is, for each i ≥ 1, p k,i > 0 for at most one value of k. (ii) We determine the set of p k,i values that, with all alleles private, maximizes F ST .
(i) In equation (A 2), note that s 2 1 À S 1 ¼ 2 P KÀ1 k¼1 P K '¼kþ1 p k,1 p ',1 . Because s 2 1 À S 1 is subtracted from both numerator and denominator in equation (A 2), the quantity in equation (A 2) is maximal when s 2 1 À S 1 is minimal. In other words, the upper bound on F ST is maximal if and only if 2 P KÀ1 k¼1 P K '¼kþ1 p k,1 p ',1 is minimal. Because σ 1 < 1, a minimum of 0 for 2 P KÀ1 k¼1 P K '¼kþ1 p k,1 p ',1 is achieved if and only if there is a single value k = k 0 at which p k 0 ,1 = σ 1 , so that p k,1 = 0 for all k ≠ k 0 . We then have s 2 1 ¼ S 1 , and from equation (A 2), Each allele is private, and because allele 1 is the most frequent, p k,i lies in [0, σ 1 ] for all (k, i).
(ii) The problem of finding the set of p k,i values that maximizes F ST has now been reduced to the problem of maximizing the right-hand side of equation (A 3), with the constraint that all alleles are private. Because the numerator in equation (A 3) increases with S and the denominator decreases with S, the maximum is achieved if and only if S achieves its maximal value. In other words, we seek to maximize S ¼ P K k¼1 P 1 i¼1 p 2 k,i , with the constraints P 1 i¼1 p k,i ¼ 1 and p k,i ≤ σ 1 for each (k, i) with 1 ≤ k ≤ K and i ≥ 1. Because each allele is private, the maximum is achieved by separately maximizing each P 1 i¼1 p 2 k,i with constraints This maximization is precisely that of lemma 3 of Rosenberg & Jakobsson [40]. Applying the lemma, the maximum is achieved with p k,1 = p k,2 = … = p k,J−1 = σ 1 , p k,J = 1 − (J − 1)σ 1 , and p k,i = 0 for i > J, where J ¼ ds À1 1 e. It satisfies P 1 i¼1 p 2 k,i 1 À s 1 ðJ À 1Þð2 À Js 1 Þ. In other words, each subpopulation k possesses J − 1 private alleles with frequency σ 1 and one private allele with frequency 1 − (J − 1)σ 1 . Hence, S ≤ K[1 − σ 1 (J − 1)(2 − Jσ 1 )], so that equation (A 3) leads to equation (3.1) for σ 1 in (0, 1).
(e) The case of non-integer σ 1 in (1, K ) This section finds the set of values of the p k,i that maximizes F ST for non-integer σ 1 in (1, K). For non-integer s 1 ¼ P K k¼1 p k,1 in (1, K ), because 0 ≤ p k,1 ≤ 1 for all k, p k,1 > 0 for at least two values of k. Writing S* = S − S 1 , equation (A 2) can be rewritten Because the numerator increases with S 1 , and because the numerator increases with S* and the denominator decreases with S*, the upper bound on F ST is greatest when both S 1 and S* are maximized subject to P 1 i¼1 p k,i ¼ 1 for each k and P K k¼1 p k,i s 1 for each i. If S 1 and S* can be simultaneously maximized at the same set of values of the p k,i , then this set of values of the p k,i achieves the maximal F ST .
We proceed in three steps. (i) First, we find the set of values of the p k,i that maximizes S 1 . (ii) Next, we find the set of values that maximizes S*. (iii) We then conclude that because the same set maximizes both S 1 and S* separately, this set achieves the upper bound in equation (A 4), and hence in equation (A 2).
(ii) Next, we maximize S Ã ¼ P 1 i¼2 P K k¼1 p 2 k,i . Because, by theorem A.2, all alleles with i ≥ 2 are private at the set of values of the p k,i that maximizes F ST for fixed non-integer σ 1 , each non-zero p k,i for i ≥ 2 is equal to the associated σ i . The sum of the frequencies of all alleles across all subpopulations is P 1 i¼1 s i ¼ K, so that P 1 i¼2 s i ¼ K À s 1 . The problem of maximizing S* is the problem of maximizing S Ã ¼ P 1 i¼2 s 2 i with the constraints P 1 i¼2 s i ¼ K À s 1 and σ i ≤ 1 for each i from 2 to ∞. This maximization is again that performed in lemma 3 of Rosenberg & Jakobsson [40]. Applying the lemma, the maximum is achieved by setting s 2 ¼ s 3 ¼ . . . ¼ s KÀbs1c ¼ 1, s KÀbs1cþ1 ¼ 1 À fs 1 g, and σ i = 0 for i . K À bs 1 c þ 1. The maximum is ð1 À fs 1 gÞ 2 þ ðK À bs 1 c À 1Þ.
(iii) S 1 is maximized at a set of p k,i for which bs 1 c subpopulations are fixed for allele 1, allele 1 has frequency {σ 1 } in one subpopulation and allele 1 has frequency 0 in all other subpopulations. S* is maximized at a set of p k,i for which K À bs 1 c À 1 subpopulations are fixed, each for a distinct allele i with i ≥ 2, one subpopulation possesses a distinct allele i ≥ 2 with frequency 1 − {σ 1 }, and all bs 1 c other subpopulations possess no alleles i ≥ 2 of non-zero frequency.
The upper bound in equation (A 4) depends on both S 1 and S*, each of which depends on the p k,i . Were the set of values of the p k,i that maximizes S 1 and the set of values of the p k,i that maximizes S* to differ, additional work would be required to find the set of values of the p k,i that maximizes F ST . However, we now observe that S 1 and S* can be simultaneously maximized at the same set of values of p k,i , so that the same set of values of the p k,i maximizes S 1 and S* and hence F ST . In particular, bs 1 c subpopulations are fixed for allele 1, each of K À bs 1 c À 1 subpopulations is fixed for its own private allele, and a single subpopulation possesses allele 1 with frequency {σ 1 } and a private allele with frequency 1 − {σ 1 }. The number of alleles of non-zero frequency is K À bs 1 c þ 1. Only the most frequent allele is shared by more than one subpopulation, and a single subpopulation possesses more than one allele of non-zero frequency.
Substituting the maximal values of S 1 and S* into equation (A 4), for non-integer σ 1 in (1, K ), we obtain the maximal F ST in terms of σ 1 shown in equation (3.1).