Counting String Theory Standard Models

We derive an approximate analytic relation between the number of consistent heterotic Calabi-Yau compactifications of string theory with the exact charged matter content of the standard model of particle physics and the topological data of the internal manifold: the former scaling exponentially with the number of Kahler parameters. This is done by an estimate of the number of solutions to a set of Diophantine equations representing constraints satisfied by any consistent heterotic string vacuum with three chiral massless families, and has been computationally checked to hold for complete intersection Calabi-Yau threefolds (CICYs) with up to seven Kahler parameters. When extrapolated to the entire CICY list, the relation gives about 10^23 string theory standard models; for the class of Calabi-Yau hypersurfaces in toric varieties, it gives about 10^723 standard models.


Introduction and summary
It is generally believed that string compactifications that have the exact charged matter content of the standard model of particle physics (and no other charged matter except moduli) are few in number. The purpose of this letter is to show that, although such compactifications may be rare and hard to find, their number is substantial. Admittedly, this bias has come in the past from the difficulty to construct phenomenologically viable compactifications. However, since the birth of string phenomenology in Ref. [1], from the advent of the first standard-like string model [2], to the first exact particle spectrum directly derived from a string compactification [3][4][5][6], to the first result [7] from algorithmic heterotic compactification [8], until the comprehensive computer scan of Refs. [9][10][11]15], as well as the various statistical perspectives on the heterotic landscape [12,38] (cf. [13] in Type II and beyond [14]), there has been much progress.
While it can be specified at different levels of sophistication, for this letter a "string standard model" is a model with a massless spectrum which is exactly that of the minimally supersymmet-X from existing databases, most of which are simply connected, then search for discrete, freely acting symmetry groups onX , and consider the quotient X X / with fundamental group ; (2) construct and classify families of -equivariant bundles V on X , ensure stability, and then compute the relevant cohomology groups; and (3) scan through the results to look for exact MSSM particle content. Much of these can be implemented on a computer.
The most extensively used databases of manifolds are the Complete Intersection Calabi-Yau three-folds (CICYs) embedded in products of projective spaces of around 8000 manifolds [16,17] as well as the Kreuzer-Skarke (KS) dataset of Calabi-Yau hypersurfaces embedded in around half-billion four-dimensional toric varieties [18][19][20]  ceed this number because of triangulations and a recent estimate was made in [21]). The comprehensive scan of [11] was performed on CICY manifolds with a number of Kähler parameters less than h 1,1 (X) = 6 and vector bundles constructed from sum of line bundles. A total of about 35, 000 SU (5) heterotic line bundle models were found, all with the right field content to induce low-energy standard-like models.
It is obviously important to have a count of MSSM models expected within string theory. While ours is still a relatively unrefined notion of the standard model, even counting at this level is not easy since it requires information on all vector bundles and their cohomology on CYs which is not available in any systematic form. An exception to this rule is the class of vector bundles that split into a sum of line bundles. Holomorphic line bundles are classified by their first Chern class, which can be expressed in terms of h 1,1 (X) integers. As such, line bundles can be enumerated.
Moreover, there is enough evidence that indicates the existence of analytic formulae for the ranks of line bundle valued cohomology groups in terms of the line bundle integers [23][24][25][26]. Finally, line bundle sums offer an accessible window into the moduli space of non-abelian bundles [26][27][28]: if a line bundle sum corresponds to a standard-like model, then usually it can be deformed into nonabelian bundles that also lead to standard-like models.
Our model building experience for such (rank five) line bundle models on CICYs suggests that a significant number of consistent models with the correct chiral asymmetry will descend to standard models after dividing by the freely acting discrete symmetry group . Moreover, by far the most frequent symmetry is Z 2 .
This suggests that an indication of the number of standard models should be provided by counting the consistent upstairs line bundle models with chiral asymmetry 6, relevant for Z 2 symmetries.
We start our analysis by outlining the constraints on the compactification data that guarantee an exact MSSM spectrum. For a fixed manifold, most of these constraints take the form of Diophantine equations and inequalities, where the unknown variables are the line bundle integers. For the class of CICY manifolds with less than six Kähler parameters this system was solved in Ref. [11] by explicitly checking every possible line bundle sum. We augment this dataset of line bundle models with results for 7 new manifolds with h 1,1 (X) = 6, 7.
These scans suggest a simple rule: the number of line bundle models increases roughly by an order of magnitude with every increment of h 1,1 (X) by one. However, it is difficult to test this relation for larger values of h 1,1 (X) due to computer limitations.
Instead, we estimate the number of solutions to the Diophantine system of constraints using a result from the mathematical literature [29]. For this to hold, we define a bound on the line bundle integers in terms of topological data of the manifold.
Finally, we come back to the empirical dataset of line bundles and correlate the number of solutions not only with h 1,1 (X), but also with a number of topological invariants, constructed from the intersection form and the second Chern class of the CY manifold, that display little variation with increasing h 1,1 (X). Extrapolating the multi-linear regression to the maximal value of h 1,1 (X) found in the CICY dataset, we estimate a total of N CICY 10 23 line bundle models (we owe the expression "a mole of models" to Tristan Hübsch) while for the manifolds in the Kreuzer-Skarke list we expect N KS 10 723 line bundle models. valuable comments on the draft. AC and AL would like to thank the Mainz Institute for Theoretical Physics for hospitality during part of the completion of this project. AL is partially supported by the EPSRC network grant EP/N007158/1. YHH thanks STFC for grant ST/J00037X/1.

Counting line bundle MSSMs
The models of interest for our count have an exact MSSM particle content and are constructed from heterotic compactifications on a smooth, compact Calabi-Yau threefolds X endowed with slope-zero, poly-stable direct sums of line bundles. Let h denote the Picard number of X , h := h 1,1 (X), and choose an integral basis of In this basis, let the second Chern class of X be c 2,i and the triple intersection num- The reader is referred to Sec. 4 of Ref. [11] for further details on the constraints.
First, we focus on SU (5) bundles V for the following reason.
From a group theoretic point of view [22], there are many ways to break the GUT group to the MSSM group using an appropriate discrete Wilson line, for example, the exact MSSM spectrum of [4] was achieved with a Z 3 × Z 3 Wilson line from an S O (10) GUT group. However, CY manifolds X with a large freely acting discrete symmetry group are quite rare. This can be seen, for instance, from the complete classification of freely-acting [33] and residual [34,35] symmetries on all CICYs, or from the KS dataset of hypersurfaces in toric Fano fourfolds [30,32].
Therefore, generically, it is expected that Calabi-Yau manifolds with a small fundamental group π 1 (X), should far exceed in number those with a large π 1 (X) (this should be contrasted with the relative paucity of Calabi-Yau manifolds of small Hodge numbers [36,37,39]). The smallest possible π 1 (X) that breaks the SU (5) GUT group to the Standard Model gauge group is = Z 2 and this setup is expected to dominate. Now, in SU (5) (commutant of the SU (5) of the bundle in E 8 ) GUTs, the 10 representation corresponds entirely to anti-families which we desire to be absent. Under the branching of E 8 to SU (5), this corresponds to the condition that h 2 (X, V ) = 0, so that stability (implying that In summary, the counting problem can then be formulated as follows:  We emphasize that N is a function of the prescribed Hodge number h, the second Chern class c 2,i , as well as the triple intersection numbers d ijk of X .

Preliminary count
For the subset of favourable CICYs with h ≤ 6, the number N was determined by the computer scan [11] and we have extended this scan to include manifolds for h 1,1 (X) = 7 as well as some nonfavourable manifold for h 1,1 ≤ 7 which were previously discarded.
The cumulative and average number of models found for each Picard number h is summarised in Table 1. Note there are no viable models for h = 1, 2, 3 since the supersymmetry (slope zero) conditions are too constraining for those cases. A simple approach is to assume that the main dependence of N is on h, and to neglect the possible effects of c 2,i and d ijk . The average number, N =N(h) of models per CY as a function of h, taken from the last row of Table 1, has been plotted (logarithmically) in Fig. 1.
A linear fit to this data (which corresponds to the red line in Fig. 1) leads to log(N(h)) −5.0 + 1.5 h .
The largest known Picard number of any CY threefold is h max = 491, which appears within the KS data set, and the largest value within the CICY list is h CICY = 19. Using (2) to boldly extrapolate to those values we find N(h CICY ) 10 23 ,N(h max ) 10 721 .
Clearly, these numbers are quite dramatic, even if we restrict ourselves to the CICYs. The predicted number of standard models even within this set is significantly larger than can be currently stored, let alone found by a scan. However, the method so far is quite crude and the extrapolation to large h adventurous. To see some of the problems, consider Fig. 2 which shows the number of models as a function of h for each CICY, rather than the average over all CICYs with the same h, as in Fig. 1. The variation within given h can be seen to be considerable -clear indication that there is a strong dependence of N on c 2,i and d ijk , in addition to h.

Some theoretical considerations
While the computer scan gives a finite number of models in each case, it is actually not easy to prove that N is finite. A succinct argument was presented in Ref. [15] based on the moduli space metric where κ = d ijk t i t j t k , κ i = d ijk t j t k and κ ij = d ijk t k , with t i being the Kähler moduli. We note that the slope zero conditions can be ex- Then, introducing the scale-invariant modified metric G = κ 6|t| G, we get the bound a k T aG k a ≤ |c 2,i (T X)| . (6) This by itself does unfortunately not bound the vectors k a since the metric G might become singular at the boundary of the Kähler cone. However, if we require that we stay in a "physical" region of the Kähler cone where all curve volumes to be greater than 1 (so that the supergravity approximation is valid) and the volume κ is bounded from above (so that we are not de-compactifying) then the eigenvalues of G are bounded from below by a strictly positive number.
Eq. (6) then implies that the length a |k a | 2 is bounded from above and, hence, that there is only a finite number of possible integer vectors k a . More quantitative statements depend very much on the specific example but what can be said is that the length of the k vectors is bounded by a radius R which roughly scales as where c 2 and d are typical values of c 2,i and d ijk .
Apart from the slope zero conditions the constraints on the line bundles can be written as a system of Diophantine equations  for i = 1, . . . , h. Here we have replaced the inequality in the anomaly condition with an equality, assuming that the bulk of the contribution comes from line bundles with the largest allowed integers. We can homogenise these equations (by introducing one additional coordinate) and think of them as a set of equations in P n , where n = 5 h. Assuming they provide a complete intersection, Z , its dimension is given by Ref. [29] provides an upper bound for the number of rational points N Z (B) within a box of size B (where the size is measured by the maximum norm) on Z which is given by N Z (B) B m . Using the above radius R as an upper bound for B we find The comparison between this upper bound and the results from the computer scan on CICY manifolds is shown in Fig. 3. With all the data points well below the red line it is clear that Eq. (9) indeed provides an upper bound but it is equally obvious that this upper bound is rather weak. There are a number of possible reasons for this. First, the result of Ref. [29] is only an upper bound (which, in addition, counts rational rather than integer points). Second, the radius R from Eq. (7) is a crude estimate and is, by itself, only an upper bound on the size B of the box considered in Ref. [29].

A more sophisticated count
Our count in §2.1, based on considering only the dependence of N on the Picard number h is clearly somewhat unrefined, while the theoretical upper bound x th from §2.2 is clearly too weak to allow for a meaningful extrapolation to larger values of h. In this subsection, we seek a more sophisticated equation for N as a function of h, c 2,i and d ijk , drawing inspiration from the above discussions.
There is an obvious difficulty of writing down even an ansatz for N as a function of c 2,i and d ijk : both of these quantities are basis-dependent (on a choice of integral basis J i for H 2 (X)), but N clearly cannot depend on such a choice of basis. This means we should think about basis-independent quantities which can be constructed from c 2,i and d ijk .
Unfortunately, both quantities have "all indices down" so there is no invariant which can be obtained by a simple contraction of indices, given that the only available metric, G from Eq. (4), is moduli-dependent. This problem has been encountered before, in the context of practical applications of Wall's theorem, and a solution has been proposed in Ref. [17], p. 174. From the intersection form λ(α, β, γ ) := X α ∧ β ∧ γ ∈ Z ≥0 , completely symmetric in α, β, γ , the following invariants can be constructed: Furthermore, combining the intersection form and c 2 = c 2 (T X) we can define the form (α, β, γ , δ) = (λ(α, β, γ )c 2 (δ) + 3 permuta tions) which gives rise to the invariants Ref. [17] also provides a practical way of computing these invariants which involves a scan over only a finite subset of H 2 (X, Z), so that they can be worked out from d ijk and c 2,i .
Combining the approaches of §2.1 and §2.2 and using invariants i=1,...,7 a plausible ansatz for x := log N is The comparison of this fit with the data is provided in Fig. 4. Each point corresponds to a CICY with x is the value computed from the RHS of Eq. (12), using the values in (13) for A i , B i and log(N) is the value of standard models on this CICY found by the computer scan. The red line is the diagonal, log(N) = x, which represents a perfect fit. It seems from Eq (13) that x depends most heavily on 1,3,5 as the corresponding coefficients dominate in magnitude. Examining these, we see that they essentially come from the basic intersection form (including the self-intersection), which is after all our most fundamental topological quantity.
which is not too far away from the earlier result (3).

Conclusions
The fit illustrated in Fig. 4 looks rather convincing and we believe that an extrapolation to large h = h 1,1 is trustworthy (since the invariants i show little variation relative to h), as long as the underlying model-building assumptions continue to be satisfied for large h. We believe that this is the case for the CICY dataset and, hence, the number of 10 23 standard models within this set should be taken seriously.
The extrapolation to h max = 491, the maximal known Picard number of any CY, is more questionable. Some of the model building assumptions made here have not yet been checked for the KS set, and there are even indications that they may not be satisfied. First, it is not clear that the KS set contains manifolds that admit freely-acting symmetries with the same frequency as CICYs.
The only systematic checks carried out for low h 1,1 where the frequency of symmetries is lower than the CICYs [30]. No information is available for large h 1,1 yet. Another generic feature of CICY models is the frequent absence of large numbers of vector-like pairs, so that checking the index was sufficient to guarantee the correct spectrum for a significant fraction of the models. It is not clear that this feature persists for constructions based on other CY manifolds. In fact, the results of Ref. [31] suggest that the presence of phenomenologically problematic numbers of vector-like pairs might be a generic feature of some other CY constructions. Again, no definite statement on this is available for the KS set.
In summary, the number of 10 723 standard models for the extrapolation to h 1,1 = 491 should be viewed with considerable caution. However, the extrapolation to h 1,1 = 19, the maximal Picard number in the CICY set has to be taken seriously and leads to 10 23 standard models. Even this number, almost certainly a conservative lower bound, is frighteningly large and beyond current computer storage and systematic search.