Machine Learning Line Bundle Cohomologies of Hypersurfaces in Toric Varieties

Different techniques from machine learning are applied to the problem of computing line bundle cohomologies of (hypersurfaces in) toric varieties. While a naive approach of training a neural network to reproduce the cohomologies fails in the general case, by inspecting the underlying functional form of the data we propose a second approach. The cohomologies depend in a piecewise polynomial way on the line bundle charges. We use unsupervised learning to separate the different polynomial phases. The result is an analytic formula for the cohomologies. This can be turned into an algorithm for computing analytic expressions for arbitrary (hypersurfaces in) toric varieties.


I. INTRODUCTION
The idea of applying concepts from data science to problems naturally appearing in string phenomenology is of course not new. The emergence of the string landscape, the set of effective field theories arising from some consistent string construction, has quickly lead people to consider statistical tools to tackle its enormous size [1].
Following early work on genetic algorithms [2,3], with techniques from data science and machine learning recently becoming important for the solution of many real world problems, there has been an increased interest in applying machine learning wisdom to the exploration of the landscape [4][5][6][7][8] [9][10][11][12].
We want to stress here that while e.g. the number of flux vacua is numerically huge (the famous estimated lower bound being 10 500 ), we are still dealing with a possibly finite and likely countable set whose members can be described by a vector with integral entries. Often the answer to many interesting questions about the vacua can also be described by a set of integers, as is the case for yes/no questions of the type "Is my vacuum supersymmetric?" or "Does my vacuum contain a tachyon", but also questions such as "How many generations of SMfermions does my vacuum contain?". We want to address the question whether such (complicated) mappings between vectors of integers can be naturally modelled by neural networks (NNs). A particular such questions is: "Given a (hypersurface in a) toric variety X, what are the ranks h • of the line bundle cohomology groups H • (O X (D)), for some toric divisor D?" In many cases the answer to this question is provided by the cohomCalg program [13], which supports us with data sets on which neural networks can be trained.
As a first approach, we try to directly train a neural network to reproduce the cohomologies. We study first whether this approach can work for the toric ambient spaces and also hypersurfaces therein. The possibility of interpolating and extrapolating the data from a training set is then investigated. This approach is very similar to the one adopted in [7], where genetic algorithms were employed to optimise a neural network for regression of line bundle cohomologies.
Our second approach consists of a two step procedure. First we cluster the cohomology data using unsupervised learning. The resulting clusters turn out to have a simple polynomial formula for their cohomologies. The two steps lead to an analytic expression for the rank of the line bundle cohomology groups.
On the way we solve a shortcoming of the cohomCalg algorithm by implementing some of the mappings in the Koszul complex.
After completion of this work we became aware of [14], which deals with the similar problem of computing line bundle cohomologies in the case of CICYs in products of projective spaces.

II. LINE BUNDLES ON HYPERSURFACES IN TORIC VARIETIES
A vast majority of the Calabi-Yau manifolds that are used in string constructions are obtained as complete intersections in toric varieties, the anticanonical hypersurfaces forming a subset of these. Although our techniques are expected to generalise to the case of complete intersections, we will treat only the case of hypersurfaces as a proof of principle.
Toric varieties can be described in many different ways, one of which is the gauged linear sigma model (GLSM) [15]. The GLSM is an N = (2, 2) SUSY gauge theory in two dimensions, with chiral superfields x i , i = 1, . . . , I, representing homogeneous coordinates of the toric space. The GLSM features R abelian gauge symmetries, and the charge vectors Q (r) i , r = 1, . . . , R encode the weights under (C * ) R rescalings of the homogeneous coordinates. Analogous to the case of projective spaces, the resulting toric variety X is then formed as a quotient of C I by the homogeneous rescalings, after cutting out a suitable fixed point set F This fixed point set depends on the choice of the FI parameters in the gauge theory. Solvability of the D-terms will result in the constraint that certain subsets S α of the full set of coordinates should not vanish simultaneously arXiv:1809.02547v2 [hep-th] 6 Dec 2018 The extracted set then takes the form The ring-theoretic way of handling the information in the vanishing set is given by the Stanley-Reisner ideal Here the generatorsS α = |Sα| i=1 x αi are monomials constructed out of the coordinates in the sets S α .
The homogeneous coordinates of a toric variety provide us with a natural open covering in terms of the sets U i = {x|x i = 0} as well as a set of divisors D i = {x|x i = 0}. Due to the equivalence between line bundles and divisors, line bundles on a toric variety take the form of tensor products of the L i = O X (D i ) and their inverses. We can also classify line bundles in terms of their GLSM charges as In a toric variety, the anticanonical hypersurface H = i D i has vanishing first Chern class and is thus Calabi-Yau. Line bundles O X (D) on the ambient space descend to line bundles on this hypersurface O H (D). The two are related by an exact sequence of sheaves, the Koszul sequence Here m is multiplication with the defining section of O X (H) of the hypersurface and res is the restriction map to it. Our main interest are the sheaf cohomology groups

III. THE COHOMCALG ALGORITHM
A more elegant and fast way to compute the sheaf cohomology is given by the cohomCalg algorithm, which has been conjectured in [16], proven in [17] and implemented in [13]. The algorithm gives generators of the cohomology groups in terms of rationoms, which are just monomials of the form where the vectors x, y refer to a splitting of the homogeneous coordinates as follows. The power set of the Stanley-Reisner ideal [18] is decomposed into its kelement subsets as One defines index-sets A = {α 1 , . . . , α k } ⊂ {1, . . . , |SR|} which allow us to label the elements of the sets P k (SR) as P k A = {S α1 , . . . ,S α k }. For a given P k A , the union of all its associated S αi is denoted as which is just the collection of all coordinates that appear in the set P k A . To this set, a degree N k A is assigned: For a given Q = Q k A the variables y that appear in the denominator of the rationom (7) are now defined to be those that are contained in Q, whereas the x coordinates are taken from the complement. For this given Q we can now construct all possible rationoms that match the GLSM charge of the divisor D that defines the line bundle O X (D). Each rationom contributes a generator of the cohomology group H N (X, O X (D)), with N = N k A . In some cases a single rationom will contribute multiple generators to the cohomology. This is associated with the calculation of a certain remnant cohomology, which has been clarified in [17]. Although these multiplicities are implemented in the cohomCalg program, this complication will not appear in the examples that we study.
Once the sheaf cohomology of X is computed, one can use the fact that the short exact sequence of sheaves (6) induces a long exact sequence of cohomology groups where δ is the connecting homomorphism, in order to deduce the sheaf cohomology H • (H, O H (D)) on the hypersurface.
The reference implementation of the cohomCalg algorithm [13] does not implement the maps in the Koszulsequence and hence relies on the exactness of the sequence in order to derive the ranks of the cohomology groups. This works by first cutting the long sequence into shorter sequences at locations where zeros occur and then using the fact that for an exact sequence the ranks satisfy n j=1 (−1) j rk(G j ) = 0. The above approach works as long as there are sufficiently many zeros in the sequence. In order to train our classifiers we need the cohomology ranks of all line bundles corresponding to a certain interval [−δ, +δ] in charge space. Generically only some of those ranks can be solved by the cohomCalg program, whereas a large portion is left undetermined.
We improve the algorithm by cutting the sequences also at the multiplication maps m * as The price for inserting an additional zero is now that we have to compute the (rank of the) image of the map m * . The induced map m * on the cohomologies is realised in this setting by multiplication of the rationom representatives of the cohomology generators with the defining section s ∈ Γ(X, O X (H)) of the hypersurface. If a resulting monomial is not contained in the set of rationoms spanning the codomain, it is equivalent to zero in cohomology.
For definitiveness we will always consider the hypersurface to be at the large complex structure point of its moduli space. This means that the map m is just multiplication by the monomial x 1 · · · x I . In all cases studied the resulting exact sequences could now be solved for the cohomologies on the hypersurface. If this would have not been the case, we could have also introduced additional cuts at the restriction maps.
The procedure suggests a natural generalization to the case of CICYs in toric varieties for which there exists a similar Koszul sequence, the mappings of which can be implemented in an analogous way. We leave an implementation of this more general case for future work.
Let us outline the calculation in an example. The anticanonical hypersurface in P 3 1112 is a K3 surface. The toric resolution of this is described by the charge vector with Stanley-Reisner ideal SR = x 1 x 2 x 3 , x 4 x 5 . We want to compute the image of the map where we have introduced a basis D 1 = {x 1 = 0} ∼ {x 2 = 0} ∼ {x 3 = 0} and D 2 = {x 4 = 0} of divisors such that {x 5 = 0} = 2D 1 +D 2 and use the corresponding dual basis for the first cohomology. Using the cohomCalg algorithm we determine the generators of both cohomology groups to be Under the map m = · i x i it is clear that only the first class of generators with denominator x 2 4 x 2 5 will be mapped to rationoms that exist in H 1 (O(2, −2)). The second class of generators with denominator x 4 x 3 5 is mapped to monomials without x 4 in the denominator, which do not have the correct singularity structure to be members of H 1 (O(2, −2)) and hence are cohomologous to zero. As a result we find that rk(im(m * )) = 3.
For an arbitrary point in the complex structure moduli space the map m * will of course be more complicated. The polynomials that result from multiplication of the rationoms in H 1 (O(−3, −4)) with the defining polynomial of the hypersurface will have to be reduced modulo the rationoms in the target cohomology. While this is straightforward to implement, it is computationally more expensive and we restrict to the large complex structure point to illustrate our methods.

IV. MACHINE LEARNING COHOMOLOGIES
The aim of this paper is to examine the possible application of neural networks in the computation of line bundle cohomologies of toric varieties and hypersurfaces therein. There are different possible approaches. In [7] genetic algorithms were used to evolve neural networks which were then used to perform a regression on the map between the line bundle charges and cohomologies. The resulting NNs reproduced the cohomology ranks with 72%/83% accuracy after training. On the other hand the authors of [11] used a classification neural network to learn the Hodge numbers of the Kreuzer-Skarke list and achieved a 80% validation rate in predicting the cohomologies. They also used a regressional neural network to solve the same problem with worse results. While these approaches work in their respective areas of application, they require large data sets and fail at the extrapolation of large numbers.

A. Neural Networks for Classification
A neural network for a classification problem maps an input vector via several hidden layers, which normally are taken to be ReLU, to a fixed number of output nodes representing the classes. The output is normalised to sum up to 1 and interpreted as a probability and this is typically implemented by applying a softmax layer. The prediction is the class of highest probability. The loss function has to be proportional to the deviation from the true result and for classification networks often is taken to be the cross entropy. This approach has the severe limitation that one has to a priori fix the possible outcomes, as every possible value of the h i has a corresponding node. The authors of [11] avoided this problem by declaring all h i > 50 as large and do not try to classify these. In the examples we will be discussing the ranks can become arbitrarily large and this classification no longer makes sense. While this approach is easy to use, the rather bad results and limitations to very small ranks render it uninteresting.

B. Neural Networks for Regression
Another approach is a regressional neural network. Here the input vector is again mapped by several hidden ReLU layers to an output vector. This time the output vector is not normalised but takes any value in R n and is interpreted as the ranks by rounding to the nearest integer. The loss function for training is taken to be the mean squared error of the prediction compared to the real ranks. This approach does not put a hard upper bound on possible ranks, but the precision of the result is limited by the number of neurons and the floating point precision used. Most standard implementations of NNs use only single precision, resulting in a precision of the ranks of 10 −6 . Thus if the ranks exceed 10 6 , the error becomes order one and the NN predicts wrong numbers.
Moreover, the NN only learns an interpolation of the given data. Therefore, if one trains the network on a data set where the entries of the charge vector are in a certain range, the predictions outside of this range are unreliable.
To illustrate these findings, we take the ambient space dP 3 and the hypersurface P 11222 [8]. We randomly generated 50000 data points with line bundle charges in the range [−50, 50]. In the case of dP 3 , the cohomologies can be learned by a NN consisting of 3 hidden ReLU layers with 500 neurons each to a precision of 99.85% within one hour. In the case of P 11222 [8], this approach fails. Even large nets produce only 0.1% correct results. The reasons are that the ranks in this example already exceed 2 · 10 7 and the high non-linearity of the problem. Sophisticated preprocessing of the data increased this to 55% accuracy after 10 minutes of training, which is still not satisfactory. Thus for these kind of problems another approach is needed.

V. AN ALGORITHM TO DETERMINE ANALYTIC FORMULAS
The algorithm described in section III allows the determination of the ranks of the cohomology groups for given values of the line bundle charges. In this section an algorithm using unsupervised learning is presented which allows the identification of analytic expressions.
First a data set S of the cohomologies is calculated for all values of the line bundle charges m satisfying |m i | ≤ a ∀i for a fixed value of a. Tests have shown that a = 25 is sufficient for the algorithm to find the analytic formulas.
The algorithm uses the observation that the h i have a distinct phase structure. In the interior of one phase the h i are polynomial functions of the line bundles of maximal degree d, where d is the dimension of the variety. If one can identify the phase structure, it is then easy to perform a polynomial fit. This represents a classification problem. As one a priori does not know the phase structure, unsupervised learning has to be applied.
In unsupervised learning one faces the task to group data points into different sets without specifying any conditions. This leads to a clustering of similar data. The only input is the data to classify and the maximal number of sets to be used. We applied the pre-implemented Clus-terClassify function of Mathematica 11.3 with 200 classes and "Quality" as optimization goal as well as "KMeans" as the method to generate the classifiers and the Linear-ModelFit function for the polynomial fits.
In the interior of one phase, the d-th derivatives of the h i with respect to m are constant and the (d + 1)-th derivatives vanish. As the h i are only defined for integer m, the data forms a lattice. The derivatives are therefore calculated using the central difference scheme with a lattice spacing of one. This leads to a non-vanishing (d+1)-th derivative exactly at the phase boundaries. The first step is to remove the boundaries out of the data set S. To do so a cluster classifier with a very large number of classes is trained on the data set where i = 0, . . . , d runs over all cohomology groups. This set takes for a point inside a phase the form and for a point at a phase boundary at least one of the latter entries is non-vanishing. This leads to a classification where all data points which lie in the interior of a phase are classified into one set and various sets of boundary points. For large enough line bundle charges the interior will always be the largest set. The boundaries are simply thrown away. Tests show that the classification works better for a small dimensional space. The number of partial derivatives increases with the degree d and the number of line bundle charges. Therefore this step was divided into several classification steps. First one trains one classifier on a subset of the derivatives of degree d + 1 and removes the boundary. Then a second classifier is trained on the next subset and so on. As the training of one classifier takes only seconds, this is not a huge performance loss but drastically improves the result.
In the examples presented in this paper we used a splitting into two randomly chosen subsets of equal size. With the remaining points forming the interior of the phases the set is formed and a second classifier trained on this set. The set S 3 is, in contrary to the original data set S, not connected in the m, which improves the classification and is the reason for the two step procedure. This now classifies the phase structure of the problem. The number of allowed classes is again taken to be very large. While it can happen that one phase is grouped into two classes, this does not pose any problem as in this case the polynomials obtained will agree and the phases can be merged later on. The final step is to perform the polynomial fit on each set and each h i . Sets with identical polynomials for all h i are then merged. This concludes the algorithm. To summarise: 1. Calculate a set of data points using the extended cohomCalg.
3. Classify the data using these derivatives.
4. Determine the d-th derivatives of the remaining data points.
5. Classify the data using these derivatives.
6. Perform a polynomial fit of degree d on each set for each h i .

Merge sets with identical polynomials.
We note that this algorithm requires no input besides the geometric data describing the variety and can therefore be completely automatised. The only thing which has to be done by hand is to extract the boundaries of the phases, as the classifier encodes them not in closed form. This is quite tedious, but for practical purposes one does not need the functions. One can use the classifier to identify in which phase a given m lies and apply the polynomial of this phase. For convenience we added the phase boundaries in the tables.
As a non-trivial test of the procedure we calculated the Euler characteristic of the examples by summing up the polynomials and compare them to the Euler characteristic as obtained from the Hirzebruch-Riemann-Roch theorem. The two expressions agree in all examples and phases.
In the following sections this algorithm is applied to some examples.

VI. LINE BUNDLES ON TORIC VARIETIES
We start with an example where the analytic expressions are well known, the del Pezzo surface dP 1 . This provides on one hand an easy method to cross-check the results and on the other hand is an easy example with only 3 phases.
Using cohomCalg, we generate a data set of the cohomology ranks with the line bundle charges in the range a = [−25, 25]. These are shown in figure 1. The application of the unsupervised learning on the third derivatives cuts out two phase-boundaries where the underlying function describing the ranks is non-differentiable. The second cluster analysis then classifies the remaining points using the second derivatives into 6 phases, three pairs of which have identical polynomials for h 1 . The result is shown in figure 2.
Fitting a polynomial of degree 2 to the ranks in each of these phases results in the polynomials listed in table I. These agree with the known analytic expressions, see e.g. [16].  (O(m, n)) of dP1.

VII. LINE BUNDLES ON HYPERSURFACES
We now turn to the more complicated problem of finding analytic expressions for line bundle cohomologies of hypersurfaces in toric varieties. As an example for a hypersurface we take the K3 space P 3 1112 [5]. This hypersurface has two line bundle charges, so that m = (m, n). The expected degree of the polynomials is d = 2. Figure  3 shows the ranks of the zeroth cohomology for different values of m and n.
At first glance this seems to consist of 3 phases. But applying the algorithm described in the last section reveals that there are actually 6 phases. Figure 4 shows the result of the second classification. The fitted polynomials can be found in table II. One nicely sees the cut boundaries and phases. Also the separation between the orange and brown phase seems redundant from the point of view of h 0 , but is necessary because of the higher cohomology groups. Especially interesting is the subdivision in the yellow/purple and red/green phases into even and odd n, which are also described by different polynomials. The phase structure thus is not only defined by some linear functions of m and n. If one tried a polynomial fit in the whole of these phases instead of separating into even/odd one would not obtain rational coefficients. E.g. in the yellow/purple phase the polynomials are 5m 2 4 + 2 for n even and 5m 2 4 + 7 4 for n odd. If one mixes these phases, the interpolating polynomial obtained is 1.80407 + 0.0131771 n + 1.24945 n 2 , which does obviously not reproduce any of the cohomologies correctly and cannot be extrapolated. Another interesting example is the octic P 4 11222 [8]. Here we expect the polynomials to be of degree d = 3. Figures 5 and 6 show again the input data for h 0 and the result after classification. The resulting polynomials for h 0 are listed in table III. We note that the only disadvantage of this procedure is that the boundaries are cut out and it is not possible to determine the value at the boundaries itself, which is reflected in only > statements in the table instead of ≥. But as these are only a limited number of points one can simply compare these with the results from cohomCalg. The tables for the other cohomology groups can be found in appendix A.

VIII. DISCUSSION
We have presented a method for generating analytic expressions for all line bundle cohomology ranks of toric varieties or hypersurfaces therein. The algorithm takes as an input the toric data in form of GLSM charges and the Stanley-Reisner ideal. For the case of hypersurfaces Phase Polynomial m < 0, n ∈ Z 0 m > 0, n < 0 m 3 3 − 2m 2 + 11m 3 − 1 m > 0, n > m 2 − 8m 3 3 + 2m 2 n + 2m 3 + 2n m > 0, 0 < n < m 2 , m even  we also need to specify a point in the complex structure moduli space in the form of a polynomial that defines a section of O X (H) and hence a specific hypersurface. For demonstrative purposes we calculated at the large complex structure point but the method carries over to other generic and special points in the moduli space. The output is a classifier that separates the space of line bundles into different phases, such that within a phase each cohomology is described by a single polynomial in the line bundle charges. Since the polynomials have coefficients in Q the result can be considered exact and we obtain a formula for all of the line bundles. As a cross-check we see that the alternating sum of polynomials in each phase reproduces the Euler characteristic as calculated from the Hirzebruch-Riemann-Roch theorem.
It was crucial to realise that we understand the local structure of the data and the problem of patching this to obtain the global structure could be broken down to a simple classification problem.
We expect that our methods carry over to similar problems of this type. For example the case of line bundles on complete intersections in toric varieties should be completely analogous. We leave the interesting case of vector bundles of higher rank in the form of monad bundles for future work.