Machine-learning the string landscape

Abstract We propose a paradigm to apply machine learning various databases which have emerged in the study of the string landscape. In particular, we establish neural networks as both classifiers and predictors and train them with a host of available data ranging from Calabi–Yau manifolds and vector bundles, to quiver representations for gauge theories, using a novel framework of recasting geometrical and physical data as pixelated images. We find that even a relatively simple neural network can learn many significant quantities to astounding accuracy in a matter of minutes and can also predict hithertofore unencountered results, whereby rendering the paradigm a valuable tool in physics as well as pure mathematics.


Introduction
Whereas theoretical physics now inevitably resides in an Age where new physics, new mathematics and new data coexist in a symbiosis transcending disciplines, string theory has spearheaded this vision. That it engenders the cross-fertilization between physics and pure mathematics is without dispute, that it also has been a testing ground for computational mathematics and "big data" is perhaps less known. With the advent of increasingly powerful computers, from this fruitful dialogue has also arisen a plethora of data, ripe for mathematical experimentation. This emergence of data began with the incipience of string phenomenology [1] where compactification of the heterotic string on Calabi-Yau threefolds (CY3) was widely believed to hold the ultimate geometric unification. A race, spanning the 1990s, to explicitly construct examples of Calabi-Yau (CY) manifolds ensued, beginning with the so-called complete intersection CY manifolds (CICYs) [2], proceeding to the hypersurfaces in weighted projective space [3], to elliptic fibrations [4] and ultimately culminating in the impressive (at least some 10 10 ) list of CY3s from reflexive polytopes [5].
With the realization that the landscape of stringy vacua might in fact exceed the number of inequivalent CY3s [6] by hundreds of orders of magnitude, there was a vering of direction toward a more multi-verse or anthropic philosophy. Nevertheless, hints have emerged that the vastness of the landscape might well be mostly * Correspondence to: Merton College, University of Oxford, UK.
All of the above cases are accompanied by typically accessible data of considerable size, representing a concrete glimpse onto the string landscape, to which we shall refer as landscape data. For instance, the heterotic line bundles on CICYs are on the order of 10 10 , the spectral-cover bundles on the elliptically fibred CY3, 10 6 , the brane-configurations in the CY volume studies, 10 5 , type II intersecting brane models, 10  brain-child of the marriage between physicists and mathematicians, especially incarnated by applications of computational algebraic geometry, numerical algebraic geometry and combinatorial geometry to problems which arise from the classification in the physics and recast into a finite, algorithmic problem in the mathematics (cf. [12]). Obviously, computing power is a crucial limitation. Unfortunately, in computational algebraic geometry -on which most of the data heavily rely, ranging from bundles stability in heterotic compactification to Hilbert series in brane gauge theories -a decisive step is finding a Groebner basis, which is notoriously known to be unparallelizable and double-exponential in running time. Thus, much of the challenge in establishing the landscape data had been to either circumvent the direct calculation of the Groebner bases by harnessing of the geometric configuration -e.g., using the combinatorics when dealing with toric varieties. Still, many of the combinatorial calculations, be they triangulation of polytopes or finding dual cones, are still exponentially expensive.
The good news for our present purpose is that, much of the data have already been collected. Oftentimes, as we shall find out in our forthcoming case-studies, tremendous effort is needed for deceptively simple questions. Hence, to draw inferences from actual theoretical data by deep-learning therefrom would not only help identify undiscovered patterns but also aid in predicting results which would otherwise cost formidable computations. Subsequently, we propose our Paradigm: To set-up neural networks (NN) to deep-learn the landscape data, to recognize unforeseeable patterns (as classifiers) and to extrapolate to new results (as predictors).
Of course, this paradigm is useful not only to physicists but to also to mathematicians; for instance, could our NN be trained well enough to approximate bundle cohomology calculations? This, and a host of other examples, we will now examine.
Methodology Neural networks are known for their complexity, involving usually a complicated directed graph each node of which is a "perceptron" (an activation function imitating a neuron) and amongst the multitude of which there are many arrows encoding input/output. Throughout this letter, we will use a rather simple multi-layer perceptron (MLP) consisting of 5 layers, three of which are hidden, with activation functions typically of the form of a logistic sigmoid or a hyperbolic tangent. The input layer is a linear layer of 100 to 1000 nodes, recognizing a tensor (as we will soon see, algebro-geometric objects such as Calabi-Yau manifolds or polytopes are generically configurations of integer tensors) and the output layer is a summation layer giving a number corresponding to a Hodge number, or to rank of a cohomology group, etc. Such an MLP can be implemented, for instance, on the latest versions of Wolfram Mathematica. With 500-1000 training rounds, the running time is merely about 5-20 minutes on an ordinary laptop. It is reassuring and pleasantly surprising that even such a relatively simple NN can achieve the level of accuracy shortly to be presented.
This letter is a companion summary of the longer paper [42] where the interested reader can find more details of the computations and the data.

Results
With simple NNs, we proceed to analyse our landscape data, a fertile ground constituting more than 2 decades of many international collaborations between physics and mathematicians. Using 4 concrete case studies, we first "learn" from the inherent structure and then "predict" unseen properties; considering how difficult some of the calculations involved had been in establishing the databases, the usefulness of our paradigm is evident.

Case study 1: CY hypersurfaces in W P 4
One of the first datasets [3] to experimentally illustrate mirror symmetry was that of hypersurfaces in weighted projective space W P 4 . The ambient space W P 4 [w 0 :w 1 :w 2 :w 3 :w 4 ] with weights w i=0,...,4 ∈ Z + is in general singular, but a generic enough homogeneous polynomial of degree 4 i=0 w i which misses the singularities defines a hypersurface therein which is a smooth CY3 X . There are 7555 inequivalent such configurations, each specified by a 5-vector w i=0,...,4 . The Euler characteristic χ of X is easily given in terms of the vector. However, as is usually the case, the individual Hodge numbers (h 1,1 , h 2,1 ) are less amenable to a simple combinatorial formula. The original computation resorted to Landau-Ginzberg techniques to obtain the list of Hodge numbers [3]. One could in principle use adjunction and Euler sequences, and singularity resolution, but this is not an easy task to automate.
Suppose we have a simple question: how many such CY3s have a relatively large number of complex deformations? We can, for instance, consider h 2,1 > 50 to be "large" and let training data be of the form w i → 1 or 0 depending on whether h 2,1 (X) > 50. Training the NN, with say 500 rounds, takes under a minute on an ordinary laptop. The result is an optimised continuous real output between 0 and 1, the rounding of which can then be compared with the actual data. An accuracy of 96.2 % is achieved almost effortlessly! To appreciate the predictive power of the network, suppose that we only had partial data. This is particularly relevant when for instance, due to computational limitations, a classification is not yet complete, or when a quantity in question has not been or could not be yet computed.
Therefore, let us pretend that we have only data available for the first 3000 out of the 7555 (X, h 2,1 ) pairs. We repeat the procedure on the 3000, and then test against the full 7555. We find that 6078 cases were actually correct. Thus, with rather incomplete training data, the NN has learnt, in under a minute, our question and predicted new results to 80% accuracy. Emboldened, let us move onto another question, of importance to string phenomenology: Given a configuration, can one tell whether χ is a multiple of 3? In the early days of heterotic string compactification, this question was decisive on whether the model admitted 3 generation of particles in the low-energy effective gauge theory. Again, we can define a binary function taking the value of 1 if χ mod 3 ≡ 0 and 0 otherwise. Training with the NN, we achieve 82% accuracy with 1000 training rounds, taking about 2 minutes; these figures are certainly expected to improve with increasing number of training rounds and with more layers or more nodes in the NN.
The astute reader might question at this stage why we have adhered to binary queries. Why not train the NN to answer a direct query, i.e., to try for instance to learn and predict the value of h 2,1 itself? This is a matter of spread in the present dataset: we have only some 10 4 inputs yet we can see that the values of h 1,1 ranges from 1 to almost 500. We do not have enough data here to make more accurate statements. This is precisely in line with our philosophy, the power of deep-learning the landscape lies in rapid estimates, in identifying patterns and drawing inferences and in avoiding intense computations.

Case study 2: CICYs
Having warmed up, let us move onto complete intersection Calabi-Yau threefolds (CICYs) in products of projective spaces. This is both the first Calabi-Yau database (or, for that matter, the first database in algebraic geometry) [2] and the most heavily studied recently for string phenomenology [23][24][25][26]28]. It has the obvious advantage that the ambient space is smooth by choice.
Briefly, CICYs embed as K homogeneous polynomials in P n 1 × . . . × P n m , of multi-degree q r j , with complete intersection meaning that K = m r=1 n r − 3 and CY condition implying  [5], denoting the quintic hypersurface in P 4 . It was shown that such configurations are finite in number and the best available computer at the time (1990's), viz., the super-computer at CERN [2], was employed. A total of 7890 inequivalent manifolds were found, corresponding to matrices with entries q r j ∈ [0, 5], of size ranging from 1 × 1 to maximum number of rows and columns being 12 and 15, respectively.
This representation is much in the standard way to represent an image: to pixelate it into blocks of m × n, each of which carrying a colour info, for example, a 3-vector encapturing the RGB data. Therefore, we can represent all the 7890 CICYs into 12 × 15 matrices over Z/6Z, embedded starting from the upper-left corner, say, and padding with zeros everywhere else, as illustrated in Fig. 1. To view a CICY as a pixelated image, and indeed, to use image processing to address problems in geometry and mathematical physics, is an entirely new idea worthy of extensive exploration. Can we deep-learn, say the full list of Hodge numbers? As usual, the Euler number is relatively easy to obtain and there is a combinatorial formula in terms of the integers q r j , whilst the individual Hodge numbers (h 1,1 , h 2,1 ) involve some non-trivial adjunction and sequence-chasing, which luckily had been performed for us [2]. Again, we set up a list of training rules (padded configuration matrix → h 1,1 ) and find that the NN can be trained to an accuracy of 99.91% in under 10 minutes! What about the NN as a predictor, which is obviously a more salient question? Suppose the NN were trained with the first 5000 of the data, then, checking against the full dataset comprising of configurations/images the NN has never before seen, we achieve 77% accuracy. Considering (1) that we have only trained the NN for a mere 6 minutes, (2) that it has seen only a little over half of the data, (3) that it is rather elementary with only 5 forward layers, and (4) that the variation of the output is integral ranging from 0 to 19, with no room for continuous tuning, such accuracy with so little effort is quite amazing.

Case study 3: bundle cohomology
The subject of vector bundle cohomology has, since the sonamed "generalized embedding" [1] of heterotic compactification on smooth CY3 X endowed with a (poly-)stable holomorphic vector bundle V , become one of the most active dialogues between algebraic geometry and theoretical physics. The realization [9] that the theoretical possibility of [1] can be concretely achieved by a judicious choice of (X, V ) to give the exact MSSM spectrum induced much activity in establishing relatively large datasets to see how often this might occur statistically [11,18,22,24,31], culminating in [25,26] which found some 200 out of a scan of 10 10 bundles which have exact MSSM content.
Upon this vast landscape let us take an insightful glimpse by taking the dataset of [31], which are SU (n) vector bundles V on elliptically fibred CY3. By virtue of a spectral-cover construction [4,30], these bundles are guaranteed to be stable and hence preserves N = 1 supersymmetry in the low-effective action, together with GUT gauge groups E 6 , S O (10) and SU (5) respectively for n = 3, 4, 5. We take the base of the elliptic fibration -of which there is a finite list [29] -as the r-th Hirzebruch surface (r = 0, 1, . . . , 12 denoting the inequivalent ways which P 1 can itself fibre over P 1 to give a complex surface), in which case the stable SU (n) bundle is described by 5 numbers (r, n, a, b, λ), with (a, b) ∈ Z + and λ ∈ Z/2 being coefficients which specify the bundle via the spectral cover. This ordered 5-vector will constitute our neural input. The database of viable models was set up in [31], viable meaning that the bundle-cohomology groups of V are where the first is a necessary condition for stability and the second, that the GUT theory has the potential to allow for 3 net generations of particles upon breaking to MSSM by Wilson lines. Over all the Hirzebruch-based CY3, 14,264 models were found; a sizeable play-ground. Suppose the output be a 2-vector, indicating (I) what the gauge group is, as denoted by n, and (II) whether there are more generations than anti-generations, as denoted by the sign of the difference h 1 (X, V ) − h 2 (X, V ); this is clearly a phenomenologically interesting question. With 1000 training rounds on a NN with an output linear layer, and with the dataset consisting of entries in the form (r, n, a, b, λ) → (n, Sign(h 1 (X, V ) − h 2 (X, V ))), in about 10 minutes, we achieve 100% accuracy (i.e., the neural network has completely learnt the data). Training with partial data, say 8000 points, a little over half, achieves 68.9% predicative accuracy over the entire set.

Case study 4: quiver gauge theories
As a final example let us tackle affine varieties in the context of quiver representations. Physically, these correspond to worldvolume gauge theories coming from D-brane probes of geometric singularities in string theory, as well as the space of vacua for classes of supersymmetric gauge theories in various dimension; they have been data-mined since the early days of AdS/CFT (cf. [33,34]). When the geometry concerned is an affine toric CY variety, the realization of brane-tiling [35] has become the correct way to understand the gauge theory and since then databases have begun to be compiled [36,37].
The input data consists of a quiver (directed graph) and a relation imposed by a polynomial super-potential (q.v. [11] for a rapid review). We can succinctly encode the above information into two matrices, which again can be turned into a pixelated image: (1) D-term matrix Q D , which comes from the kernel of the incidence matrix d of the quiver, each column of which corresponds to an arrow with −1 as head and +1 as tail and 0 otherwise; (2) F-term matrix Q F each column of which documents where and with what exponent the field corresponding to the arrow appears in ∂ W . Concatenating Q D and Q F gives the so-call total charge matrix Q t of the moduli space as a toric variety (q.v. §2 of [34] for the precise procedure). The combinatorics and geometry of the above is a long story spanning a lustrum of research to uncover followed by a decade of still-ongoing investigations.
In the first database of [36], a host of examples were tabulated. A total of 375 quiver theories much like the above were catalogued (a catalogue which has recently been vastly expanded in [37]). Though not very large, this gives us a playground to test some of our ideas. The input data is the total charge matrix Q t , the maximal of whose number of rows and columns are, respectively 33 and 36, and all taking values in {−3, −2, . . . , 3, 4}. Now, suppose we wish to know the number of points of the toric diagram associated to the moduli space, which is clearly an important quantity. In principle, this can be computed (albeit computationally intensive): the integer kernel of Q t should give a matrix whose columns are the coordinates of the toric diagram, with multiplicity (associated to the perfect matchings of the bipartite tiling). Training with our NN with the full list achieves, in under 5 minutes, 99.5% accuracy.

A sanity check
Lest the readers' optimism be elevated to unreasonable heights by the string of successes with the NNs, it is imperative that we be aware of deep-learning's limitations. We therefore finish with a sanity check that a NN is not some omnipotent oracle capable of predicting any pattern. An example which must be doomed to failure is the primes (or, for that matter, the zeros of the Riemann zeta function). Indeed, if an NN could learn some unexpected pattern in the primes, this would be a rather frightening prospect for mathematics. We test the sequence of primes (i.e., data of the form i -> Prime[i]) with our NN, and achieve no better than a 0.1% accuracy. Our NN is utterly useless against this formidable challenge; we are better off trying a simple regression against some n log(n) curve, as dictated by the prime number theorem. This is a sobering exercise as well as a further justification of the various case studied above, that it is indeed meaningful to deep-learn the landscape data and that our visual representation of geometrical configurations is an efficient methodology.

Discussion
There are many questions in theoretical physics, or even in pure mathematics, for which one would only desire a qualitative, approximate, or partial answer, and whose full solution would often either be beyond the current scope, conceptual or computational, or would have taken considerable effort to attain. Typical such questions could be "what is the likelihood of finding a universe with three generations of particles within the landscape of string vacua or inflationary scenarios", or "what percentage of known Calabi-Yau manifolds has Hodge numbers within a prescribed range"? Attempting to address these profound questions have, with the ever-increasing power of computers, engendered our community's version of "big data", which though perhaps humble compared to some other fields, do comprise, especially considering the abstract nature of the problems at hand, of significant information often resulting from intense dialogue between teams of physicists and mathematicians for many years.
On the still-ripening fruits of this labour the philosophy of the last decade or so, particularly for the string phenomenology and computational geometry community, has been to (I) create larger and larger datasets and (II) scan through them to test the likelihood of certain salient features. Now that the data is augmenting in size and availability, it is only natural to follow the standard procedures of the data-mining community. In this letter, we have proposed the paradigm of applying deep-learning, via neural networks, such data. The purpose is twofold, the neural network can act as Classifiers: by association of input configuration with a requisite quantity, and pattern-match over a given dataset; Predictors: by extrapolating to hithertofore unencountered configurations, having deep-learnt a given (partial) dataset.
This is, of course, the archetypal means by which Google deeplearns the internet and hand-writing recognition software adapts to the reader's esoteric script. It is intriguing that by going through a wealth of concrete examples from what we have dubbed landscape data, some of whose creation the author had been a part, this philosophy remains enlightening. Specifically, we have taken test cases from a range of problems in mathematical physics, algebraic geometry and representation theory, such as CY datasets, classification of stable vector bundles, and catalogues of quiver varieties and brane tilings. We subsequently saw that even relatively simple NN can deep-learning to extraordinary accuracy.
In some sense, this is not surprising, there is underlying structure to any classification problem in our context, which may not be manifest. Indeed, what is novel is to look at the likes of a CICY or a quiver theory as a pixelated image, no different from a handwritten digit, for whose analysis machine-learning has become the de facto method and a blossoming industry. The landscape data, be they work of human hands, elements of Nature or conceptions of Mathematics, have inherent structure, sometimes more efficiently uncovered by AI via deep-learning. Thereby, one can rapidly obtain results, before embarking on finding a reductionist framework for a fundamental theory explaining the results or proceed to intensive computations from first principles. This paradigm is especially useful when classification problems become intractable, which is often the case, here a pragmatic approach would be to deep-learn partial classification results and predict future outcome.
Under this rubric, the possibilities are endless. Several immediate and pertinent directions spring to mind. First, the largest dataset in algebraic geometry/string theory is the Kreuzer-Skarke list [5,20,21] of reflexive polytopes in dimension 4 from each of which many CY manifolds (compact and non-compact) can be constructed. To discover hidden patterns is an ongoing enterprise [14,17] and the help of deep-learning would be a most welcome one. Next, the issue of bundle stability and cohomology is a central problem in heterotic phenomenology as well as algebraic geometry. In many ways, this is a perfect problem for machine-learning: the input is usually encodable into an integer matrix or a list of matrices, representing the coefficients in an expansion into effective divisor classes, the output is simply a vector of integers (in the case of cohomology) or a binary answer (with respective to a given Kahler class, the bundle is either stable or not). The bruteforce way involves the usual spectral sequences and determining all coboundary maps or finding the lattices of subsheafs, expensive by any standards. In the case of stability checking, this is an enormous effort to arrive at a yes/no query. With increasing number of explicitly known examples of stable bundles constructed from first principles, to deep-learn this and then estimate the probability of a given bundle being stable would be tremendous time-saver.
To give an idea the high non-triviality of our venture, suppose we wanted to know how many CY3 can be constructed from the famous Kreuzer-Skarke (KS) list of 473 million reflexive polytopes. Only recently [21] was a systematic triangulation carried out on a cluster, up to h 1,1 = 7 (above which even the state-of-art com-puter is powerless), and 100, 000 manifolds were found from 25, 000 polytopes. The KS list has h 1,1 going up to 496, thus we have not even touched the tip of the iceberg in answering the simplest question of enumerating CY3s. Here, the NN would be extremely useful in predicting an estimate, having learnt the data from [21], which already took ∼ 5000 core-hours with traditional methods on the cluster; this is currently under investigation.
We hope the reader has been persuaded by not only the scope but also the feasibility of our proposed paradigm, a paradigm of increasing importance in an Age where even the most abstruse of mathematics or the most theoretical of physics cannot avoid compilations of and investigations on perpetually growing datasets. The case studies of deep-learning such landscape of data here presented are but a few nuggets in an unfathomably vast gold-mine, rich with new science yet to be discovered.