Machine-Learning the Classification of Spacetimes

On the long-established classification problems in general relativity we take a novel perspective by adopting fruitful techniques from machine learning and modern data-science. In particular, we model Petrov's classification of spacetimes, and show that a feed-forward neural network can achieve high degree of success. We also show how data visualization techniques with dimensionality reduction can help analyze the underlying patterns in the structure of the different types of spacetimes.


I. INTRODUCTION & SUMMARY
What are the possible structures of spacetime? This is surely one of the most important questions in theoretical physics. Classification problems in general relativity have been an active field since the very beginning and have more recently been a focus of computer algebra systems [1,2]. Fully classifying and comparing Riemannian manifolds can be achieved through the Cartan-Karlhede algorithm [3]. The first step in this algorithm is to determine the Petrov [4] and Segre [5] types of the spacetime [1,6]. These methods analyze algebraic symmetries of the Weyl and Ricci tensor, respectively, and involve detailed study of roots and multiplicities of certain quartic equations. In particular, Petrov's classification of the Weyl tensor has been an integral part of the study of exact solutions to the Einstein equations. Here, we will illustrate a new computational approach that can be used in the Petrov classification problem, which can then be extended for a full classification of gravitational solutions.
Since the recent introduction of machine-learning and related techniques of modern data science, to study the string theory landscape [7][8][9][10][11], and more generally the vast landscape of pure mathematics [12][13][14][15][16], it is natural to address our present problem of spacetime classification under the auspices of this programme. The reader is also referred to the pedagogical introduction of machinelearning in theoretical physics and mathematics by [17,18] as well as references therein. Furthermore, detection of symmetries in physical systems relevant to our context, using machine-learning, is also discussed in [19][20][21][22][23][24][25].
In this letter, we apply some of these machine-learning (ML) techniques to Petrov's classification of spacetimes. Since the original formulation of the problem, many algorithms have been proposed to model the classification * hey@maths.ox.ac.uk † Juan.PerezIpina@maths.ox.ac.uk (see, for example, [6,[26][27][28][29]). These usually reduce the problem to finding the roots and multiplicites of a quartic equation where the parameters are a set of five complex Weyl scalars Ψ i (i = 0, ..., 4) in the Newman-Penrose formalism [30]. These Weyl scalars can easily be computed for any spacetime and the relations between the nonvanishing Ψ i determine the Petrov type of the manifold. Here we take this approach for building a supervised learning problem fit for ML tools. In Section II we give an overview of the problem and show how to represent the spacetime data in an expedient manner. We artificially generate numerical data to train and validate various ML classifiers. Specifically, we start by building different datasets of Weyl scalars {Ψ 0 , Ψ 1 , Ψ 2 , Ψ 3 , Ψ 4 }, with randomly generated entries, and then manually labeling each data point with its corresponding Petrov type. These datasets are later used in Section III to train several ML classifiers to see how well they learn and compare. We find that feed-forward neural networks (NN) are the most accurate classifiers for this problem, obtaining very high precision in only a handful of epochs. Moreover, in Section IV, we use other data science techniques, like Principal Component Analysis (PCA), to further study latent patterns in the data, that give rise to the Petrov classification. We show how data visualization tools can illustrate the intrinsic differences between spacetimes of distinct Petrov type. Finally, we discuss the results and future applications of this programme in Section V.

II. THE PETROV CLASSIFICATION
Petrov's classification of the algebraic symmetries of the Weyl tensor can be formulated as an eigenvalue problem for the Weyl tensor evaluated at some spacetime event. Alternatively, one can see it as a characterization of the Weyl tensor in terms of the principal null directions (p.n.d.) at that event [31] (see the Appendix for details into this approach). Depending on the amount The classification can be seen in Figure 1.
As is shown in the Appendix, one can see that the determination of principal null directions is equivalent to solving the following quartic equation for z: where Ψ i (i = 0, 1, 2, 3, 4) are the five complex Weyl scalars in the Newman-Penrose formalism, and are defined in (A.9). a. The n = 32 Cases: The vanishing of one or more of these Ψ i simplifies (II.1), and this has been a staple of most attempts to determine the Petrov type. This has been taken into account in many of the aforementioned algorithms where, starting with the work of [27], a parameter n was introduced to distinguish the 32 possible combinations of vanishing/non-vanishing Weyl scalars. Each of these 32 classes might have one or more Petrov types assigned to it. For a detailed list of the classes and their Petrov types, see Table I (where we've ordered the cases according to the number of vanishing Weyl scalars, and not on the value of n from [27]).
As can be seen from Table I, for the cases with 3 or more vanishing Weyl scalars the Petrov type can be immediately determined; this is not the case for the rest. When working with an arbitrary null tetrad, the Weyl vector {Ψ i } might be arbitrarily hard and it takes more work to determine the Petrov type. Of course, the Petrov classification is coordinate independent and specifically, does not depend on the choice of tetrad, as long as that frame is not singular [32]. To distinguish between the possible types at the bottom half of the table, there have been many analytical results as in [6,27] (building polynomials out of the remaining non-vanishing Weyl scalars). Since we want our classifier to handle completely general data (and work in any basis), we want to train in all possible cases of Table I Form" refers to the vanishing of the five quantities Ψ i : N signifies a non-vanishing entry and 0, a vanishing one. This table was based on the one at [6], where we also corrected some typos. FIG. 2: Architecture of the five-layer neural network. The hidden layers are alternating between tanh and sigmoid activated; the last layer corresponds to the softmax activation function. The hidden layers contain 500 nodes each, and the softmax has 6, corresponding to the six output classes.
b. Data Generation: For this purpose we treat {Ψ 0 , Ψ 1 , Ψ 2 , Ψ 3 , Ψ 4 } as a numerical five-vector, randomly generating the entries for every possible case and subcase in Table I.
We created different databases formed from integer, rational, real or complex entries. The latter two permit the creation of huge datasets uniformly distributed in a specific range (e.g. for the reals Ψ i ∈ {−10, 10}). Unfortunately, for the real and complex data points, some subcases where not possible to sample through purely random generation so the analytical results of [6,27] were used to generate this remaining data.
Specifically, for the real (or complex) dataset, 10, 000 points were collected from each n except the last case, NNNNN, where 20, 000 points were sampled. This amounts to a total size of 330, 000 data points, of different Petrov types. Notice that by doing this we are taking a different number of data points per Petrov type but this is consistent with how common it is to find each type. For example, for the real dataset the resulting tally of points per type can be seen in Table II. These vectors were then labeled by their corresponding Petrov type, through the implementation of the [26] algorithm in Mathematica [33].  II: Tally of data points per Petrov type for the dataset of real entries. The distribution of points per class is not homogeneous and this has to be taken into account when judging the efficiency of a classifier.

III. BUILDING A CLASSIFIER
For our supervized ML paradigm, the above dataset was randomly split in three groups: 70% for the training set, 15% for validation and 15% for testing. The first two are used to train the classifiers, while the testing set is used to evaluate the performance in never before seen data. Many different types of classifiers were trained and tested, including: decision trees (boosted), random forests, nearest neighbours, and more. While these methods achieved reasonable accuracies, the best results were obtained using feed-forward neural networks (NN), which we detail shortly. The non-linearity in a NN model is obtained through the choice of activation functions. For this problem we found the highest accuracy in the use of hyperbolic tangents and logistic sigmoids. While the problem can be modelled using a single hidden layer, we found higher accuracy in fewer epochs when using multiple hidden layers. In Figure 2 we show the architecture of our NN that combines both activation functions in multiple alternating hidden layers. It takes as input the five-dimensional 1 vectors of Weyl numbers, then goes through four hidden layers of 500 nodes each, with alternating activation functions: tanh and sigmoids. The specific numbers of nodes 1 Here and in the following we will use the dataset built from real entries for Ψ i . An analog analysis was produced for the complex dataset, where the input vector is ten-dimensional, after splitting in real and imaginary parts. Similar results in accuracy and confidence were found for the complex dataset. and hidden layers were also found to produce the highest accuracy results, but by no means do we claim this to be the most efficient configuration possible. Different choices of these hyperparameters (or other variables such as the optimizer, the label encoding or the learning rate) represent possible directions of improvements on this neural network. Finally, since we have here a multiclass classification problem, the last layer is a softmax layer, with 6 nodes for the 6 different classes. As mentioned above, the NN from Figure 2 was trained and optimized using the training and validation sets, and the testing set was used to determine its accuracy. The network was trained for 30 epochs, using a learning rate of 10 −3 , and the ADAM optimizer [34]. In Figure 3 one can see the steady decrease of the loss function and error rate, as the number of training rounds increases. We define accuracy as percentage agreement of predicted versus actual values. However, when dealing with imbalanced multi-class classification problems accuracy is not the most useful evaluation metric. To take these differences into account we define confidence then through the use of Matthew's Correlation Coefficient (MCC) ϕ generalized to the multi-class case 2 . In all, we achieved an accuracy of 0.979, confidence of 0.973, and a final loss of 6.61 × 10 −2 .
We can plot the confusion matrix to see the successes and mistakes for each class. This is a 6 × 6 integer matrix of the actual numbers in the Petrov class of (O, I, II, III, D, N ) versus the numbers as predicted by the ML classifier.

IV. DATA VISUALIZATION
Having successfully trained a neural network (as well as other classifiers) in learning the Weyl data, it is also interesting to see how our data looks, and what patterns we directly observe. This is very much in the spirit of conjecturing formulation via ML [15], to let ML algorithms detect patterns which might ab initio be hidden.

FIG. 5: A plot of the Confusion Matrix by our NN
classifier; we can see that it is heavily diagonal, signifying that the classification into the 6 Petrov types is extremely accurate.
For this we can follow a standard Principal Component Analysis (PCA) to dimensionally reduce the data to its highest-variance components so we can study the resulting two-dimensional plots.
We first obtain the principal components from the full unlabelled dataset and then reattach the corresponding labels (color-coding each distinct class for visualization). In Fig. 4 we can see the principal component representation for each Petrov type (not type O since it corresponds only to the vector {0, 0, 0, 0, 0} and therefore has no variation). Note that within the populated areas of the plots, some are more densely populated than others, reflecting the specific data generation procedure of Section II.
One can see how in the most general case, the data is spread out everywhere with no pattern in sight. As we increase the degeneration (that is, we move downward in Figure 1, the data starts settling into definite patterns.
In particular, types D and N have very specific shapes, illustrating the particularity of these cases. One can for instance superpose these figures and see exactly how one Petrov type degenerates into another, but for clarity we do not do this since overlaying will obscure many points in the plot.

V. OUTLOOK
In this work we have shown how to apply techniques from machine learning and data science in classification problems in general relativity. Taking as an example the Petrov classification of the Weyl tensor, we have adapted the problem to fit into the realm of supervised machine learning.
That is, our input consisted of randomly generated five-dimensional vectors representing the Weyl scalars {Ψ i } (i = 0, ..., 4), labelled with their corresponding Petrov type (I, II, III, D, N, O). We generated enough data points to consider all possible cases of non-vanishing Weyl components, as described by Table I, to have a set of base-independent training data. We designed a feedforward neural network to train on this data and achieved 98% accuracy with a confidence (MCC) of 0.973. This shows that with a very simple neural network, in only a handful of epochs, one can model the Petrov classification with a high degree of success. We also showed how data visualization and dimensionality reduction can help in analyzing the data itself and the patterns that underlie it. Both these directions can help illuminate the intricacies of the classification of spacetimes, shedding light on problems of numerical relativity or in the general study of solutions to the Einstein equations.
The Petrov classification is only a part of the general programme for classifying and comparing spacetimes. The procedure elucidated on this paper can easily be extended to model the Segre classification of the Ricci tensor, for another part of the puzzle. This then constitutes the first step in having a machine ready setup for the full classification of spacetimes, a machine learning formulation of the Cartan-Karlhede algorithm. With the development of online databases for exact solutions to the Einstein equations, the stage is set for a complete exploration of the power of these techniques in this important field.

VI. ACKNOWLEDGEMENTS
YHH would like to thank STFC for grant ST/J00037X/2. In this appendix we provide the conventions and mathematical background used to define Petrov's classification, basing our analysis on [1]. This also sets the notation for the main text, especially Figure 1.
A complex null tetrad is a choice of two real null vectors l, k and two complex conjugate null vectors m,m: and where the metric in this basis reads From this tetrad we can build a basis of bivectors with components that will be useful in the following. We remember that the Weyl tensor is the trace-free part of the curvature tensor, given by C abcd =R abcd + 1 2 (R bc g ad + R ad g bc − R bd g ac − R ac g bd ) + 1 6 R(g ac g bd − g ad g bc ) . (A.5) This tensor has the same symmetries as the Riemann curvature, with the added property of tracelessness. In general, it has ten independent components. For the classification it is useful to define the complex tensor where Now we can expand C * abcd in the basis (A.4) as with the five complex coefficients defined by Ψ 0 ≡ C abcd k a m b k c m d , Therefore, determining the ten independent components of the Weyl tensor in (A.5) is equivalent to determining the five complex scalars defined above. With regards to their physical interpretation: Ψ 0 and Ψ 1 represent transverse and longitudinal waves in the l direction, Ψ 2 a Coulomb-like component and Ψ 3 and Ψ 4 are longitudinal and transverse wave components in the k direction. Petrov's classification by Penrose [31] characterizes the Weyl tensor according to principal null directions k with the property There can be at most four such null vectors (p.n.d.'s). If a space-time admits four distinct p.n.d.'s it is called algebraically general (type I), otherwise it is algebraically special. If k is a member of the null tetrad then equation (A.10) is equivalent to Ψ 0 = 0. We can rotate to an arbitrary complex null tetrad (m ,m , l , k ), where the coefficient Ψ 0 undergoes the transformation: with z a complex number. So we see that the determination of principal null directions is equivalent to solving the quartic equation for z: showing that there can be indeed four (complex) roots to this equation, that do not need to be different. Depending on the amount and multiplicity of the p.n.d.'s we get the classification in Figure 1.