Machine Learning Lie Structures&Applications to Physics

Classical and exceptional Lie algebras and their representations are among the most important tools in the analysis of symmetry in physical systems. In this letter we show how the computation of tensor products and branching rules of irreducible representations are machine-learnable, and can achieve relative speed-ups of orders of magnitude in comparison to the non-ML algorithms.


I. INTRODUCTION & SUMMARY
Lie algebras are an integral part of modern mathematical physics. Their representation theory governs every field of physics from the fundamental structure of particles to the states of a quantum computer. Traditionally, an indispensable tool to the high energy physicist is the extensive tables of [1]. More contemporary usage, with the advent of computing power of the ordinary laptop, have relied on the likes of highly convenient software such as "LieART" [2]. Such computer algebra methods, especially in conjunction with the familiarity of the Wolfram programming language to the theoretical physicists, are clearly destined to play a helpful rôle.
In parallel, a recent programme of applying the techniques from machine-learning (ML) and data science to study various mathematical formulae and conjectures had been proposed [3,4,10]. Indeed, while the initial studies were inspired by and brought to string theory in timely and independent works in [3][4][5][6][7], experimentation of whether standard techniques in neural regressors and classifiers could be carried over to study diverse problems have taken a life of its own. These have ranged from finding bundle cohomology on varieties [6,8,9], to distinguishing elliptic fibrations [13] and invariants of Calabi-Yau threefolds [11], to machine-learning the Donaldson algorithm for numerical Calabi-Yau metrics [23], to the algebraic structures of discrete groups and rings [12], to the BSD conjecture & Langlands programme in number theory [14][15][16], to quiver gauge theories and cluster algebras [21], to patterns in particle masses [18], to knot invariants [17], to statistical predictions and modelbuilding in string theory [19,20,25], to classifying combinatorial properties of finite graphs [24] etc. Moreover, the very structures of quantum field theory and holography [26][27][28] have also been proposed to be closely related to suitable neural networks.
In this letter, we continue this exciting programme and apply machine learning techniques to another indispensable concept for physicists, namely the ubiquitous continuous symmetries as encoded by Lie groups/algebras. Physicists have long used them to classify from the phases of matters to the spectrum of elementary particles. As listed earlier, machine learning techniques have provided us with a powerful new approach towards various classification problems of physical interests 1 . Here we would like to ask the whether the essential structures of Lie group can also be learned by machine. Specifically, by this we mean whether neural nework (NN) classifiers and regressors can, after having seen enough samples of typical calculations such as tensor decomposition or branching rules -both known to be heavily computationally expensive, as we will shortly see -predict the result more efficiently.
As a comparison, let us also mention a somewhat surprising result from [12], where some fundamental structures of algebra, viz., certain properties of finite groups and finite rings, seem to be machine-learnable. Difficult problems in representation theory such as recognizing whether a finite group is simple or not by "looking" at the Cayley multiplication table, or whether random permutation matrices (Sudoku) possess group structure, etc., can be classified by a support vector machine very efficiently without recourse to the likes of Sylow theorems which are computationally expensive.
In this letter we are motivated by the question of whether and how much one can machine-learn the essential information about classical, and exceptional Lie algebras as tabulated in standard texts such as Slansky [1]. Specifically, we address the two fundamental problems in the representation theory of Lie algebras that is crucial to physics -the tensor product decomposition and the branching rules to a sub-algebra -and show that these salient structures are machine learnable.
In particular we show that a relatively simple forwardfeeding neural network can predict to high accuracy 1 There have been other interesting works on detecting physical symmetries using machine-learning [22,30]. arXiv:2011.00871v1 [hep-th] 2 Nov 2020 and confidence, the number of irreducible representations ("irreps") that appear in a tensor product decomposition, which we refer to as the length of the decomposition. Our findings for classical and exceptional algebras are summarized in Tables I and II. We subsequently show that a neural network can also predict with high accuracy, the presence or absence of a given irreducible representation of a maximal sub-algebra within an irreducible representation of a parent algebra. The neural network is capable of predicting, for example, the presence of bifundamentals in SU (3)×SU (2) for a given representation of SU (5) to an accuracy of 88% and a confidence of 0.735. We remark that our classification problems were also addressed with various standard classifiers, such as Naive Bayes, nearest neighbours and support vector machines. It is found that the NN with the architecture shown below in Fig. 2 significantly out-performed them. Whilst these aforementioned classifiers have the advantage of relative ease in interpretability, NNs with similar architures have been found to perform well for a variety of problems, such as the computation of topological invariants of manifolds [3,19], and finite graph invariants [24].

II. TENSOR PRODUCTS AND BRANCHING RULES AS LEARNED BY A NEURAL NETWORK.
A. Tensor Products

Predicting the length of generic tensor decompositions
Let us begin with a simple ML experiment. One of the most important computations for Lie groups/algebras is the decomposition of the tensor product of two representations into a direct sum of irreducible representations for a given group G: a r ∈ Z ≥0 are the multiplicity factors. To be concrete, let us first consider A m = SU (m + 1). Every irreducible representation ("irrep") of A m is specified by a highestweight vector v, which is a rank m vector of non-negative integer components. Throughout this letter, we will use to denote the weight vector for a Lie algebra of rank r. When the context is clear, an integer with the vector over-script is understood to be a vector of the same integer entry, e.g., 4 = (4, 4, . . . , 4). As the entries of v increase in magnitude, the dimension of the corresponding irrep R v can grow dramatically. For instance, for This makes the task of identifying the precise irreps contained in a tensor decomposition rather laborious.
We start with two weight-vectors v 1 , v 2 . Their rank m is chosen randomly from {1,2,...,8}. Then, we randomly generate a pair quinary vectors v 1 , v 2 of rank m, and compute their tensor decomposition into irreducible representations: This computation, although algorithmic, is non-trivial.
Even the relatively simple question of how many distinct irreps, along with their multiplicities, are there on the RHS or what we call the length of a given tensor decomposition, is not immediately obvious just by looking at the the vectors v 1 and v 2 . For example, 3) It is difficult to see a priori that one decomposition would be of length 3 while the other would be of length 5; and one needs to actually compute the respective tensor decomposition to know the answer. It took several hours using LieART to perform five thousands decompositions 2 . To get an idea of their distributions, we show the histogram of the length: indeed there is a huge variation from 1 to over 350. A significant improvement in the running time (from hours to a few minutes) can be attained by capping off the maximum dimension of the irreps (say to 10,000). The distribution of the lengths of the decompositions vs frequency histogram, is depicted in figure 1. Let us next consider a simple binary classification problem using the data generated by LieART: can ML distinguish tensor decompositions of length ≥ 70 and of length < 70? The length 70 is chosen since it splits the data rather evenly into around five thousands each. To uniformize the input vectors, for the rank m < 8, we also pad both v 1,2 to the right with −1 (a meaningless number in this context) and stack them on top of each other. Thus, our input is a 2 × 8 matrix with integer entries for 1 ≤ m ≤ 8. This step is essential for using a single NN for learning data for Lie algebras of varying ranks (it is for A m , 1 < m < 8 here). For the majority of our experiments, we use a feed-forward neural network classifier built in Mathematica with the architecture shown in Figure 2. We also reproduced these results with a similar 2-layer architecture on Keras [31], with selu activated neurons to obtain similar accuracy and confidence.
Finally, we need to ensure that the last softmax is rounded to 0 or 1 according to our binary categories. With learning rate 1/1000, ADAM optimizer and 100 training rounds, we find that at 20% training and 80% complementary validation, we achieve accuracy 0.969, with confidence 0.930, within one minute; which is excellent indeed. Throughout this letter, we will use "accuracy" to mean percentage agreement of predicted and actual values. In addition, in discrete classification problems it is also important to have a measure of "confidence" so that false positives/negatives can be noted. A widely used one is the so-called Matthews' Phi-coefficient φ (essentially a signed square-root of the chi-squared of the contingency table) [32]. When φ is close to +1 it means prediction with good confidence; when it is close 0, it means the prediction is random and useless; and when it is close to −1, it means the prediction is in anticorrelation. The results of our training and learning for the A m and G 2 cases are depicted in figures 3 and 4. The plots show a steady lowering in both the error-rate and loss-function as we increase the number of rounds of training and validation. For 100 rounds of training, we achieve 5% error rate, and 0.1 loss function.
The above experiment was also carried out with other classical, as well as exceptional Lie algebras with comparable success. For the classical cases B m = SO(2m + 1), C m = Sp(2m), D m = SO(2m) the results are given in table I. We generated the same data size as in the A m case, i.e. 5000, and used the same cap on the maximum dimension of the irreps (10,000). The accuracy of ML prediction was above .95 for each of these cases.   caused by the low number of points available at low dimensions due to its relatively high rank: 903 data points below dimension of 120,000. Raising the dimension cap would improve the machine-learning, bringing it up to par with others, however the corresponding data generation using LieART would take days.

Extrapolating the lengths of tensor products from NN
We can take the experiment from the previous subsection one step further. This time we would like to train the neural net on low dimensional tensor decomposition data, then test its performance on higher dimensional cases. If successful this would immensely reduce the computation time. For example, obtaining the length of decomposition for two A 6 weight vectors v 1 = v 2 = (2, 2, 2, 2, 2, 2) by brute force takes over 15 minutes on LieART. If successful, machine learning should give the same result within seconds! We trained the NN in figure 2 on the same data for A m generated by LieART for the previous experiment. However, the training set is now restricted to have both input weight vectors of dimension less than certain cutoff value, here taken to be 2,000. The validation was done on the complementary dataset with input weight vectors of dimension ranging between 2,000 and 10,000. The resultant accuracy was .895 with a confidence of .779. In conclusion, indeed the NN seems to be able to extrapolate fairly accurately the length of tensor decomposition from lower to higher dimensional representations. We repeated this experiment for all the classical Lie algebras, and achieved comparable values of precision, and confidence. The results of the NN performance are reported in table III. Note that the splitting point of the length below/above which we classify our data (named classifying length in the table III) was modified with respect to the previous experiment (where it was left fixed). This is necessary as for B m , C m , D m the length of decompositions below the capped dimension tend to be lower than those for A m .
A refinement with Keras: So far we split the data into a training and test/validation set, where the training set was drawn from low dimensional representations and the test set was drawn from high dimensional representations. Since the experiment was successful, we propose a stronger test. Namely, we draw the training and test sets from low dimensional data, and validate the trained  neural network on high dimensional data. A good performance on the validation set, which the neural net has  never seen during training, is the gold-plated signal that the neural net has indeed learnt the tensor product rules.
On the flip side now that the training/test set is much smaller, the problem is rife with the risk of overfitting. Indeed, a naive choice of the cross-entropy loss function leads to overfitting on the training set. However, the loss function may be regularized with L1 and L2 terms to mitigate this problem, and we obtain good results, presented below in  IV: Accuracy for the length of the tensor product decomposition for A m − D m and G 2 algebras, when training/testing with low dimensional irreps, and validating on high dimensional irreps. With the exception of C m algebras, the validation accuracy is comparable to the train/test accuracy in all cases.

B. Branching Rules
The next task on which we would like to train our neural network is for it to learn about the branching rules for Lie algebras. Suppose we take a weight-vector of SU (5), and restrict its entries from 0 to 4 (i.e., as quinary 4-vectors). Even though this may look rather harmless, the dimension of the corresponding irrep ranges from 1 for 0, to 9765625 for 4. When we decomposed these irreps of SU (5) to those of its maximal sub-algebra SU (3)×SU (2)×U (1), and found their explicit branching products, the time taken on LieART was easily seen to be exponential 3 . In figure 6, we plot the log of the time taken in seconds, versus the length of the weight-vector. The best fit is the line −5.54361 + 1.69186x. By extrapolation, the single irrep of SU (5) corresponding to weight vector 10 would take over 20 years just to compute its 3 Notice that as LieART is only capable of generating branching rule data for maximal subgroups, here we will focus on this simplest set of branching training data to illustrate the capability of neural network.
In the rest of this section, we shall show the efficacy of using ML to predict presence/absence of any given representation of the maximal sub-algebra in a given irrep of SU (5) and G 2 algebras. For concreteness, we look for bifundamental representations of SU (3) × SU (2) (with arbitrary values of U (1) charges) in any given SU (5) irrep. In the G 2 case, we restrict ourselves to bi-fundamental representation of SU (2) × SU (2) maximal sub-algebra.
For the SU (5) branching, we use 1,000 irreps below 70,000 dimension as input vectors, with binary YES(=1)/NO(=0) output depending on presence/absence of a bi-fundamental rep of SU (3) × SU (2). Training is done by choosing 20% of the data, and validation on the complementary data. We end up with an accuracy of .880, and a confidence of .735. The performance of the NN improves drastically upon increasing the data set. Yet again, the data generation was restricted by the evaluation time on LieART.
For the G 2 branching, we used 400 weight input vectors with dimensions below 4.7 million. Analogous to the SU (5) case, the output was binary YES(=1)/NO(=0) depending on presence/absence of a bi-fundamental rep of SU (2) × SU (2). Here, we achieved an accuracy of .75. The performance was relatively worse w.r.t. the SU (5) case, again due to fewer data points.

III. OUTLOOK
Given the ubiquity of Lie algebras and groups in physics, let us end this letter with some comments about the vast possibilities in applications to physics of our results, exemplifying with two which immediately come to mind.
In scattering processes, given a pair of incoming particles transforming under the irreps of certain global symmetry group, the outgoing particles can be classified via their tensor decompositions. The tensor decomposition prediction and extrapolation results in section II A thus allow us to efficiently estimate the number of distinct outgoing particles. It would also be exciting to see if the NN upper bound estimate of the length of a given decomposition can help LieART package to work out its explicit terms within significantly shorter period.
Our choice of studying the branching of SU (5) into its maximal subgroup SU (3) × SU (2) × U (1) in section II B was phenomenologically motivated. This hopefully can lead to an useful algorithm for testing whether a field transforming under SU (5) GUT gauge group can yield descendants transforming under standard model gauge groups upon spontaneous symmetry breaking. We hope this will be useful for particle physics model building purposes.