Deep Learning the Hyperbolic Volume of a Knot

: An important conjecture in knot theory relates the large-N , double scaling limit of the colored Jones polynomial J K,N ( q ) of a knot K to the hyperbolic volume of the knot complement, Vol( K ). A less studied question is whether Vol( K ) can be recovered directly from the original Jones polynomial ( N = 1). In this report we use a deep neural network to approximate Vol( K ) from the Jones polynomial. Our network is robust and correctly predicts the volume with 97 . 5% accuracy when training on 30% of the data. This points to the existence of a more direct connection between the hyperbolic volume and the Jones polynomial.


Introduction
Identifying patterns in data enables us to formulate questions that can lead to exact results.Since many of these patterns are subtle, machine learning has emerged as a useful tool in discovering these relationships.In this work, we apply this idea to invariants in knot theory.
A knot is an embedding of a circle in the 3-sphere S 3 .These objects play important roles in a wide range of fields including particle physics, statistical mechanics, molecular biology, chemistry, sailing, and art [1][2][3][4][5][6].Figure 1 depicts several well known knots.Knot invariants, which distinguish knots from each other, are independent of how a knot is drawn on the plane (the knot diagram).Determining relationships between these invariant quantities is a central theme of knot theory.See Appendix A for a brief When strands of the knot cross, the diagram keeps track of which strand is on top and which strand is on the bottom, so the diagram captures all information about the 3-dimensional embedding.For the figure-eight knot, the Jones polynomial in our conventions is J figure-eight (q) = q −2 − q −1 + 1 − q + q 2 , and the hyperbolic volume is approximately 2.02988.Image taken from Wikimedia Commons.overview of the invariants discussed in this work.Perhaps the most famous invariant of a knot K is the Jones polynomial J K (q), which is a Laurent polynomial with integer coefficients.The original definition of the Jones polynomial was combinatorial [7], but an intrinsically geometric definition and generalization was discovered soon thereafter [1].The generalizations found in [1] are known as "colored" Jones polynomials and represent a family of knot invariants J K,N (q) labeled by a positive integer N called the color.The special value N = 1 recovers the Jones polynomial.While the Jones polynomial is defined for any knot, some invariants exist only for subsets of knots.An example of such an invariant is the hyperbolic volume of the knot's complement, denoted Vol(K).It is defined only if the manifold obtained by drilling out the knot from its 3-dimensional ambient space admits a complete hyperbolic structure.The vast majority of knots are hyperbolic [8], and we will restrict our attention to this case.An important open problem in knot theory is to establish a conjecture that relates J K,N (q) to Vol(K).The volume conjecture [9][10][11] asserts that lim The main idea of the volume conjecture is that the colored Jones polynomial in the large color limit contains information about the volume of K.
One might wonder if this property of the colored Jones polynomials extends to the original Jones polynomial.Evaluating the Jones polynomial at q = −1, there is a surprising approximately linear relationship between log |J K (−1)| and Vol(K), but this correlation seems to apply only to a particular class of knots [12].Additionally, the so-called "volume-ish" theorem [13] gives upper and lower bounds on Vol(K) in terms of certain coefficients appearing in J K (q).An improved relationship is achieved [14] by replacing J K (−1) with the reduced rank of the Khovanov homology, a homology theory H K whose graded Euler characteristic is J K (q).The cost of this improvement is that the Khovanov homology is a much more refined invariant of K than the Jones polynomial, and one needs to work much harder to compute it [15].The most optimistic interpretation of these results is that there is a nonlinear relation A mapping J K (q) to Vol(K) along the lines of Eq. (1.1), but perhaps not quite as simple to write.In this report, we provide evidence for this idea by directly estimating A using a simple two hidden layer fully connected neural network.

Setup and Result
A neural network is a function which is constructed by training on several examples.Suppose that we have a dataset D = {J 1 , J 2 , . . ., J m }, and to every element of D, there is an associated element in another set S: In our case, the J i are the Jones polynomials of knots, and the v i are the volumes of those knots. 1A neural network f θ is a function (with an a priori chosen architecture) which is designed to approximate the associations A efficiently; the subscript θ denotes the internal parameters, called weights and biases, on which the neural network depends.In order for the network to learn A, we divide the dataset D into two parts: a training set, T = {J 1 , J 2 , . . ., J n } chosen at random from D, and its complement, The neural network is taught the associations on the training set by tuning the internal parameters θ to approximate A as closely as possible on T .In general, f θ (J i ) = v i without overfitting the data.We must instead minimize a suitably chosen loss function that captures the difference between the two.Finally, we assess the performance of the trained network by applying it to the unseen inputs J i ∈ T c and comparing f θ (J i ) to the true answers v i = A(J i ).See Appendix B for more details about neural networks and our particular architecture and implementation.
Neural networks of appropriate size can approximate any function [16] and in general are composed of layers which perform matrix multiplication, bias vector addition, and a nonlinear activation function σ which acts element-wise on vectors.After encoding the Jones polynomial J K (q) in a vector J K consisting of the integer coefficients and the maximum and minimum degree of the polynomial, our network can schematically be written as where W j θ and b j θ are the weight matrices and bias vectors, respectively, of the j th hidden layer and the summation simply adds up the components of the output vector.The input layer is padded with zeroes so that the vectors are of uniform length.In our case, the inputs are vectors of length 17.The hidden layers have 100 neurons each, and the final output layer is a summation over the output of the second hidden layer.In the language of Eq. (2.2), W 1 θ is a 100×17 matrix, b 1 θ is a length 100 vector, W 2 θ is a 100×100 matrix, and b 2 θ is a length 100 vector, all with variable entries that are determined by training the network on data.For data, we use a table of Jones polynomials and hyperbolic volumes for 59, 928 knots obtained from the online databases Knot Atlas [17] and SnapPy [18].This includes all hyperbolic knots in Knot Atlas up to 14 crossings.We implement and train f θ in Mathematica 11 [19] using built in functions that are completely unoptimized for the problem under consideration.The loss function is proportional to the squared error in volume, and parameters are adjusted by stochastic gradient descent.We follow the usual protocol for neural network training [20]: the network is shown a set of training data, and loss function gradients on this set are used to adjust the network parameters θ via backpropagation.
With this architecture, our network performs significantly better than any previous method of estimating Vol(K) from J K (q).Its simplicity and robustness suggest the existence of an almost exact nonlinear relationship between the two invariants which is more complicated than Eq.(1.1), but not by much.
In Figure 2, we show a plot of the accuracy of our trained model compared with the volume-ish bounds and Khovanov homology rank methods.Random selection within the volume-ish bounds leads to an enormous error (Figure 2a).This is because the volume-ish bounds are fairly loose, and the range of allowed volumes is wide enough that large errors become unavoidable due to random selection.The theorem applies only to knots for which crossings alternate between underhand and overhand, so we have restricted to this subset.The Khovanov homology rank, on the other hand, applies more generally and can predict the volume with a mean error of approximately , 259 alternating knots using the volume-ish theorem.The predicted volumes were obtained by selecting a random real number in the allowed range prescribed by [13].(b) Prediction for 37, 667 knots (the subset of knots for which the Khovanov homology rank was readily available).The predicted volumes were obtained by fitting a linear function to the set of points defined by (log(rank(H K ) − 1), Vol(K)) and then applying that function to log(rank(H K ) − 1).(c) Prediction for all 59, 928 knots using the neural network f θ .The predicted volumes were obtained by training f θ on 30% of the data and then applying f θ to all of the Jones polynomials.
5.55% (Figure 2b).However, even the Khovanov homology rank predictions show a large spread around the perfect prediction line.In Figure 2c, we show our network's performance.We compute the relative error where K are knots belonging to the complement of the training set.Averaging over 100 runs, the relative error is 2.53 ± 0.08% when training on 30% of the data.This error increases to 3.0% when training on 5% of the data and rises to only 4.4% even if we train using just 1% of the data.The neural network analysis applies to all knots in the database.We notice that the spread between the prediction and the actual value decreases at larger volumes.In part, this is because there is more data here as the number of possible knots and the mean volume both increase with crossing number.Figure 3 illustrates how little input is needed for the network to learn the correlation between the Jones polynomial and the hyperbolic volume: the horizontal axis gives the size of the training set as a fraction of the complete dataset, and the vertical axis gives the average relative error.This is an instance of probably approximately correct learning [21].
Since it can extract very predictive features from small subsets of the data, this suggests f θ is learning something fundamental that connects the Jones polynomial to the hyperbolic volume.Indeed, 1% of the data is already enough to teach our network more (in terms of lower average error) about the hyperbolic volume than is known by the Khovanov homology rank function of [14], despite the fact that H K is a more refined knot invariant than J K (q), and therefore intuitively we would expect it contains more information about the volume.Perhaps a neural network architecture which takes in aspects of the Khovanov homology as an input would perform even better in predicting the hyperbolic volume.
The performance of our very simple network is robust in the sense that adding extra layers, adding or removing a few neurons in each layer, changing the activation functions, and changing the loss function all have negligible effects on the resulting trained network accuracy.The training of f θ is relatively smooth and occurs quickly.It can be accomplished on a laptop in under 3 minutes.We plot the loss versus the number of training rounds in Figure 4.The neural network learns how to predict the This feature persists for much smaller training set sizes (1%, 5% of data).These observations support our conclusion from the error rate discussion and again suggest that the network can learn robust features after seeing just a small amount of data for a short amount of time.
The training data must be representative, however.The volume of the knot complement serves as a proxy for the complexity of a knot.Training only on the 25% of knots with the smallest volume, the neural network underpredicts the volumes of the remaining knots.The error is 14.0%.Seeding the training set with a small sampling of the higher volume knots restores the performance of the network.See Appendix C for other experiments we performed.

Discussion
We have shown that a relationship between the (uncolored) Jones polynomial J K (q) and the hyperbolic volume Vol(K) similar in spirit to the volume conjecture Eq. (1.1) can be learned quickly and robustly using a deep neural network with only two hid-den layers.We now comment on some implications of our findings for knot theory and theoretical physics as well as potential directions for future work.Perhaps the most obvious question is whether there is really a not-so-complicated function A which exactly computes Vol(K) from J K (q) with small corrections coming from other knot invariants.There is some evidence suggesting that underlying the relationship between J K (q) and Vol(K) is the theory of Khovanov homology [14].Recent work [22] shows that Hodge numbers of complete intersection Calabi-Yau threefolds can be computed by neural network classifiers and support vector machines in polynomial time offering a considerable simplification over traditional Gröbner basis methods, which are by comparison doubly exponential in time.The Hodge numbers are dimensions of cohomology groups.The existence of an underlying homology or cohomology theory could be a crucial aspect to machine learning this class of problems.
In theoretical physics, colored Jones polynomials appear as expectation values of Wilson loop operators in Chern-Simons theory [1].The volume conjecture has an interpretation [11] in this context as a relationship between a double scaling limit of SU (2) and the weak coupling limit of SL(2, C) Chern-Simons theory.In this report we demonstrate a potential connection between the strong coupling limit of SU (2) and the weak coupling limit of SL(2, C) Chern-Simons theory.Can other relationships between coupling regimes of topological quantum field theories be found using these neural network techniques to analyze expectation values?The intimate association between knot invariants and Gromov-Witten invariants [23,24] indicates that new insights about topological strings can also be gained by adapting machine learning techniques.It might also be interesting to apply machine learning techniques to investigate the quantum entanglement structure of links studied in [25,26].Recently, [27][28][29][30] have pioneered investigations of the string landscape with machine learning techniques.Exploring the mathematics landscape in a similar spirit, we expect that the strategy we employ of analyzing correlations between properties of basic objects can suggest new relationships of an approximate form.
Simons Collaboration, and the DoE contract FG02-05ER-41367.All data used in this analysis is publicly available on the Knot Atlas [17] and SnapPy [18] websites.

A Overview of knot invariants
In this section, we give a brief overview of the knot invariants of direct relevance to this work, namely the Jones polynomial and the hyperbolic volume (and also to some extent the Khovanov homology).All of these are topological invariants of a knot in the sense that they do not depend on a specific two-dimensional drawing of the knot, but depend only on its topology.Let us begin with the Jones polynomial.This is defined using the Kauffman bracket K , where K is the knot in question.The Kauffman bracket satisfies three conditions: (1) ∅ = 1, (2) K = −(A 2 + B 2 ) K where is the unknot, A = q 1/4 and B = q −1/4 , and (3) the smoothing relation shown in Figure 5.These rules allow us to uniquely associate a Laurent polynomial in q to every h i h i smoothing of the knot, and the sum of all these terms (i.e., over all the smoothings) is the Kauffman bracket (see [31] for details).The Jones polynomial is then equal to the Kauffman bracket up to an overall normalization constant: where w(K) is the writhe of K, the number of overhand crossings minus the number of underhand crossings.It was famously shown by Witten [1] that the Jones polynomial of a knot K can also be thought of as the expectation value of a Wilson loop operator along K in SU (2) Chern-Simons theory.Since Chern-Simons theory is a (three-dimensional) topological quantum field theory, this gives a manifestly three-dimensional perspective for why the Jones polynomial is a topological invariant of the knot.Interestingly, the Jones polynomial also turns out to be a polynomial (in powers of q) with integer coefficients.This fact was later explained by Khovanov homology.Very briefly, the Khovanov homology can be thought of as a categorification of the Jones polynomial.
In Khovanov homology, one defines a Khovanov bracket in analogy with the Kauffman bracket, but we associate a tensor power of a graded vector space with every smoothing of the knot.By taking certain direct sums of these vector spaces and defining a suitable differential operator between them, we build a chain complex.It can then be shown that the Jones polynomial is the graded Euler characteristic of this complex, and thus the coefficients of the Jones polynomial are the dimensions of the vector spaces which appear in the chain complex.For more details, see [15,32].
The other knot invariant which is central in this work is the hyperbolic volume of a knot.For any knot K in S 3 , the knot complement is defined as the manifold M K = S 3 − K.More precisely, we remove a tubular neighborhood of the knot from S 3 .Knots for which the knot complement admits a complete hyperbolic structure are called hyperbolic knots.For such a knot K, the complete hyperbolic structure on the knot complement M K is unique, and the corresponding volume of M K is called the hyperbolic volume Vol(K) of the knot.The standard way to compute the hyperbolic volume (following [8]) is to find a tetrahedral decomposition of the knot complement.Each tetrahedron can then be embedded in hyperbolic space, up to the specification of one complex number, often called the shape parameter of the tetrahedron.Requiring that all the tetrahedra in the knot complement fit together without any curvature singularities gives a set of algebraic constraints on the shape parameters, which can then be solved to obtain the shape parameters, and thus the desired hyperbolic structure.The volume of the knot complement is then the sum of the volumes of the individual tetrahedra.
The Jones polynomial by itself is not sufficient to identify a knot uniquely.For example, the knots 4 1 (the figure-eight knot) and K11n19 have the same Jones polynomials but different volumes (the converse can also occur).There are 38, 679 unique Jones polynomials in our dataset.

B Neural networks B.1 Generalities
Our aim is to construct a function f θ which approximates the relation as closely as possible.We use a deep neural network to achieve this.A neural network f θ is a (generally nonlinear) map from an input data vector v in ∈ D to an output data vector v out ∈ S, where θ labels the internal parameters which the map involves.In our case, the input vectors are the Jones polynomials of the knots in our database, while the outputs are their corresponding volumes (so S = R).We divide the input vectors D into the training set T and its complement T c .Given the relation A : T → S on the training set, the idea is to tune the parameters θ in such a way that f θ reproduces A on the training dataset as closely as possible.This is typically accomplished by picking some loss function h(θ), such as h(θ , where the sum is over v (i) in ∈ T , and then minimizing h(θ) in the space of the parameters to find the point in parameter space at which the loss function is minimized.Having done so, we then apply the function f θ to the set T c (which is so far unseen by the neural network) to test how well it approximates A on it -this ensures that f θ is not trivially overfitting the data.It is known that neural networks of appropriate size can approximate any function [16].Figure 6: An example of a two hidden layer fully connected neural network architecture.Each hidden layer is shorthand for a matrix multiplication followed by a bias vector addition followed by an element-wise activation function; in our network, we use the logistic sigmoid function.The final layer simply sums the components of the second hidden layer's output.Our network f θ takes an input vector of size 17, has two 100 neuron hidden layers, and a final summation output layer.
Several interesting architectures of neural networks have been studied, but a simple architecture which will suffice for our purposes is the fully connected network (see Figure 6).In this architecture, the network is composed of hidden layers which perform matrix multiplication and bias vector addition followed by element-wise application of an activation function σ.The network can thus be schematically written as where W m θ and b m θ are the weight matrices and bias vectors (respectively) of the m th hidden layer, and σ( v) a = σ( v a ), (B.3) with a being the vector index on the appropriate internal state.As stated previously, the idea is then to minimize the loss function on the training data by appropriately tuning the parameters W m θ and b m θ .This is achieved by using the backpropagation algorithm, which computes gradients of the loss function for each training data point and adjusts the parameters layer by layer in the network.

B.2 Details of the network
As mentioned in the main text, the particular network we used in order to study the hyperbolic volume is of the form where ) is a vector representation of the Jones polynomial As mentioned before, the specific values of these internal parameters were found by training the network on a portion of the dataset, which can be implemented in Mathematica by the command NetTrain.The loss function is simply the mean squared error between the predicted volume and the true volume of the training example.We then test the accuracy of our network by applying it to the unseen knots in T c .
We are being conservative in estimating the error of our trained network.This is because the dataset contains several instances of knots with the same Jones polynomials but different volumes, i.e., the association A we seek to teach the network to reproduce is not a function.Therefore, it may be that the network is taught one of the values of the volumes for such a Jones polynomial and then tested on a different value.We can repeat our analysis by keeping only the set of unique Jones polynomials within the dataset; when a Jones polynomial corresponds to several knots with different volumes, we select the volume of a randomly selected knot among these.In performing this experiment, we find that the relative error is unchanged.This could imply that the volume has the schematic form v i = f (J i ) + small corrections which depend on other invariants, (B.8) and the success of the network is due to it learning f very well.An examination of knots with the same Jones polynomial shows that the volumes tend to cluster; they differ on average by 3.1%.This is consistent with Eq. (B.8) above.The deviation is larger for knots with smaller volumes, which is also consistent with the spread in the network predictions for small volumes in Figure 2c.
Instead of listing out the weight matrices W i θ and the biases b i θ , which is difficult and unilluminating because of their size, we will show some of their properties in the Figures 7, 8 and 9 below.Notably, from the small error bars on the spectra of the weight matrices, it is evident that these are not random matrices, but are certain specific matrices which are central to the relation between the hyperbolic volume and the Jones polynomial.On the other hand, the large error bars on the biases suggest that they might not play a crucial role in the network.The plots here correspond to training on 30% of the total dataset, but similar plots for other fractions have the same profile.In particular, the largest eigenvalue by magnitude in Figure 8 is essentially unchanged.We also trained f θ using a rectified linear unit activation function, but this network performed noticeably worse than the logistic sigmoid network.It would be interesting to understand why this occurred, since the rectified linear unit has become relatively standard in the wider machine learning community.It may be that our learning problem does not suffer from the vanishing gradient problem which rectified linear units resolve.θ .Note that the largest magnitude eigenvalue is always real and negative.This may be a consequence of a generalized version of the Perron-Frobenius theorem.

C Other experiments
There are many knot invariants known, and we tried to find other such relationships using similar techniques to those discussed in this paper.These experiments had varying success.We failed to reproduce the hyperbolic volume when training our network on the braid words, which capture all information about the knot in a compressed form.We also failed to predict the Chern-Simons invariant (which is the imaginary part of the integral of the Chern-Simons three-form on the knot complement) from the Jones polynomial.We succeeded up to 10% error in reproducing the minimum and maximum degrees of the Jones polynomial from the braid word.We attempted to learn the volume-ish theorem of [13] but the results were inconclusive.We also attempted to learn a compressed form of the Jones polynomial from pictures of the knots using a convolutional neural network, but this did not work.We did not have enough data to attempt any learning on the A-polynomial of [33], but it may be worth pursuing because it is more obviously connected to the hyperbolic volume.The relationship between the Jones polynomial and volume is particularly striking in light of these failures.Further investigation along these lines is warranted (see for instance [34] for similar ideas).

Figure 1 :
Figure 1: From left to right: the unknot, trefoil knot, figure-eight knot, and cinquefoil knot.When strands of the knot cross, the diagram keeps track of which strand is on top and which strand is on the bottom, so the diagram captures all information about the 3-dimensional embedding.For the figure-eight knot, the Jones polynomial in our conventions is J figure-eight (q) = q −2 − q −1 + 1 − q + q 2 , and the hyperbolic volume is approximately 2.02988.Image taken from Wikimedia Commons.

Figure 2 :
Figure 2: Scatterplots of predicted volume versus actual volume for various prediction methods with dashed black lines denoting perfect prediction.(a) Prediction for 26, 259 alternating knots using the volume-ish theorem.The predicted volumes were obtained by selecting a random real number in the allowed range prescribed by[13].(b) Prediction for 37, 667 knots (the subset of knots for which the Khovanov homology rank was readily available).The predicted volumes were obtained by fitting a linear function to the set of points defined by (log(rank(H K ) − 1), Vol(K)) and then applying that function to log(rank(H K ) − 1).(c) Prediction for all 59, 928 knots using the neural network f θ .The predicted volumes were obtained by training f θ on 30% of the data and then applying f θ to all of the Jones polynomials.

Figure 3 :
Figure 3: The neural network quickly converges to optimal performance while training on a given fraction of the total dataset of 59, 928 knots.Data points and the associated error bars are computed from averaging over 20 trials each of which is trained on a randomly selected sample of the dataset.

Figure 4 :
Figure 4: Average loss versus number of training rounds for both training (orange curve) and test (blue curve) datasets.The training set was 30% of the data, chosen at random, and the test set was the complement.The loss function can be viewed as a proxy for the error rate, and in our setup the two are proportional.

Figure 5 :
Figure 5: The smoothing relation in the definition of the Kauffman bracket.Each of the terms appearing on the right hand side refers to a choice of smoothing of the crossing on the left hand side.

Figure 7 :
Figure 7: (Left) The eigenvalues of the matrix (W 1 θ ) T W 1 θ , where W 1 θ is the weight matrix of the first layer.The spectrum was averaged over 10 runs, with the error bars marking the standard deviations.(Right) The biases b 1 θ of the first layer averaged over 10 runs, with the error bars marking the standard deviations.

Figure 8 :
Figure 8: (Left) The absolute values of the eigenvalues of the weight matrix W 2 θ of the second layer.The spectrum was averaged over 10 runs, with the error bars marking the standard deviations.(Right) The phases of the eigenvalues of the weight matrix W 2θ .Note that the largest magnitude eigenvalue is always real and negative.This may be a consequence of a generalized version of the Perron-Frobenius theorem.

Figure 9 :biases b 2 θ
Figure 9: The biases b 2 θ of the second layer averaged over 10 runs, with the error bars marking the standard deviations.