Unsupervised identification of topological phase transitions using predictive models

Machine-learning driven models have proven to be powerful tools for the identification of phases of matter. In particular, unsupervised methods hold the promise to help discover new phases of matter without the need for any prior theoretical knowledge. While for phases characterized by a broken symmetry, the use of unsupervised methods has proven to be successful, topological phases without a local order parameter seem to be much harder to identify without supervision. Here, we use an unsupervised approach to identify boundaries of the topological phases. We train artificial neural nets to relate configurational data or measurement outcomes to quantities like temperature or tuning parameters in the Hamiltonian. The accuracy of these predictive models can then serve as an indicator for phase transitions. We successfully illustrate this approach on both the classical Ising gauge theory as well as on the quantum ground state of a generalized toric code.


Introduction
Identifying phase transitions is one of the key questions in theoretical and experimental condensed matter physics alike. For the experimental characterization of thermodynamic phase transitions, there exists an excessive amount of possible tools, ranging from system specific, like the study of the conductivity in an electronic system, to very generic, like the specific heat. The latter is particularly appealing as it does not assume any prior knowledge: for example, structural transitions, the onset of magnetism, or the transition to superconductivity, all show up in this generic probe. The study of the specific heat is also a standard tool for the theoretician, especially given its generic power.
For quantum phase transitions [1], an equally generic tool as the specific heat for thermal transitions is the fidelity susceptibility. One investigates the derivative of the overlap ( )| ( ) y b y b ¶ á + ñ b  [2] of two infinitesimally separated ground states | ( ) y b ñ as a function of some tuning parameter β. While this probe is in principle very powerful [3][4][5][6], it is typically hard to evaluate as one has rarely access to the full wave-function. At least not for most of the approximate numerical techniques and especially not in experimental studies. This raises the question if one can replace the fidelity susceptibility with a tool that is equally unbiased, generic, and accessible to typical numerical and experimental techniques. In a recent publication some of the present authors introduced such an algorithmic method for classical systems with an order-parameter signaling an (arbitrary) symmetry breaking [7]. Here we demonstrate that one can successfully generalize this method to problems without a local order parameter, i.e. systems with a topological character. Moreover, we show that one can straightforwardly extend [7] to the quantum realm.
The method is based on the analysis of the accuracy of a predictive model. The central idea is to distill a predictive model that relates input data from numerical or experimental studies to the output in the form of a known tuning parameter such as the temperature or a parameter in the Hamiltonian β. Typically, one infers this predictive model via machine-learning techniques in the form of neural nets. The basic idea, however, is independent of the specific inference technique. In a next step, the accuracy of the predictive model is analyzed via the comparison of the predicted to the known value of the tuning parameter β. In particular, we show the derivative of the prediction accuracy with respect to the tuning parameter to be an equally sensitive indicator of a phase transition as the fidelity susceptibility.
To illustrate our generalization of the methods of [7], we investigate two generic models hosting interesting thermodynamic phases without a local order parameter. First, we investigate the finite-temperature cross-over in Wegner's Ising gauge theory (IGT) [8][9][10] to show that we can analyze an interesting classical problem without a local order parameter. Second, we broaden the scope by taking the step from the IGT to a generalized toric code problem [11,12] showcasing the applicability of the method to quantum problems.

The Ising gauge theory
Wegner's Ising gauge theory (IGT) is a spin model defined on a N×N square lattice with spins placed on the lattice bonds [8][9][10]13]. It is described by the Hamiltonian where J is a coupling constant, p refers to plaquettes on the lattice (see figure 1), and s i z is the Pauli matrix describing a single spin-1/2. Periodic boundary conditions are imposed. The ground state of this Hamiltonian is a highly degenerate manifold, an arbitrary superposition of all states that meet the condition that the product of spins along each plaquette is equal to 1. At a finite temperature T>0 the local constraints figure 1). The IGT does not have a finite temperature phase transition. However, for finite system sizes one can find a crossover temperature, T * =1/(kβ * ) defined by the appearance of one plaquette with s  = - 12,14]. Matters are further complicated by the fact that the ground-state manifold cannot be characterized by a local order parameter [15,10] owing to a local gauge degree of freedom. We come back to this point below.
To check whether a given spin state is in the IGT ground-state manifold, one has to verify that the condition is met for all plaquettes in the lattice. Equivalently, one can use the duality map to analyze the phase transition: we connect the edges of the lattice that contain spins with the same orientation and form loops. The IGT constrained phase then has the property that all these loops are closed. Whenever the constraint is violated it results in an open loop [10,16,13], see figure 1.
Distinguishing high and low temperature states of the model (1) is a well studied test case for machine learning recognition of phases of matter [14]. As one can see from figure 1, the IGT constitutes an interesting example where the phases are hard to distinguish visually without being a priori familiar with a local restrictions or the dual map. While an supervised approach is immediately successful at distinguishing the high and low temperature phases [14], unsupervised approaches did not succeed without an explicit recipe what type of restriction to look at. There has been significant progress in this direction, but a fully general approach is yet to be found [17][18][19][20][21][22][23][24]. While methods like principal component analysis, clustering and variational auto-encoders have proven to be successful to determine the phase transitions in spin models possessing an order parameter [25], systems without order parameters still represent a challenge.
Here we show how the method introduced by Schäfer et al [7] can be generalized to systems without a local order parameter. One first pre-trains a neural network to relate a spin configuration { } b S label to the (inverse) temperature β label , at which the configuration was sampled. After this initial training, the performance of the estimator is assessed with respect to the true value. The derivative is maximal where the estimator performs worst. In other words, a local maximum in ( ) b  label indicates a phase transition or cross-over temperature b label * . While this method does not in principle rely on a local order parameter, it has been shown that the network picks up on the magnetization pattern [7]. It was therefore unclear if one can generalize this strategy to the current problem. Here we show that this approach is valid even for phases of matter that do not contain an order parameter, or a finite temperature phase transition. Our approach differs from prototypical unsupervised machine learning techniques, such as, e.g. principal component analysis, t-distributed stochastic neighbor embedding (t-SNE), or k-means clustering, since a fully supervised subroutine, namely a regression on the labeled system parameters, is employed. However, we intentionally refer to the approach as an unsupervised learning scheme, as the method aims ultimately to infer the phase diagram of the physical system and not its parameters and the algorithm thereby requires no prior knowledge of the phase labels, the number of different phases or character of the phase transition. In fact, the derivative (2) has generically a stronger signal when the parameters in the supervised part of the protocol are not learned up to high precision.
We create sample configurations of the IGT model and label them with β=1/(kT). We train a convolutional neural network to predict β given an IGT configuration as an input. Our neural network consists of 2 convolutional and 2 dense layers and was trained on 2×10 5 configurations for 100 different values of β (for details see appendix A).
In figure 2 we show how the difference between the true and predicted inverse temperatures β pred −β label behaves as a function of the true β label for seven different system sizes N=4, 8,12,16,20,24,28 (the total number of spins is 2N 2 ). We see that the behavior of the prediction is not uniform for all inputs and, in fact, we observe that for all systems sizes there exists a different finiteb above which the network has difficulties to identify the correct β label . In figure 3 we show ( ) b  label which we evaluated as ( ) where sampled at discrete b i label . For all system sizes we observe a presence of a peak that indicates the position of the largest change in the difference between true and predicted β. The peak becomes less recognizable with increasing system size, which is consistent with the fact that the critical β * keeps increasing with growing system size and in the infinite system limit the crossover behavior completely disappears. Figure 2. We show the difference of the network prediction of β pred and assigned label β label , β pred −β label as a function of β label for system sizes N=4 to N=28. The dashed lines denote the position of the crossover inverse temperature β * as determined by the density of states method.
The neural network predicts a continuous parameter (inverse temperature) for our model and we observe a change of behavior at some critical value. We show in figure 4 the determined crossover temperature β * as a function of system size. For the system sizes we were able to test numerically we recover logarithmic scaling as expected for the crossover temperature [14,26].
To independently confirm the neural network predictions, we can analyze whether we can identify the physics of what the network is learning and reproduce its predictions by another physical model. From the training set, we can construct a density of states distribution, ò. In particular, the density of states can be written as a function of energy, E, and inverse temperature, β Here, δ a,b is the Kronecker-delta symbol (δ a, b =1 for a=b and δ a,b =0 for ¹ a b), E n (β n ) is energy (label) of the nth configuration in the training set and M is the number of configurations in the training set. We show the distribution ò obtained for the system size N=8 (128 spins) in figure 5.
We use the distribution (3) to calculate the most likely ≕ b b pred for each configuration at a given energy, which immediately allows us to evaluate the relation between the assigned β and β pred . Using the density of states we are able to reproduce the behavior in figure 3 (see appendix A). We show the detailed calculation and the dependencies of the predicted β pred and its derivative ( ) b  label as a function of the true β label in appendix A. This gives us a numerical evidence that the network is learning the density of states distribution shown in figure 5. We identify the logarithmic scaling (with system size) of the critical β * predicted from the density of states (shown in blue in figure 4) analogously to the predictions obtained from the neural net model.  Positions of critical β * as a function of a system size N. We show the scaling obtained from the unsupervised learning method and the scaling obtained from density of states in blue and orange respectively. The shaded areas represent the error bars. Error bars correspond to standard deviation from the meanb * evaluated by averaging over β * predicted by five separately trained neural nets.
Another unsupervised approach that has proven to be successful for both classical and quantum systems is the confusion scheme introduced in [18]. We compare both approaches in appendix C and show that the confusion scheme is not suitable for the example of IGT studied here.

The toric code and its generalizations
So far we have analyzed the performance of our method on the cross-over of a classical spin-1/2 model. When going to quantum models, two complications arise, related to the input and output of our predictive model. For classical systems, simple spin configurations are the natural input. For quantum systems, generically entanglement in the form of non-classically correlated configurations plays a key role. Consequently, the choice of training data needs to either reflect some prior knowledge of the system, or one has to sample over various classical projections of the entangled wave function. On the output side, one can either target a finitetemperature transition, or investigate a quantum phase transition at zero temperature. In the former, the output of the predictive model stays the same: β pred , the inverse temperature. For zero temperature transitions, one can still investigate a single-parameter family of Hamiltonians H(β). The obvious prediction task is then to reproduce the tuning parameter β, rather then the temperature.
We now turn to a concrete model of a quantum phase transition in a system without a local order parameter. The obvious generalization of (1) is the application of a transverse field [9,13,27] The model above is very well studied, has a confinement-deconfinement transition at a critical g * , and is a working horse for the study of  2 spin-liquids. Instead of directly working with this simple model we go beyond (4) in two ways: (i) We restrict ourselves to a subset of gauge-invariant ground states by moving to the toric code [11]. (ii) We generalize the transverse field to allow for an exact solution. We detail both steps in the following.
The IGT of equation ( elevates the generators of the gauge transformation to a term in the Hamiltonian. As a consequence, the ground states of the toric code correspond to the gauge-invariant ground states of H TR [27]. For our numerical purposes below, we largely benefit from the exact solution of the above Hamiltonian: we can write one of the four (unnormalized) ground states as [28] Figure 5. Density of states distribution ò (β, E) of the training set as a function of inverse temperature β and energy E. The plot above has been generated for system size N=8.
where | ñ 0 x is a reference state with all spins up in the σ x basis. Then, applying products of Pauli z-matrices along the two non-contractible loops yields the other three orthogonal ground states. We can easily see that the ground states are indeed gauge invariant by applying gauge transformations, obtaining | | ñ = ñ A TC TC s . Applying a transverse field a spin-model typically excludes an exact solution. The present case is no difference. However, in a recent publication, Chamon and Castelnovo introduced the following generalization of the toric code [12,26,29,30] describes the particular configuration of added background fields and β>0 characterizes their amplitude. A transition to a topologically trivial phase occurs at a critical value of the field strength β c . The field configuration λ i influences the critical value β c . A detailed analysis of this phase transition has been provided in [30].
To finish our discussion of these exactly solvable models we write the ground state of (8) This ground state is four-fold degenerate when periodic boundary conditions are considered [28]. We denote with H the abelian group whose elements h are all possible operations defined by the action of products of plaquette operators on an initial (reference) spin-configuration | ñ 0 x . By ( ) s h i x we denote the eigenvalue of the operator s i x on the eigenstate | ñ h 0 x . As a consequence, the term ( ) s h i x can take the values ±1. The normalization factor, Z corresponds to the partition function for this ground state and is given by With these considerations we are now in the position to show that the analysis of the predictive model can point out the topological phase transition of this quantum model as well. Unlike in the IGT, discussed in the previous section, the highly entangled ground states of the modified toric code model (8) are not fully characterized by a spin configuration alone. On the other hand, equation (9) provides a closed analytical form for the ground states of the family of the Hamiltonians (8). In addition to that, these ground states are only fourfold degenerate in the topological phase. We take advantage of the knowledge of the modified toric code ground states and show this to be sufficient for identification of the phase transition from the predictive model.

Projection onto spin configurations
We consider a projection of the ground states of the Hamiltonian (8) onto the σ x and σ z bases. These two types of projections correspond to experimentally accessible measurements and we show that both allow us to detect the topological phase transition of the full quantum model. As for the IGT cross-over analyzed previously, we are yet again in the situation where we are able to input a configuration into the predictive model and ask it to predict a continuous parameter.
There are two crucial differences here: first, we are considering a zero temperature topological phase transition that is driven by the applied field strength β. The second difference lies in the behavior of the projected spin configurations in the two phases. In particular, we are able to draw parallels to phase transitions of classical spin models. As we elaborate below, choosing a basis to project on corresponds to mapping the phases of the quantum model to phases of a specific classical spin model.

The σ x -projection
Let us first consider the projection onto the σ x basis. We notice that the ground state (9) represents a superposition of x-spin-configurations | ≔ | ñ ñ S h0 h x for all elements of the group H. All states | ñ S h fulfill the socalled closed loop condition for all values of β. In connection to the IGT, this corresponds to the condition of gauge invariance. More concretely, local constraints are imposed, that the product of σ x eigenvalues around a vertex is equal to one. The value of the field strength, β, influences the weight of a given spin configuration (see equation (9)). Therefore, the probability to obtain a particular configuration | ñ S h after projection onto σ x -basis is given by We can understand the physics of the σ x -projected ground state by first considering limiting cases of the field strength β. When β→0, the ground state (9) corresponds to the ground state of the pure toric code Hamiltonian (6). Therefore, when projected onto the σ x basis, all possible | ñ S h are equally likely (since all | ñ S h are weighted equally in the full eigenstate). When b  ¥, on the other hand, all configurations but | | ñ = ñ S 0 are exponentially suppressed and hence, the projected spin configurations are always ordered.
Thus, what used to be a topological phase transition of the full quantum state is now a transition from disordered spin-configurations (β small) to an ordered spin-configuration (β large, all spins up). We observe that, provided there is a finite β at which the transition between ordered and disordered configurations manifests itself, we obtained a phase transition that shows resemblance to the phase transition of the 2D Ising model. We show that indeed the 2D Ising model and its phase transition can be recovered by a simple change of variables, see appendix C.
Let us now explore the topological phase transition in the toric code model using the unsupervised learning method we introduced above. We train a neural network on the projected σ x configurations labeled with the field strength, β. We used a network consisting of two convolutional (100 filters, kernel size 3 and 2), one dense layer with 100 neurons and one dropout layer with dropout rate 0.15. We train the neural network on 59950 configurations containing 100 different values of β between 0 and 1. All the simulations were performed for the system size N=20 It was shown in [26] that the topological phase transition of the generalized toric code model can be determined from the behavior of the fidelity between two ground states with slightly varied field strengths (δβ→0) In other words, we calculate the overlap of two ground state wave functions with applied fields whose magnitudes are very close to each other. We can indeed observe a change in the behavior of the overlap in the neighborhood of the phase transition. The rate of this change is better analyzed by studying the derivative of the quantity in equation (12), the so-called fidelity susceptibility We observe in figure 7 that the dashed lines determined from fidelity susceptibility calculation are in good agreement with the maximum of the peaks of the derivative ( ) b  label of the predictive model. We show details of the fidelity susceptibility calculation in appendix C.

The σ z -projection
We can ask whether a particular projection is necessary to determine the topological phase transition from the spin configurations alone. Let us consider measuring the ground state in the σ z basis instead of σ x . In order to simplify mathematical expressions let us without loss of generality choose a different state from the ground state manifold Here, analogously to equation (9), G is the abelian group of possible products of vertex operators and | ñ 0 z is the reference state. Note that we chose a different reference state. As opposed to equation (9), all spins of the reference state are aligned in an eigenstate of σ z instead of σ x . The normalization is denoted with Z z and not elaborated on further here.
Let us again examine the limiting behavior of β if σ z was measured on every spin of the state (14). If β→0 we obtain the exact toric code ground state. Projective measurement of σ z on this ground state then results in the configuration | = ñ S g 0 g z , hence the closed loop (plaquette) conditions | | ñ = ñ B S S p g g are fulfilled. Every configuration | ñ g 0 z fulfilling these constraints is obtained with equal probability. We note here, that the local plaquette constraints are in exact correspondence to the IGT local constraints fulfilled in the zero temperature phase.
Applying the same logic as in the case of the σ x projection, we can conclude that in the case b  ¥ we arrive at a completely polarized state, where all spins are aligned in the x-direction. If we now project onto a σ z eigenstate, the plaquette constraints will not stay preserved. In fact, any configuration in σ z basis will be obtained with equal probability. Hence, we find that in the σ z projection the phase transition arises from a quite different process than we observed before: for small β the system would be in the state where loop conditions are preserved, while for large β they are violated. While in the case of σ x the phase transition simply changes the weight for some states from the set preserving loop condition, in the case of σ z projection we transition from the state where all the states preserving loop condition are weighted equally to the phase where the loop constraints are completely violated. We can therefore draw parallels to the previously examined IGT transition at finite temperature. In particular, in both cases we observe phases that can be distinguished by checking for a violation of the local closed loop constraints. However, there is a crucial difference between these two transitions. IGT exhibits a finite temperature cross-over and the violation of local constraints is a result of thermal excitations. Here, we consider a quantum phase transition at zero temperature, where the local constraints are violated due to the interplay with added perturbations. In particular, for IGT in the thermodynamic limit there is only a transition at infinite inverse temperature, b, whereas the quantum phase transition we consider here occurs at a finite field strength β in the thermodynamic limit as well.
We employ the unsupervised learning technique on the σ z projection of the modified toric code ground state (14) with the strength of the background field β as a label for the supervised part of the protocol. This time our neural net model consists of two convolutional layers (with 128 filters and kernel size 2) and three dense layers (with 100, 100 and 50 neurons, respectively).
We show in figure 8 the results for N=4 (32 spins) and two different field configurations of the 6 field configurations defined in appendix C and previously studied on a larger lattice for the x-projections. The reduction of the system size and number of field configurations presented here are a consequence of constructing projections onto the σ z -basis from the ground state containing σ x fields: mixed σ z σ x terms make Monte Carlo update computationally significantly more expensive (for details see appendix C).
We note here, that for both σ x and σ z projections we limit ourselves to a single topological sector with the choice of the ground state in equation (9). Since the other topological sectors exhibit qualitatively the same phase transition at the same transition point the discussion above can be extended to any ground state within the topological sector.

Phase transition determination from the stabilizer expectation values
Finally, we discuss on how to obtain the topological phase transition in the toric code model by extracting necessary information by measurements that can be readily performed on the quantum state at hand and do not require projections onto the spin configurations. It was shown in [30] that the behavior of the expectation values of the stabilizer operators are intimately related to the position of the topological phase transition in the toric code model. We use our predictive model to evaluate the position of the phase transition from the expectation value of the stabilizer operators to offer an alternative method to determine the position of the phase transition.
As in the previous sections, we train a neural network to predict the value of the field strength amplitude, β. This time we use as an input the expectation value of the plaquette operator, á ñ B p (with β as a label). Then we use the network to predict the field strength β for the expectation values of B p evaluated with respect to the new set of quantum states. We use a neural network with two dense layers (with 20 neurons each). The derivative ( ) b  label of the predictive model is shown as a function of field strength for six distinctive field configurations in figure 9. We again compare to the position of the phase transition obtained by fidelity susceptibility method (dashed lines) and observe an excellent agreement. As in the case of configurational data, we used the topological sector defined by the ground state in equation (9). Our result is again independent of the state in the ground state manifold, as all ground states are locally indistinguishable in the topological phase. As a consequence, the local expectation value á ñ B p does not depend on the topological sector examined. While the connection between expectation values of stabilizer operators and the position of the phase transition have not been shown analytically, another numerical evidence was provided in [31]. The authors examine direct detection of anyons, a process that can be mapped onto the expectation values which we investigated here. The presence of anyons is then immediately tied to the existence of topological order. We elaborate on the connection to the present work in appendix C.

Discussion
Unsupervised machine learning techniques for phase classification in condensed matter physics are potentially powerful tools for the discovery of new quantum phases. Due to the lack of local order parameters, phases exhibiting topological order present a challenging task for unsupervised methods. In this work, we have shown that a novel unsupervised method, namely the analysis of predictive neural network models [7], can reliably detect the violations of topological order, or a topological phase transition should it exist. Topologically ordered states have been particularly challenging for unsupervised learning techniques, because the quantity characterizing topological order is inherently non-local and hard to identify from raw data. In the method presented here, we trained the network on an arbitrary continuous parameter associated to the state and then analyzed the errors in the network predictions. We presented numerical evidence that these prediction errors are signatures of a phase transition. We showed that this conclusion was independent of the particular type of phase transition present in the system and the type of the input data. To determine which type of phase transition is present, applying our method in conjunction with principal component analysis [17], variational auto-encoders [25], or confusion schemes [18,21] that all succeed in determination of phase transition governed by local order parameter can be used as a guideline.
Providing the resolution to the problem of finding the cross-over temperature in the IGT and its generalizations in an unsupervised manner is the first step towards developing reliable techniques that can be applied to study the models whose phase diagrams are not yet fully understood.
We created the samples used for training (like those shown in figure 1) of our model using Monte Carlo simulations. We created data for system sizes N×N×2 with Nä{4, 8, 12, 16, 20, 24, 28}. For each system size we created 100 different values of βä[0, 5]. We generated 20 000 configurations for each pair [β, N]. The neural net we used consists of 2 convolutional (128 filters, kernel size 3) and 2 dense layers (300 and 100 neurons, respectively). We observed that our method is flexible with respect to the hyper parameters of the neural network. However, too shallow networks that predict the same average β for all states inside and outside of the topological sector should be avoided.
We trained the network by minimizing the mean-squared-error loss function where β pred is the β determined by the network and β label is the label of the given input sample, n is the batch size. The predictions of β by the network and their divergences are shown in figures 2 and 3. In order to evaluate the error bars of the neural net predictions, we repeated the training procedure outlined above for 5 separate models (identical construction, separately generated training sets). Then we evaluated standard deviation of the critical β * .
We can replicate the predictions achieved by a neural network using a density of states based model as explained in the main text. Let us consider lattice configurations (training samples) X n with their assigned inverse temperature labels β n =β label (X n ). We can evaluate an energy, E n of each of these configurations using the formula where the first summation is over all plaquettes, p, whereas the second summation is over spins within each plaquette. For convenience we choose J=1. Then we can construct the density of states distribution of the training set for a=b and δ a,b =0 for ¹ a b), E n is energy for the configuration X n evaluated using formula A2 and N is number of configurations X n in the training set. We can write the energy distribution in the form above because the energy of the lattice configuration, E is discrete by construction and β is discretized in steps as explained above. An example of this distribution is shown in figure 5 for the system size N=8.
Having access to the energy E n of a given configuration X n , we can then evaluate the average β of all states with energy E, which we denote by β av for a configuration X n in the training set The function above predicts the value of β which is most likely for a given energy, E, given the energy distribution of the training set. We can use the function (A4) to determine the relation between assigned labels, β n and values of β predicted by our model where M is the number of configurations X m in an arbitrarily chosen test set. Using equation (A5) we can predict the estimated β for a range of true labels. In figure A1 we show the difference between true and predicted β as a function of true β. In figure A2 we show the derivative of the estimated β as a function of true β. Comparing with figure 3 we see that our model based on the density of states in the training set is reproducing well the actions of the neural net model we introduced in the main text. We have used maxima determined by the density of states as a dashed-line reference for the position of the transition in the main text. Figure A1. Density of states based prediction of β. We plot the difference between true and predicted β, β pred −β label , as a function of β label .

B.2. Calculation of the fidelity susceptibility
We compare the position of the phase transition found by the neural network to the transition indicated by the fidelity susceptibility [2]. The fidelity susceptibility is defined as Figure A2. We show the derivative of the density of states based prediction, ( ) b  label , as a function of β label .
where the state | ( ) b Y ñis a ground state of a given Hamiltonian with respect to the parameter β. For our particular model, | ( ) b Y ñis given in equation (9). It has been shown, that a divergence or maximum of the fidelity susceptibility χ F indicates a second-order symmetry-breaking quantum phase transition [3][4][5]. Numerical evidence suggests that topological phase transitions are indicated in the same way [6]. We can calculate the fidelity susceptibility for the introduced disordered toric model as We numerically evaluate the expression via Monte Carlo sampling for the different field configurations examined throughout this work and compare the position of the maximum with the position of the phase transition found by the neural network. In particular, we calculate the fidelity susceptibility for the following 6 different field configurations:

B.3. Numerical simulation of projections
The projection of the ground state of the perturbed toric model onto the σ x or σ z basis is in both cases simulated via Monte Carlo sampling. To project on the σ x basis, we aim to obtain a configuration S h sampled from the probability distribution p(S h ) (B3). Such a configuration is reached via a Markov chain. More concretely, we start with a lattice with all spins up in x basis and construct the Markov chain as follows: in each step (with given spin configuration S h i ), a random plaquette p is picked. After thermalization time, the spin configuration S h is obtained with probability p(S h ) and a projection is simulated.
Projecting on the σ z basis follows the same principle with the caveat that the spin-flip probability is computationally expensive to calculate. We start from a state in the ground state manifold  particular, one chooses a value b ¢ c as guess for the critical point and separates the dataset in two parts: all values smaller than b ¢ c are labeled with 0, all values larger than b ¢ c are labeled with 1. A neural network is trained to reproduce the labels. This procedure is repeated for all values b ¢ c in the interval (β a , β b ), and the performance ( ) b ¢ P c of the trained networks is plotted and analyzed. The main idea is, that the network performs best when b ¢ c is chosen to be the critical point. More concretely, the performance has a W-shape, if a phase transition is recognized (see figure C1). As a consequence, the main difference between our method and the confusion scheme is that we use a single network and analyze its prediction, while the method of van Nieuwenburg et al requires separate networks for each point of the phase diagram they wish to check (which might make it less suitable for high dimensional parameter spaces).
The confusion scheme has proven to perform well on a variety of phase transitions in classical systems and quantum systems. We employ the confusion scheme on the IGT crossover analyzed in section 2. The network performance ( ) b ¢ P c is shown in figure C1 for the system size N=12. As the typical W-shape is not reproduced, the position of the phase transition is not recognized. Instead, we obtain a shifted V-shape. We can understand the result in the following way. The confusion method can easily distinguish between all states with β<β c on one side of the transition, but it cannot distinguish at the states in the ordered sector. More concretely, all samples in the ordered sector (β>β c ) are in the ground state (no local constraints are violated) and thus indistinguishable to the networks. In the disordered section, the configurations at different b ¢ c are distinguishable by different numbers of local plaquette constraint violations. Smaller values of β lead to a larger number of frustrated plaquettes. As a consequence, the network is able to distinguish all the states in the disordered phase.
For this example, a relatively small network architecture consisting of a convolutional layer (12×12=144 neurons, 5 filters and kernel size 3) and a dense layer with one output neuron was chosen. Some testing with larger networks showed a similar accuracy curve. Tests on smaller networks showed a different V-shape. Specifically, the obtained V-shape was not shifted to the disordered phase. Henceforth, the networks were not able to distinguish the configurations in none of the phases and had to guess randomly. We conclude, that the application of the confusion scheme to IGT is not straightforward. Figure C1. Accuracy ( ) b¢ P c of the confusion scheme on the IGT problem with N=12 (blue dots). In grey, the ideal W-shape indicating a phase transition is shown as comparison. The crossover from ground state to non ground state is at around . The error bars are obtained by averaging over ten different and independent Monte Carlo runs for obtaining the data.