Unsupervised interpretable learning of topological indices invariant under permutations of atomic bands

Multi-band insulating Bloch Hamiltonians with internal or spatial symmetries, such as particle-hole or inversion, may have topologically disconnected sectors of trivial atomic-limit (momentum-independent) Hamiltonians. We present a neural-network-based protocol for finding topologically relevant indices that are invariant under transformations between such trivial atomic-limit Hamiltonians, thus corresponding to the standard classification of band insulators. The work extends the method of"topological data augmentation"for unsupervised learning introduced in Ref. [1] by also generalizing and simplifying the data generation scheme and by introducing a special"mod"layer of the neural network appropriate for $Z_n$ classification. Ensembles of training data are generated by deforming seed objects in a way that preserves a discrete representation of continuity. In order to focus the learning on the topologically relevant indices, prior to the deformation procedure we stack the seed Bloch Hamiltonians with a complete set of symmetry-respecting trivial atomic bands. The obtained datasets are then used for training an interpretable neural network specially designed to capture the topological properties by learning physically relevant momentum space quantities, even in crystalline symmetry classes.


I. INTRODUCTION
To find and classify exotic phases of quantum matter is of key importance in modern science. These phases often point to materials with unique properties and high scientific impact. A particular type of quantum matter, the so-called topologically non-trivial quantum phases, are especially at the frontier of current research [2][3][4]. Topological phase transitions lie outside the standard Landau paradigm of symmetry breaking and highlight topological non-triviality of the underlying state spaces. Despite of all the progress made and understanding gained over the years, the field is still very far from being complete.
The use of Artificial Neural Networks (NNs) or Deep Learning, in particular, stand out as the premier ML tool, with the power both to find complex patterns in data and to generalize this knowledge to previously unseen data. The capabilities and limitations of NNs applied to the task of classifying phases of topological quantum matter are currently under active research [17][18][19][20][21][22][23][24][25]. A potential limitation to using NNs is that common protocols are based on supervised learning which requires labeled data (object and corresponding topological marker) from an explicit model. This limits the prospectives for using Step I Step II Step III FIG. 1. Protocol for generating the training data: Step I. The parent Hamiltonians Hi(k) with i = 0, 1 are extended to Hi(k) ⊕ Fi,j(k), where Fi,j(k) are randomly generated trivial atomic-limit Bloch Hamiltonians.
Step II. The stacked parents are augmented via continuous deformations.
Step III. The obtained datasets, labeled '0 and '1, are used for training a neural network specifically designed for capturing topological indices.
ML to explore unknown topological structures. To begin to address this issue several unsupervised protocols for doing topological classifications have recently been developed [1,[36][37][38][39][40]. The method presented by us in Ref. [1] is based on 'topological data augmentation, where datasets of topologically equivalent children states are created from single parent states. The created data is then used for training a NN, using the unsupervised 'learning by confusion concept introduced in Ref. [36]: Only datasets that have some distinguishing feature can be function-ally classified by the network. Thus, any two a priori unknown datasets can be used for training the network with dummy label targets, with successful training outcome only if the two sets are topologically distinct. The procedure also used interpretable neural networks, designed to learn an intermediate momentum space output which is closely related to integral expressions over the local curvature. A crucial and challenging requirement for the procedure to work is that the data is sufficiently randomized to erase any non-topological information related to the specific parent states. In order to accomplish this randomization without destroying the topological information we construct in Ref. [1] explicit topology preserving transformations, valid in the discretized Brillouin zone. However, to construct such transformations, one might argue, requires a good understanding of the topological structure. Thus to relax the specificity of the topology preserving deformations would be a step forward in the spirit of the outlined objective.
With this in mind, in this paper we significantly advance the protocol from Ref. [1] and reveal new horizons of its applicability. As before, the protocol is split into two main stages, data generation and training of the network. The training data is generated from two parent states via random topology-preserving deformations.
Here we always require the states to change slowly with momentum, in this way automatically ensuring continuity of the applied deformations. The procedure to do this while at the same time ensuring sufficient exploration of the space in order to obscure any non-topological features is one of the main methodological points. This also allows us to consider more general quantum systems without requiring any external knowledge on the topological structure of the underlying state space. We also modified the network's layout allowing the net to generically represent a much broader class of topological indices, including such that are sensitive to high symmetry points, lines, and surfaces in the Brillouin zone.
An additional limitation of the original protocol of topological data augmentation is that network may learn momentum space local invariants that are irrelevant for establishing non-trivial topological features. As an example, consider Hamiltonians with inversion symmetry, where in the atomic limit the individual orbitals may be odd or even under inversion. A half-filled 4-band Hamiltonian in the atomic limit may have two occupied bands either both with even parity, both with odd parity, or one band with even and one with odd. These three sectors of atomic Hamiltonians are topologically trivial, have no edge states [54], but nevertheless cannot be connected by continuous non gap-closing deformations. Thus, a network trained on dataset based on deformations of single seed objects from one of these three sectors may learn to distinguish different trivial band insulators based on a local index (such as the parity at the Brillouin zone center) rather than trivial from non-trivial, depending on the specifics of the two training datasets.
In order to address this issue we extend the Hilbert space and corresponding matrix dimensions of the Bloch Hamiltonians that we want to classify from n to 2n bands, by stacking them with n atomic bands as sketched in Fig. 1: The children states are produced by first stacking the parents with a number of trivial atomic bands and only then deforming them employing random topologypreserving deformations. The created data is then used for training an interpretable NN allowing us to extract the learned topological quantities. Trivial atomic bands here mean bands that respect the symmetries of the corresponding symmetry class but do not vary in momentum, they represent the intrinsically trivial atomic limit where individual atomic bands do not hybridize. Thus, within the new procedure we purposely assign the same labels to any Bloch Hamiltonians differing by trivial atomic bands, in this way penalizing the NN for outputting topologically irrelevant indices that differentiate between a priori trivial objects. This approach is inspired by the commonly employed K-theoretic analytic classification schemes [4,55,56], where the topological equivalence between two band insulators is defined up to stacking of trivial bands. All trivial Hamiltonians here constitute a trivial monoid element and any Hamiltonian can then be stacked with atomic bands without changing the topological class. The paper is organized as follows: The protocol is described in details in Sec. II, including how to generate parent states that vary sufficiently slowly on the scale of discretized momentum, how these are stacked with atomic bands and subsequently deformed in a manner that conserves the discrete measure of continuity of the Bloch Hamiltonians. The protocol is exemplified on onedimensional (1d) 4-band insulators from three different symmetry classes in Sec III. This includes examples with particle-hole symmetry and inversion symmetry where the network learns to single out the high symmetry points in the Brillouin zone and combine these in the appropriate way to produce the topological index. A summary and outlook is given in Sec. IV.

II. METHODS
To set the stage, we describe our protocol in general terms and provide the details applicable for all examples to follow. Any one-dimensional (1d) gaped n-band system can be represented using a n by n Hermitian Bloch Hamiltonian H(k) that is a periodic continuous function of momentum k. Two Bloch Hamiltonians are said to be topologically equivalent if the occupied spaces can be continuously deformed into each other. Any Bloch Hamiltonian can therefore be continuously transformed to have energies −1 and 1 for occupied and empty bands respectively, and we consider the space of normalized Bloch Hamiltonians only. The aim is then to find a topological index, call it ν, labeling the Bloch Hamiltonians H(k) by their topological equivalence classes. It is known that the space of all Bloch Hamiltonians in 1d is topologically Then, we choose any discontinuity neighboring point, say k i , and do one recursive iteration at k i . Next, one recursive iteration at ki is performed again.
trivial but the classification becomes nontrivial after imposing various symmetries on the systems, in this way restricting the allowed deformations. In this paper, we always assume the systems to be half filled and explicitly normalize the Bloch Hamiltonians to have equal number of positive and negative bands.

A. Prerequisites
For our protocol to be functional we need to choose a distance measure between any Hermitian matrices H 1 and H 2 to later establish a notion of continuity for discretized Bloch Hamiltonians. The distance is here defined through relation where |ψ 1/2,i are the normalized eigenstates of H 1/2 , with i running over the N o.b. occupied bands of H 1/2 (k). Note that the distance d(H 1 (k), H 2 (k)) = 0 for every momentum k if and only if the (normalized) Bloch Hamiltonians H 1 (k) and H 2 (k) are identical. We also need an efficient algorithm to numerically move between the matrices and for this purpose we describe a procedure for gradually changing matrix H 1 towards H 2 . The idea is to numerically minimize the distance d(H 1 , H 2 ) employing the gradient descent algorithm [38]. Explicitly, in each numeric step we perform where N k and n are the numbers of momentum points and internal bands, respectively. The net consists of a Conv2D layer with receptive field of size (2,2) (over neighboring k-points and real and imaginary part of matrix entry, with 64 channel depth corresponding to each matrix entry), several Conv2D layers with receptive field of size (1,1), a Dense layer, and an optional Modulo layer. The last convolution layer outputs a single receptive field, corresponding to some momentum resolved local measure, which is key to the interpretability, as exemplified in Fig 4b, 5b, and 7. The Modulo layer applies a predefined continuous function coinciding with the modulo operation in some key integer points: An example of such function mimicking mod 2 operation is depicted as well.
where Λ j is a basis of Hermitian matrices. The phase constants φ j are computed using the gradient descent rule with some small learning rate η. To avoid rapid changes possibly leading to discontinuous deformations we also restrict φ j not to exceed some upper bound value φ max . The symmetries are maintained by doing the unitary rotations in pairs bonded by the symmetries, in this way forbidding the deformed Hamiltonian to go outside the considered symmetry class.

B. Generating parent states
The training data is generated from two n by n parent states H 0 (k) and H 1 (k) defined on a grid of N k momentum points. The baseline trivial state H 0 (k) is taken to be any trivial atomic-limit Bloch Hamiltonian satisfying the symmetries. It is produced by repeating a random properly symmetrized n by n Hermitian matrix. The parent H 1 (k) is generated by symmetrizing a Bloch Hamiltonian A n sin(pk) + B n cos(pk) with A p and B p random n by n Hermitian matrices. With this procedure we can efficiently sample the symmetry classes and control state's continuity by choosing m N k . The continuity here is realized by requiring all neighbor to neighbor distances d(H(k), H(k + ∆k)) (with ∆k = 2π/N k ) to be below some small threshold δ. Thus, the parent H 1 (k) is suggested to be randomly generated via Eq. (3). Note, however, that in general we are mainly interested in picking H 1 (k) that is topologically nontrivial and therefore can facilitate the protocol by applying all available external intuition for purposely choosing a promising candidate for H 1 (k). Completely random guessing, however, is also anticipated to eventually lead to the same results according to learning by confusion ideology but one will need to make several attempts until reaching success in training.

C. Stacking parents with trivial atomic bands
Within our protocol of topological classification, Fig. 1, the parents H i (k) with i = 0, 1 are first extended to H i (k) ⊕ F i,j (k) using random trivial atomic-limit Bloch Hamiltonians F i,j (k), and only then the stacked parents are deformed. The extension of the parents is not generally necessary within our protocol, however, the datasets generated in this way facilitate the NN to only look at topologically relevant indices that do not distinguish between different trivial atomic-limit Hamiltonians. We therefore develop a procedure for randomly generating trivial atomic-limit Bloch Hamiltonians F (k) respecting the relevant symmetries. Each trivial Bloch Hamiltonian F (k) is constructed by repeating some base n by n matrix F . The matrixF has to respect the relevant symmetries, and within our protocol it is generated by filling the matrix entries with random complex numbers (a + ib) with a, b ∈ [−1, 1] (with real numbers c ∈ [−1, 1] for diagonal entries), symmetrizing the obtained matrix to respect the given symmetries, and then normalizing it to have eigenvalues ±1. Depending on symmetry class the space of all symmetry-respecting matrices can be composed of multiple topologically disconnected sectors and these sectors can in general be non-uniformly represented in our randomly created data. Note that trivial Hamiltonians F (k) corresponding toF from disconnected sectors generically cannot be connected using continuous symmetrypreserving deformations. This may mislead our NN to find topologically relevant quantities because some connected parts of trivial atomic-limit Hamiltonians might be underrepresented. For avoiding our algorithm giving preference to a particular disconnected matrix sector we put the generated matrices into datasets, each composed of 10 3 continuously connected random matrices. The connectivity is established by checking if they can be connected via the gradient descent algorithm described in Sec. II A without breaking the symmetries, and the number of obtained disconnected sectors of trivial Hamiltonians depends on the symmetry class. We then pick a base matrixF i,j defining F i,j (k) with equal probability from the created datasets of matricesF . In this way all the disconnected sectors are represented equally within the process. Using this approach we produce j = 1, ..., N p stacked parent states H i (k) ⊕ F i,j (k) corresponding to two parents H 0 (k) and H 1 (k). By using these trivially stacked parents we hint the network to learn only quantities that cannot differentiate between two trivial atomic states even if they cannot be continuously connected.

D. Topological data augmentation
The stacked parent states are then augmented by performing random deformations while keeping the states continuous and maintaining the symmetries. The deformations are implemented by doing unitary rotations of discretized 2n by 2n Bloch Hamiltonians H(k) at distinct momentum points k i towards randomly generated target matricesH. These rotations are done in small steps by minimizing the distance function d(H(k i ),H) following gradient descent algorithm with some learning rate η and rotation cutoff φ max . We also perform the steps in symmetry-bonded pairs to always maintain the symmetries. The unitary matrices are taken to be U j = exp(iψ j Λ j ) with Λ j a set of basis matrices for 2n by 2n Hermitian matrices. Importantly, the state is kept continuous by recursively moving also matrices at neighboring momentum sites towards the same target once the distance exceeds the threshold δ, Fig. 2. In this way any deformation at one momentum site pulls the neighbors along with it, resulting in efficient augmentation procedure that at all times maintains continuity of the state. We terminate the deformation algorithm if: the corresponding matrix reached the target, the state could not be maintained continuous after a number of recursive steps M max , or if we exceeded a maximum number of gradient descent learning steps N max . The motivation for this procedure, which is similar to a string being pulled over a large distance at one point, is to be able to traverse large distances in the topological sector in an unbiased fashion. This in contrast to making small random local deformations, just producing a random walk, which in a high dimensional space will provide less efficient exploration.

E. Neural network structure and training
The two datasets corresponding to topologically augmented stacked parents are then used to train a neuralnetwork-based classifier specifically designed to represent generic expressions of topological indices. The NN is trained on the Bloch Hamiltonians H(k) transformed from N k × 2n × 2n complex-valued format to 2 × (N k + 1) × 4 n 2 float format with an extra momentum site added to encode the periodicity. The classifying network consists of several convolution layers and in the earlier work [1] followed by a summation layer. The activation functions associated with each convolution layer are rectified linear units (relU) except the last one with linear activation. This network type was designed to perform identical local operations on each site and then sum the obtained outputs over the whole sample, in this way capturing topological numbers that are given by an integral over some local curvature. However, in presence of certain symmetries, high symmetry momentum points, lines or surfaces may be of special importance and the indices are expected to take a different form. We therefore advance the design of the network to cover a much broader class of expressions by changing the last summation layer to a fully connected (dense) layer, see Fig. 3.
To be concrete, we use a dense layer with a single output node, absolute weight values ≤ 1, zero bias, and linear activation. The reason for avoiding non-linear activation and large weights is to constrain the preceding feature map (the output from last convolution layer) to learn relevant and interpretable momentum space quantities. Note that this family of networks, the net in Fig. 3 and its generalizations to 2d and 3d, can adjust to calculate sums of some local functions over arbitrary symmetrypreserving points, lines, or planes, and then add or subtract them in any order. A coarse grained version of the output before summation in this single node is what is displayed as the momentum resolved images in Figs. 4, 5, and 7. The network thus processes the local information in k-space convoluted over all bands in the first convolution layer (2 by 2 filters operate on nearest neighbor sites in k-space and real and imaginary part), which after subsequent non-linear operations outputs the relevant momentum resolved local quantities.
To capture an even larger class of indices we also suggest to use an extra layer applying a predefined operation on the single-valued output to represent a modulo function over some assumed range of integers. The modulo function itself was found to be not applicable because it is periodic and discontinuous, giving convergence problems during the training. A way around this problem is to use a continuous function coinciding with the modulo operation in some key integer points. An example mimicking mod 2 operation is given inside a box in Fig. 3: This function outputs 0 for input 0 and -2, and outputs 1 for input -1. The network illustrated in Fig. 3 can be generalized to 2d case by analogously modifying the 2d network of Ref. [1].

III. RESULTS
Here we present results of the analysis for three examples of 4-band Hamiltonians in 1d, having chiral symmmetry, inversion symmetry, and particle-hole symmetry respectively. We show how the network correctly learns to classify trivial from non-trivial datasets in a way that gives interpretable information corresponding to relevant momentum resolved quantities.

A. Hyperparameters
We selected the same hyperparameters throughout all of our examples and they are listed here. The momentum space was discretized by N k = 100 points. The gradient descent leaning rate was taken to be η = 0.1, the rotation cutoff φ max = 0.1, and the continuity parameter δ = 0.1. The maximum numbers of performed learning and recursive steps before terminating the algorithm were N max = 20 and M max = 20, respectively. The trivial parent H 0 (k) was generated by picking a random symmetry-respecting matrix and duplicating it for all N k points. The second parent H 1 (k) was randomly generated by symmetrizing the output of Eq. (3) with m = 4. The two parents were then stacked with N p = 10 randomly chosen symmetry-respecting trivial atomic-limit Hamiltonians to create two sets of 2n band Bloch Hamiltonians. Each of the stacked parents was then used to create 10 3 children. For doing so, at first we deformed them by 10 independent single-site unitary rotations, and only then saved a child for each new unitary deformation. In this way we produced two datasets of 10 4 children Bloch Hamiltonians corresponding to two original parents.

B. Multiband 1d insulators with chiral symmetry
As our first illustration, we implement the protocol for topological classification of 4-band insulators in 1d with chiral symmetry. It is well known that all gapped systems within this symmetry class are distinguished by a Z topological invariant, the so-called winding number [4]. Here we train a NN to fit into a representation of the winding number without using any externally labeled data: The training data is entirely produced via the topological data augmentation procedure described in Sec. II B. This symmetry class was also explored for 2-band systems in Ref. [1], using a different protocol for generating the training data, but with similar results. Bloch Hamiltonians satisfying chiral symmetry must obey H(k) = −U † S H(k) U S , with a unitary matrix U S chosen to be a diagonal matrix with first half of entries +1 and last half of entries −1, thus consisting of off-diagonal blocks. In order to symmetrize randomly generated Hermitian matrices A and Bloch Hamiltonians H(k) we then performed the following operations on them It was found that all generated symmetrized matrices A could be connected via the gradient descent algorithm described in Sec. II A, creating a single dataset of interconnected matrices, in agreement with our anticipation. The base matrices defining the chiral atomic-limit Bloch Hamiltonians were therefore uniformly picked from that single dataset, taken to be sampled by 10 3 matrices. (Note that, because there is only a single trivial atomic sector, the stacking procedure is not strictly necessary for this class of systems.) All other specifications needed for topological data augmentation are listed in Sec. III A.
At the training stage we employed a NN with a layout thoroughly described in Sec. II E and depicted in Fig. 3. The convolutional part of the network was explicitly taken to consist of one 2d convolution layer of 512 filters with (2, 2) receptive fields and three 2d convolution layers of 256, 128, 1 filters with (1, 1) receptive fields. The network was successfully trained without any modulo layer in this case, confirming that the learned topological number is a Z invariant. In total our network has 296 038 trainable parameters.
The training was done on 2 · 9500 and tested on 2 · 500 samples using absolute mean error cost and Adam optimizer. We trained over 2000 epochs with learning rate η = 10 −4 . Importantly, we also effectively augmented the training dataset: Before each epoch every Bloch Hamiltonian was translated by a random number of momentum sites and uniformly rotated using a random symmetry-respecting unitary operator.
The results are presented in Fig. 4. The network has successfully learned to distinguish the datasets corresponding to two ensembles of topologically equivalent children, Fig. 4a: The classification accuracy evaluated on the test set is 100%. Importantly, the NN layout allows us to interpret the obtained classification and find a local quantity corresponding to the learned topological index. In Fig. 4b -e we exemplified the learned local quantity on four representative systems from the test dataset.

C. Multiband 1D insulators with inversion symmetry
In our second example we focus on 4-band systems with inversion symmetry and implement our protocol for classifying them. The topological phases within this symmetry class are characterized by a Z invariant ν IS = n 0 −n π with n 0/π counting a number of negative parity eigenstates at high-symmetry points k = 0 and k = π [54,57].
The inversion relation reads as H(k) = U † IS H(−k) U IS , where U IS is some unitary matrix and it is here chosen to be the same as U S , i.e. a diagonal matrix with +1 and −1 entries. The symmetrization of randomly generated Hermitian matrices A and Bloch Hamiltonians H(k) is done via The generated symmetrized matrices A were found to create three distinct blocks of matrices, with all matrices connected via the gradient descent algorithm within each block. Analytically, as discussed in the introduction, the disconnected blocks correspond to three different combinations of inversion-symmetric eigenvalues ±1 of the occupied eigenstates. Thus, for producing inversion-symmetric trivial atomic-limit Hamiltonians we uniformly picked matrices from these three blocks. Each block was taken to consist of 10 3 generated matrices.
Other details of data generation are given in Sec. III A. We then trained a NN with a layout described in Sec. II E and illustrated in Fig. 3. The convolution layers of the net were chosen to have 128 filters for the first layer with (2, 2) receptive field, and 64, 32, 1 filters for the other ones with (1, 1) receptive field. The same as in Sec. III B, the network was successfully trained without any modulo layer. In total there are 43 366 trainable parameters. The network was trained on 2 · 9500 states and tested on 2 · 500 states employing absolute mean error cost and Adam optimizer. The training was done over 10 3 epochs with learning rate η = 10 −3 . Before every epoch we performed a uniform symmetry-respecting rotation of each training state and by this effectively augmented the dataset. The outcome of our protocol is depicted in Fig. 5. The network has learned to differentiate between two ensembles of topologically equivalent children with precision 100%, Fig. 5a. By looking at the network's intermediate output we could also retrieve the local quantity corresponding to the learned topological index, as shown in Fig. 5b -e. Strikingly, the network learned the importance of high-symmetric k = 0, π and simply avoids all other momentum points. Important to note that this property was not built-in into the network by hand and it highlights the flexibility of our NN layout, Figs. 3: The last dense layer in the network allows it to find important points, lines and/or surfaces in momentum space without any external supervision.

D. Multiband 1D insulators with particle-hole symmetry
For our last example we implement the protocol for 1d topological classification in symmetry class D composed of systems exhibiting a particle-hole symmetry. It is well known that the topological number distinguishing the phases within this symmetry class is a Z 2 number. The topologically nontrivial phase within this symmetry class exhibits very rich physics and opens up a road for realization of robust Majorana end modes [58].
In first quantization the particle-hole symmetry is antiunitary anticommuting symmetry explicitly described by relation H(k) = −U † C H * (−k) U C for some unitary operator U C . Here we use the conventional representation of particle-hole symmetry that is basic within the BCS theory of superconductivity: We take U C = I ⊗ τ x with a Pauli matrix τ x and identity I. To symmetrize Hermitian matrices A and Bloch Hamiltonians H(k) the following transformations are performed: By applying the gradient descent algorithm from (a) y = 0.00 (c) y = -0.99 (b) y = -1.99 (d) y = -1.00 1 -1 -1 Sec. II A, it was found that the generated symmetrized matrices A clustered into two blocks of matrices and we uniformly picked matrices from them for producing trivial atomic-limit Bloch Hamiltonians respecting the particle-hole symmetry. These two blocks of trivial Hamiltonians would in the Majorana basis correspond to the Pfaffian at k = 0 and k = π being ±1 [4,59]. In total there were 2 · 10 3 matrices generated. All the details on other aspects of the data generation are provided in Sec. III A. Again, in the training stage we employed a neural net layout shown in Fig. 3 with convolution layers of 256, 128, 64, 1 filters. The first convolution layer operates with (2, 2) receptive field and all other ones with (1, 1) receptive field. There are 107 110 adjustable parameters in total. Without a modulo layer the network has failed to adjust to separate the classes of the training data, but with the modulo layer depicted in Fig. 3 it has succeeded. We trained the network on 2 · 9500 and tested on 2 · 500 samples using Adam optimizer and absolute mean error cost function. The training was done over 10 3 epochs with learning rate η = 10 −4 . The same as in all previous cases we effectively augmented the training dataset by doing a uniform symmetry-respecting rotation of every Bloch Hamiltonian before each epoch.
The network's output evaluated on a test dataset is shown in Fig. 6: Two ensembles of topologically equivalent children are successfully differentiated by the network with high precision, Fig. 6a. In Fig. 6b we show the intermediate output produced by the network before applying the mod 2 operation: Interestingly, we see the data corresponding to topologically trivial ensemble clustered around two different values, −2 and 0, confirming the Z 2 nature of the topological index. In Fig. 7 four examples of test states show the local quantity corresponding to the learned topological number. Again, the network has adjusted to look at only high-symmetric k = 0, π and neglect other momentum points.
These results demonstrate that the network without explicit guidance learns to focus on the pertinent information. In fact, it is known that these Hamiltonians can be classified by product of the sign of the Pfaffian at the high symmetry points k = 0 and π. Without extending the datasets with different sectors of trivial atomic bands the network would only learn to pick up one of the high symmetry points, and it would fail to classify all Hamil-tonians properly. One might train two networks to learn to separately classify the two symmetry points, but the corresponding topological indices will be a composition of topologically irrelevant and relevant invariants. Another alternative to the mod layer for this problem would be to use an additional dense layer with non-linear activation functions to effectuate an xor classification of the output from the last convolution layer. However, the drawback of this is that it would allow for a drift of the last feature map output, with a corresponding loss of the interpretability of the results.

IV. CONCLUSION AND OUTLOOK
We present a novel protocol for using NNs to learn topological indices of Bloch Hamiltonians, extending previous work on 'topological data augmentation. The protocol is characterized by the following features: i) It is unsupervised, i.e. training data is generated by randomly deforming parents states while ensuring the topological integrity of the data sets, without using any prior knowledge or specific models except a specific representation of the symmetry class to constrain the data. ii) The NNs are specifically designed to be interpretable, in the sense that the single feature map of the last convolution layer gives momentum resolved information about the learned quantities. In this way the network can for example learn to single out relevant high symmetry momentum points. iii) By extending the training data samples with a complete set of symmetry respecting atomic insulators (specifically here, stacking 4 bands on 4 bands) the network learns to pick up only relevant topological invariants that do not discriminate between topologically disconnected but trivial atomic-limit Hamiltonians.
The protocol takes a next step towards the goal of using machine learning and NNs to identify unknown topological invariants that go beyond the already well established non-interacting and translationally invariant systems with spatial or internal symmetries. Areas to explore could be periodically-driven (Floquet) topological phases [60,61], non-Hermitian topological matter [45,62], interacting systems [63], and others. The NNs used in this work are very small compared to state of the art deep learning networks which opens up for extending the protocol to systems with less symmetry (e.g. disordered systems) that would require dense matrix input which is not in a block diagonal form. A nice ingredient of the network structure for the examples presented in this work and in Ref. [1] is the interpretability that follows from the construction where the network has local operations in momentum space, until the very last layers. It remains to be explored if and how this can be extended to problems where local measures may not be sufficient. Nevertheless, the interpretability is a bonus which is not strictly necessary for the general topological data augmentation procedure and network classification.