Drawing Phase Diagrams of Random Quantum Systems by Deep Learning the Wave Functions

Applications of neural networks to condensed matter physics are becoming popular and beginning to be well accepted. Obtaining and representing the ground and excited state wave functions are examples of such applications. Another application is analyzing the wave functions and determining their quantum phases. Here, we review the recent progress of using the multilayer convolutional neural network, so-called deep learning, to determine the quantum phases in random electron systems. After training the neural network by the supervised learning of wave functions in restricted parameter regions in known phases, the neural networks can determine the phases of the wave functions in wide parameter regions in unknown phases; hence, the phase diagrams are obtained. We demonstrate the validity and generality of this method by drawing the phase diagrams of two- and higher dimensional Anderson metal-insulator transitions and quantum percolations as well as disordered topological systems such as three-dimensional topological insulators and Weyl semimetals. Both real-space and Fourier space wave functions are analyzed. The advantages and disadvantages over conventional methods are discussed.


Introduction
More than seven decades have passed since McCulloch and Pitts studied artificial neurons. 1) Using artificial neurons, Rosenblatt proposed a layered neural network called the perceptron, 2) which consists of input and output layers together with intermediate layers. The connections between layers, called weight parameters, can be changed to reproduce the correct input-output relations. This process of tuning the parameters is essentially "machine learning." The idea of the multilayer perceptron 3) is simple and can be applied to solve many problems, but it is only in the last decade that our computers have become powerful enough to solve complicated problems via large-scale multilayer neural networks, called deep learning. [4][5][6][7] Computational physics has successfully solved many problems in solid-state physics, and it is natural to use machine learning, including deep learning, to tackle complicated problems.
Random electron systems show the Anderson-type metal-insulator transition, [211][212][213] (socalled Anderson transition, also called delocalization-localization transition), quantum percolation transition, 214) topological-nontopologogical transitions, 215,216) and semimetalinsulator 217,218) and semimetal-metal transitions. [219][220][221][222] The wave functions of the ran- 2/48 dom/interacting quantum system are difficult to analyze owing to the large fluctuations of the wave functions, but a trained CNN has been shown to detect quantum phase transitions. It can detect topological states in one-dimensional (1D) systems, 223,224) two-dimensional (2D) Anderson transition and topological transitions such as the band-to-Chern insulator transition, 208,[225][226][227][228] 2D topological superconductor transition, 229) higher order topological insulators, 230) 3D Anderson and quantum percolation transitions, 231) as well as 3D topological phase transitions [232][233][234][235] such as topological insulators and Weyl semimetals. Quantum chaos 236) is related to a random electron system, which is also studied using neural networks. 237,238) The interplay of randomness and interaction is attracting renewed interest from the view point of many-body localization, [239][240][241][242][243] where the hypothesis of "eigenstate thermalization" no longer applies, and the machine learning is again shown to be powerful in recognizing whether the phase thermalizes. [244][245][246][247][248][249][250][251][252][253][254][255][256] In this paper, we review the application of the CNN to draw phase diagrams in random quantum systems. In the next section, we explain the methods, followed by a section on models and results, where the Anderson metal-insulator transitions and quantum percolation transitions in various dimensions, as well as the 3D topological insulator and Weyl semimetal transitions, are discussed. The last section is devoted to summary and concluding remarks.
Providing an exhaustive overview of the extensive literature on machine learning would be an exacting task, so here we focus on drawing the phase diagrams of quantum phase transitions in random systems. We do not pretend to give an exhaustive overview, and apologize that many aspects of the machine learning approaches in condensed matter physics will not be covered.

Methods
To draw the phase diagrams, we use the CNN consisting of three types of layers: convolutional layers, pooling layers, and fully connected layers. The basic structure of the CNN is illustrated in Fig. 1. This type of CNN has proved to be very powerful for image recognition. A famous example is LeNet. 4) Given the input to the first layer, the output of one layer propagates to the input of the next layer, and finally, the output of the last layer is obtained.
The CNN is therefore a type of feedforward network. The detailed process of the CNN is as follows.
We consider the electron density |ψ(x i )| 2 at x i (site index i) as input u (0) i . In the first convolutional layer, cells with a certain size are cut out from the input u (0) (see the small cube layers are repeated, and finally, the material phase to which the eigenfunction belongs is output through a fully connected layer. Here, the channel number of the first layer C 1 is 5. Cases of the d-dimensional input, 2D, four-dimensional (4D) and higher, are realized with a similar configuration.
inside "Input: Wave Function" in Fig. 1) and transformed by where u (0) j denotes the component of u (0) i in the jth cell, which is originally a tensor of rank-d (d being the dimensionality of the system) but arranged one-dimensionally, and W (1) k and b (1) k are the weight parameter of channel k(1 ≤ k ≤ C 1 ) and the bias parameter for channel k, respectively. The weight parameter W (1) k has the same dimension as u (0) j and does not depend on the position j at which the cell is cut out. During the training of the CNN, W and b are optimized to reproduce the input (eigenfunction)-output (material phase) relations. The convolution process corresponds to extracting the local features of the input data. We stride the position to cut out the cell and obtain the output u (1)′ so that we obtain C 1 images from an input image. We then apply the rectified linear unit (ReLU) to the output, to obtain u (1) j,k , where max(0, x) acts as an activation function expressing the firing of neurons. Note that the size of the cell, the stride value, and the numbers of channels are hyperparameters that cannot be optimized by training, and need to be chosen appropriately a priori.
In general, the selection of hyperparameters has an effect on learning accuracy in the training.
In the second and subsequent convolutional layers, the convolution process is performed over all channels. Then the transformation from the (n − 1)th layer to the nth layer with where C n and C n−1 denote the total number of channels in the nth and (n − 1)th layers, respectively. As with the first convolutional layer, we apply the ReLU to the output as an activation function, u (n) j,k = max(0, u (n)′ j,k ). In the pooling layer, located mainly after the convolutional layer, the maximum value in the cell is chosen, The number of channels is the same before and after the layer, since the sum over channels is not taken. The pooling process corresponds to removing noise and is useful for reducing the dimension of the input data.
In the fully connected layer, located mainly before the final output of the CNN, the multidimensional vector output from the convolutional layer or the pooling layer is flattened to the 1D vector u. Then, it is transformed by where q denotes the component of the vector u ′ and r denotes the index of each material phase. (We consider the fully connected layer consisting of two layers such as LeNet, but in a simple case, it is realized in one layer, u ′′ r = W r · u + b r .) In the case of the Anderson model, r = 0 and 1 correspond to the localized and delocalized phases, respectively.
In the final stage, we apply the softmax function to the last output u ′′ r , and obtain the final output u (out) r , which represents the "confidence" or "probability" P r that the eigenfunction belongs to the phase of index r.
To obtain a meaningful final output u (out) r , it is necessary to optimize W and b in each layer. In classification problems such as quantum phase determination, it is appropriate to update these parameters to minimize the cross entropy, closely related to the maximum likelihood estimation, 7) where P ′ r,i is a desired output value 5

Models and Results
We train and use the CNN to analyze the eigenfunctions obtained by diagonalizing the tight-binding Hamiltonian on a hypercubic lattice, where x indicates the position in a d-dimensional space, c † x (c x ) the creation (annihilation) operator at site x, and V x,x ′ the transfer between sites x and x ′ with x, x ′ restricting the transfer only between the nearest neighbors. v x is the random potential at site x. In the following, we consider square (2D), cubic (3D) and four dimensional (4D) hypercubic lattices. When we include spin and orbital degrees of freedom, c x becomes a vector and V x,x ′ a matrix.
The universality class of the random electron system [264][265][266][267][268][269][270] is determined by the basic symmetries of the Hamiltonian, such as time-reversal symmetry (TRS) and spin-rotation symmetry (SRS). Systems with broken TRS belong to the unitary class. Systems with both TRS and SRS belong to the orthogonal class, whereas those with TRS but broken SRS belong to the symplectic class. In our model, we change the universality class by modifying the transfer V x,x ′ . In the absence of a magnetic field and spin-orbit interaction, we take V x,x ′ = 1, and the system belongs to the orthogonal class. The choice V x,x ′ = exp(iθ x,x ′ ) describes the presence of a magnetic field, which breaks TRS; hence, the systems belong to the unitary class. To discuss the effect of spin-orbit interaction, V x,x ′ is set to SU(2) matrices [271][272][273][274] with the site potential v x independent of spin. In this case, TRS is preserved but SRS is broken; hence, the systems belong to the symplectic class.
Similar tight-binding models are used to discuss topological materials: in the case of the 3D topological insulator, V x,x ′ is set to be proportional to Dirac gamma matrices, 275) whereas in the case of the 3D Weyl semimetal, V x,x ′ is set to be proportional to Pauli matrices. 222,276) See Eqs. (15) and (21).
In the case of quantum percolation, we set the nearest neighbors to be connected randomly, i.e., V x,x ′ is finite or 0 randomly. In this paper, we consider the site percolation problem, where the sites are occupied randomly with probability p, and they are connected only when both of the nearest-neighbor sites are occupied. The connected sites form clusters, and when p ≥ p c , a cluster that connects one side of the system to the other appears. This p c is called the classical percolation threshold. For p < p c , all the clusters are isolated, and the wave functions on them cannot extend over the whole system. Thus, the system is always an insulator. The metal phase, however, does not necessarily appear for p ≥ p c , because the 7/48 wave functions on a cluster may remain localized even if the cluster is extended all over the system. The condition p > p c is, therefore, a necessary but not sufficient condition. Only when p ≥ p q ≥ p c does the current flow, where p q is the quantum percolation threshold. See Fig. 2, where cases of 2D site percolation, the percolation threshold p c of which is 0.5927 · · · , are shown. Note that all the states are localized in two dimensions 277) except for the symplectic class. We therefore assumed SU(2) transfer [see Eq. (13)] to observe the localizationdelocalization transition in the case of the quantum percolation problem [ In the following subsections, we draw various phase diagrams for many types of disordered system using the CNN. 8/48

3D Anderson model and quantum percolation
We first consider a 3D Anderson model of localization, 211) where v x is randomly and uniformly distributed in the range [−W/2, W/2], with W the disorder strength. Conventional notations use "W" for the weight parameters and the strength of disorder. In this paper, we follow this convention, but to avoid confusion, the weight parameters are written as vectors or with indices such as in Eqs. (1), (3), and (6), whereas the disorder strength is written as a scalar.
At energy E = 0, i.e., at the center of the band, the wave functions are delocalized when W < W c and the system is a metal. For W > W c , the wave functions are exponentially localized and the system is an Anderson insulator (AI). Here, the critical disorder W c is estimated to be 16.54 ± 0.01 by the finite size scaling analysis of the Lyapunov exponent calculated by the transfer matrix method. 278,279) 3D quantum percolation model.-We next consider the 3D quantum site percolation model described by the following Hamiltonian, [280][281][282][283] where the transfer V x,x ′ is defined as We take the energy unit to be the absolute value of the transfer energy between connected bonds. In the present case of site percolation, each site is filled with a probability p s , and a bond is connected only when both of the nearest-neighbor sites are filled. For each realization of site percolations, we identify the maximally connected cluster and analyze the states on this maximally connected cluster.
According to the study of quantum percolation, 283,284) the strongly localized states, socalled molecular states, exist at some energies, such as E = 0, ±1, ± √ 2, ± √ 3. These states are peculiar to the quantum percolation model and are degenerate, resulting in the strong peaks in the density of states. Owing to the degeneracy, any linear combination is possible, which may result in a difficulty of judging the delocalized/localized phases. We therefore assume a weak site random potential on the order of 10 −3 , namely, we add x v x c † x c x to the Hamiltonian with v x ∈ [−10 −3 /2, 10 −3 /2] and lift the degeneracy. 9/48 In both Anderson and quantum percolation models, we consider a 40 × 40 × 40 simple cubic lattice and impose a periodic boundary condition. The maximum modulus of the eigenfunction is shifted to the center of the system to improve the accuracy of the machine learning.
For training the neural network, we vary W in the Anderson model in the range W ∈ [14,16], where W < W c = 16.54 ± 0.01 (metal phase where the wave function is delocalized), and for each W, we diagonalize the Hamiltonian via sparse matrix diagonalization and obtain the eigenfunction closest to the band center, E = 0. We choose 4,000 different W to prepare 4,000 eigenfunctions and label them "delocalized". We then change the range of W to W ∈ [17,19] (insulating phase where the wave function is localized), prepare 4,000 eigenfunctions, and label them "localized." We then set u (0) i = |ψ(x i )| 2 and feed these to the 3D CNN and train the neural network so that it recognizes the localized and delocalized states with high accuracy, typically > 99%. Fig. 3(a), we show the probability that the states are localized at E = 0, E being the Fermi energy. The test data is 100 eigenfunctions with various W, and the average over five samples is taken as in Fig. 1 of Ref. 232) From this figure, the phase transition from a delocalized to a localized phase has been confirmed around W c , from which we confirm that the CNN correctly detected the Anderson transition.

Result for Anderson model.-In
We then prepare eigenfunctions throughout the energy spectrum with varying W, and let the machine (CNN) determine the phase. In Fig. 3(b), we plot 0 × P loc + 1 × P deloc = P deloc as a heat map. The sharp change in color from red to blue indicates that the CNN correctly identified the metal-insulator transition. This rapid change in color indicates the phase boundary, which is in good agreement with the previous results. [285][286][287] Near the band edges, even for small W ≈ 0.5W c , the machine judged that the eigenstates are localized. These states near the band edges are localized because of potential localization with little quantum interference. We note that the CNN has been trained only with the eigenfunctions around E = 0, where the localization is caused by quantum interference due to multiple scatterings, not by potential localization.
Application to quantum percolation.-Now, we apply the CNN trained for the Anderson model at the band center [green arrows in Fig. 3(b)] to obtain the phase diagrams of the 3D quantum percolation. Owing to the random connection between the sites, the transfer matrix method is not applicable [see Eq. (A·2)]. In addition, the density of states is spiky for quantum percolation. Drawing the phase diagram, therefore, is more difficult than in the case of the Anderson model. 10 which the sites percolate. 288,289) We see that the quantum percolation transition occurs well above 0.312, which means that even if the sites percolate, the wave function on the sites remains localized. The quantum percolation threshold p q depends on the energy nonmonotonically. We emphasize that the CNN used to draw this phase diagram is trained in a small region of the phase diagram, indicated by the green arrows in Fig. 3(b), with no additional training for quantum percolation.

Generalization capability
As we have seen above, once the CNN is trained in small regions of the phase diagram, it can determine the phase outside the training region [ Fig. 3(b)] as well as the phase of a different model such as the quantum percolation model (Fig. 4). Thus, we have demonstrated the generalization capability of the CNN.
We can further test the generalization capability by changing the site potential of the Anderson model from the random box distribution, whose probability distribution is   We can also break the TRS by adding random phases to the transfer, V x, with θ x,x ′ uniformly distributed in [0, 2π). 290) Figure 6 shows the phase diagram. The cross × 12/48 indicates the estimate by the transfer matrix method, 291) which is consistent with the present results. To compare the phase diagram of the orthogonal class, Fig. 3(b), we scale the disorder strength, W, by the critical disorder of the orthogonal class, W c = 16.54. The states at E = 0, W/W c = 1.05 are localized in the orthogonal class [ Fig. 3(b)], but are delocalized in Fig. 6. This is contrary to the naive expectation that the addition of the random phases results in stronger disorder; hence, the localization is enhanced, not suppressed. In fact, the effect of breaking TRS, which causes delocalization, overcomes the effect of the addition of randomness, leading to random-magnetic-field-induced delocalization. This nontrivial feature of the phase diagram is correctly captured by the CNN.

2D SU(2) model and quantum percolation
In random non-interacting electron systems, all the states are localized in two dimensions and there are no metal phases. 277) In the presence of spin-orbit scattering, however, electrons can be extended, 267,268,292) and the system undergoes a metal-insulator transition with change in the disorder strength and Fermi energy.
To incorporate the spin-orbit interaction in the tight-binding model, Eq. (9), we choose the transfer V x,x ′ to be SU(2) matrices. [271][272][273] To analyze the localization-delocalization transition, which is characterized by the divergence of localization/correlation lengths ξ, other length scales such as the spin-precession length should be much shorter than ξ. We therefore 13/48 take V x,x ′ to be random. 272,273) Of all the choices of the probability distribution of V x,x ′ , we take the invariant Haar measure, with α and γ uniformly distributed in the range [0, 2π). The probability density P(β) is The Anderson transition of this 2D system is well detected by the CNN. In Fig. 7, we plot the probability P deloc . The sharp change in the color at the metal-insulator phase boundary (dashed line 274) ) indicates that the CNN has correctly detected the Anderson transition. training. The dashed line is the estimate of the transfer matrix method. 274) As in the case of the 3D Anderson transition, once the CNN is trained for a regular square lattice, we can apply this CNN to the quantum percolation model where the lattice is random. Figure 8 shows the phase diagram of the 2D quantum percolation for the symplectic class. As in the 3D quantum percolation, the 2D quantum percolation threshold is significantly higher than the classical percolation threshold (dashed line) of p classical s ≈ 0.5927 · · · . 214) Note that there is no Anderson transition for the orthogonal class. In the case of the unitary class, the quantum Hall transition [293][294][295][296] takes place in high magnetic fields or in a Chern insulator. The supervised training approach is also valid for this quantum Hall transition, 208,225) where the critical exponent is extracted. 297) 14/48

Anderson transition and quantum percolation in higher dimensions
It is instructive to discuss the Anderson transition and quantum percolation in higher dimensions. The critical exponent ν is known to be 1/2 in the limit of infinite dimensions, [298][299][300] and the Borel-Padé approximation 301) successfully interpolates the exponents in low to high dimensions. 302,303) The Anderson transition in four dimensions can be studied, for example, using the quantum kicked rotor with amplitude modulation realized in atomic matter waves. 304) For human beings, the wave functions in 4D space are difficult to imagine and analyze, since our eyes and brains are already trained to observe 2D and 3D images. For a machine, it does not matter whether the images are 3D or 4D. As in the case of the 3D Anderson transition and quantum percolation, we prepare 2,000 4D wave functions in the metal phase ( Fig. 9(a). The CNN trained for the 4D Anderson model is then used to draw the phase diagram of 4D quantum site percolation, Fig. 9(b). Again, the quantum percolation threshold, i.e., the localization-delocalization phase boundary, is well The green arrows indicate the training region. In (a), the disorder strength W is scaled by W 4D c ≈ 34.62. 302) In the quantum percolation model, again the quantum percolation threshold is well above the classical percolation threshold (dashed line).

3D topological matter
Some of the band insulators are now recognized as topological insulators, [306][307][308][309][310] where the bulk wave functions have nontrivial topology. As a consequence, the interface between the bulk (nontrivial) and vacuum (trivial) shows edge/surface states. Another interesting topological material is the 3D Weyl semimetal, 311,312) where the hybridization of surface states and bulk Weyl nodes appears. See Figs. A·1 and A·2 in Appendix. Here, we use the CNN to detect these novel surface states of 3D topological insulators and Weyl semimetals. One of the advantages of detecting the surface states is that we can detect the topological phase even in the presence of randomness, which breaks translational invariance.

3D topological insulators
We first consider the topological insulators using the Wilson-Dirac-type tight-binding Hamiltonian, 275,313) where c † x (c x ) is a four-component creation (annihilation) operator on a simple cubic lattice at site x, and e µ is a unit vector in the µ-direction. α µ and β are gamma matrices defined by where σ µ and τ µ are Pauli matrices that act on the spin and orbital degrees of freedom, respectively. m 0 is the mass parameter, and m 2,µ and t are transfer energies. In the absence of randomness, the energy band reads The random potential v x is uniformly and independently distributed between Systems of size 24×24×24 are diagonalized numerically, and the state whose eigenenergy is closest to the band center E = 0 is taken. In the following, we set t = 2 and m 2,z = 0.5. In this case, the ordinary insulator (OI) phase appears in m 0 > 0, the STI phase in 0 > m 0 > −1, the WTI phase with weak index (001) in −1 > m 0 > −2, and the WTI phase with weak index (111) in −2 > m 0 > −3. 314,315) The eigenfunctions for the state |ν have four components due to spin and orbital degrees of freedom, and are denoted as ψ ν (x, y, z, i) , (i = 1, 2, 3, 4). We define a 3D image by 232) the 3D wave function is mapped to a 2D image by integrating |ψ(x, y, z)| 2 over one direction, for example, the z-direction, and the surface states that extend parallel to the z-direction become edge states in the 2D image. This method, however, has difficulty in distinguishing STI from WTI (111). In this paper, we use 3D image recognition to distinguish these different topological phases.
To prepare the training data, we set W This is called the topological Anderson insulator (TAI) transition. [317][318][319] The present method captures the TAI and gives a phase diagram quantitatively consistent with that obtained by 18/48 the transfer matrix method. 216,316) It should be emphasized that training along a few finite 1D lines in a 2D parameter space enables us to draw the phase diagram. We also note that the phase boundary between OI and STI is colored red, which indicates that the phase on the phase boundary is a metal phase. In fact, the Dirac semimetal continues to exist on the phase boundary even in the presence of disorder. 221,320) So far, we have considered 3D wave functions in real-space, but with small additional numerical costs, the wave functions in the Fourier space (k-space) are calculated, where the fixed boundary condition in the x-direction is taken into account, and k y = 2πn y /L y , k z = 2πn z /L z , and k x = πn x /(L x +1), with integers n satisfying 0 ≤ n µ < L µ (µ = y, z), and 1 ≤ n x ≤ L x . We can also work with the hybrid space, where we Fourier transform the wave functions only in the y-and z-directions, Now, we can train the CNN by using |ψ(k i |ψ(x, k y , k z , i)| 2 as 3D images, and draw the phase diagram in exactly the same way as in the case of real-space analyses. The obtained phase diagram is shown in Fig. 10(b), where the colors change more sharply and clearly when the phase changes between insulators with different topologies.
Before concluding this subsection, we note that the standard method of using the transfer matrix 216) to determine the phase diagram in the presence of disorder breaks down for the choice of parameters t = m 2,µ , where µ is the direction along the transfer matrix multiplication (see Sect. A.3). This is because the transfer matrix connecting a layer to the next layer is not invertible for t 2 − m 2 2,µ = 0. 313) The method presented in this subsection, therefore, has wider applicability. 232)

3D Weyl semimetal
We next consider the 3D Weyl semimetal (WSM). 311,312) One way of realizing the 3D WSM is to consider 2D Chern insulators (CIs) 222,321,322) and stack them in the zdirection. 222,323) We begin with a spinless two-orbital tight-binding model on a square lattice, which consists of an s-orbital and a p ≡ p x + ip y orbital, 324) and stack them in the z-direction to form a cubic lattice, where ǫ s , v s (x), ǫ p , and v p (x) denote the atomic energies and disorder potentials for the s-and p-orbitals, respectively. Both v s (x) and v p (x) are uniformly distributed within [−W/2, W/2] with an independent probability distribution. t s , t p , and t sp are transfer energies between neighboring s-orbitals, p-orbitals, and that between s-and p-orbitals, respectively. t ′ s and t ′ p are interlayer transfer energies, i.e., hopping elements in the z-direction.
In the absence of randomness, the Hamiltonian matrix is expressed in k-space as with σ = (σ x , σ y , σ z ) Pauli matrices and As in Ref., 222) we set ǫ s = −ǫ p , ǫ s − ǫ p = −2(t s + t p ), t ′ s = −t ′ p > 0, t s = t p > 0, and t sp = 4t s /3, and take 4t s as the energy unit. The dimensionless interlayer coupling is defined as In the absence of randomness, this choice of parameters realizes CI with a band gap in the 2D limit, β = 0. As long as 1/2 > |β| ≥ 0, the energy band remains gapped, and the system continues to be CI. The system enters into the 3D WSM phase for |β| > 1/2. 222) In the presence of randomness, four phases appear; CI, WSM, DM, and the Anderson insulator. Here, we focus on the first three phases by considering W < 2.5 and 0.3 < β < 0.6.
(The Anderson insulator phase appears in the larger W region.) Actually, WSM can be further classified according to the number of Weyl node pairs. For example, WSM(II) has two pairs of Weyl nodes, k = ±k 1 , ±k 2 , where the energy in a clean system is E(±k i ) = 0 (i = 1, 2). In our set of parameters, we expect two or three pairs of Weyl nodes, so we define here P WSMII and P WSMIII instead of P WSM alone. and DM, respectively. We then draw a color map in the W-β plane, as shown in Fig. 11, which quantitatively reproduces the phase diagram obtained by the transfer matrix method. 222) As in the case of the topological insulator, the phase diagram based on the supervised training of real-space wave functions [ Fig. 11(a)] is noisy. Again, the situation is improved if we work in the k-space [ Fig. 11(b)].

Summary and Concluding Remarks
In this study, we have shown how the neural network is used to draw various phase diagrams in the quantum phase transitions. We have used the wave function as an input and determined the material phase in which the wave function is obtained. Both real-space and k-space wave functions are used. Note that numerical diagonalization is carried out in the real-space where the Hamiltonian becomes sparse, and the k-space wave function can be calculated with small extra numerical costs, since we focus on the wave functions closest to the band center for topological systems.
In the case of topological insulators, the phase transitions between different topological phases are more clearly detected by the CNN if we work in the k-space. The phase boundary between the metal phase and ordinary/topological insulators, however, does not agree well with the transfer matrix calculation. The phase boundary between metal and insulators is more accurate if we work in the real-space. This is also the case for the Anderson metal-insulator transition, where working in real-space is better than that in k-space. Whether working in  222) whereas the white dashed lines are the estimate by the self-consistent Born approximation. 222,326) The arrows indicate the parameters along which the training data have been prepared.
k-space is better than that in real-space, therefore, depends on the nature of the transition.
One of the advantages of this approach is the wider applicability; it can be applied to Another approach to determine the critical point is that we assume the critical point x c , vary x c and observe how the training scores change. 177,297) Although the estimate of the critical point is less precise than the conventional method, the idea may be applied to problems where the 22/48 conventional method is difficult to apply.
The accurate estimate of the critical exponent for the quantum phase transition, such as the quantum Hall effect transition, 297) is still underway, but is an important problem left for the future. One might think that the probability P, as in Fig. 3(a), around the critical point changes more rapidly with the increase in the system size L, with the slope proportional to L 1/ν , ν being the critical exponent for the divergence of the length scale. We, however, do not know how we should change hyperparameters such as convolution/pooling kernel sizes and the depth of the network as we change L. Take the 3D Anderson transition as an example. For L = 40, we used convolution kernel size 5 and pooling kernel size 2, and the network consists of 6 convolutions, 3 pooling, and 2 fully connected layers (see Table A·1). When we simulate larger systems, say, L = 80, should we use the same hyperparameters, or should we increase the kernel sizes and network depth? In the case of the latter, how? Unless we understand the effect of kernel sizes and network depth on finite size scaling, a reliable estimate of ν and its error bar is difficult.
One of the important quantities that is often used in the context of the localizationdelocalization problem 328,329) is the inverse participation ratio (IPR), 330)  We now take the average of IPR, IPR E , where IPR E is the average of IPR over a small energy bin around the energy E, and · · · is the sample average. The results are plotted in Fig. 13(a), where the phase transition is still difficult to observe.  Fig. 13(b), we plot Θ(IPR E − IPR c ) where the sample average is taken, which resembles Fig. 3, but the phase boundary is not as sharp as in Fig. 3. Thus, the CNN 23/48 has some advantage over IPR analysis. The biggest advantage, however, is that we do not need to discover IPR to characterize the Anderson localization. So far, we have concentrated on the static properties of wave functions. Another quantity that changes its behavior across the transition is the diffusion property, which is related to the dynamics of wave packets. In the metal (delocalized) phase, initially localized wave packets begin to be extended with time t, whereas in the insulator (localized) phase, initially localized wave packets remain localized. This diffusion property is characterized by the time evolution 24/48 of the diffusion length r of wave packets, which behaves as where D is the diffusion constant and α = 2/d for the Anderson transition. 331) One way to detect such changes in time-dependent behaviors is to use the CNN for analyzing r(t) vs t images. Another way is to use a recurrent neural network, which is widely used for analyzing time series. 332,333) The deep neural network used here is a tool for classifying the phase and is regarded as a blackbox. The properties of the neural network themselves are also interesting to study from the physics view point, 185,194, especially in relation to the renormalization group [367][368][369][370][371][372][373][374][375] and tensor network. [376][377][378][379][380][381][382][383][384][385] The vulnerability of phase determination against adversarial perturbation is also an interesting topic. 386,387) Whether the neural network can learn formula such as the winding number is an interesting question, and in fact, it seems to be the case. 388,389) It is natural to apply machine learning to quantum computers. [390][391][392][393][394][395][396] as well to apply quantum algorithms to machine learning. [397][398][399][400] Acknowledgements The authors would like to thank Dr.   is used for (f)-(h). Fermi arcs are observed in (g) and (h).

A.2 CNN hyperparameters
As mentioned in Sect. 2, the CNN has parameters that are not optimized during the course of supervised training. These parameters, so-called hyperparameters, must be selected in advance so that the CNN determines the quantum phases for the validation set with high accu- 26/48 racy. Our selections of the parameters for Sect. 3 are shown Tables A·1 to A·4. In the tables, the "kernel size" corresponds to the size of the cell cut out from the images in the previous layer. When the "padding" is True, zero padding, namely, adding zeroes to the peripherals of the input, is applied so that the output shape is the same as the input shape, whereas the output shape decreases through the convolution layer when the "padding" is False. To be more specific, when the kernel size is m with the input linear dimension L, the output is L − m + 1 for padding False, whereas the output size remains L for padding True. To avoid overfitting, the dropout process, which randomly drops half of the inputs, has been implemented after each pooling layer as well as the fully connected layer, except for the last layer. For the 4D CNN, the Adam method was used to minimize the cross entropy, whereas the AdaDelta method was used for the others.

A.3 Breakdown of transfer matrix method
In this subsection, we explain why the transfer matrix method, which is widely used in the study of Anderson localization, 212,278,401,402) breaks down in certain lattices such as quantum percolation, fractal lattices, 403) and topological insulators 313) as well as the Weyl semimetal. 222) For simplicity, as in the main text, we consider the Hamiltonian where only the nearestneighbor couplings are allowed, and consider a long bar in the x-direction with cross section 27/48  L y × L z . We denote the values of the wave function on the nth cross section normal to the x-direction as the M-dimensional vector Ψ n , where M is the degree of freedom on the cross section (L y × L z × internal degrees of freedom such as spin and orbital). From the Schrödinger equation, we relate Ψ n+1 to Ψ n and Ψ n−1 as EΨ n = H n Ψ n + V n,n+1 Ψ n+1 + V n,n−1 Ψ n−1 , where H n is the M × M Hamiltonian matrix on the nth cross section, and I M and 0 M are unit 28/48 and zero matrices of dimension M, respectively. The transfer matrix, therefore requires the existence of the inverse matrix, V −1 n,n+1 . In the quantum percolation, V n,n+1 is a diagonal matrix, the elements of which are zero when the nearest-neighboring sites in the x-direction are disconnected, leading to det(V n,n+1 ) = 0. In the case of a fractal lattice, V n,n+1 can be nonsquare matrix. In the case of topological insulators, det(V n,n+1 ) = ((t 2 − m 2 2,x )/4) 2L y L z , so even in the case of a simple cubic lattice, the transfer matrix method does not apply for the choice of parameters, t = m 2,x .
Similarly, in our model of the 3D Weyl semimetal, Eq. (21), the transfer matrix method breaks down in the x-and y-directions when t 2 sp − t s t p = 0. 29/48