Observing a topological phase transition with deep neural networks from experimental images of ultracold atoms

Although classifying topological quantum phases have attracted great interests, the absence of local order parameter generically makes it challenging to detect a topological phase transition from experimental data. Recent advances in machine learning algorithms enable physicists to analyze experimental data with unprecedented high sensitivities, and identify quantum phases even in the presence of unavoidable noises. Here, we report a successful identification of topological phase transitions using a deep convolutional neural network trained with low signal-to-noise-ratio (SNR) experimental data obtained in a symmetry-protected topological system of spin-orbit-coupled fermions. We apply the trained network to unseen data to map out a whole phase diagram, which predicts the positions of the two topological phase transitions that are consistent with the results obtained by using the conventional method on higher SNR data. By visualizing the filters and post-convolutional results of the convolutional layer, we further find that the CNN uses the same information to make the classification in the system as the conventional analysis, namely spin imbalance, but with an advantage concerning SNR. Our work highlights the potential of machine learning techniques to be used in various quantum systems.


I. Introduction
Since the discovery of quantum Hall effect in a 2D electron gas [1], topological quantum phases, which are usually characterized by nonlocal invariants, have played a pivotal role in quantum matter research. In contrast to the solid-state materials, ultracold atoms provide a clean and highly-controllable experimental platform to investigate topological quantum systems [2,3]. To date, many paradigmatic topological models have been realized and explored in experiments with ultracold atoms using artificial gauge fields [4,5], such as the realization of the 1D Su-Schrieffer-Heeger model [6], the 1D symmetry-protected topological (SPT) phase [7], the 2D Chern insulator [8] and nodal [9] or Weyl [10] semimetals in 3D. While some topological phases have already been successfully characterized in recent works [7][8][9]11], uncovering a proper observable is generically subtle and highly model-dependent due to the absence of the local order parameter, which makes detecting topological phases from the experimental data obtained from ultracold atoms remains challenging.
Recently, machine learning techniques, such as deep convolutional neural network, have become an indispensable tool in many technological areas and also proven its power in quantum many-body physics [12][13][14][15][16]. For example, both supervised learning and unsupervised learning have been used to determine topological phases [17] and further map out the phase diagram [18,19] due to their ability to classify or identify massive data sets such as experimental images. Here we implement machine learning techniques such as deep convolutional neural networks (CNN) to classify single-shot absorption images with different SPT phases which are characterized by the Z 2 invariant [7]. We train the neural network on experimental images far from the phase transition points, and then use the trained neural network to identify the phase transition point. An exemplary analysis with a conventional method demonstrates that the classification of single pair of images is difficult for conventional techniques and sufficiently large ensemble average is required due to the low SNR in single pair of images. By visualizing the filters and activation region of the trained neural network, we further interpret that spin imbalance information plays an important role when the neural network makes the classifications. Our work demonstrates the ability of machine learning techniques in processing experimental images which may provide a deeper understanding of the emergent topological quantum matter.

II. Experimental preparation
The topological phase transition to be analyzed by the neural network is experimentally realized with ultracold fermions in an optical lattice with spin-orbit couplings as described in Fig. 1a [20,21] (see Supplementary materials for more details). We begin with precooled 173 Yb atoms in the intercombination magnetooptical trap followed by the evaporative cooling in a crossed optical dipole trap. After the final stage of the optical evaporative cooling, we achieve a two-component   [7,22]. The atoms are then loaded into a 1D optical lattice dressed by a Raman coupling potential within 10 ms. A spin-dependent optical ac Stark shift is added to energetically lift up all other hyperfine levels except two relevant levels realizing an effective spin-1/2 subspace. This Stark shift beam is applied 5ms before the optical lattice potential and the Raman coupling beam are adiabatically ramped up. After holding the atoms in lattice for 2ms, all the laser beams are suddenly switched off and 8ms spin-resolved time-of flight (TOF) images are taken after a spin sensitive blast sequence (see Supplementary materials for more details). Therefore, for each data we obtain three raw images which include an image I ↓,o with |↑ atoms removed, an image I ↑,o with |↓ atoms removed and a background image I o with both |↑ and |↓ atoms removed. Finally, the distribution of spin |↑ and |↓ atoms can be extracted from I ↑,o − I o and I ↓,o − I o , respectively(see Fig. 1c,d).
The realized 1D optical Raman lattice has a Hamiltonian of 2m is the kinetic energy along x direction, V ↑,↓ (x) = V 0↑,0↓ cos 2 (k 0 x) are the optical lattice potentials for spin |↑, ↓ states with V 0↑,0↓ denote the lattice depths and k 0 = π/a, M(x) = M 0 cos (k 0 x) is the periodic Raman coupling potential with amplitude M 0 , δ is the two photon detuning and1, σ x,y,z denote the unit matrix and Pauli matrices. In the Hamiltonian, the lattice potential V ↑,↓ (x) induces the nearest-neighbor hopping which conserves the spin, while the Raman coupling term M(x) contributes to the hopping that flips the spin. The effect of these hoppings leads to nontrivial topological phases protected by symmetries [7]. According to the Altland-Zirnbauer classification [23] and further calculation, the Hamiltonian satisfies a nonlocal chiral symmetry and a magnetic group symmetry that is defined as the product of time-reversal and mirror symmetries, and the topological invariant of this system is characterized by an integer winding number [7,20,23]. By varying the two-photon detuning δ, we can realize the topologically nontrivial or trivial phases corresponding to ν = 1 or ν = 0 as shown in Fig. 1b.

III. Train neural network to detect the phase transition
We take the spin |↑ and |↓ data for varying twophoton detuning. The |↑ and |↓ of each experimental run constitutes a sample for the model. The label of the data can be determined by normalized spin polarization P (q x ) = (n ↑ (q x ) − n ↓ (q x ))/(n ↑ (q x ) + n ↓ (q x )) averaged over all the data with same two-photon detuning δ. When P (q x ) > 0 ∀q x ∈ FBZ or P (q x ) < 0 ∀q x ∈ FBZ, the phase is topologically trivial with the Z 2 -invariant ν = 0 and we label the data to Trivial 1 or Trivial 2 respectively. For the case that P (q x ) is partially spin-up dominated, the phase is topologically nontrivial with the Z 2 -invariant ν = 1 and we label the data to Topological. For each detuning value, five data are randomly chosen to form the test dataset and the remaining data with the detuning value far from the transition regimes are Probability randomly assigned to a train dataset and a validation dataset with an approximate 4:1 ratio. The resultant train, validation and test dataset contains 140, 31 and 95 datapoints respectively. We then establish a CNN architecture that contains a convolutional layer, a Global Average Pooling (GAP) layer and three hidden fully connected layers as shown in Fig. 2. We use 32 1D filters in the convolutional layer oriented perpendicular to the direction of the lattice direction. The choice for the orientation of filters is based on the physical intuition that the system is onedimensional, thus the filter correlates data in the vertical direction and passes a one-dimensional information along the lattice (horizontal) direction to the rest of the CNN, similar to the integration procedure before analyzing in the conventional method. During training process, the train dataset is fed into the neural network which outputs a prediction probability P r(C i ) for each class {C i } = {Trivial 1, Topological, Trivial 2} when a single pair of images is loaded. The parameters of the CNN are tuned iteratively to minimize the sparse categorical cross entropy loss function using the Adam optimizer [24]. After every epoch of training, the model is evaluated with the validation dataset, and the training is completed if the validation accuracy does not further improve. The final train and validation accuracy are both 100%, and the total number of epochs used is around 120.
To identify the transition regime, we let the trained CNN predict the reserved test dataset. Specifically, we sum up the predicted probabilities for Trivial 1 and Trivial 2 classes to obtain the probability for topologically trivial case (Fig. 2a), since they have the same Z 2 invariant ν = 0, and the predicted probability for Topological class corresponds to Z 2 invariant ν = 1 as summarized in Fig.3a. The output probability for ν = 1(0) is then plotted as a function of two-photon detuning δ (Fig.3b), providing two phase transition regime around δ = 0.32Er and δ = 1.51Er. The prediction is in quantitative agreement with our previous conventional analysis [7], which is extracted from another dataset with shorter TOF and higher signal-to-noise ratio. (shown by the pink shadow in Fig.3b) To highlight the advantage gained via the neural network approach, we show an exemplary analysis using the conventional method in Fig 4. A spin polarization of the ground band is reconstructed from spin-up and spin-down averaged images (Fig. 4a) leading to the measurement of topological invariant. However, sufficiently large ensemble average is required to correctly classify the topological states as shown in Fig. 4b. For detuning δ = 0.7Er, the classification from the conventional method hardly converges before an averaging of 15, which indicates even more images are needed for an unambiguous classification. Notably, it is almost impossible to probe the topological phase transition using the conventional analysis with a single pair of images without ensemble averaging (Fig. 4c).

IV. Interpret the neural network
Whereas we have focused on the machine learned analysis of topological phases transition, it will be interesting to investigate how the neural network extracts physical observables relevant to the topological property of the system. This understanding may to some extent exclude the possibility that the probability curve in Fig.3b is a simple interpolation, and also allow to generalize the application of machine learning analysis to various manybody quantum systems [16,[25][26][27][28][29][30][31]. In our work, the machine learning analysis of topological quantum phases can guide us to extract right features from experimental images.
Here, we investigate how the neural network performs the classification in our system. To this end, we start by visualizing the filters F k (k = 1, ..., 32) of the optimal CNN. Each filter consists of 2 channels F k↑ and F k↓ which act on spin up and spin down image respectively and serve as integration operations on the vertical dimen- sion of the corresponding image. Looking at the filters, we find that approximately 75% of the filters are trained to have an opposite weight for spin |↑ and |↓ channels (One example with k = 11 is shown in Fig. 5a; all filters being used in this procedure are availabe in Fig. S2 of Supplementary material). This type of subtraction operation indicates that these filters attempt to learn the spin imbalance information from the spin |↑ and |↓ data, which is one of key observables in topological bands for ultracold atoms [7]. Besides, the channels also display opposite sign of weights between the central and background regions, which shows that the filter learns to distinguish the central region from the background noises and suppresses the noises. The weights in the central region have large magnitudes, which emphasizes the central region of the image during the integration. Analogously, in the conventional analysis, we first preprocess the image by cropping the central 141 pixels along the vertical dimension before calculating the spin imbalance.
We further pass three images from three different classes I i through the convolutional layer of the neural network to calculate the post-convolution results R ki for this filter F k , which is also known as the feature maps. (Fig. 4b), where F k↑ and F k↓ denotes the weights of filters for spin |↑ and |↓ channels respectively, B k is the bias corresponding to the filter F k . This type of filter outputs distinguishable feature maps for three different classes, with magnitude Trivial 2 > Topological > Trivial 1, which looks like conventionally-calculated spin polarization within the FBZ. These results suggest that the behavior of the convolutional layer is analogous to the conventional analysis, where the spin polarization is first extracted from the spin |↑ and |↓ atoms.
Finally, we record the averaged GAP score G ki = 1 500 ReLu( x R ki (x )) when all the data in test dataset forward-propagate the first stage of the network. The resultant G ki is then plotted as a function of two photon detuning δ as shown in Fig.5c. Since G ki can be understood as the average value of the negative-truncated area in R ki , it takes a value of nearly zero, relatively high positive, and intermediate when the input image belongs to Trivial 1, Trivial 2, and Topological class respectively. This forms three different plateaus with two boundaries locating near the phase transition regime. Thus, the GAP score outputted by each filter can be understood as a single-filter version phase classifier and the dense layers further process, compare and summarize the independent viewpoints from all filters and give a final probability with an improved validity to reduce the appearance of outliers. (Visualization of other filters is listed in the supplemental document.)

V. Conclusion
In summary, we have demonstrated the power of machine learning techniques in processing massive dataset by applying a deep CNN to the experimental images with different SPT phases characterized by the Z 2 invariant. The trained neural network identifies two phase transition regimes, which is in good agreement with the previous result processed with conventional analysis. Different from the previous 6 ms TOF dataset in [7], the current dataset has longer TOF, which potentially leads to lower signal-to-noise ratio. Except for the training dataset and their labels, only limited prior knowledge is required when we establish the network architecture and pre-process the dataset, which provides better generalization capability comparing with conventional analysis. We further visualize the filters and convoluted result of the trained neural network and find that the spin imbalance information plays a significant role when the neural network makes the classification, which implies that the probability curve represents the phase transition between two distinct phases instead of simple interpolation. This suggests that the neural network analysis not only processes previously unknown information [16] but also performs a conventional analysis in a noise-resilient manner. It will be interesting to systematically investigate how the neural network can be resilient to systematic noises (e.g. shot-to-shot shift of the atomic cloud or photon shot noises) than conventional analysis. Another possible future direction is unsupervised machine learning algorithms which can identify the phase transition with unlabeled data and be applied to quantum systems with unknown topological phase diagrams or hidden orders [19].
Our work shows machine learning techniques, such as CNNs, provide a useful and convenient tool to analyze the experimental data with limited SNR obtained from ultracold atom experiments. In particular, this technique would be particularly useful for analyzing spin polarization in topological systems [9,11,32]. Especially, this approach may open an interesting possibility of extracting topological invariants from the high-dimensional topological system [9].