Can a CNN trained on the Ising model detect the phase transition of the $q$-state Potts model?

Employing a deep convolutional neural network (deep CNN) trained on spin configurations of the 2D Ising model and the temperatures, we examine whether the deep CNN can detect the phase transition of the 2D $q$-state Potts model. To this end, we generate binarized images of spin configurations of the $q$-state Potts model ($q\ge 3$) by replacing the spin variables $\{0,1,\dots,\lfloor q/2\rfloor-1\}$ and $\{\lfloor q/2\rfloor,\dots,q-1\}$ with $\{0\}$ and $\{1\}$, respectively. Then, we input these images to the trained CNN to output the predicted temperatures. The binarized images of the $q$-state Potts model are entirely different from Ising spin configurations, particularly at the transition temperature. Moreover, our CNN model is not trained on the information about whether phases are ordered/disordered but is naively trained by Ising spin configurations labeled with temperatures at which they are generated. Nevertheless, the deep CNN can detect the transition point with high accuracy, regardless of the type of transition. We also find that, in the high-temperature region, the CNN outputs the temperature based on the internal energy, whereas, in the low-temperature region, the output depends on the magnetization and possibly the internal energy as well. However, in the vicinity of the transition point, the CNN may use more general factors to detect the transition point.

1. Introduction Machine learning (ML) employing an artificial neural network (NN) has seen renewed interest in recent years and has been widely applied in various branches of science using its ability to capture features and classify them (see, e.g., Refs. [1,2]). A NN is conceptually inspired by the structure of the brain neurons: it consists of a network of nodes (representing neurons) aligned in layers so that the nodes in each layer are connected by the links (synapses) through which data are passed to each node. The node is activated (fired) when the sum of the weighted data exceeds a threshold value called bias. These weights (corresponding to the strengths of synaptic connections) are at first randomly initialized, and trained by repeatedly passing the training data through the network until the NN outputs something meaningful. A NN trained in this way can have the ability to solve complex realworld problems that conventional approaches have failed to handle, such as object recognition and detection in images. There exist different types of NNs for different tasks. Among them, a convolutional neural network (CNN), which will be used in this study, is particularly suitable for processing 2D data such as images (see Ref. [3] for a recent review).
It is natural that NNs, which are capable of extracting specific features of real-world objects and classifying them, have begun to be used as tools in studies of theoretical physics (see, e.g., Refs. [4,5]). Let us focus our attention on an application of NNs for problems of order-disorder phase transitions [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26], which is the main topic considered in this letter. More specifically, NNs with supervised [6,7,8,9,10,11,12,13,14,15,16,17,18] and unsupervised [19,20,21,22,23,24,25,26] learning have been used to accurately identify phases and phase transitions. In supervised learning, each training data (e.g., an image of a spin configuration) is supplemented by some labels (e.g., ordered/disordered, temperature, magnetization, etc.), and the NN is trained until it outputs values approximately identical to the labels of the input data. In unsupervised learning, on the other hand, the training data are supplemented by no such labels, and the NN is trained until it extracts some discriminative features from the input data.
In supervised learning, there exist several approaches to detect transition temperatures. (i) The first approach relies on a binary classification [6,7,8,9,10,11,12]: a NN is trained on a dataset where each data is labeled with, e.g., 1 (ordered) or 0 (disordered) so that it identifies whether the input is ordered or disordered. Namely, prior information about phases is provided to the NN as labels of training samples. The trained NN can precisely detect the transition temperature by automatically reading the value of the order parameter of the input sample. (ii) The second approach is more naive and trains a NN only on data labeled with the temperatures at which they are generated [14,15,16,17]. In other words, information about phases is not provided to the NN. Nevertheless, the NN spontaneously captures phase transitions by automatically detecting the internal energy or magnetization [16]. (iii) The third approach utilizes a transferability of NNs. A NN trained on one system surprisingly detects the transition point in other similar systems obtained by, e.g., changing a lattice topology or a form of interaction in the Ising model [6,23,11], a filling number in the Hubbard model [7], and a spin state number in the q-state Potts model [13,10,12].
Despite these successes, in many cases, NNs work as black boxes, and a systematic understanding of how NNs identify and extract features from complex objects (in our case, images of spin configurations) is a crucial ingredient for the universal application of NNs to various problems in the natural/social sciences. See Ref. [27] for an interesting argument in terms of some relation between renormalization groups and deep NNs.
In this letter, combining the second and third approaches described above, we examine whether a deep CNN (deep CNN) trained on the Ising model can identify phase transitions of the q-state Potts models. Moreover, analyzing relations between latent variables and physical quantities, we consider how the CNN detects the phase transition of the q-state Potts models. As mentioned above, the transferability of an Ising-trained NN to the Potts models has also been investigated in Refs. [13,10,12] by a binary classification or, equivalently, by providing prior information about phases. We would like to emphasize that our CNN model is not trained on the information about whether phases are ordered/disordered but is naively trained by Ising spin configurations labeled with the temperatures at which they are generated, and our Ising-trained CNN is easily applicable for detection of the transition point of the q-state Potts model. Though the physical properties of the Ising model and the q-state Potts model with q ≥ 3 are different especially at the transition temperature, the CNN can accurately predict the transition temperature, regardless of the type of transition.
2. q-state Potts model Let us summarize the q-state Potts model, concentrating on the phase transition. The model on an L × L square lattice is defined by where j, k denotes nearest-neighbor pairs. We shall exclusively consider the ferromagnetic model J > 0 and set J = k B = 1 (k B : the Boltzmann constant) for convenience. For q = 2, the model is equivalent to the Ising model. In the thermodynamic limit L → ∞, the model undergoes a second-order phase transition for q ≤ 4, while for q > 4 it undergoes a first-order transition [28] at the transition temperature T = T ∞ c (q): In particular, for q ≤ 4, each scaling behavior is classified into a different universality class: the critical properties for the q = 2, 3, 4 Potts models are characterized by conformal field theory (CFT) with the central charge c = 1/2, 4/5, 1, respectively [29].
Our purpose is to examine whether the Ising-trained CNN can detect the phase transition of the q-state (q ≥ 3) Potts model. The trained CNN, however, may only adapt to classifying the binary images: we must appropriately binarize the images of spin configurations of the q-state Potts model while loosing as few of the essential properties of the phase transition as possible. The simplest way is to divide the spin variables in the given configuration into two parts {0, 1, . . . , q/2 − 1} and { q/2 , . . . , q − 1} and replace them with {0} and {1}, respectively (see Fig. 1 for the q = 3 and q = 4 cases). Let us denote the resultant binarized configuration by {σ j } (σ j ∈ {0, 1}). Also, we define the internal energy E and the magnetization M for the transformed model as respectively. We stress that, under this transformation, the transition temperature and the transition type (i.e., first/second) are invariant in the thermodynamic limit. Furthermore, some geometric properties at criticality are still retained in the binarized models. For instance, the fractal dimensions d f of the cluster boundaries of the binarized models for q = 3 and q = 4 are, respectively, given by d f = 17/12 [30] and d f = 3/2 [31], which are consistent with the prediction of CFT. In Fig. 2, we depict binarized images at the transition points for various spin states.  the temperatures. More specifically, using the Wolff algorithm [32], we generate 6, 000 images on a 128 × 128 square lattice with free boundary conditions for each temperature ranging from T ∞ c,q=2 − 0.5 to T ∞ c,q=2 + 0.5 (T ∞ c,q=2 = 1/ log(1 + √ 2) 1.1346) in increments of 0.01. To eliminate boundary effects and also reduce the processing time, we actually use images of size 64 × 64, which are cropped from the center of the corresponding original images. (ii) Then, as input images, we prepare binarized images of spin configurations of the q-state Potts model in a method similar to that for the training images: generate the images of size 128 × 128 using the Wolff algorithm, binarize them as explained above, and then crop the images to 64 × 64 from the center of the original images. (iii) Finally, we input the binarized images to the trained CNN to output the predicted temperatures.
Our deep CNN model comprises 20 convolutional layers and 5 fully connected layers, as pictorially depicted in Fig. 3, and is designed to be somewhat deeper than an ordinary CNN so as to increase the accuracy of the output, the versatility, and the flexibility in learning. In each convolutional layer, an image is convolved with 3 × 3 filters, a stride of 1, and padding of 0 s. In each max pooling layer, the filter size and the stride, respectively, are set to 2 × 2 and 2. To prevent overfitting and vanishing of a gradient, batch normalization [33] is applied after the input layer and each ReLU (Rectified Linear Unit) activation. An identity activation function is used in the output layer. As a loss function and an optimizer, the mean squared error and Adam [34] are adopted, respectively. Our deep CNN model has been implemented using TensorFlow.

Results and Discussions
Now we discuss the results. In Fig. 4, we depict the relationship of the temperatures τ in q for the input images and the corresponding output temperatures τ out predicted by the CNN. Here, τ in q and τ out , respectively denote where T is the temperature at which input images are generated, T c,q is the transition temperature evaluated from the behavior of the internal energy of the original q-state Potts model without binarization (due to finite-size effects, T c,q deviates from T ∞ c,q (2) derived at the thermodynamic limit), T CNN denotes the average of the CNN outputs for the 6000 input samples generated at T , and T CNN c is the predicted critical temperature corresponding to the Ising spin configurations at T = T c,q=2 . See Table 1 for the detailed values.
For comparison, the result for the Ising model (q = 2) on which our CNN is trained is also depicted in the same figure. From this, one sees that our CNN is relatively well trained. The predicted temperatures for the input images generated at the same temperature, in general, depend on the model, except for the transition temperature T = T c,q (i.e., τ in q = 0). This result is intuitively consistent, because the physical quantities such as the internal energy and magnetization (3), which are considered to be quantities that the CNN uses to make predictions [16], generally depend on the model (see below for a more quantitative discussion). On the other hand, for the images generated at the transition temperatures T = T c,q , the CNN outputs almost the same predicted temperatures as T CNN c . This result indicates that the Ising-trained CNN can, surprisingly, detect the phase transition of the q-state Potts model, despite the fact that the transition type, physical quantities (e.g., E and M ), and geometric properties such as the fractal dimension of the spin-cluster boundaries, in general, are different from those of the Ising model, as shown in Table 1 (see also Fig. 2). The temperatures T * c,q of the input images from which the CNN outputs T CNN c are listed in Table 1. (In other words, Figure 4: The relation of the normalized temperatures τ in (4) at which the input images of the q-state Potts models are generated and the corresponding normalized temperatures τ out (4) predicted by the CNN. Each τ out is given by the mean value of the output data for the 6000 input images generated at the same temperature. The transition points correspond to τ in/out = 0, which is well predicted by the CNN. For more precise predicted values of the transition points, see Table 1.  Table 1: The transition temperatures of the q-state Potts models for various q. T * c,q , E and M , respectively, denote the temperature, internal energy, and magnetization of the image for which the CNN outputs T CNN c = 1.1209. That is, these are the quantities of the generated image that the CNN predicts to be at the transition point. T c,q is the transition temperature evaluated from the behavior of the internal energies for the q-state Potts model. T ∞ c,q is the transition temperature in the thermodynamic limit (2). Each datum contains a numerical error in the last two digits. The analytical values of the fractal dimension d f (q ≤ 4) in the thermodynamic limit are also listed.
T * c,q is the temperature of the generated image that the CNN predicts to be at the transition point.) One finds that the CNN precisely detects the transition point of the q-state Potts model, regardless of the type of transition.
Next, let us consider how the CNN predicts the temperature for the input image. In general, as pointed out in Ref. [16], the CNN is considered to detect the phase transition by the magnetization or internal energy. In Figs. 5(a) and (b), we depict the relation between the output temperatures of the CNN T CNN and the internal energies E, and the magnetization M (see Eq. (3) for their precise definitions). One sees that above the transition point i.e., T > T c,q , the CNN outputs almost the same temperature if the E of the input images are the same, which indicates that the CNN predicts the temperatures based mainly on E in the high-temperature region T > T c,q . Note that, in this region, the expectation value of M does not depend on T . Correspondingly, the predicted temperatures do not depend on M either, while, for the low-temperature region T < T c , the figure may indicate that the CNN outputs the temperature by both E and M . In fact, as explained below, a more detailed analysis using principal component analysis indicates that the CNN distinguishes between a system and that obtained by reversing all the spins. Therefore, in the low-temperature region, CNN makes predictions based on M (and possibly E as well). The dependence on E or M of the CNN output explains that the predicted temperature T CNN generally depends on the model, as shown in Fig. 4. However, in the vicinity of the transition point T = T * c,q , the assumption that the CNN predicts the temperature based only on E and M does not account for the fact that our CNN can accurately predict the transition temperature. This is for the following two reasons. First, the internal energies E (and possibly M because it appears to depend on the model) of the input images, which the CNN predicts to be at the transition points, are different for each model (see Table 1). Second, as depicted in Fig. 5 (c) for q = 3 and q = 10, distributions of predicted temperatures for input images with approximately the same E and M , are model dependent, which contradicts the assumption. In conclusion, the Ising-trained CNN detects the transition point of the q-state Potts model not only by the internal energies and magnetizations, but might detect it by more general properties such as complexities of spin clusters. It remains to be seen on which factors the CNN predicts the transition point.
To confirm the above results more concretely, we perform principal component analysis (PCA) on the last hidden layer consisting of 32 nodes in our CNN model (see Fig.3). PCA is a multivariate analysis technique that uses an orthogonal transformation to rotate a multidimensional dataset (in our case 32 dimensions) so that each component of the transformed dataset is uncorrelated, and the components (called the principal components) are ordered in accordance with decreasing values of the variance (i.e., the first principal component has the greatest variance). In Fig. 6 (a), we depict the E and M dependences of the first principal component. For the low-temperature region corresponding to E < −1.75 and |M | > 0.75 (see also Figs. 5(a) and (b)), the curves have finite gradients and almost agree with those for the Ising model. Further, the M dependences are single-valued functions, whereas the E dependences are double-valued functions: the upper (lower) ones correspond to positive (negative) magnetizations. These behaviors show that the CNN distinguishes the difference between images with positive magnetizations and those with negative magnetizations. Thus, in the low-temperature region, the CNN outputs the temperature based mainly on M and possibly E as well.
One finds that the first principal component does not contribute to the high-temperature region corresponding to E > −1. this region, we also depict the behaviors of the third principal component in Fig. 6 (b). In contrast to Fig. 6 (a), in the high-temperature region, the E dependences of the third principal component have finite gradients and are consistent with that of the Ising model, whereas the component does not depend on M in this region. Thus, in the high-temperature region, the CNN predicts the temperature mainly based on E. However, in the vicinity of the transition point corresponding to −1.5 < E < −1.7 and 0.25 < |M | < 0.75, the E dependences of both the first and third components explicitly depend on the spin state number q: on increasing q, the curves gradually shift to the right. Namely, the curves in this region depend on the model, and hence our CNN does not detect the transition by E. On the other hand, the M dependences in the vicinity of the transition point have finite gradients and are similar to the behavior of the curve for the Ising model. The CNN might detect the transition point by M or something related to that, but this is not particularly conclusive from these PCA analyses due to the large variance. As mentioned previously, further clarification of the properties used by our CNN to detect transition points is a future issue.

Conclusion
The Ising-trained deep CNN can precisely detect the phase transition of the q-state Potts model, regardless of the type of transition. Our CNN model has not been trained on information about phases but is naively trained only by Ising spin configurations labeled with temperatures. We find that, above the transition point, the deep CNN outputs the temperature mainly based on the internal energy, whereas, below the transition point, it outputs the temperature mainly based on the magnetization and possibly the internal energy. However, in the vicinity of the transition point, the CNN predicts the temperature not only by the internal energy or the magnetization, but it may detect the transition points by more global features. In view of the fact that NNs have been applied to detect more general types of phase transitions such as topological phase transitions [35,36,37,38,39,40,41,42,43,44,45], a fundamental and quantitative investigation of how NNs capture global features at transition points is highly desired.