Detecting composite orders in layered models via machine learning

Determining the phase diagram of systems consisting of smaller subsystems ‘connected’ via a tunable coupling is a challenging task relevant for a variety of physical settings. A general question is whether new phases, not present in the uncoupled limit, may arise. We use machine learning and a suitable quasidistance between different points of the phase diagram to study layered spin models, in which the spin variables constituting each of the uncoupled systems (to which we refer as layers) are coupled to each other via an interlayer coupling. In such systems, in general, composite order parameters involving spins of different layers may emerge as a consequence of the interlayer coupling. We focus on the layered Ising and Ashkin–Teller models as a paradigmatic case study, determining their phase diagram via the application of a machine learning algorithm to the Monte Carlo data. Remarkably our technique is able to correctly characterize all the system phases also in the case of hidden order parameters, i.e. order parameters whose expression in terms of the microscopic configurations would require additional preprocessing of the data fed to the algorithm. We correctly retrieve the three known phases of the Ashkin–Teller model with ferromagnetic couplings, including the phase described by a composite order parameter. For the bilayer and trilayer Ising models the phases we find are only the ferromagnetic and the paramagnetic ones. Within the approach we introduce, owing to the construction of convolutional neural networks, naturally suitable for layered image-like data with arbitrary number of layers, no preprocessing of the Monte Carlo data is needed, also with regard to its spatial structure. The physical meaning of our results is discussed and compared with analytical data, where available. Yet, the method can be used without any a priori knowledge of the phases one seeks to find and can be applied to other models and structures.


Introduction
Classification of observations into separate categories is certainly one of the most important applications of machine learning [1]. Successful examples range broadly from the detection of exotic particles in experimental high energy physics [2] through learning human actions in movies [3] to dermatologist-grade skin cancer classification [4]. The classification task is very often performed with artificial neural networks, capable of learning even highly complex and elusive patterns in the data, both detectable and invisible to humans. If multilayer image data is in question, convolutional neural networks (CNNs) perform exceptionally well, mimicking human vision by inferring from small portions of the image at a time.
Successful applications of CNNs outside the field of physics are extremely numerous, ranging from the detection of human faces at multiple angles or in partially visible images [5] to the ImageNet large-scale classification challenge [6], just to mention a few.
Translational invariance and adjustable size of the filters, which detect local correlations, make CNNs the ideal candidates for phase diagram reconstruction. The phase diagram is typically reconstructed from a large number of Monte Carlo (MC) snapshots. At first, research efforts revolved around supervised learning on the MC snapshots [7][8][9][10][11][12][13][14], later shifting to fully unsupervised learning on a chosen observable, such as non-local correlators whose behavior is modified by the presence of phase transitions [15]. A number of methods are capable of predicting phase transitions by extrapolation outside the parameter range they have been trained on [16][17][18][19][20][21].
Our goal is to use machine learning techniques to characterize the phase diagram of coupled spin models. We refer to them as layered spin models since it is natural to think of the coupling between two-dimensional spin models considering them separately, say in different layers and involving different spin variables, subsequently turning on the interlayer coupling. The simplest structure of this kind is a bilayer. It is clear that, depending on the form and the strength of the coupling, the phases of the uncoupled models can be altered and new phases, impossible without the interlayer coupling, may emerge. Therefore, it becomes important to devise a general approach that can detect the presence of phases induced by the interlayer coupling.
In this paper we introduce a CNN-based approach capable of fully unsupervised learning of phase diagrams with the network fed exclusively with raw MC snapshots without any a priori knowledge about relevant observables or order parameters. We note that complementary approaches to the unsupervised learning problem have been pursued using principal component analysis and support vector machines [22][23][24][25][26][27], deep autoencoders [28] or discriminative cooperative networks [29]. However, here we show that the task of fully-unsupervised phase diagram reconstruction can also be performed using CNNs, allowing one to apply to physical problems a number of techniques developed in the field of computer vision, a field in which CNNs represents the golden standard.
Our approach is applied to the reconstruction of the phase diagram of layered spin models. Our motivation for such an investigation is three-fold. On one side, when two or more models are coupled, new phases may emerge as a result of the presence and of the form of the coupling. Consider, for instance, two magnetic systems with a tunable coupling between each other and suppose that when the coupling is zero, each system separately undergoes a conventional ferromagnetic phase transition [30]. For finite coupling, on the other hand, the order parameter may involve, in the general case, some non-trivial combination of spins of both systems. Let us consider a specific example, i.e. the Ashkin-Teller model, consisting of two square-lattice Ising models with spin variables σ and τ coupled via a term of the form σστ τ [31,32]. When the interlayer coupling between the variables σ and τ is zero, the phase diagram of the model is characterized only by the order parameters σ and τ . On the other hand, when the interlayer coupling is large enough with respect to the intralayer term, a new non-trivial phase with a composite order parameter στ emerges, even when all couplings are ferromagnetic. Further examples of the occurrence of novel order due to the coupling between different layers include the so-called 'metallic superfluid' phase [33,34], as well as the recently-reported BKT-paired phase in two coupled two-dimensional XY models [35]. At last, let us consider again two square-lattice Ising models with spin variables σ and τ , now coupled via a term of the form στ : is the phase with composite order parameter στ present or not? As discussed in literature for the bilayer configurations and reviewed below, we expect that such phase should not exist. Since the phase diagram of the 2D Ashkin-Teller model and of some its variations can be determined analytically [32,36], and similarly the Ising model is a classical workhorse of statistical mechanics [30,37], they provide an ideal benchmark to look for composite order parameters in an unsupervised way. One could ask whether and what new composite order parameters emerge in multilayer configurations, such as the trilayer one. Although in the two-variable (or, in our language, bilayer) Ashkin-Teller model the composite order parameter can be easily recognized, a more complex spin model with several layers, with both short-and long-range interlayer couplings, could be much more challenging to be addressed with simple physical considerations. Many, possibly competing, composite order parameters may be present and determining the one which actually breaks the symmetry and generates a novel phase is a highly non-trivial task. From this point a view, an unsupervised approach able to correctly reproduce the phase diagram of layered models, regardless of the nature of underlying order parameters, is highly desirable.
Our second motivation is that layered models emerge in a wide range of physical situations. Among them, the bilayer structure in which two two-dimensional systems are coupled has been studied in a number of cases, ranging from graphene [38] to ultracold dipolar gases [39]. Another major example is provided by layered superconductors, that can occur naturally or be artificially created. Among the former class, of primary importance are compounds of transition-metal dichalcogenides layers intercalated with organic molecules [40] and cuprates [41]. Examples of artificial structures are alternating layers of graphite and alkali metals [42] or samples with layers of different metals [43]. Neutral layered superfluids can be engineered with quantum gases by using a deep optical lattice in one spatial direction with ultracold fermions [44] or bosons [45]. It is therefore important to develop general approaches capable of dealing with coupled interacting systems. In particular, given the importance of layered physical systems and their ubiquitous presence in a variety of contexts, one may think for instance of layered superconductors, a general approach to individuate their phase diagram-once that one is able to study the uncoupled counterpart-would provide an important tool of investigation.
Finally, our last motivation is purely methodological and inherent to machine learning. Indeed, in layered models one has a certain degree of arbitrarity in the way the MC data to be analyzed are fed to the neural networks, e.g. one can provide the data in each layer separately, or retaining their spatial structure such as columns and ordering them correspondingly. As an example, in the Ashkin-Teller model one can provide numerical algorithms either with all the σ i 's and then all the τ i 's, or the pairs (σ i , τ i ) according to the index i labeling the position of the spins in the layers. This arbitrarity also reflects itself in the fact that an order parameter which can be clearly identified with a choice can be non-trivial, or 'hidden', with another choice. A special class of hidden order parameters are composite order parameters, i.e. parameters defined across multiple layers of a layered system. To use again the Ashkin-Teller model as an example, the order parameter στ is immediately identified when the choice of the pairs (σ i , τ i ) is done, but not when the data is provided layer by layer. Therefore a natural question is how to identify phase transitions in coupled or layered models driven by order parameters which may be hidden by the codification of the data to be provided to the machine learning algorithm.

Machine learning phase transitions in classical spin models
Let us consider a general case of a spin system whose Hamiltonian is defined by two parameters, J and K. We aim to devise a procedure to depict the phase diagram in the K-J plane. To this extent we discretize a portion of the K-J plane on a grid with steps ΔJ and ΔK. For each point on the grid we generate a number of uncorrelated MC snapshots using standard algorithms [46][47][48]. Unless otherwise specified we shall work on a 32 × 32 × N l square lattice, N l being the number of layers to be specified later, and we shall generate a number of 600 snapshots for each point in the phase diagram. Periodic boundary conditions are used on each layer throughout all the simulations.
The training of the CNN-see appendix A for the details of the architecture-attempts at learning to distinguish snapshots belonging to the two different points, (J 1 , K 1 ) and (J 2 , K 2 ), in the phase diagram. In order to carry out this plan, at first, we divide the data in a standard way, taking a fraction of the snapshots-the 80%-from each of the two points in the phase diagram as training set, while keeping the remaining 20% as validation set. Then, we train the network on the training data and quantify the classification accuracy on the validation set. Given a successful training, one of the two neurons on the output layer will yield a higher value for an input configuration belonging to the validation sample at the point (J 1 , K 1 ), while a higher value of the second neuron will correspond to a configuration sample at the point (J 2 , K 2 ). Accordingly, the accuracy ϕ of the training can be quantified by the ratio of correctly labeled samples N s over the total number of configurations N tot in the validation set [15] Intuitively, when the training fails, the two points present nearly identical features, thus belonging to the same phase, and the accuracy of the training will be low, ϕ ≈ 0.5. On the other hand, if it succeeds, the two points should belong to two different phases and the fraction ϕ of correctly labeled examples from the validation set will be large, ϕ ≈ 1. It is worth to stress that the labels attributed from the network to any MC configuration after the training do not correspond to the physical phases present in the system. Indeed, the algorithm is unsupervised and these are unknown. Each snapshot is just labeled to mark whether it originates from an MC simulation with couplings (J 1 , K 1 ) or (J 2 , K 2 ). In particular, regardless of the number of phases in the system-even this information is unknown to the algorithm-there will be exactly two labels here. Based on the discussion above, the variable ϕ, which quantifies the portion of the validation set correctly labeled by the network, represents the accuracy of the algorithm in distinguishing the two different points (J 1 , K 1 ) and (J 2 , K 2 ) of the phase diagram. Then, we can introduce the following quasidistance 9 between the two phase diagram points (J 1 , K 1 ) and (J 2 , K 2 ): where Θ(x) is the heavyside step function, preventing d from assuming negative values. Perfect discrimination ϕ = 1 corresponds to d = 1, while perfect confusion ϕ = 0.5 corresponds to d = 0. We feed the raw MC snapshots directly to the CNN, with spin down encoded as 0 and spin up encoded as 1, no preprocessing applied. The network architecture is optimized for the task of classifying two phases: after convolutional and fully connected layers the final layer consists of two softmax output neurons outputting the labels. The convolutional filters span both layers, which is the feature enabling the network to learn composite order parameters. Hence, both layers are simultaneously fed into the network. Further technical details on the network architecture and training can be found in the appendix.
At last, we make use of the distances defined in equation (2) to construct a field u(J, K) defined on the phase diagram through its finite-difference lattice gradient Clearly ∇u will be constant in regions of the phase diagram belonging to the same phase, since we expect that the difficulty of telling first neighbors apart should be uniformly quite high. On the other hand, we expect the value of ∇u to abruptly change in the vicinity of a phase transition, suggesting that the phase diagram should be naturally characterized by looking at the finite-difference lattice Laplacian with the n = 2, n = 3 and n = 4 cases corresponding to a 5-point, 9-point or 13-point stencil, respectively. The stencil includes (n − 1) nearest neighbors in the J and K directions. We stress that the summations can be rearranged so that they involve only differences of the u field evaluated between first, second and third neighbors, that can in turn be expressed in terms of the quasidistance d. From the discussion above, it is clear that a sudden rise in the value of ∇ 2 u means that the CNN can distinguish with increased precision arbitrarily close points in the phase diagram, thus signaling a phase transition. We anticipate that including high-order finite-differences besides the obvious 5-point stencil taking into account first-neighbors stencil considerably increases the quality of the reconstructed phase diagram. This point will be analyzed in detail later. Moreover, using the stencil as opposed to always just comparing two neighboring points of the phase diagram immunizes the algorithm in the case of very dense grid. In such a case, it would be progressively difficult to find neighboring points belonging to different phases. With our approach, we are assured that using a large enough stencil will circumvent this problem for any grid density. Following the previous discussion, the quasidistance d((J 1 , K 1 ), (J 2 , K 2 )), which enters the definition in equation (2), quantifies the difference between two points in the phase diagram. Then, the field u(J, K) encodes overall information over the similarity between the distribution functions, which generated the configurations at the point (J, K) and its neighbors. As such, the introduction of the quasidistance d presents some analogies with the fidelity approach used to detect quantum phase transitions [49]. In the latter, the fidelity quantifies the similarity between two different (but close in the phase diagram) ground-state wavefunctions and its derivative abruptly changes in correspondence of a quantum phase transition. Hence, for the classical case we are considering, the quasidistance d provides a counterpart of the fidelity. Consequently its derivative, connected to the field u(J, K) via equation (3), provides a counterpart of the derivative of the fidelity.
Calculation of ∇ 2 u(J, K) for the entire phase diagram is by far the most time-consuming step of the algorithm. Using N nearest-neighbours, i.e. [4(N − 1) + 1]-point stencil, it requires M · 4(N − 1) calculations of the quasidistance. There, is the total number of discretized (J, K) pairs in the phase diagram. We now briefly comment more on the physical relevance of the u field. A second-order phase transition, within the scheme we just described, is identified through the non-analytical behavior-in the continuous limit-of ∇ 2 u. The concept we introduce is general: ∇u correspondes to a similarity measure between different points in the parameter space and ∇ 2 u, then, measures an abrupt change in this similarity. Different models might lead the CNN to establish the similarity measure in different ways. The generality of our approach also means that it can, in principle, recognise first-order and continuous phase transitions equally well.
In conclusion of the present section, we compare our scheme with other related approaches. As opposed to other machine learning schemes, in the present work we do not need the evaluation of any observable quantity to establish a distance [15], rather directly relying on the MC snapshots. Moreover, as opposed to other approaches [15] the scheme we introduce in this paper fully takes advantage of the two-dimensional nature of a two-parameter phase diagram, as the local information is reconstructed by taking into account neighbors in all directions. Extensions to three-or higher-dimensional phase diagrams are straightforward [50]. Finally, our approach requires only the evaluation of a fixed number of neighbors for each point in the phase diagram, ensuring that the computational effort required for training scales linearly with the number of points in the discretized phase diagram.

Multilayer Ising models
We now use the framework described in the previous section to characterize the phase diagram of different coupled spin models.
Let us start from a bilayer Ising system, described by the following Hamiltonian with a quadratic coupling term (sometimes referred to as the Yukawa coupling): where σ i , τ i = ±1 are Ising variables on two-dimensional square lattices, whose sites are denoted by the indices i, j. The sums in equation (6) are over nearest-neighbor sites. When K = 0, the system reduces to two uncoupled Ising models, having a phase transition at the Onsager critical point (βJ) c = ln(1 + √ 2)/2 [37,51], β being the inverse temperature. This critical point is shifted by the presence of a finite interlayer coupling K. The resulting Ising critical line separating the paramagnetic and ferromagnetic phases as a function of K has been studied in the literature [52][53][54]. It is clear that the bilayer system (6) is the classical counterpart of two coupled quantum Ising chains in a transverse field, a system that has been studied both in relation to its spectrum, phase transitions and possibility to determine an integrable line in the space of parameters [55][56][57][58]. The classical bilayer system and the quantum coupled chains can be also related to each other by an exact mapping.
From our point of view, the model described by equation (6) is an excellent starting point for our investigations, especially in order to check the existence of a composite order parameter and its relation to the phase diagram. It is now natural to parametrize the phase diagram in term of the dimensionless combinations βJ and βK, discretizing it for values of βJ ∈ [0, 0.5] and βK ∈ [0, 1.4], with discretization steps ΔβJ = ΔβK = 0.01. We then apply the phase diagram reconstruction procedure described in the Reconstructed phase diagram for the square-lattice Ising trilayer model, showing a phase transition between an unordered, high-temperature phase (U) to an ordered, low-temperature phase (O). Note that as the interlayer interaction βK is increased, the critical temperature decreases from the analytic limit (βJ) c = ln(1 + √ 2)/2 ≈ 0.44, marked by a red solid diamond, to the strong-interlayer-coupling limit βK → ∞ where (βJ) c = (βJ) c /3, marked by a red, dashed line. The hollow red diamonds and the dash-dotted line are obtained from the susceptibility peak in the raw MC data using a conventional approach and show the transition point across the whole phase diagram, while also accounting for the finite size of the system.

Figure 3.
Reconstructed phase diagram for the square-lattice Ashkin-Teller model. Our approach identifies three phases, in agreement with the theory of the Ashkin-Teller model. The red, blue yellow and green solid diamonds show analytically-determined phase transition points at specific limits, see main text, whereas the hollow cyan diamonds show the transition point across the whole phase diagram, also accounting for finite-size scaling, and are obtained by analyzing the susceptibility peak in the raw MC data using a conventional approach. The insets show representative configurations of the σ, τ spins and of the 'composite' spin στ , for each phase: note that the transition between phase II and phase III does not correspond to any apparent difference in the σ and τ layers that we feed to the CNN. We stress that the στ 'composite' variable, marked in red, is not fed to the CNN. previous section to precisely determine the phase boundaries in the βK-βJ phase diagram, shown in figure 1(c). According to analytical results for an infinite square lattice, the phase transition in the uncoupled βK = 0 case should occur at (βJ) c ≈ 0.44; a solid diamond marks this result in figure 1(c). Hollow red diamonds in the same phase diagram are derived using a conventional approach from the susceptibility peak in the raw MC data, therefore marking the critical point also accounting for the finite size (32 × 32 × 2) of the system, across the whole phase diagram. Both the machine-learning reconstruction of the phase diagram and the hollow red diamonds show that the critical temperature gradually decreases to the strong-coupling critical temperature (βJ) c = (βJ) c /2. This result is marked by a red dashed line. The width of the peak is essentially, again, due to the the finite-size of the lattice used for MC simulations, whose snapshots we feed to the neural network. The errors of our method on the determination of transition points are discussed in appendix B. The result is that it appears that only two phases are found, with order parameter σ = τ . From our treatment of data we cannot determine the behavior of the order parameter inside the two phases, whose study would be an interesting future continuation of the present results.
Next, we consider a trilayer system, whose Hamiltonian is a natural extension of the one of equation (6): and the new variable υ i is also an Ising spin. This is the first non-trivial example, and of course representative of properties of the multilayer Ising model with Yukawa coupling. The central natural question is whether a composite order parameter emerges. Moreover the model of equation (7) is interesting since it paves the way to the investigation of the N layers case, which shall be trivial with the method presented here. Indeed the N layer case may serve to investigate how the (three-dimensional) limit of infinite layers is retrieved, an issue in the context of layered models, see e.g. reference [59].
The investigation of the model described by equation (7) follows the same line as the one of the bilayer case, we are able to reconstruct the phase diagram as shown in figure 2. In the βK = 0 case, again, one recovers the critical temperature of the square-lattice Ising model, marked by a solid red diamond. Then, as βK is increased, one recovers the strong-interlayer-coupling critical temperature that in this case is (βJ) c = (βJ) c /3, marked by a red dashed line. As in the bilayer case, hollow red diamonds mark the transition point, as derived from the susceptibility peak in the raw MC data, using a conventional approach. The main result exhibited in figure 2 is that no composite order parameter appears even for the trilayer case. Therefore, our technique has been able to correctly recover the phase diagram of the bilayer Ising model, where we do not expect any additional order to appear [60,61], while it also predicts the same picture for the trilayer case, for which no previous expectation exist up to our knowledge. The generalization to the N-layer case shall be straightforward, but more numerically demanding, while based on the present results no additional phases are expected to appear. Therefore, in the following we are going to investigate a different case where a composite order parameter appears by construction.

Reconstructing composite order parameters: the Ashkin-Teller model
We now turn to the square-lattice Ashkin-Teller model, described by the following Hamiltonian with σ i , τ i = ±1. Compared to Hamiltonians (6) and (7) one sees that the coupling is now quartic in spins.
Since in the Ising model there are only two scaling fields relevant in renormalization group sense [30,37], the magnetization and the energy, one sees that in the models (6) and (8) one has basically the two natural ways of having respectively magnetization-magnetization and energy-energy couplings, higher order coupling terms being irrelevant. The Ashkin-Teller model is also related to the four state planar Potts model, and several variations of it, also in three dimensions, have been investigated [62]. The Ashkin-Teller model features a rich phase diagram, and remarkably in two dimensions can be studied analytically [32,36]. Here we consider the case of ferromagnetic couplings, J, K 0. It is known that three different phases exist [32]. Besides an ordered phase, denoted by I, characterized by σ = 0 = τ and a disordered phase, II, characterized by σ = τ = 0 one also finds the peculiar phase III in which the single spins σ and τ are disordered, whereas a composite order parameter given by their product is ferromagnetically ordered, i.e. στ = 0.
Whereas the previous investigation of Ising-like models makes us confident that the ML procedure we have introduced is able to correctly characterize the transition between phase I and phase II, it is not a priori clear that phase III can be correctly identified. As shown in the small inset of figure 3, MC snapshots show disordered spins both in phase II and in phase III, the transition being determined by the στ composite variable, that we do not directly feed to the CNN. In order to learn the existence of the II-III phase transition the CNN must learn to reconstruct the composite order parameter. We find that our framework successfully performs this task, owing to the convolutional filters which are convolved in 2D spanning across the layers and are able to learn even elusive interlayer correlations.
The reconstructed phase diagram of figure 3 shows that indeed our approach is able to correctly learn the phase transitions in the ferromagnetic Ashkin-Teller model. Whereas the transition line corresponding to the magnetization of σ and τ , as separated variables, corresponds to a prominent peak, whose width is essentially determined by finite-size effects, the line corresponding to the magnetization of the composite στ order parameter corresponds to a smaller peak, displaying that the characterization of this transition line is more demanding to the CNN, but still possible.
We can compare the phase diagram we obtain with some exact results. In the K → 0 the model reduces to a square-lattice Ising model with coupling constant J, with critical temperature (βJ) c = ln(1 + √ 2)/2 ≈ 0.44 [37,51], whereas in the K → ∞ limit the model reduces to a square-lattice Ising model with coupling constant 2J and critical temperature (βJ) c = ln(1 + √ 2)/4 ≈ 0.22. Finally for J = 0 the system again undergoes an Ising-like phase transition for the composite order parameter, at (βK) c = ln(1 + √ 2)/2 ≈ 0.44. These three points are marked by a red, green and blue solid diamond, respectively, in the phase diagram of figure 3, showing a very good agreement between the analytical results and the reconstructed phase diagram, even in the latter case when the composite order parameter στ drives the transition. These three diamonds, showing the results of analytic calculations, are complemented by hollow cyan diamonds, marking the critical point as calculated using a conventional approach from the susceptibility peak in the raw MC data across the whole phase diagram, thus also accounting for the finite size of the system. Finally, the yellow solid diamond marks the bifurcation point as determined analytically in reference [32]; we attribute the difference with respect to the bifurcation point in our reconstructed phase diagram to finite size effects. We also mention that the critical lines separating the different phases are retrieved with a precision up to ∼20-30%, except for vanishing βJ. Again we attribute this to finite size effects; proceeding as extensively discussed in the literature [1] one could obtain a quantitative agreement on the location of the critical lines. Here, our emphasis is on the possibility of retrieving the phases with composite order parameters and to ascertain their existence, as we also did for the trilayer Ising model.

Scaling properties and robustness of the approach
Our results show that with the network and learning parameters that we used we were able to obtain a phase diagram of quality high enough to visually identify different phases. In addition, in this section we characterize our method by quantifying signal to noise ratio (SNR) and studying its behavior when essential parameters are changed. We define the SNR as SNR ≡ log 10 x i being the values of the ∇ 2 u field of equation (4), the summation extending over a region containing N values, ν being the 'noise', i.e. the average value of ∇ 2 u in a subset of the region far away from a phase transition. We evaluate the SNR for the Ising bilayer on a strip centered on βK = 1.1, exhibiting a sharp phase transition at βJ ≈ 0.26 as clear from figure 1. At first, we vary the number of training epochs, observing that the SNR is rapidly increasing before reaching a maximum value at around 5 epochs of training. This indicates that further training brings no benefit while providing a risk of overfitting, justifying our early-stopping approach. Secondly, we vary the number of samples in the training set, showing a rapid increase in the SNR before reaching a plateau at about 400 samples, justifying our choice of using a slightly larger number (600) of samples in the training set. Lastly, we vary the number of convolutional filters in the CNN. Again, the general upwards trend shows that a larger number of convolutional filters helps in enhancing the quality of the reconstructed phase diagrams. However, we stress that in this latter case the SNR is quite high in the whole parameter region we consider. The lowest number of convolutional filters we consider (3) is already enough for achieving a good reconstruction of the phase diagram and a large SNR value. These analyses are shown in figure 4. We have also analyzed how the reconstructed phase transition is affected by the dimension of the stencil in equation (4). Using a 5-point, 9-point or 13-point stencil we have obtained SNR values of −1.36 dB, 0.38 dB and 3.88 dB, respectively. This confirms that the approach we are introducing takes indeed great advantage from the two-dimensional structure of the phase diagram, and information from second-and third-nearest neighbors is being used to sharply characterize the phase transition.

Conclusions
As shown for layered spin models such as the multilayer Ising and Ashkin-Teller models, our work demonstrates that ML approaches are able to learn the order parameter driving a phase transition in layered models, also when this parameter is not immediately apparent from the snapshots without preprocessing. This is directly possible, introducing a suitable quasidistance between different points of the phase diagram, thanks to the convolutional filters which are, without any a priori knowledge, capable of learning even involved algebraic operations that uncover the order parameters from the data. A brief discussion of the analogies of the introduced quasidistance with the fidelity approach used to detect quantum phase transitions is presented. Our work paves the way to the use of ML approaches to investigate the properties of systems of increasing complexity and to characterizing phases of matter described by multiple, possibly non-local order parameters, as the convolution operation can naturally construct features from non-neighbouring spin variables. More generally, the universal approximation theorem [63] ensures that a neural network can, at least in principle, learn to recognize arbitrarily complex order parameters. In particular it would be very interesting to study the multilayer Ising model with an increasing number of layers, the three-dimensional Ashkin-Teller and the trilayer Ahkin-Teller in two dimensions, which can be studied with the techniques introduced in the paper. Non-local couplings among the layers could be added, which would lead to non-local, more composite, operators. These results should be compared with the identification of hidden order done using non-ML techniques [64]. Also, the present approach may be used for other cases in which the identification of the order parameters is not straightforward [65][66][67]. Even if our approach has been devised to deal with coupled spin models and can applied to different geometrical configurations, it is not clear a priori that it would succeed in other more complicated cases of coupled interacting systems, such as multilayer configurations of interacting bosons and fermions or bilayer quantum Hall systems. Of course, in order to study generically coupled models one needs to have an efficient algorithm to simulate the uncoupled systems. Nevertheless, we think the present work provides a methodological basis, highlighting the effect of interlayer coupling on the macroscopic properties and phases of coupled systems.
Naturally, the approach we introduced could also be extended in the future to characterize quantum models, or classical spin models with competition between short-and long-range interactions [68][69][70], or more involved spin models such as the XY model [14,[71][72][73][74][75][76][77][78][79], discretising the continuous degrees of freedom [74]. We expect that by an appropriate choice of the sizes and strides of the filter in the convolutional layer one could characterize antiferromagnetic order parameters, non-local order parameters and exotic order parameters, such as nematic and smectic phases. In this context, current experiments on fermionic dipolar atoms [80,81] promise to open a new window in the physics of competing long-range and short-range interactions [82], clearing the path for the comprehension of modulated phases in strongly interacting quantum systems.
The presence of spatially ordered structures is a leitmotiv for long-range and layered systems such as ultra-thin magnetic films [83][84][85], iron-based superconductors and cuprates [86,87]. The pattern structure normally depends on several experimental conditions and it produces a particularly rich phase diagram. Most of the common features occurring in stripe forming systems and modulated phases remain obscure due to the challenges posed by the complicated order parameters, which occur in these cases [88][89][90][91]. The ML technique introduced in the present paper may serve as an essential prove to finally uncover the complexity of such phases. Our results pave the way for fully automated study of phase diagrams of more general and complicated spin systems. An exciting open problem lying in the realm of so-called explainable artificial intelligence [92] is whether machine learning techniques could not only learn to separate phases differing by a 'hidden' order parameter, but also identify that parameter. Another natural development of the present work is to use our fully-unsupervised technique to learn directly from experimental data [93][94][95]. Finally, it would be interesting to extend the results presented in this paper according to the variational procedure discussed in reference [96]. accuracy of our method, we analyzed the natural width associated to the phase transition in the Ashkin-Teller model for βK = 0.7, determining it by looking at the peak of magnetic susceptibility directly from MC simulations, and determining its width through the full width at half maximum (FWHM). We compare it with the FWHM of the Laplacian peak we reconstruct from our machine learning approach. The results are shown in figure B1; the FWHM of both the magnetic susceptibility (red squares, the red dashed line guides the eyes) and machine learning Laplacian (blue squares, the blue dashed line guides the eye) obey the same ∝1/L scaling with respect to the lattice size L. The constant offset between the two datasets can be understood, as anticipated, as the additional error introduced of our method, due to the discretization of the parameter space, and due to some intrinsic uncertainty associated to the machine learning procedure.