InfoCGAN Classification of 2-Dimensional Square Ising Configurations

An infoCGAN neural network is trained on 2-dimensional square Ising configurations conditioned on the external applied magnetic field and the temperature. The network is composed of two main sub-networks. The network learns to generate convincing Ising configurations as well as discriminate between ``real'' and ``fake'' configurations with an additional categorical assignment prediction. The predicted categorical assignments show very strong agreement with the expected physical phases in the Ising model, the ferromagnetic spin-up and spin down phases as well as the paramagnetic phase. Additionally, configurations associated with the crossover region in the non-vanishing field case are predicted by the model. The classification probabilities allow for a robust method of estimating the critical temperature in the vanishing field case, showing exceptional agreement with the known physics. This work indicates that an adversarial neural network approach can be used efficiently to identify physical phases with no a priori information beyond raw physical configurations and the physical conditions they are subject to.


I. INTRODUCTION
Recent advances in the field of machine learning has provided many opportunities for application in the sciences. Fundamentally, the machine learning approach is motivated by the use of pattern recognition and statistical inference in a manner capable of isolating statistically significant features contained within input data to produce meaningful predictions. This is of course in contrast to more traditional methods requiring explicit sets of instructions to elicit meaningful predictions from input data. For many systems, such an explicit set of instructions is not necessarily known, which renders the machine learning approach attractive for addressing such systems. Generally speaking, a machine learning approach using a feed-forward artificial neural network involves tuning a computational graph in a manner such that the weights of the edges in the graph are modified by numerically solving the saddle point optimization problem to extract desired predictions from input information.
In this work, focus will be placed on the 2-dimensional square Ising model, an oft-employed model for describing magnetic phenomena in the field of statistical physics. The Ising system itself is expressed in the form of a 2dimensional array of spins σ i indexed by i that represent a discrete arrangement of dipole moments of atomic spins. 1 The spins are restricted to spin-up or spin-down alignments represented as ±1 such that σ i ∈ {−1, +1}. The spins are allowed to interact with their nearest neighbors according to the pair-wise interaction strength J ij for neighbors σ i and σ j . Additionally, each spin σ i will interact with an external magnetic field H i (where the magnetic dipole moment µ has been absorbed into the magnetic field H i ) at that lattice site. The full Hamiltonian describing this system is thus expressed as Where i, j indicated adjacent index pairs i and j. For J ij > 0, the interaction between spins i and j is ferromagnetic, for J ij < 0, the interaction between spins i and j is antiferromagnetic, and for J ij = 0, the spins i and j are necessarily non-interacting. Furthermore, for H i > 0, the spin at lattice site i tends to prefer a spin-up alignment, with the strength of the preference determined by the strength of the field. By contrast, for H i < 0, the spin at lattice site i tends to prefer a spin-down alignment. For H i = 0, there is no external magnetic field influence on lattice site i. Typically, the Ising model is solved for the case J ij = J and H i = H. This is the expression that will be used in this work.
The vanishing field case H = 0 is of particular interest as under these conditions, the Ising model for dimension d ≥ 2 exhibits a second-order phase transition characterized by a critical temperature, T C . The 2D case was originally solved analytically by Onsager in 1944. 2 At low temperatures below the critical point, the system is dominated by nearest-neighbor interactions, which for a ferromagnetic system means that adjacent spins tend to align with one another. However, as the temperature is increased, thermal fluctuations will eventually overpower the interactions such that the magnetic ordering is destroyed and the orientations of adjacent spins can be taken to be uncorrelated. This is referred to as a paramagnet. However, if an external magnetic field is applied to a paramagnet, it will tend to respond to it and tend to align with it, albeit for sufficiently high temperatures, a sufficiently strong external field will be required to overcome the thermal fluctuations. The ferromagnetic and paramagnetic phases are discriminated by an order parameter, which takes the form of the magnetization for the ferromagnetic Ising model. Oftentimes, the average magnetization is used to fulfill this role and for a system composed of N lattice sites is expressed as arXiv:2005.01682v2 [cond-mat.stat-mech] 5 May 2020 In the vanishing field case for a system in the thermodynamic limit (N → ∞, V → ∞), the average magnetization abruptly vanishes at the critical temperature. The magnetic susceptibility χ additionally diverges at the critical temperature as well as the specific heat capacity C.
Thus, the system can be said to be paramagnetic for a vanishing average magnetization and ferromagnetic for a finite average magnetization (with the system categorized as ferromagnetic spin-up or spin-down according to parity). However, in the presence of an non-vanishing external magnetic field, parity is destroyed by the inclusion of an alignment preference. As a result, a first-order phase transition is observed as a discontinuous change in the average magnetization of the system is induced by varying the strength of the external magnetic field from negative to positive values (or vice versa) at low temperatures (T < T C ). Furthermore, there is no longer a second-order phase transition as the average magnetization no longer abruptly vanishes at a critical temperature. Rather, there is a region in which the system tends from an ordered to a disordered state. This region is referred to as the crossover region.   Examples of the types of spin configurations that can be found in the vanishing field case are depicted in Fig.  1. Below the critical temperature, nearest neighbor interactions dominate the system, causing the spins to align with one another in either the spin-up or spin-down configuration. The behavior of the average magnetization as a function of temperature is additionally depicted for the vanishing field case and the non-vanishing field case in Fig. 2 and Fig. 3 respectively. For the vanishing field case, the average magnetization can be seen to abruptly vanish in turn in the vicinity of the critical temperature. By contrast, the average magnetization tends to smoothly decay to a small value in the non-vanishing field case, with parity determined by the external magnetic field.
Generally speaking, machine learning can be thought of as a framework for utilizing pattern recognition and statistical inference to build a model capable of isolating statistically significant features in input information to produce meaningful output in the lieu of an explicit set of instructions to perform the same task, which is useful in situations where the meaningful features are not known a priori. Identifying statistically significant features in physical systems as criteria for phase transitions is a known approach for many systems, such as with the widely-adopted Lindemann parameter, which is a measure of the deviations of atomic positions in the system from equilibrium positions often used to characterize the melting of a crystal structure. 3 For sufficiently complex systems, the identification of such a criterion can be inaccessible through traditional means, which opens an opportunity for machine learning approaches to accomplish such a task. This concept of hidden orderings have been proposed for some systems. [4][5][6] Although inference methods, such as the maximum likelihood method and the maximum entropy method 7,8 have been routinely applied on certain physical problems, applications which utilize other machine learning methods have not attracted much attention until recently.
A lot of work has been done in recent times for detecting phase transitions and performing structure classifications in various physical systems using machine learning. [9][10][11][12][13] Additionally, some exciting work has been done to investigate the capacity for machine learning methods to accelerate physical simulations. 14 For the Ising system specifically, there have been many such efforts. Boltzmann machines (BMs), deep belief networks (DBNs), and deep restricted Boltzmann machines (RBMs) have proven to be effective at generating Ising configurations with accurate thermodynamic properties. 15,16 Interestingly, it has been shown that there is an exact mapping between the variational renormalization group (RG) and the RBM, perhaps implying that machine learning approaches may employ a generalized RG-like scheme to extract relevant features from input data. 17 Variational autoencoders (VAEs) have also been used to successfully extract the Ising order parameter in the latent representation in the vanishing field case. 18,19 Additionally, it has been shown that a classification neural network is indeed capable of providing an accurate estimation of the Ising critical point in the vanishing field case. 20,21 Accurate classification can even be obtained with a rather small network and the order parameter can be even be extracted from the decision function of the classification network, albeit through the use of supervised learning utilizing a priori categorization labels.
The purpose of this work is to explore the capability of an information maximizing conditional generative adversarial network (InfoCGAN) to discriminate between the different classes of microstates exhibited by the 2-dimensional square Ising model, including the configurations characterizing the crossover region in addition to the ferromagnetic and paramagnetic microstates in an unsupervised manner. 22 The infoGAN class of artificial neural networks belong to the larger group of generative adversarial networks (GAN) and is thus guided by the same principles. 23,24 The network is composed of two primary components, a generator network and a discriminator network. 24 In this framework, the networks are interpreted to be contesting with one other in an adversarial game (albeit not always a zero-sum game) to solve the saddle point optimization problem. The role of the discriminator network is to transform an input configuration into a measure of the probability that it belongs to the same statistical distribution as the training configurations and the role of the generator network is to transform an input Gaussian noise vector into a configuration that belongs to the same statistical distribution as the training configurations. The output of the generator network is thus of the same form as the training configurations and the output of the discriminator network is a single value belonging to the interval v ∈ [0, 1], interpreted as the probability of the input configuration belonging to the same statistical distribution as the training configurations and referred to as the validity.
The procedure for training a GAN is done by first optimizing the weights of the discriminator network to predict a batch of the training configurations as "real" (an output of one) and a batch of configurations obtained from the generator network as "fake" (an output of zero). Then, the generator network weights are optimized such that its output for a single batch of random Gaussian latent variables is determined to be "real" by the discriminator network (with the discriminator weights held constant during this step). This procedure is repeated for each batch in the set of training configurations to constitute a training epoch. In this manner, the generator and the discriminator networks "learn" from each other, as the discriminator network is optimized to detect differences between the "real" training configurations and the "fake" generated configurations while the generator network is optimized to respond to information provided by updates to the discriminator network to generate more convincing "fake" configurations that the discriminator network identifies as "real." For the implementation used in this work, the minmax GAN (or MM GAN), the adversarial contest between the discriminator and generator networks is indeed a zero-sum game. As with all artificial neural networks, the tuning of the weights in the networks is done by performing backpropagation (implemented with a chosen optimizer) to solve the saddle point optimization problem by minimizing an objective loss function. The ideal outcome of the training process for an MM GAN is then to produce a stable Nash equilibrium between the discriminator and generator networks in which the state of the objective loss function ensures that one of the networks will not change its behavior regardless of what its opponent may do. For the specific case of the MM GAN, the cost function takes the following form Where x represents a training configuration, z represents latent input to the generator network, P (x) represents the distribution of the training configurations, P (z) represents the distribution of the latent input to the generator network, D represents the discriminator network, and G represents the generator network. This can be reinterpreted as objective loss functions for the respective networks in the following manner Where nowx has been adopted to represent a generated sample drawn from the distribution P (x) provided by the generator network. Training a GAN is notoriously difficult as the model is sensitive to many major problems, including simple non-convergence to a Nash equilibrium, mode collapse in which the generator fails to produce a variety of samples, diminished gradients in which the discriminator is effective enough that the generator fails to receive enough information to properly update due to vanishing gradients, imbalance between the generator and discriminator resulting in overfitting, and sensitivity to training hyperparameters, to name a few.
With the motivations behind GANs in mind, it is clear to see the value that such a framework can provide to researching physical systems. Assuming a GAN is sufficiently trained on configurations drawn from a well-simulated physical system, the distribution of the microstates composing the simulation data can be "learned", which can then be used to predict physical properties of interest, not unlike calculating physical properties from the partition function describing a system in statistical physics. However, while the standard GAN framework provides the necessary tools for generating convincing configurations and discriminating between "real" and "fake" samples, this is not necessarily of immediate interest to physical research. Indeed, there is an interpretability problem with GANs in which while the important information required to draw a sample from the distribution of the training configurations is contained in the latent input to the generator network, there is not always an obvious way to isolate desired properties of the distribution represented in the latent input. The infoGAN model provides an encouraging attempt to address this interpretability problem by learning disentangled representations of configurations by maximizing mutual information between small subsets of the latent inputs to the generator network and the output of an auxiliary network to the discriminator network. 23 This can be incorporated into the cost function when training the discriminator network with an additional mutual information term I(c; G(c, z)) where c is the chosen subset of latent inputs (often referred to as the control codes) distributed as P (c) with entropy H(c), G(c, z) is the generator network, and Q(c|x) is the auxiliary network. The mutual information term is estimated using an entropic approach.
Where λ is a hyperparameter used to weight the importance of optimizing the mutual information against the generative ability of the model to produce "real" samples and the discriminative ability of the model to distinguish "real" and "fake" samples. The control codes c are free to take on many forms, but are most frequently taken to be uniform, Gaussian, or categorical variables. In this work, categorical variables are employed to allow the network to directly classify different kinds of Ising configurations in an unsupervised manner. The use of conditional variables is not restricted to these control variables, however.  The structures described for the GAN and infoGAN models are respectively depicted in Fig. 4 and Fig. 5.
Additionally, the generator and the discriminator can be conditioned on the known physical conditions that the Ising configurations are subject to, namely the temperature and the external magnetic field strength. The inclusion of this information involves conditioning both the generator and discriminator networks on these inputs, denoted with t. This type of GAN is referred to as a conditional GAN (CGAN). 25 The cost function reads as This can then be freely incorporated into the infoGAN model to produce a so-called infoCGAN model. The structures described for the CGAN and infoCGAN models are respectively depicted in Fig. 6 and Fig. 7.
In summary, categorization of physical configurations can be accomplished using this approach by augmenting the standard GAN approach to add a significant amount of control to the "learned" distribution of the training data in the form of both categorical labels that distinguish different classes of physical configurations and conditional labels that indicate the physical conditions the configurations are subject to. The auxiliary network can then be used as a classifier to partition the physical configurations into physically meaningful classes, which is intended to be the physical phase in this work. By contrast to prior works, the InfoCGAN approach is more flexible and expressive. Such a network is capable of fulfilling the role of a generator similar to the RBM and VAE approaches, albeit with greater fidelity than a VAE as well as provide either categorical or continuous assignments to the configurations (though only categorical control codes are used in this work) in an unsupervised manner by way of the auxiliary network, whereas a VAE only provides a continuous latent space. This is distinct from typical classification networks that are limited to supervised learning and thus require a priori knowledge of the physical phases in the training samples. InfoC-GAN additionally provides the capacity for conditioning the Ising configurations on physical conditions the Ising system is subject to, which can be reasonably exploited for many other physical simulations.

II. METHODS
The simulated Ising configurations obtained are generated using a typical Monte Carlo simulation across a large range of temperatures and external magnetic fields using a Metropolis criterion for spin-flip attempts. This is implemented using the Python programming language with NumPy operations, Numba JIT compilation, and Dask parallelization. [26][27][28][29] The lattice size chosen for the square 2-dimensional configurations was l = 27. A single spin flip attempt consists of flipping the spin of a single lattice site, calculating the resulting change in energy ∆E, and then using that change in energy to define the Metropolis criterion exp − ∆E T . If a randomly generated number is smaller than said Metropolis criterion, the configuration resulting from the spin-flip is accepted as the new configuration. The data analyzed in this work consists of 1,024 square Ising configurations of side length 32 with periodic boundary conditions across 65 external field strengths and 65 temperatures respectively sampled from [−2, 2] and [1,5] on uniformly spaced grids. The interaction strength was set to unity such that J ij = J = 1. Each sample was equilibrated with 8,388,608 spin-flip attempts before data collection began. Data was then collected at an interval of 8,192 spin-flip attempts for each sample up to a sample count of 1,024. At the end of each data collection step, a replica exchange Markov chain Monte Carlo move, also known as parallel tempering, was performed across the full temperature range for each set of Ising configurations that shared the same external field strength. [30][31][32] In this work, the Ising spins are rescaled such that a spin-down atomic spins carry the value 0 and spinup atomic spins carry the value 1, which is a standard setup for binary-valued features in data science applications. This representation is consistent with the lattice gas model, which is equivalent to the Ising model. The resulting modified Hamiltonian would be represented in the following manner Where the external field strength H is reinterpreted as the chemical potential µ, J retains its role as the interaction strength, and n i ∈ {0, 1} represents the lattice site occupancy. The original Ising Hamiltonian can be recovered using the relation σ i = 2n i − 1 up to a constant.
The infoCGAN model is trained on the collection of Ising configurations using five categorical control variables and two conditional variables, taking the values of the temperature and the applied external magnetic field. The temperatures are normalized to a range of [0, 1] and the external magnetic fields are normalized to a range of [−1, +1]. The five categorical control variables are intended to represent the two low-temperature ferromagnetic classifications, the paramagnetic classification, and the two crossover regions (reflecting the parity of the external magnetic field). The categorical control variables, conditional variables, and a Gaussian latent space of size 128 are concatenated to produce the latent input vector for the generator network. Every hidden layer uses Le-Cun normal kernel initializations with scaled exponential linear unit (SELU) activations such that the networks are self-normalizing networks (SNNs). 33 Due to the 2dimensional structure of the Ising configurations, it is a natural choice to implement the infoCGAN as a convolutional neural network (CNN). 34 The neural network itself was implemented in the Python language using the TensorFlow framework with Keras layers. 26,35,36 The first hidden layer in the generator network is a dense layer of size 256, which is then reshaped to a size of (1, 1, 256). Two transposed convolutions are then iteratively applied with a number of filters that decreases by a factor of 2 upon each iteration. A final transposed convolution is applied with a single filter such that an output tensor of shape (27,27,1) is designated as an output layer subject to a logistic activation function with output restricted to the interval [0, 1), which can be treated as the probability of a spin-up lattice site. All of the transposed convolution layers are implemented with a kernel shape of (3, 3) and a stride of (3,3). The input conditional variables are additionally fed forward as output.
The discriminator network takes an Ising configuration of shape (27,27,1) as well as conditional variables as input. The first component of the discriminator network performs iterative convolutions on the input Ising configuration, which is the inverse of the iterative transposed convolutions performed by the generator network. The final convolution produces an output shape of (1, 1, 256) which is then flattened to remove the vestigial dimensions. This is treated as a hidden latent space containing structural information about the input Ising configurations extracted by the convolutional layers. Up until this point, all of the layers in the discriminator network are shared with the auxiliary network such that all of the weights and biases are shared. After this point, the discriminator and auxiliary networks will indeed possess similar structure, but they will be allowed to optimize the weights and biases of these final layers such that they are allowed to respond to different features in the hidden latent space to optimize their respective predictions. The discriminator and auxiliary networks then independently feed the hidden latent space output into dense layers of size 135. The resulting output is concatenated with the input conditional variables and respectively fed into the output layers for each network. The output layer of the discriminator network is a dense layer of size 1 implements with a logistic activation function. By contrast, the output layer of the auxiliary network is a dense layer of size 5 implemented with a softmax activation function.
The loss function is implemented with binary crossentropy loss for the validity score and categorical crossentropy for the categorical control predictions. Stochastic gradient descent (SGD) is used as the optimizer for the discriminator with a learning rate of 0.01 and adaptive moment estimation (Adam) is used as the optimizer for the generator with β 1 = 0.5 with a learning rate of 0.001 in order to prevent the discriminator from overpowering the generator during learning. 37,38 A value of λ = 1 is employed for the mutual information loss from the auxiliary network.
Training is performed for sixteen epochs with a batch size of 169, resulting in 25,600 batches for each epoch. After sufficient training, the auxiliary network can be used to predict the classification probabilities of the Ising configurations with respect to the five categories. If one would desire it, the generator network can additionally be used to generate new convincing Ising configurations given a categorical assignment, temperature, external magnetic field, and a random Gaussian latent vector.

III. RESULTS
The loss history of the infoCGAN does indicate the tendency towards a Nash equilibrium, though with occasional classification errors in the later epochs that are nonetheless statistically conscionable. The discrimination loss is essentially equivalent for the "real" and "fake" samples and is consistently lower than the generation loss, but the results are well within expectations. Training stability was rather quickly obtained. All of the plots in this section are generated with the MatPlotLib package using a perceptually uniform colormap. 39 The loss history with respect to the batches is depicted in Fig. 8 while the loss history with respect to the epochs is depicted in Fig. 9 FIG. 10: This diagram depicts the classification probability for the first categorical variable with respect to temperature and the external magnetic field. The predicted classification probabilities for the first two categorical control variables are shown with respect to the temperature and the external magnetic field for the Ising configurations are depicted in Fig. 10 and Fig. 11. The results are exceptional, as the regions of high classification probabilities reflect the expected locations of the low-temperature ordered ferromagnetic phases, complete with symmetry across the vanishing field case. The classification probabilities are very close to 0.5 for each category along the low-temperature regime of the vanishing field, reflecting the fact that in the absence of an external field, there is no imposed preference on the type of magnetic order exhibited by the system. The boundaries of these high probability regions for these categories are rather sharply defined. This is encouraging, as both the training Ising configurations and the generated configurations for these regions possess consistent thermodynamics, showing an average magnetization extremely close to either -1 or +1 across the regions depending on the external field (excluding the vanishing field case, of course). Outside of these predicted classification regions, there is a consistent departure from the completely ordered states caused by thermal fluctuations. The predicted classification probabilities for the next two categorical control variables are shown with respect to the temperature and the external magnetic field for the Ising configurations are depicted in Fig. 12 and Fig. 13. These classification regions capture the aforementioned configurations in which there are significant departures from the completely ordered ferromagnetic states, but the nearest-neighbor interactions and interactions with the external field (where applicable). Investigation into the configurations contained in these regions shows a smooth decay of the average magnetization with respect to increasing temperature, consistent with the characteristics of a crossover region. Furthermore, as the external field strength increases, the region begins and ends at increasing temperatures, consistent with the expectation that the crossover region will shift to higher temperature ranges as it becomes more difficult for thermal fluctuations to overpower the influence imposed by the external field. Once again, there is an observed symmetry of the two regions across the vanishing field as expected as well as reasonably sharply defined region boundaries, similar to what was observed with the ferromagnetic regions.
FIG. 14: This diagram depicts the classification probability for the fifth categorical variable with respect to temperature and the external magnetic field.
The predicted classification probability for the final categorical control variable is shown with respect to the temperature and the external magnetic field for the Ising configurations is depicted in Fig. 14. This classification region is well-characterized by the paramagnetic state, as the configurations contained by it exhibit nearly zero average magnetization. Where the ferromagnetic regions failed to predict the critical temperature due to the finite size of the system, this region compensates, as the region begins in the immediate vicinity of the true critical point of T C ≈ 2.269 as calculated by Kramers-Wanier duality. [40][41][42] As seen with the prior two sets of classification regions, there is a symmetry across the vanishing field. By contrast with the others, however, there is no need to partition the classification across the vanishing field as the thermal fluctuations are overpowering the influence of the external field.
In order to further investigate how well-approximated the critical point actually is, a logistic curve can be fit using orthogonal distance regression to the average paramagnetic classification probabilities with respect to temperature along H = 0, taking the form P (S paramagnetic |T ) = 1 1+e −k(T −T C ) where k is the logistic growth rate of the probability andT C is the estimated critical temperature.

FIG. 15:
This diagram depicts the average classification probabilities of the class assignments predicted by the auxiliary network with respect to temperature in the vanishing field case. Deep purple represents the ferromagnetic classifications, light purple represents the crossover classifications, and pink represents the paramagnetic classifications. The logistic curve is fit to the paramagnetic probabilities and provides a critical temperature estimate ofT C ≈ 2.308. This is shown in Fig. 15, where the dotted line indicates the critical temperature estimate atT C ≈ 2.308. This provides an overestimate of ≈ 1.688%.  This estimation of the critical point is consistent with the peaks observed in the magnetic susceptibility χ and the specific heat capacity C respectively depicted in Fig.  16 and Fig. 17, which would be divergent in the thermodynamic limit. This provides significant evidence for the estimation provided through the auxiliary network classification being consistent with the known physics. The magnetizations m and energies E of the Ising configurations with respect to temperature are additionally respectively shown in Fig. 18 and Fig. 19. The predicted critical temperature is consistent with expectations, as it is located where the strict separation of the spin-up and spin-down configurations breaks down for the magnetization and where the inflection occurs in the energy.

IV. CONCLUSIONS
The infoCGAN auxiliary classification network has performed extremely well in predicting phases of the 2dimensional Ising model. It is important to note how well the approach agrees with known properties of the Ising model despite having no access to any physical a priori information beyond the raw configurations, the temperatures, and the external field strengths. From this approach, a clear distinction between the two ferromagnetic phases as well as the paramagnetic phase is readily obvious as well as a reasonable identification of the crossover region. Furthermore, an exceptional estimation of the critical point is provided by this approach.
These results show encouraging results to motivate further development of approaches to machine learn phases of matter. For many physical systems, order parameters have not been identified through conventional means and the phase diagrams elude discovery. However, simulation data consisting of configurational information with respect to thermal parameters is readily available. It is clear from this work that machine learning approaches are indeed capable of determining a structural criterion to partition physical samples into separate phases of matter and are even flexible enough to allow for conditioning of the model on additional physical parameters that may prove useful.

V. ACKNOWLEDGMENTS
This work is funded by the NSF EPSCoR CIMM project under award OIA-1541079. Additional support (KMT) was provided by NSF Materials Theory grant DMR-1728457. An award of computer time was provided by the INCITE program. This research also used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.