Galaxy Spin Classification I: Z-wise vs S-wise Spirals With Chirality Equivariant Residual Network

The angular momentum of galaxies (galaxy spin) contains rich information about the initial condition of the Universe, yet it is challenging to efficiently measure the spin direction for the tremendous amount of galaxies that are being mapped by the ongoing and forthcoming cosmological surveys. We present a machine learning based classifier for the Z-wise vs S-wise spirals, which can help to break the degeneracy in the galaxy spin direction measurement. The proposed Chirality Equivariant Residual Network (CE-ResNet) is manifestly equivariant under a reflection of the input image, which guarantees that there is no inherent asymmetry between the Z-wise and S-wise probability estimators. We train the model with Sloan Digital Sky Survey (SDSS) images, with the training labels given by the Galaxy Zoo 1 (GZ1) project. A combination of data augmentation tricks are used during the training, making the model more robust to be applied to other surveys. We find a $\sim\!30\%$ increase of both types of spirals when Dark Energy Spectroscopic Instrument (DESI) images are used for classification, due to the better imaging quality of DESI. We verify that the $\sim\!7\sigma$ difference between the numbers of Z-wise and S-wise spirals is due to human bias, since the discrepancy drops to $<\!1.8\sigma$ with our CE-ResNet classification results. We discuss the potential systematics that are relevant to the future cosmological applications.


INTRODUCTION
In tidal-torque theory, the angular momentum of galaxies (galaxy spin) is generated by the tidal torque due to the misalignment between the protohalo inertia tensor and the local gravitational tidal shear (Peebles 1969;Doroshkevich 1970;White 1984). Cosmological simulations have confirmed that the direction of dark matter halo spin is well described by the tidal-torque theory (Porciani et al. 2002), and that disk galaxies generally follow dark matter and gain similar spin directions as their host halos (Teklu et al. 2015;Jiang et al. 2019). This makes galaxy spin a promising cosmological probe of various parameters including the initial condition of the Universe, primordial chirality and the neutrino mass (see e.g. , 2001Yu et al. 2019Yu et al. , 2020Motloch et al. 2021Motloch et al. , 2022a. Recently, Motloch et al. (2021) find a correlation between the observed galaxy spins and the initial density field of the Universe. The same galaxy catalog has also been used to search for primordial chirality violations (Motloch et al. 2022a). However, the signal to noise ratio is limited by the number of galaxies (∼ 15000) with their spin directions observed. To be able to fully exploit the galaxy spin as a cosmological probe and make good on any of the exciting prospects, it is necessary to develop new methods to measure the direction of galaxy spin accurately and efficiently.
Assuming that spiral galaxies are well approximated by circular disks, their three-dimensional spin directions can be determined through the position angles and axis ratios which are readily available from photometric observations, up to a fourfold degeneracy (see Figure 2 of Motloch et al. 2021, for a visual illustration). Iye et al. (2019) visually inspect 842 spiral galaxies and confirm that (1) all the spirals are trailing, i.e. Z-wise spirals rotate clockwisely, and (2) the dark, dust-lane-dominant side of the minor axis is closer to us. Therefore, one can break the fourfold degeneracy if one can determine (1) whether the galaxy is a Z-wise or S-wise spiral, and (2) which side of the minor axis is darker and redder. 1 In this paper, we focus on the classification of Z-wise vs S-wise spirals, and leave dark side vs bright side classification for future research.
Galaxy Zoo 1 (GZ1) is a citizen science project which classifies about 9 × 10 5 galaxies by members of the public (Lintott et al. 2008(Lintott et al. , 2011. 2 It provides the information of clockwise or anticlockwise (Z-wise or S-wise spiral pattern) for galaxies from the Sloan Digital Sky Survey (SDSS, Abazajian et al. 2009;Ahumada et al. 2020) data, which leads to a ∼ 3σ detection of the correlation between the galaxy spin field and cosmological initial conditions (Motloch et al. 2021) and preliminary results for primordial chirality violations (Motloch et al. 2022a). However, the ongoing and forthcoming cosmological surveys, such as the Dark Energy Spectroscopic Instrument (DESI, DESI Collaboration et al. 2016a,b), will map tens of more galaxies than those classified in GZ1, which are prohibitive to be again classified by human. Machine learning (ML) based classification methods are required to efficiently identify the morphological properties of galaxies.
Deep Convolutional Neural Networks (CNN) have led to a series of breakthroughs in computer vision during the past ten years (LeCun et al. 1989;Krizhevsky et al. 2012;He et al. 2016;Tan & Le 2019). They are now regarded as the state-of-the-art image classification method and are widely used in general astrophysics applications (Banerji et al. 2010;Huertas-Company et al. 2011;Dieleman et al. 2015;He et al. 2019;Abul Hayat et al. 2020;Yao-Yu Lin et al. 2020). In this paper, we present Chirality Equivariant Residual Network (CE-ResNet), a machine learning based classifier for the Zwise vs S-wise spirals. The remainder of this paper is organized as follows: we introduce the datasets in Sec-1 Alternatively, one can use the information regarding which side of the major axis is approaching us (Han et al. 1995;Pen et al. 2000;Motloch et al. 2021). This may be less ambiguous than visually deciding the dark side of the minor axis, but requires spectroscopic data and thus cannot be directly obtained from photometric surveys. 2 http://zoo1.galaxyzoo.org/ tion 2 and the architecture of our model in Section 3. We present the training and classification results of our model in Section 4 with SDSS images, and in Section 5 with DESI images. The known asymmetry between Z-Spirals and S-Spirals in GZ1 is discussed in Section 6. We conclude this paper in Section 7. The source code of our CE-ResNet model 3 and the classification catalogs 4 are publicly available.

GALAXY SPIN DATASETS
In the GZ1 project, public volunteers are asked to classify the SDSS galaxy images into six categories: ellipticals, Z-wise spirals, S-wise spirals, edge-on spirals, star / don't know, and mergers. The catalog includes the vote counts of the six morphological types for 667,944 galaxies with SDSS spectra data available, and for 225,268 galaxies with no spectra data available. The empirical probability that a galaxy belongs to each category can be estimated by the fraction of votes, while only the probabilities of Z-wise and S-wise spirals, p z and p s , are relevant for this paper. 5 We refer to the empirical probabilities from the GZ1 catalog as the "true" probability, in contrast to the "predicted" probability given by the classifiers. To keep the number of different kinds of galaxies roughly balanced, we downsample the galaxies with p m ≡ max (p z , p s ) ∈ [0, 0.1] by a factor of 20 (i.e. only keep 1 of 20 such galaxies), the galaxies with p m ∈ (0.1, 0.2] by a factor of 5, and the galaxies with p m ∈ (0.2, 0.3] by a factor of 2. We then query the SDSS DR16 SQL database with the RA and Dec of each galaxy in the GZ1 catalog, and apply the following cuts to remove the galaxies that are unlikely to be clear enough to identify their morphology: 1. There should be exactly one PhotoObj within 1" of the location in the GZ1 catalog. 2. The error of r band magnitude should be in (0, 1).
3. The r band half light radius r 50 should be larger than 1".
3 https://github.com/h3jia/galaxy spin classifier/ 4 https://zenodo.org/record/7170929/ 5 Note that p z/s throughout this paper is the probability that a random volunteer in the GZ1 project will decide that the galaxy is a Z(S)-Spiral, which is not exactly the same as the probability that the galaxy is actually a Z(S)-Spiral. However, these two types of probabilities should be positively correlated, and the galaxies with p z/s → 1 are indeed very likely to be real Z(S)-Spirals (see the sample images in Appendix A). Therefore, the empirical probability estimated by the vote fractions is still a useful quantity for the classification of galaxy morphology. 4. The r band relative error of radius σ r50 /r 50 should be in (0", 0.25").
We find 173,097 galaxies that meet all the criteria (dubbed "Reduced GZ1" catalog henceforth), and use 70% of them for training, 15% for validation, and 15% for testing. We obtain the jpeg images of these galaxies from both SDSS and DESI surveys using the Legacy Surveys Sky Viewer tool (Dey et al. 2019). 6 The SDSS DR16 images are generated from the gri bands, while the DESI DR9 images are generated from the grz bands. The numbers of p z and p s galaxies with different choices of cutoff values p cut for the vote fraction are listed in Table 1. Sample images for galaxies with different GZ1 morphology classification probabilities can be found in Figure 9.
To assess whether our model can be robustly applied to surveys other than SDSS (with which the model is 6 https://www.legacysurvey.org/ trained), we also collect all the galaxies in the DESI Legacy Survey Sweep Catalogs that are larger than 1" in half light radius and have spectroscopic redshifts available, leading to the "Preliminary DESI" catalog of 1,953,246 galaxies. For our model, the image field-ofview (FOV) is chosen as a multiple of the galaxy size, which however may deviate significantly between different survey measurements (see Figure 8). This requires that our classification model should be insensitive to the image FOV, which will be justified later in this paper (see Figure 4 and the discussions therein).

NETWORK ARCHITECTURE
Empirically, deeper neural networks are more expressive than their shallower analogs, at the cost of being more difficult to train (Mhaskar et al. 2017;Mehta et al. 2019): as the depth of the network increases, its accuracy may get saturated and then eventually decreases rapidly, which is known as the degradation problem (He & Sun 2015;Srivastava et al. 2015). This issue can  He et al. (2016), but has four additional fully connected layers. The network predicts the scores of Z-Spirals and Non-Spirals from the original images, and the scores of S-Spirals and Non-Spirals from the flipped images, which guarantees that it is equivariant under a parity inversion. See Figure 1 for a demonstration of the model workflow.  He et al. 2016), which allows the construction of extremely deep convolution networks and is still considered as the state-of-the-art method for computer vision tasks. The key insight of ResNet is that, in principle deeper networks should perform at least as well as their shallower analogs, since if the subsequent layers are all identity, a deep network becomes equivalent to a shallow network. The degradation problem then implies that it is not easy for neural networks to approximate the identity function with nonlinear layers. Therefore, instead of fitting the target function H(x) directly, it is advantageous to have each nonlinear layer fit F(x) := H(x) − x, which can be implemented by a simple shortcut connection (see e.g. Figure  networks such that they should perform no worse than shallower networks. We implement our Chirality Equivariant Residual Network (CE-ResNet) in pytorch (Paszke et al. 2019), based on the ResNet-50 model in He et al. (2016). The network structure is shown in Table 2. We use ReLU activation function for the convolution layers, and tanh activation function for the fully connected layers. Our model is the same as the original ResNet-50 model, except for the following changes. (1) The input image size is now 3 × 160 × 160, and the output size for each layer is changed accordingly. (2) We add four additional fully connected layers to improve its expressivity. (3) Each galaxy image is fed into the neural network twice: the same network predicts the scores for Z-Spirals and Non-Spirals given the original image, and the scores for S-Spirals and Non-Spirals given the flipped image. We then average the two estimates of the Non-Spiral score, and apply a softmax function to get the probabilities for the three categories (see Figure 1). This guarantees o r i g i n a l t r a n s l a t  that the network is purely parity-even, which is crucial for the cosmological galaxy spin analysis. Note that we directly use the network to predict the scores and probabilities; we do not select a cutoff value and divide the galaxies into discrete categories of Z-Spirals, S-Spirals and Non-Spirals before training. Unlike standard ML datasets such as MNIST (Lecun et al. 1998), where it is obvious whether one image is a handwritten number 8 or not, many galaxy images are not clear enough so that one can assert they belong to a certain morphology category. Although galaxies with larger p z/s are more likely to be real Z/S-spirals, there is no simple cutoff value such that all the galaxies with p z/s above it are Z/S-spirals while all the galaxies below it are not: as shown in Figure 9, the images just change continuously with respect to p z/s . The value of p z/s in the GZ1 catalog is indeed a stochastic variable: assuming that the vote on one single galaxy from each person follows some i.i.d. distribution based on its morphology and image quality, the fraction of total votes for each category does asymptotically converge with infinite vote size, but will always have finite noise when the vote size is limited. If one pre-divides the galaxies into dis-crete categories, there are always mislabelled galaxies around the cutoff probability, which will be confusing to the classifier. Therefore, we stick with estimating the scores directly, and leave the interpretation of the output probabilities to the user.

TRAINING AND RESULTS
The input of our model are 3-channel RGB images of shape 3 × 160 × 160, with the target galaxy centered and the image FOV equal to five times the galaxy half light diameter. Since the chirality of a spiral galaxy should have no dependence on its location, orientation, size and color, we apply the following data augmentation during training, which are illustrated in Figure 2: 1. The relative location of the galaxy is moved by up to 25% of its half light radius along both directions, with the exact translation randomly sampled from the uniform distribution U (−0.25 r 50 , 0.25 r 50 ).
2. The galaxy is rotated by a random angle, sampled from the uniform distribution U (0 • , 360 • ).  . We check the stability of our model, under a translation, rotation, zoom and permutation of color channels. The grey bands indicate the data augmentation used in the training of our network. Our model is completely insensitive to the rotation and color channel permutation of input images. Its performance does degrade when the images are translated by more than 0.7 galaxy radii or when the FOV is smaller than 3 galaxy diameters, probably because some useful information is cut out from the images. On the other hand, using a larger FOV has a smaller impact on the classification results. 3. The exact FOV of training images is sampled from a uniform distribution between four times and six times the galaxy half light diameter.
4. We apply a random permutation to the three color channels, to mitigate the potential overfitting to the correlation between galaxy color and morphology (e.g. Bamford et al. 2009). This will also improve the model generalizability to images from other surveys, with possibly different filters, instrumental responses and color scales.
Here we stick with 3-channel images and do not average the different bands or use only the band with the best signal-to-noise ratio: although the color of galaxy should contain no chirality information, it is possible that e.g. the spiral arms where more new stars are forming look bluer than the region between the arms, so that the spiral structure is less clear in single-channel images compared with the full 3-channel images. We train our CE-ResNet model for 120 epochs using the cross entropy loss and the AdamW optimizer (Loshchilov & Hutter 2019), which takes about one day on one NVIDIA V100 GPU. We set the weight decay coefficient to 1 and use an initial learning rate of 0.0001, and after every 5 epochs we reduce the learning rate by a factor of 15%. The training and validation losses are shown in Figure 3. Sample images of different p z/s,pred are shown in Figure 10, which are similar to those from the original GZ1 classification (Figure 9). The total number of spiral galaxies is also close to the original human classification (Table 1).
We test the robustness of our model under the data augmentation transforms in Figure 4, with only one type of transform activated each time. Note that in the first panel, we apply a stochastic transform similar to the one used during training, with the maximum translation indicated by the horizontal axis. For the other three panels, the same deterministic transform is applied to all the images in each case. The loss function is almost constant within the extent of training data augmentation indicated by the grey bands, meaning that our model is indeed insensitive to these transforms. Beyond the  Figure 6. The quantiles of pm ≡ max(pz, ps) prediction errors for galaxies within different bins of redshift z, half light radius r50, r band magnitude mr, color mg − mr, aspect ratio b/a, and orientation φ, with the corresponding number counts of galaxies in each bin. The legend shows the four bins for pm. We note that for the same pm bin, the pm prediction is roughly unbiased with the variance independent of the six parameters, except that the model slightly underestimate pm for the galaxies that are distant visually, visually small, dim, and have a small aspect ratio.
grey bands, the loss function does increase when a large translation or a small FOV is used, probably because part of the galaxies are cut out from the images. This implies that when the model is applied to another catalog without reliable galaxy radius measurements, one should consider setting the image FOV slightly larger rather than smaller, to make sure that the whole galaxy is contained in the image.
In Figure 5, we show the 2-dim histograms of the true vs predicted GZ1 vote probabilities for the Z-wise and S-wise spirals, which is roughly diagonal with a small dispersion of 0.1. Is this a good result? Since our training data is from the volunteer vote fractions, the dispersion cannot be smaller than the Poisson noise in the data. Assuming that the votes follow an i.i.d. binomial distribution, the empirical vote fraction will have a standard error of ∆p = p(1 − p)/n, which equals to 0.073 if p = 0.8 and n = 30. Therefore, the performance of our model is already close to the best allowed by the training data. We also note an excess of galaxies in the 0 < p z/s,true < 0.1, 0.1 < p z/s,pred < 0.3 bins, due to the relative large number of galaxies with 0 < p z/s,true < 0.1. However, such small p z/s galaxies are rarely relevant to practical applications, since their spin directions are mostly undetermined.
We check whether the classification accuracy systematically depends certain galaxy parameters in Figure 6. All galaxies in the test dataset are binned according to their redshift z, half light radius r 50 , r band magnitude m r , color m g − m r , aspect ratio b/a, orientation φ, as well as the true p m ≡ max(p z , p s ). We plot the error of p m prediction and the number of galaxies for each bin. Generally, the error is the smallest for p m ∈ [0.85, 1.0], since the Poisson noise follows ∆p = p(1 − p)/n and decreases with increasing p when p > 0.5. We find that our model is overall unbiased, except for the galaxies with large z, small r 50 , large m r and small b/a, whose p m is slightly underestimated. This is unlikely an issue for cosmological applications, however, as the number of such galaxies is relatively small. Actually, such deviation will not complicate the cosmological analysis, since effectively our model just has a different selection function than the humans in GZ1, meaning that a slightly different population of galaxies have their spin directions determined by the classifier. In principle, one should take this into account to avoid making overconfident claims about our Universe, but the treatment of such selection effects should be similar, regardless of whether the galaxies are selected by humans or machines.

APPLICATION TO DESI IMAGES
Having trained our CE-ResNet model with SDSS images, we use the model to classify the same galaxies but with DESI images, which generally look redder than SDSS images as the z band is used instead of the i band for the red channel. It turns out that our model performs well on these DESI images: we find 31.8% more spiral galaxies (with p cut = 0.7) relative to the original human classification (Table 1), as the predicted p m increases for almost all kinds of galaxies except those already clearly classified with SDSS images (Figure 7). Comparing the sample images in Figures 10 and 11, the DESI images are obviously clearer than the SDSS images, enabling better classification of the galaxy morphology.
We also apply our model to the "Preliminary DESI" catalog, which indeed has a large overlap with the GZ1 catalog, as SDSS contributes most of the current galaxy spectra data before DESI spectra become available. We find that 150,283 of the 173,097 galaxies in the Reduced GZ1 catalog can be matched to one galaxy within 1" in the Preliminary DESI catalog. However, these galaxies have slightly different location and radius measurements in DESI Legacy Surveys, which are required to determine the image cut for our model input. We thus use the "Preliminary DESI" catalog to validate the accuracy of our model with DESI photometric measurements, since the future DESI spectra catalog will have the galaxy location and radius measured with the same pipeline.
We compare the predicted p m for the galaxies matched between the Reduced GZ1 and Preliminary DESI catalogs. In principle, p m,pred should be close between these two cases, since we are classifying the same galaxies using the same imaging survey but just with different image cuts. As shown in Table 1, the total number of Z-Spirals and S-Spirals are similar to the Reduced GZ1 catalog with DESI images, with slightly fewer galaxies if one chooses p cut = 0.9 but more galaxies if one chooses p cut ∈ [ 0.5, 0.7 ]. We find that ∆p m,pred has a strong correlation with the difference in the measured galaxy radius: most of the galaxies have their radii measured larger by about 30% in the DESI catalog, which however only leads to a small impact on the predicted p m  Figure 6, but comparing p m,pred and the measured half light radius, for all the galaxies in the Reduced GZ1 catalog that have one within-1" match in the Preliminary DESI catalog. The predicted pm is close for most galaxies despite the difference in the radius measurement, except for the large, red galaxies whose radius measurement differs by about 100%.
( 0.05) since our model is insensitive to a reasonable amount of changes in the image FOV ( Figure 4). On the other hand, some large, dim galaxies have 100% difference in radius measurements, which leads to a larger dispersion in ∆p m,pred . This should only have limited effects on cosmological applications of our model, however, as the number of such galaxies is relatively small according to Figure 6. Although in practice, one may further improve the performance by applying an empirical correction for the galaxy radii to compensate for the difference between surveys, based on e.g. the general trend of r DESI /r SDSS in Figure 8, which we leave for future research.

CHIRALITY VIOLATION DUE TO HUMAN BIAS
There is a known bias towards S-wise spirals in the GZ1 classification catalog, which has been attributed to human selection effects (Hayes et al. 2017). We check the symmetry between Z-Spirals and S-Spirals using the classification catalogs in Table 1. We use the following statistics to determine the significance of chirality violation, which should asymptotically follow the standard Gaussian distribution under the null hypothesis of no chirality violation. See Appendix B for the derivation. Note that our Reduced GZ1 catalog contains all the p m > 0.3 galaxies in the full GZ1 catalog, while the p m ≤ 0.3 galaxies are downsampled. Similar to Hayes et al. (2017), we find a ∼ 7σ asymmetry in the GZ1 Humans classification catalog, which disappears if the same SDSS images are classified by the parity-even CE-ResNet, implying that the asymmetry is due to a slight underestimate of p z relative to p s by GZ1 Humans. However, when DESI instead of SDSS images are used for the Reduced GZ1 galaxies, there are again slightly more S-wise than Z-wise galaxies with p cut = 0.5. This is likely because some of the p z ∼ 0.5 (with DESI images) galaxies are cut out from the Reduced GZ1 catalog since they may have p z < 0.3 by GZ1 Humans. When applied to the Preliminary DESI catalog, our CE-ResNet finds equal number (< 1.8σ asymmetry) of Z-Spirals and S-Spirals for all different choices of p cut , confirming that no real chirality violation between these two types of spirals exists in Nature.

DISCUSSIONS
In this paper, we present Chirality Equivariant Residual Network (CE-ResNet), a machine learning based model for the classification of Z-wise vs S-wise spirals. Trained with Galaxy Zoo 1 (GZ1) data, our model gives similar predictions on the chirality of galaxies as the volunteers in the GZ1 project, but can be efficiently applied to the millions or even billions of galaxies that will be mapped in the near feature. Our model is manifestly parity-even, since basically the same estimator is used the predict the probabilities of Z-Spirals and S-Spirals, using the trick that one gets a S-Spiral if one flips the image of a Z-Spiral. We validate our model in Section 4, and verify that our model can be directly applied to DESI images even though it is trained with SDSS images in Section 5. We confirm that no real asymmetry between the numbers of Z-wise and S-wise spirals exists in Section 6.
We note a few related works in the literature. Hayes et al. (2017) develops an unbiased selector to demonstrate the origin of the excess of S-Spirals in the GZ1 catalog. However, their Unbiased Machine spirality selector finds significantly fewer spirals than GZ1 Humans, such that it cannot maximally extract the information in the survey data for cosmological analysis. Recently, Tadaki et al. (2020) studies the classification of spiral galaxies with CNN. However, their dataset only includes galaxies that are unambiguously Z-spirals, S-spirals and non-spirals, whereas real world survey catalogs also include galaxies with unclear morphological type due to the limitation of image quality. Their model predictions on these unclear galaxies can be undefined as the network has never seen such galaxies during training, making it risky to be directly applied to full survey catalogs. Also, the classifier in Tadaki et al. (2020) is not manifestly parity-even, so one should be cautious about the possible inherent asymmetry between Z-Spirals and S-Spirals in their model.
The network used in this paper is equivariant under a reflection of the input image, which eliminates the potential bias caused by the difference between the Z-and S-type estimators. Additionally, we augment the dataset by a random translation, rotation, scaling and permutation of color channels, to mitigate potential overfitting onto the irrelevant position, orientation, size and color information. According to Figure 4, our network is indeed stable under such transforms. We note that in principle, one can implement a more advanced equivariant network that manifestly accounts for all the relevant symmetries (e.g. Zhang 2019; Weiler & Cesa 2019;Sosnovik et al. 2019;Cesa et al. 2022). Also, domain adaption techniques may help to improve the performance when the model needs to be applied to data from different surveys (Ben-David et al. 2010;Alexander et al. 2021). We leave these directions for future research, since our current architecture already works well in the various benchmarks demonstrated in this paper. Computations were performed on the Mist supercomputer at the SciNet HPC Consortium and the SOSCIP Consortium's GPU computing platform. SciNet is funded by: the Canada Foundation for Innovation; the Government of Ontario; Ontario Research Fund -Research Excellence; and the University of Toronto (Loken et al. 2010