Abstract

The distribution of the spin directions of spiral galaxies in the Sloan Digital Sky Survey has been a topic of debate in the past two decades, with conflicting conclusions reported even in cases where the same data were used. Here, we follow one of the previous experiments by applying the SpArcFiRe algorithm to annotate the spin directions in an original dataset of Galaxy Zoo 1. The annotation of the galaxy spin directions is carried out after the first step of selecting the spiral galaxies in three different manners: manual analysis by Galaxy Zoo classifications, by a model-driven computer analysis, and with no selection of spiral galaxies. The results show that when spiral galaxies are selected by Galaxy Zoo volunteers, the distribution of their spin directions as determined by SpArcFiRe is not random, which agrees with previous reports. When selecting the spiral galaxies using a model-driven computer analysis or without selecting the spiral galaxies at all, the distribution is also not random. Simple binomial distribution analysis shows that the probability of the parity violation to occur by chance is lower than 0.01. Fitting the spin directions as observed from the Earth to cosine dependence exhibits a dipole axis with statistical strength of 2.33 to 3.97. These experiments show that regardless of the selection mechanism and the analysis method, all experiments show similar conclusions. These results are aligned with previous reports using other methods and telescopes, suggesting that the spin directions of spiral galaxies as observed from the Earth exhibit a dipole axis formed by their spin directions. Possible explanations can be related to the large-scale structure of the universe or to the internal structure of galaxies. The catalogs of annotated galaxies generated as part of this study are available.

1. Introduction

The distribution of spin directions of spiral galaxies has been a topic of discussion for several decades, with several studies showing conflicting results regarding the same data. In particular, several different studies have shown conflicting results when using galaxies from SDSS and specifically the set of SDSS galaxies with spectra. Namely, some studies suggest that the distribution of the spin direction of SDSS spiral galaxies with spectra is random, while other experiments showed statistically significant asymmetry in the same data. If the distribution of galaxy spin directions does not conform with the parity assumption, the parity violation can be exhibited through a cosmological-scale dipole axis formed by the large-scale distribution of galaxy spin directions.

Several previous studies showed a nonrandom distribution of the spin directions of spiral galaxies in SDSS and suggested that the distribution forms a large-scale dipole axis [14]. In fact, claims for nonrandom distribution of galaxy spin directions were reported nearly two decades before SDSS saw its first light [5]. The nonrandomness also showed a statistically significant number of galaxies spinning in opposite directions in opposite hemispheres [3, 4, 6]. That is, the SDSS galaxies can be separated into two hemispheres such that one hemisphere has a higher number of galaxies spinning clockwise, while the opposite hemisphere has a higher number of galaxies spinning counterclockwise, and the differences are statistically significant [3, 4, 6]. In particular, among all SDSS galaxies with spectra, there is a higher number of galaxies spinning counterclockwise compared to the number of galaxies spinning clockwise [1, 3, 4]. Other possibly related studies used a smaller number of galaxies to show alignment in the spin directions of SDSS galaxies that are too far from each other to interact gravitationally [79]. These observations were defined as “mysterious,” suggesting that the large-scale structure is linked through galaxy spin directions [8].

In addition to SDSS, other telescopes also showed parity violations in the distribution of the spin direction of spiral galaxies. These include Pan-STARRS [4], DECam [6], Hubble Space Telescope [10], the Dark Energy Survey [11], and the DESI Legacy Survey [12]. These telescopes cover both the northern and southern hemispheres, and the Hubble Space Telescope provides analysis that is not subjected to a possible effect of the atmosphere. On the other hand, other experiments showed no statistically significant difference between the number of clockwise and counterclockwise galaxies [13] and suggested that the distribution of spin directions in SDSS is random [1416].

However, it is difficult to fully prove the absence of nonrandom distribution. Experiments that showed randomness in the distribution of galaxy spin directions might not necessarily prove that the distribution of galaxy spin directions is indeed random but merely that the signal is not statistically significant. Reasons can include datasets that are not sufficiently large or biases in the data that their corrections require modifications to the data. These reasons are summarized in Section 2, and a more detailed discussion can be found in [11, 17, 18].

While the assumption that the universe is isotropic is part of the cosmological principle, several different probes have shown nonrandom distribution at cosmological scales [19]. These probes include the CMB radiation [2028], as well as other probes such as LX-T scaling [29], cosmic rays [30], short gamma ray bursts [31], Ia supernova [32, 33], dark energy [3437], quasars [3840], [41, 42], and galaxy shapes [43]. It has also been shown that the large-scale structure might have “handedness,” exhibited by asymmetry of the four-point correlation function such that each point is a galaxy [44, 45].

If these observations reflect the real large-scale structure of the universe, they shift from the cosmological principle and the standard cosmological models [19] and can be related to several alternative theories. These include the ellipsoidal universe [4649], rotating universe [5056], or black hole cosmology [5764].

As discussed in [65, 66], the observed anisotropy in the distribution of galaxy spin directions might also be driven by the internal structure of galaxies rather than the large-scale structure of the universe. In that case, the rotational velocity of the Milky Way relative to the rotational velocity of the observed galaxies would exhibit parity violation, forming an axis that is expected to peak at around the galactic pole [6568]. More information about the possible link between the anisotropy in galaxy spin directions and the internal structure of galaxies is provided in [66].

2. Summary of Previous Work on Asymmetry in the Spin Direction Distribution of SDSS Galaxies

Early analyses included a small number of galaxies, suggesting the possibility of a nonrandom distribution of the spin directions of spiral galaxies [5]. Analysis with a higher number of more than 6 galaxies from the southern hemisphere showed that the distribution is random [13]. As explained in [17], given the expected magnitude of the parity violation, the number of galaxies used in that study was too small to show a statistically significant parity violation. On the other hand, analysis of a larger number of galaxies from SDSS showed evidence of parity violation that forms a statistically significant dipole axis [1, 69]. In [1], galaxies annotated by five undergraduate students were used to show nonrandomness and a dipole axis in galaxy spin directions with a statistical significance of .

Another attempt to profile the large-scale distribution of galaxy spin directions was Galaxy Zoo 1, where SDSS galaxy images were annotated manually by anonymous volunteers through a web-based user interface [14]. The results showed that according to the manual annotation, galaxies that spin counterclockwise are far more prevalent in SDSS compared to galaxies that spin clockwise. That large difference of 15% was assumed to be the result of bias of the human perception or the user interface, rather than a reflection of the real distribution of spiral galaxies in the sky [14].

When the bias was noticed, a smaller set of galaxies was annotated again, but in that experiment, the galaxies were also annotated after mirroring the images. Annotating both the original images and the mirrored images ensured that the annotation bias of the original images was offset by the annotation of the mirrored image. That experiment showed that indeed the large difference was driven by a certain bias in the annotation. After mirroring the images, 6.032% of the galaxies were annotated as spinning counterclockwise, compared to 5.942% of the mirrored galaxy images that were annotated as counterclockwise. Similarly, 5.525% of the original galaxies images were annotated as spinning clockwise, compared to 5.646% of the mirrored galaxy images that were annotated as spinning clockwise. These numbers are specified in Table 2 in [14].

In both cases, the number of galaxies spinning counterclockwise was 1.5% or 2% higher than the number of galaxies spinning clockwise. That difference agrees in both direction and magnitude with the asymmetry reported in [4], which also used SDSS galaxies with spectra. Because just a small number of the galaxies were mirrored, the dataset contained just galaxies. The binomial statistical significance of the distribution was (0.13) when the clockwise galaxies were mirrored, and (0.21) when the counterclockwise galaxies were mirrored. The number of galaxies are shown in Table 1. These probabilities are not considered statistically significant, which can possibly result from the low number of galaxies, but the direction and magnitude of the distribution also do not conflict with the observed distribution of SDSS galaxies with spectra reported in [4].

Another study proposed that the nonrandom distribution of galaxy spin directions in SDSS is the result of “duplicate objects” in the data [16]. That study, however, does not refer to a specific paper that claimed for the presence of a dipole axis formed by the distribution of galaxy spin directions and also had duplicate objects in the data. Also, a simple analysis showed that the “clean” data used in [16] is in fact not random [17]. Code and data to reproduce the analysis are available at https://people.cs.ksu.edu/~lshamir/data/iye_et_al. Clearly, the data used in the analysis contain no “duplicate objects” and show that even after all duplicate objects are removed the distribution is not random. The statistical strength of a dipole axis in that specific dataset is and therefore agrees with previous experiments that showed a dipole axis exhibited by the large-scale distribution of galaxy spin directions. More information about experiments and analysis of that dataset is provided in [17].

Another analysis that examined the spin directions of galaxies with spectra in SDSS used the SpArcFiRe method to annotate a large number of SDSS galaxies [15]. That dataset included the original Galaxy Zoo 1 galaxies [70]. SpArcFiRe is a method that works best when applied to spiral galaxies, and therefore a first step of selecting spiral galaxies was applied before the galaxies were annotated by their spin direction using SpArcFiRe. The selection of spiral galaxies was done by two different methods. The first method was based on the manual annotation of the Galaxy Zoo volunteers, who annotated each galaxy as elliptical or spiral. After selecting the galaxies annotated as spiral and applying SpArcFiRe to identify their spin directions, the asymmetry between the number of clockwise and counterclockwise galaxies was statistically significant and ranged between 2 to 3 [15]. That led to the conclusion that the selection of spiral galaxies by Galaxy Zoo volunteers was biased in the sense that a galaxy that spins counterclockwise had a better chance of being labeled as spiral compared to a galaxy spinning clockwise. That was a new bias that was not noticed in the initial study of spin direction distribution in Galaxy Zoo [14].

To avoid the effect of a possible bias in the human perception, another analysis was performed by selecting the spiral galaxies by applying a machine learning classifier. The two-way classifier was trained with elliptical and spiral galaxies, and the class of spiral galaxies contained an equal number of clockwise and counterclockwise galaxies. That is, the number of galaxies spinning clockwise in the training set was exactly the same as the number of galaxies spinning counterclockwise. The equal number of spin directions in the class of spiral galaxies ensured that no certain spin direction would have a preference over the other spin direction in the selection of spiral galaxies. Clearly, that is a sound experimental design that ensured that no bias can result from the arbitrary selection of a small set of galaxies in the training set.

But in addition to that careful design of the machine learning system, the machine learning algorithm was applied after identifying and removing the features that can identify to a certain level the spin direction of the galaxy. As stated in the paper, “we choose our attributes to include some photometric attributes that were disjoint with those that Shamir (2016) found to be correlated with chirality, in addition to several SPARCFIRE outputs with all chirality information removed” [15]. While that implementation decision led to a random distribution of the annotated spin directions, it is also not clear whether the selection of the spiral galaxies was biased due to the removal of features that correlate with the spin direction, as it is expected that the removal of these features would lead to an even distribution of the annotations [17]. That is, the differences between galaxies with opposite spin directions could also be the source of an astronomical reason, rather than a bias in the algorithm. As shown in [17], removing features that correlate with the spin direction can reduce the signal of the asymmetry.

Here, we perform a similar experiment and use the SpArcFiRe (SPiral ARC FInder and REporter) method to annotate the same set of Galaxy Zoo 1 galaxies, but by selecting spiral galaxies in three different manners: by manual analysis of Galaxy Zoo volunteers, by computer analysis, and with no selection of spiral galaxies at all. The selection of spiral galaxies is performed with no a-priori assumptions regarding their expected distribution. The data and code are available publicly to allow replication of the results. The possible scientific implications and the agreement of the observation with several other recent studies that make use of other probes [19, 27, 41, 42, 44, 45] are discussed in Section 5. Data for the experiments performed in this paper are made publicly available to allow the reproduction of the results and further related experiments.

3. Data

The galaxies used in this study are SDSS galaxies used in Galaxy Zoo 1 [70]. Images of 666,416 galaxies were downloaded in the JPEG file format using the SDSS cutout service and were converted to PNG for applying the SpArcFiRe (Scalable Automated Detection of Spiral Galaxy Arm) method [15, 71]. The source code of SpArcFiRe is publicly available (https://github.com/waynebhayes/SpArcFiRe). Annotation of a single 128 galaxy image requires about 30 seconds of processing time when using a single Intel Core-i7 processor, and therefore the analysis was carried out by using 100 cores to reduce the response time of the analysis. Figure 1 shows the distribution of the data into RA bins. As the figure shows, the distribution of the galaxies in the sky is not uniform. That makes the analysis somewhat limited as the footprint of galaxies with spectra is practically smaller than what SDSS can provide, but the dataset is used for the sake of consistency and comparison with the previous studies that also used Galaxy Zoo 1 galaxies or SDSS galaxies with spectra.

SpArcFiRe provides a detailed list of descriptors for each galaxy [71]. The method identifies arm segments in the image and can group the pixels that are part of each segment. That allows us to fit the pixels in the segment to a logarithmic spiral arc, which allows to extract different descriptors. For the spin direction, SpArcFiRe extracts several indicators, which are the longest arc, the majority of the arcs, the length weighted, and the pitch angle sum. The way the galaxy images are analyzed are explained thoroughly in [71].

For the analysis, we used galaxies for which all four indicators provided by SpArcFiRe and identify the spin direction of the galaxy showed the same spin direction. That provided a set of 273,055 galaxies with an annotated spin direction. The SpArcFiRe method was then applied again after mirroring the galaxy images, providing a set of 273,346 galaxies. After removing objects that were within 0.01 degrees or less to each other, the datasets were reduced to 271,063 and 271,308 galaxies, respectively. The slight differences between the results after mirroring the images is mentioned in [15] and will be discussed later in this paper.

To test the consistency of SpArcFiRe, we examined manually 322 galaxies and tested whether the annotation made by SpArcFiRe is in agreement with manual annotation. For that purpose, we identified manually 173 galaxies that by visual inspection seem to spin clockwise, and 149 galaxies that spin counterclockwise. From the clockwise galaxies, 122 galaxies were identified correctly as galaxies that spin clockwise and 26 (15.02%) as galaxies that spin counterclockwise. The rest of the galaxies were not annotated with a spin direction. Among the galaxies that were visually identified as galaxies spinning counterclockwise, 109 were also annotated by SpArcFiRe as counterclockwise and 24 (16.1%) as clockwise. The impact of the error will be discussed and analyzed in Section 4.

Since SpArcFiRe is designed to analyze spiral galaxies, we performed a selection of just spiral galaxies in three manners: the first was selecting spiral galaxies that were annotated as spirals by the manual inspection of the Galaxy Zoo 1 volunteers [70]. In Galaxy Zoo, each galaxy was annotated by several different annotators, who very often disagree with each other. To determine the annotation of a galaxy, a threshold is determined for the agreement between the different annotations. When the threshold is higher, the annotations are expected to be more accurate, but that also reduces the size of the dataset since fewer galaxies meet the higher threshold [70]. Following the study of [15], several experiments were made by selecting several different “debiased” thresholds.

Since the human selection of spiral galaxies can be biased, another method of selecting spiral galaxies was based on computer analysis. That was carried out by using the Ganalyzer method [72]. As a model-driven method, it is not based on any kind of machine learning, and therefore it is not subjected to possible biases in the training data. The simple “mechanical” nature of the Ganalyzer allows it to be fully symmetric [6, 11].

In addition to the manual selection and computer selection of spiral galaxies, another experiment was performed by using all galaxies that SpArcFiRe determined their spin direction without a first step of selection of spiral galaxies. While the annotation of galaxies that are elliptical can add noise to the system, it might be expected that the error in the annotation will be distributed equally between clockwise and counterclockwise galaxies. SpArcFiRe also does not force a certain spin direction and can also annotate galaxies as not spinning in any identifiable direction. The list of galaxies and their annotations as assigned by SpArcFiRe is available at https://people.cs.ksu.edu/~lshamir/data/sparcfire/.

4. Results

The first experiment was a simple test of the distribution of spin directions in the entire dataset and without any selection of spiral galaxies before applying SpArcFiRe. Another experiment was performed by selecting spiral galaxies by different thresholds of agreement of the Galaxy Zoo annotations and then applying SpArcFiRe to annotate their spin direction. The selection of spiral galaxies was carried out by using different levels of agreement as thresholds of the Galaxy Zoo annotations. The agreement levels were between 40% and 95%. For instance, using 95% as the agreement threshold means that only galaxies annotated as spiral by at least 95% of the human annotators were selected. That also includes the “clean” Galaxy Zoo standard of 80%, and the “superclean” [70] standard of 95%. Another method of selecting the spiral galaxies was by using the Ganalyzermodel-driven algorithm for classifying between elliptical and spiral galaxies as described in Section 3. The distributions of the spin directions in the entire dataset are shown in Table 1. Table 2 shows comparisons to previous studies using SDSS galaxies with spectra.

As the table shows, all experiments show a higher number of galaxies spinning counterclockwise than clockwise. When mirroring the galaxy images, SpArcFiRe shows a higher number of clockwise galaxies, which are in fact galaxies spinning counterclockwise in the original and nonmirrored galaxy images. When mirroring the images, the results are not completely inverse to the results when using the original images. That is not surprising, since it has been reported that SpArcFiRe has a certain degree of asymmetry in the manner it annotates galaxy images. As explained in Appendix A of [15], SpArcFiRe is not fully symmetric, and therefore, the galaxies were annotated again after mirroring the images. The complexity of SpArcFiRe made it difficult to identify the reasons for the differences between the original and mirrored images [15].

The results are also compared to previous literature of experiments that used SDSS galaxies with spectra and were based on different annotation methods. These experiments use symmetric automatic annotation [4] or manual annotation such that the galaxy images were mirrored [1, 14]. All of these experiments also show a higher number of galaxies spinning counterclockwise. This agreement does not necessarily prove that the observed asymmetry is not driven by a combination of bias and statistical fluctuations, but they also do not conflict with each other.

The asymmetry shown when the spiral galaxies are selected manually by Galaxy Zoo volunteers is in agreement with the results shown in [15]. For the automatic selection of the galaxies, the results disagree with [15]. A possible reason for that disagreement is that the automatic selection of spiral galaxies performed in [15] were carried out after applying a machine learning algorithm designed by specifically removing the attributes that correlate with the galaxy spin direction. The spiral selection used for the results shown in Table 1 is carried out by a simple symmetric model-driven algorithm and therefore without selection or removal of specific attributes.

4.1. Identification of a Possible Dipole Axis Alignment

Previous work using different telescopes showed that the spin directions of spiral galaxies form a statistically significant large-scale axis [12]. That was carried out by fitting the spin directions to the cosine of the angle between the galaxies and every possible integer combination in the sky [11], as shown in the following equation:where is 1 if galaxy i spins clockwise or −1 if galaxy i spins counterclockwise, and is the angular distance between galaxy i and the location of the possible dipole axis .

The statistical significance of the possible dipole axis centered at is determined by assigning the galaxies with random spin directions, and computing the using equation (1). That is carried out 1000 times, and the mean and of the are determined. The difference between the mean when is computed by random spin directions and the computed when using the observed spin directions determines the statistical strength of the axis. Repeating that process from each possible integer combination shows the statistical signal of a dipole axis at all parts of the sky. Figure 2 shows several examples of applying the analysis to the datasets in Table 1 that had the lowest values.

As the figure shows, the profiles exhibited by the different methods of selecting spiral galaxies are similar to each other. The automatic selection shows results in agreement with the manual selection of spiral galaxies, but the statistical significance is different. Table 3 shows the locations of the most likely dipole axis when using the different methods of spiral galaxy selection. The locations of the dipole axis are somewhat different across different datasets, but in all cases still within 1 difference compared to each other.

An interesting observation is that the statistical significance of the axis is stronger when the galaxy images were mirrored. That agrees with the simple statistical significance shown in Table 1. That can be the result of a certain asymmetric behavior of the annotation algorithm in the case that the distribution of spin directions in the sky is random. If the distribution of galaxy spin directions is not random, the difference can be explained by a certain bias of the algorithm. That is discussed in detail later in this section.

The axes can also be compared to previous experiments with 77,840 SDSS galaxies [73]. That dataset contains galaxies that do not necessarily have spectra, but these galaxies are relatively large (Petrosian radius 5.5′) and bright (i magnitude 18). Figure 3 shows the results of previous experiments [73] when using SDSS galaxies that do not necessarily have spectra, as well as another dataset of 13,440 SDSS galaxies with spectra originally used in [74]. In the experiment of [73], the galaxies were annotated automatically by using a model-driven symmetric annotation method. The galaxies in [74] were annotated with manual inspection.

Other experiments that can be used for comparison include other telescopes. These include an experiment with 33,028 galaxies imaged by Pan-STARRS [4] and 807,898 galaxies imaged by DECam [6]. Figure 3 shows that in these experiments the results are similar to the results shown in Figure 2.

The experiment of [1] showed a dipole axis that peaks at . That location is also within 1 statistical error to the most likely axes shown here, as specified in Table 3. It is also close to the location of the dipole axis observed with Pan-STARRS [4] at , and DECam [6] at .

When using the automatically selected spiral galaxies, the statistical significance of the dipole axis is maximal at around the declination of . Figure 4 displays the statistical significance at different RAs when the declination is set to . As the figure shows, the analysis with the mirrored galaxy images and the original galaxy images show similar profiles, but the statistical signal is significantly stronger when the mirrored galaxy images were used. That shows that the asymmetry of SpArcFiRe as reported in [15] can affect the statistical signal. That is also shown in Table 1.

Figure 5 shows a simple analysis of the simple asymmetry in different RA ranges when the declination range is to . The figure shows a higher number of galaxies spinning counterclockwise in the RA range of , and the asymmetry peaks at . In the other hemisphere there are more galaxies spinning clockwise, but due to the small total number of galaxies in that hemisphere, it is difficult to profile that asymmetry.

4.2. Analysis of Possible Algorithm Bias

One of the explanations for the results shown here is a bias in the SpArcFiRe annotation algorithm, and it has been reported that such subtle asymmetry exists [15]. For instance, if SpArcFiRe tends to prefer to annotate galaxies as spinning counterclockwise, such consistent bias can become statistically significant. It has been shown that such bias can also lead to a dipole axis. It has been shown that even a subtle but consistent bias in the annotation algorithm can lead to a dipole axis with an extremely high statistical signal, that peaks exactly at the celestial pole [73].

One of the experiments done to study the nature of the bias is repeating the experiments after mirroring the galaxy images. Based on the results shown here, if the algorithm is systematically biased to prefer a certain spin direction, it would have been a preference to galaxies that spin counterclockwise. That is, the results shown here can be explained by a bias in the algorithm, or by a bias in the selection of spiral galaxies.

Assuming an equal number of clockwise and counterclockwise galaxies in the sky and a bias b in the SpArcFiRe software, the number of clockwise galaxies will be lower than the number of galaxies annotated as counterclockwise, which is . The asymmetry A between the number of clockwise and counterclockwise galaxies can be defined as , where cw is the number of galaxies that spin clockwise ccw is the number of galaxies that spin counterclockwise, and is the bias such that . When mirroring the galaxy images, A can be defined as . Assuming no asymmetry in the sky, cw is equal to ccw, and therefore is equal to , which is equal to . That, however, is not what is observed when mirroring the galaxy images. Mirroring the galaxy images provides different results and also flips the sign of the asymmetry as shown consistently in Table 1.

On the other hand, assuming that the real sky distribution of spin directions of spiral galaxies is not symmetric, . The asymmetry ratio between the number of clockwise and counterclockwise galaxies can be defined as . The asymmetry is positive when the number of galaxies is higher than the number if galaxies and negative if the number of galaxies is larger. The asymmetry of the original nonmirrored images can be defined as

A can be greater or smaller than 1, depends on the values of and . After mirroring the galaxies, the asymmetry of the mirrored galaxies is

If , a is greater than 1. if is positive will necessarily be greater than 1, which is the observation shown in Table 3 for the nonmirrored images. can be either greater or smaller than 1. If , will be greater than 1, otherwise will be smaller than 1. Since the observed is in all cases smaller than 1, the asymmetry of the algorithm is smaller than the asymmetry of the spin directions of the galaxies in the dataset. While this simple analysis is expected, it shows that if the drops from a number greater than 1 when using the original images to a number smaller than 1 when using the mirrored images, that change is not driven by the asymmetry of the algorithm.

Another possible reason for the observed results can be a bias in the selection of spiral galaxies. A selection of a spiral galaxy is not necessarily a formally defined task, and the separation between spiral and elliptical galaxies have many in-between cases. If more counterclockwise galaxies are selected as spiral galaxies compared to clockwise galaxies, that will result in a dataset that has a higher number of counterclockwise galaxies.

In previous work, the bias was addressed by comparing the distribution in two opposite hemispheres [4, 6, 11, 12, 73]. In these experiments, the galaxies were separated into two hemispheres such that one hemisphere showed a higher number of galaxies spinning clockwise, and the opposite hemisphere showed a higher number of galaxies spinning counterclockwise. Table 4 shows the distribution of 807,898 galaxies imaged by DECam, as thoroughly described in [6].

If the selection of spiral galaxies was biased, it is expected that the selection would have been consistent in all parts of the sky. That is, if more counterclockwise galaxies are selected as spiral galaxies, that should lead to a higher number of counterclockwise galaxies in all parts of the sky and is not expected to flip in opposite hemispheres of the sky. In the SDSS galaxies with spectra used in Galaxy Zoo 1, the vast majority of the galaxies are concentrated in one hemisphere, with very few galaxies in the opposite hemisphere as shown in Figure 1. The absence of galaxies in two opposite hemispheres makes it difficult to apply the same analysis as done in [4, 6, 11, 12, 73], and the example shown in Table 4.

An attempt to follow that analysis with the SDSS galaxies used in this study is shown in Tables 5 and 6. Table 5 shows the distribution of galaxy spin directions of the SDSS galaxies annotated by SpArcFiRe, such that the RA of the galaxies falls within . Table 6 shows the same analysis for the opposite hemisphere . The results show that in the more populated part of the sky the higher number of counterclockwise galaxies is consistent and statistically significant. In the opposite part of the sky, the number of clockwise galaxies is higher, but the statistical significance is low. That can be the result of the far lower number of galaxies in that part of the sky, not allowing a strong statistical signal.

Table 6 shows an experiment with a small number of galaxies and does not allow to determine a statistically significant asymmetry. The table shows certain evidence of a higher number of galaxies that spin clockwise in that hemisphere, but the number of galaxies is not sufficient to determine a statistically significant opposite asymmetries in spin directions.

Because the distribution of the galaxies in the sky makes the separation of the dataset into two hemispheres impractical, two other methods of selecting spiral galaxies were applied in addition to the manual selection of spiral galaxies by Galaxy Zoo volunteers. The first was to apply SpArcFiRe with no selection of spiral galaxies. When SpArcFiRe cannot identify the spin direction of the galaxy, the galaxy is not used. The disadvantage of that method is that without a first step of selecting spiral galaxies, SpArcFiRe might provide less accurate annotations. The second method that was for selecting spiral galaxies was by using the Ganalyzer [72] algorithm, which is a simple model-driven method that can identify spiral galaxies. Ganalyzer is far less sophisticated than SpArcFiRe and provides less information about the galaxy. On the other hand, its simple “mechanical” nature allows it to be fully symmetric. The symmetric nature of the Ganalyzer was tested in previous studies [4, 6, 11, 12, 73]. When selecting spiral galaxies with Ganalyzer, the number of counterclockwise galaxies is higher when analyzing the original images, and the number of clockwise galaxies is higher when analyzing the mirrored images.

The analysis is challenged by the fact that SpArcFiRe is not fully symmetric. The observation that the sign of the asymmetry flips when the images are mirrored indicates that the asymmetry of SpArcFiRe is smaller than the asymmetry between the number of clockwise and counterclockwise galaxies in the set of SDSS galaxies with spectra. While the results shown in this paper might not be sufficient to prove the nonrandom distribution of galaxy spin directions, they show that the distribution of the spin directions of SDSS galaxies with spectra are in agreement with nonrandom distribution and do not conflict with previous results.

5. Conclusion

The availability of large digital sky surveys enabled by high-throughput robotic telescopes has enabled the studying of questions that were not addressable in the preinformation era. The distribution of spin directions of spiral galaxies is a question that was studied by using several sky surveys and several analysis methods. The set of SDSS galaxies with spectra is one of the datasets that was studied several times in the past, showing different conclusions. One of these studies used the SpArcFiRe method to annotate SDSS galaxies with spectra used in Galaxy Zoo 1 [15].

The experiment performed here used the same SpArcFiRe method of annotation that was used in [15]. While SpArcFiRe was used to annotate the spin directions of the spiral galaxies, that annotation was applied after the first step of selecting the spiral galaxies and separating them from the rest of the galaxies. When the spiral galaxies are selected manually by Galaxy Zoo volunteers, the number of clockwise and counterclockwise galaxies is not symmetric, as was also reported in [15]. But when selecting the spiral galaxies automatically, or when not selecting the spiral galaxies at all, the number of galaxies spinning clockwise is also significantly different from the number of galaxies spinning counterclockwise.

The study of [15] also performed an experiment by selecting the spiral galaxies automatically and used a machine learning algorithm for that task. The algorithm was trained with elliptical and spiral galaxies, such that the class of spiral galaxies contained an equal number of clockwise and counterclockwise galaxies. Such construction of the training set can avoid a situation in which more galaxies of a certain spin direction are classified as a spiral. From a machine learning perspective, that is a careful design that aims at reducing the possible biases introduced by machine learning.

But in addition to selecting spiral galaxies, the machine learning algorithm was applied after manually removing all attributes that correlated with the spin direction. As shown in [17], when using machine learning to select spiral galaxies, removing specifically the features that correlate with spin direction leads to a random distribution of the spin directions. In this paper, the experiments were performed by selecting the spiral galaxies without removing specific features, and in fact without using machine learning. The results show that the spin directions of spiral galaxies as seen from Earth form a dipole axis with a statistical significance of between 2.33 to 3.97. Some of the experiments were also done by applying SpArcFiRe without a first step of selection of spiral galaxies. In all cases the results are consistent and also showed statistically significant dipole axis formed by the spin directions of the galaxies.

Due to the limited footprint size, the results of the SDSS data annotated by SpArcFiRe as shown here cannot provide the comprehensive analysis of a very large footprint such as the DESI legacy survey [12]. But the results shown here are in agreement with previous reports that use different telescopes and different analysis methods. These experiments include SDSS galaxies [4] but are also consistent across other telescopes such as Pan-STARRS [4], DECam [6], Hubble Space Telescope [10], the Dark Energy Survey [11], and the DESI Legacy Survey [12].

Because SpArcFiRe has a small but consistent asymmetry, using SpArcFiRe for this task is more difficult compared to fully symmetric methods, as discussed thoroughly in Section 4.2. The simple binomial distribution shows a maximum probability of 1.5 to occur by chance. When mirroring the images, the asymmetry is inverse, and the statistical signal is still significant at 0.01. That shows statistically significant parity violation in galaxy spin directions when using SpArcFiRe to annotate SDSS galaxies.

Analysis of a dipole axis formed by the distribution of galaxy spin directions shows different levels of statistical significance depends on the selection of the galaxies and size of the dataset, and the experiments agree with the contention that a dipole axis exists with statistical significance as high as . None of the experiments showed disagreement with the presence of a dipole axis. An interesting observation is that the most likely position of the dipole axis is at close proximity to the galactic pole, which might indicate that the dipole axis is not related to the large-scale structure of the local universe but to the internal structure of galaxies that is discussed in [66, 67, 75].

Data Availability

Data used in this study were made available publicly. The list of SDSS galaxies and their spin direction annotations used in this study is available at https://people.cs.ksu.edu/~lshamir/data/sparcfire/. Annotations of the mirrored galaxies are also available at the same URL. In addition, a list of galaxies and their spin directions used for creating panel (a) in Figure 3 is available at https://people.cs.ksu.edu/~lshamir/data/assymdup/. A list of the smaller set of galaxies used to create Figure 3 is available at https://people.cs.ksu.edu/~lshamir/data/assym/. The galaxies used in [17] and reproduction of that experiment are available at https://people.cs.ksu.edu/~lshamir/data/iye_et_al.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank the knowledgeable reviewer for the insightful comments. The research was funded in part by NSF under Grant no. AST-1903823.