Monochromatic globular clusters as a critical test of formation models for the dark matter deficient galaxies NGC1052-DF2 and NGC1052-DF4

It was recently proposed that the dark matter-deficient ultra-diffuse galaxies DF2 and DF4 in the NGC1052 group could be the products of a"bullet dwarf"collision between two gas-rich progenitor galaxies. In this model DF2 and DF4 formed at the same time in the immediate aftermath of the collision, and a strong prediction is that their globular clusters should have nearly identical stellar populations. Here we test this prediction by measuring accurate F606W-F814W colors from deep HST/ACS imaging. We find that the clusters are extremely homogeneous. The mean color difference between the globular clusters in DF2 and DF4 is $-0.003\pm 0.005$ mag and the observed scatter for the combined sample of 18 clusters with $M_V<-8.6$ in both galaxies is $0.015 \pm 0.002$ mag. After accounting for observational uncertainties and stochastic cluster-to-cluster variation in the number of red giants, the remaining scatter is $0.008^{+0.005}_{-0.006}$ mag. Both the color difference and the scatter are an order of magnitude smaller than in other dwarf galaxies, and we infer that the bullet scenario passes an important test that could have falsified it. No other formation models have predicted this extreme uniformity of the globular clusters in the two galaxies. We find that the galaxies themselves are slightly redder than the clusters, consistent with a previously-measured metallicity difference. Numerical simulations have shown that such differences are expected in the bullet scenario, as the galaxies continued to self-enrich after the formation of the globular clusters.


INTRODUCTION
NGC 1052-DF2 (or DF2) and NGC 1052-DF4 (DF4) share two unusual properties that set them apart from all other known galaxies. First, their globular clusters are, on average, a factor of ≈ 4 brighter and a factor of ≈ 2 larger than canonical values (van Dokkum et al. 2018aMa et al. 2020;Shen et al. 2021a). Furthermore, their velocity dispersions are consistent with their stellar mass alone and much smaller than expected from a normal dark matter halo (van Dokkum et al. 2018b;Wasserman et al. 2018;Danieli et al. 2019;Emsellem et al. 2019).
Initially the main question was whether the dynamical masses and distances were measured correctly (see, e.g., Martin et al. 2018;Laporte et al. 2019;Trujillo et al. 2019), but as the anomalous properties of the galaxies were gradu- * NASA Hubble Fellow ally confirmed (and corroborated with independent evidence; see Dutta Chowdhury et al. 2019; Keim et al. 2021) the focus shifted to the question how they were formed. Proposals include assembly out of tidally-removed gas , stripping of dark matter by close encounters with NGC 1052 (Ogiya 2018;Carleton et al. 2019;Nusser 2020;Ogiya et al. 2022;Moreno et al. 2022) or NGC 1035 (Montes et al. 2020), jet-or outflow-induced star formation, like Minkowski's object (van Breugel et al. 1985;Natarajan et al. 1998), and extreme feedback in low mass halos . Silk (2019) suggested that DF2 could be the result of a "mini bullet-cluster" event, where the dark matter and the baryons became separated in a nearly head-on encounter between two gas-rich progenitor galaxies. Such a collision produces two dark matter-dominated remnants with a globular cluster-rich, dark matter-free object in between them that formed from the shocked gas. This scenario was further ex-plored with simulations in Shin et al. (2020) and Lee et al. (2021), who showed that collisions between an unbound object and a satellite could explain many of the observed properties of the galaxies and occur with some regularity in cosmological simulations. The main issue with this model -as with many of the alternative explanations listed above -is the presence of two dark matter-deficient galaxies, seemingly requiring lightning to strike twice in the same group.
Recently we suggested that a single bullet dwarf collision may have produced both DF2 and DF4 . This joint formation is consistent with the striking similarities between the two galaxies, their radial velocities and line-of-sight distances, the emergence of multiple clumps in at least some bullet collision simulations (Shin et al. 2020), and with the discovery that DF2 and DF4 are part of a remarkable trail of ≈ 10 galaxies in the group.
As noted in van Dokkum et al. (2022) this formation model for DF2 and DF4 is falsifiable, as it makes a very specific prediction for their globular clusters. Both galaxies formed out of the gas that was left behind by the progenitor galaxies, at the same time. This gas mixed efficiently during the collision and should have a uniform metallicity. While overall star formation likely lasted for 500 Myr (Shin et al. 2020;Lee et al. 2021), the globular clusters formed almost instantaneously before further enrichment could substantially change the abundance of the gas (Lee et al. 2021). As a result, the stellar populations of all the globular clusters, in both galaxies, should be extremely similar. 1 This is not what is typically observed in dwarf galaxies. In a comprehensive study of the colors of globular clusters in Virgo dwarf galaxies, Peng et al. (2006) found that there is substantial cluster-to-cluster and galaxy-to-galaxy scatter.
Previous measurements of the colors of the globular clusters in DF2 and DF4 do in fact suggest differences between them, in potential conflict with the bullet model. Tables 1 and 2 in Shen et al. (2021a) imply a mean V 606 − I 814 globular cluster color of 0.37 ± 0.01 AB mag in DF2 and 0.41 ± 0.01 in DF4. 2 This difference seems small but it corresponds to ≈ 0.3 dex in age or ≈ 0.4 dex in metallicity. Furthermore, the cluster-to-cluster scatter is non-zero, with σ = 0.04 ± 0.01 mag for both galaxies. There is also a hint that the galaxies themselves might have different V 606 − I 814 colors: Cohen et al. (2018) finds 0.40 for the diffuse light in DF2 and 0.32 for DF4, albeit with an uncertainty of 0.1 for both. 1 We note here that simulations have so far only focused on comparing the properties of clusters across a single resulting galaxy fragment, and not yet across several galaxies. 2 Throughout, V 606 − I 814 is used to denote the F606W − F814W color in the AB system. The conversion to Johnson-Cousins V − I in the Vega system is V − I = 1.312(V 606 − I 814 ) + 0.364 (Sirianni et al. 2005).
These measurements were performed using standard techniques (such as Source Extractor; Bertin & Arnouts 1996) and are largely based on single-orbit HST/ACS images that were reduced with the default STScI flat fields. Here we remeasure the colors of the clusters and the diffuse light in both DF2 and DF4 using custom techniques and well-calibrated, much deeper data, as a stringent test of the bullet dwarf model. Where needed we use a distance of D = 21.7 Mpc for DF2 and D = 20.0 Mpc for DF4 (Danieli et al. 2020;Shen et al. 2021b, Z. Shen et al., in preparation).

DATA
We make use of deep HST/ACS data that were obtained in programs GO-15695 and GO-15851. The aim of these programs was to measure distances to DF4 (GO-15695) and DF2 (GO-15851) from the tip of the red giant branch (TRGB). For DF4 the exposure times were 7 and 3 orbits in I 814 and V 606 respectively. In Danieli et al. (2020) these were combined with the 1 + 1 orbits that had been obtained in GO-14644 (see Cohen et al. 2018), and the location of the TRGB was measured from these 8+4 orbit depth data. The DF2 observations of GO-15851 were deeper, at 19 orbits in I 814 and 19 orbits in V 606 . Shen et al. (2021b) used these data to measure the TRGB distance of DF2, again after adding the GO-14644 data for a total exposure of 20 + 20 orbits.
In this study we redrizzle the ACS data, with several changes. First, we discard the 1 + 1 orbits that were obtained for DF2 and DF4 in GO-14644. The orientation of these observations was very different from the more recent data, which means the point spread function (PSF) and chargetransfer efficiency corrections are different, and there is no need to maximize depth: the fluxes of the globular clusters are typically ∼ 30 e − s −1 , which means that Poisson errors are 0.5 % even in a single orbit. More importantly, we apply a flat-fielding correction to the flc files prior to drizzling. The ACS flat fields were last updated in 2006 and they no longer provide optimal corrections. As described in instrument science report 3 ACS 2017-09, ultra-deep stacks of Frontier Fields images show that there are systematic flat-fielding residuals at the level of 1 %. This effect is also described in ISR ACS 2020-08. The residuals correlate with the local thickness of the CCD, and as the correlation between quantum efficiency and thickness reverses at ≈ 700 nm, the residuals in V 606 and I 814 are spatially anti-correlated. As a result, flat-fielding errors in the V 606 − I 814 color reach 2 %, even though they are only half that in each filter individually.
The ACS team at STScI provided us with preliminary correction flats and we applied these to the flc files. The The images span 80 × 80 . Bottom panels: Photometry procedure for two of the globular clusters, indicated with yellow boxes above. From left to right we show the original I814 image (with a size of 7 × 7 ), the final object mask, the PSF-convolved King (1962) model fit with best-fitting background plane, and the cleaned image. The cleaned image is the original image with the masked areas filled in by the King model. Colors and total magnitudes are measured from aperture photometry on the cleaned V606 and I814 images.
flc files were aligned with each other using tweakreg. The tweakreg algorithm is sensitive to cosmic rays; we addressed this by running the code on versions of the data where cosmic rays were removed (with L.A.Cosmic van Dokkum 2001). I 814 images of DF2 and DF4 are shown in Fig. 1. Visually comparing the default and flat-fieldcorrected images, there is a clear improvement in the flat-ness of the sky background, particularly for DF2. From the remaining variation in the background we estimate that residual flat fielding errors are 0.5 ± 0.2 %. Assuming that the errors are no longer (anti-)correlated, the uncertainty in the color due to spatially-varying flat-fielding uncertainties is then 0.7 ± 0.3 % in our images.

Sample
At present 24 globular clusters have been spectroscopically-confirmed, 12 in DF2 and 12 in DF4. These are listed in Tables 1 and 2  Three of the confirmed DF4 clusters are fainter than M V = −8. As we show later the errors increase sharply at fainter magnitudes, and our quantitative analysis of the V 606 − I 814 scatter focuses on the 11 + 7 = 18 objects that have M V < −8.6. The faint clusters are used to investigate trends with magnitude; to this end we use an augmented sample in § 4 that includes five faint objects without a spectrum. These candidate clusters satisfy the Shen et al. (2021a) color and size criteria and are located within 45 of the center of DF2 or DF4. The sample is listed in Table 1.

Cleaned Globular Cluster Images
Even with flat fielding errors largely eliminated, it is not straightforward to obtain photometry with the desired accuracy. An investigation into the effects of changes in aperture sizes and the background subtraction shows seemingly random variation in the colors at the 1% -2% level for individual clusters. These variations can be traced to the cumulative effects of contaminating objects, mostly red giants in the main bodies of the galaxies, on both the flux measurements and the estimation of the background.
In shallow data this is an unavoidable source of noise but owing to the depth of the I 814 images we can mitigate these effects. We use the following procedure for each cluster. We begin by creating a mask for the brightest contaminating objects in a 7 × 7 region centered on the cluster. All pixels with a flux in excess of 0.03 e − s −1 pix −1 are masked, excluding the central 1. 0 × 1. 0 to avoid masking the cluster itself. The purpose of this initial mask is to enable an initial fit to the globular cluster. A PSF-convolved 2D modified King profile with α = 2 (Elson 1999;Peng et al. 2002) is fit to the image, using galfit (Peng et al. 2002). The bright pixel mask is used, the background is modeled as a constant, and for the PSF we choose a non-saturated star in the image.
With a model for the cluster in hand an improved mask can be created by identifying pixels that deviate significantly from the model. Specifically, we create a residual image R = (I − M)/M 0.02 , with I the image, M the model, and M 0.02 the model with all pixels ≤ 0.02 e − s −1 pix −1 set to 0.02. After median filtering R by a 3 × 3 pixel filter, all pixels > 0.15 are flagged and added to the initial mask. This process masks objects whose median flux in a 3 × 3 pixel aperture deviates by more than 15 % from the model, as well as objects away from the central regions whose flux exceeds a fixed threshold of 0.003 e − s −1 (with the precise threshold depending on the background level). The threshold of 0.02 e − s −1 pix −1 was chosen to ensure a smooth transition between these regimes. The chosen values are a compromise between masking as many contaminants as possible and leaving sufficient data for reliable flux and background measurements. The masks for two representative clusters, DF2-71 and DF4-4045, are shown in Fig. 1.
The fit to the globular cluster is repeated with this new mask, now modeling the background as a plane with gradients in x and y to properly account for the diffuse light of DF2 and DF4. The models for DF2-71 and DF4-4045 are shown in the third column of Fig. 1. Finally, a cleaned image is created by replacing all masked pixels in the original image by the corresponding pixels in the model (fourth column of Fig. 1).

Aperture Photometry
Fluxes are measured directly from the cleaned images using simple aperture photometry. Colors are measured in 0. 5 diameter apertures and total V 606 magnitudes in 1. 5 diameter apertures. The background is determined from an annulus with an inner diameter of 1. 5 and an outer diameter of 3. 0. As noted above the results depend on the precise choice of these parameters when applied to the original images, but we find that they are insensitive to them when applied to the cleaned images.
Errors on the fluxes were determined empirically by repeating the aperture photometry in a grid of 6 × 6 blank positions within the 7 × 7 cleaned images. This procedure ensures that the local environment of each globular cluster is taken into account. As can be seen in the mask images of Fig. 1, the central region (where the globular cluster is) is typically not masked as faint stars do not exceed the thresholds there. This leads to a bias that can be quantified and corrected for in the grid photometry. At each grid position we replaced the pixels within ±0. 25 of that position with those from the original image. The grid photometry now closely resembles the globular cluster photometry. The mean flux from the 36 grid positions is subtracted from the globular cluster measurements and the scatter among them is taken as the uncertainty. Both values were determined with the biweight estimator (Beers et al. 1990) as it is insensitive to outliers. Colors are corrected for Galactic reddening. Solid symbols are spectroscopically-confirmed and light-colored symbols have uncertainties > 0.015 mag. The parameterized distribution for globular clusters in low luminosity Virgo galaxies is shown in grey (see text). Bottom left panel: Mean de-reddened color of globular clusters with uncertainties < 0.015, compared to Virgo galaxies from Peng et al. (2006). There is no significant systematic color difference between DF2 and DF4. Bottom right panel: Observed and intrinsic scatter in the colors, again compared to Virgo galaxies. The clusters in DF2 and DF4 are extremely homogeneous. The orange line is the minimum possible scatter, arising from stochastic fluctuations in the number of red giants.
Aperture corrections are taken from Bohlin (2016)

OBSERVED COLOR VARIATION
The distribution of the DF2 and DF4 globular clusters in the color-magnitude plane is shown in the top panel of Fig. 2. Errorbars are a combination of the measurement uncertainty and the 0.007 mag flat fielding uncertainty. For reference, the greyscale background shows the parameterized distribution of globular clusters in low luminosity galaxies in the Virgo cluster. This is a combination of the two-component decom-  Shen et al. (2021a). Uncertainties on the colors are ±1σ, with ± meas the measurement uncertainty, ± phot the measurement uncertainty combined with the flat fielding uncertainty, and ± stoch the stochastic uncertainty due to variations in the number of red giants.
position of the color distribution for the faintest galaxies 4 in Peng et al. (2006) and the Gaussian fit to the globular cluster luminosity function of the faintest galaxies 5 in Jordán et al. (2007).  Table 3 of Jordán et al. (2007) and converted to V 606 using V 606 = g − 0.40. The globular clusters in DF2 and DF4 are much brighter than those in Virgo dwarfs, as has been discussed extensively in earlier papers (see van Dokkum et al. 2018a;Trujillo et al. 2019;Shen et al. 2021a), and they are also somewhat bluer. There is no evidence for a systematic trend with magnitude when the full sample of confirmed and candidate clusters is considered. The photometric uncertainties increase sharply at fainter magnitudes, and in the following we only consider the 18 clusters with errors < 0.015. This sample corresponds to the full sample of clusters with M 606 < −8.6.
The mean color of the DF2 and DF4 clusters is nearly identical: V 606 − I 814 = 0.374 ± 0.004 for DF2 and V 606 − I 814 = 0.377 ± 0.003 for DF4, 6 as determined with the biweight estimator (Beers et al. 1990). The mean color difference is ∆ DF2−DF4 = −0.003 ± 0.005. The mean colors are compared to those of globular clusters in Virgo galaxies 7 in Fig.  2 (lower left). The clusters in DF2 and DF4 are bluer than those in Virgo galaxies with similar N GC . 8 Furthermore, in the Virgo sample with N GC < 20 the median color difference between any two data points is 0.036, an order of magnitude larger than the difference between DF2 and DF4.
The cluster-to-cluster scatter is also very small, and within the errors is the same for the two galaxies: σ obs = 0.015 ± 0.003 for DF2 and σ obs = 0.010 ± 0.003 for DF4. The observed scatter in the combined sample of 18 bright globular clusters in DF2 and DF4 is σ obs = 0.015 ± 0.002 (all determined with the biweight estimator; the rms is also 0.015). The intrinsic scatter σ intr can be determined by constructing the likelihood function, with c i the colors of the individual clusters, µ the mean, and σ 2 eff = σ 2 intr + e 2 i . We find an intrinsic scatter of σ intr = 0.012 +0.004 −0.003 . As shown in Fig. 2 (lower right) the typical scatter in Virgo galaxies is σ ≈ 0.1 mag, and there are no galaxies with σ < 0.05 mag. We conclude that the globular clusters in DF2 and DF4 form a remarkably homogeneous population, 6 The errorbars do not include an uncertainty of ≈ 1% in the absolute calibration of the ACS filters, as this systematic error affects all colors and magnitudes by the same amount. 7 The Virgo data are the single-component fits in Table 3 of Peng et al. (2006). 8 The color difference partly reflects an age difference: the DF2 clusters have ages of 9 ± 2 Gyr (van Dokkum et al. 2018a;Fensch et al. 2019) whereas the typical ages of metal-poor globular clusters are similar to those in the Milky Way (see Strader et al. 2005). However, this only explains ∼ 1/3 of the difference. We note that the colors of the DF2 and DF4 clusters are similar to those of clusters in the Milky Way: using colors from Table 2 of Bellini et al. (2015) and metallicities from Harris (1996)  as predicted by the bullet dwarf collision model. 9 This level of homogeneity is not observed in normal dwarf galaxies.

INTERPRETATION OF THE VARIATION
The color variation is small but not zero. As noted above, observational uncertainties explain some of the observed scatter. This is shown in the top left panel of Fig. 3, where the errorbar for each data point is split into several distinct contributions. Measurement uncertainties are shown in black and the total photometric uncertainties, which include the 0.007±0.003 mag flat fielding errors, are shown in dark grey.
The intrinsic scatter is so small that the effects of stochastic sampling of the isochrone need to be taken into account. Red giants contribute significantly to the integrated light and the Poisson variation in their number causes some clusters to be redder and brighter than others. The same effect causes the well-known pixel-to-pixel surface brightness fluctuations in galaxy images (Tonry & Schneider 1988;. We quantify this effect by generating artificial globular clusters with the ArtPop code  and measuring their integrated colors. The clusters have log(age) = 9.9, −1.8 ≤ [Fe/H] ≤ −0.8, and cover a factor of 20 in mass (parameterized by log(n stars ), which ranges from 5 to 6.3 with steps of 0.2). At each mass 500 clusters are generated with different random seeds, and the rms scatter in the V 606 − I 814 colors is measured.
The results are shown in the bottom right panel of Fig. 3. The cluster-to-cluster V 606 − I 814 scatter is ∼ 1 % in the relevant luminosity range, of the same order as the intrinsic scatter in the DF2/DF4 globular clusters, with a modest dependence on metallicity. Two example clusters that illustrate the effect are shown at left (see also Fig. 6 in . A polynomial fit to the luminosity-dependent variation has the form σ stoch = 0.019 + 0.0087(M 606 + 7.5) + 0.00153(M 606 + 7.5) 2 , (2) and is shown by the black line in Fig. 3.
Light grey errorbars in the top panel of Fig. 3 show the effect of including this uncertainty for each cluster. The broken orange lines show where 68 % of the points are expected to fall due to the stochastic sampling effect alone. Solid orange lines include the measurement error; for a normal distribution entirely defined by these lines, 12 out of 18 would fall within them. They encompass ten, only slightly fewer. Using the total errors (that is, the combination of the measurement error, the flat fielding error, and the stochastic variation) in the likelihood analysis gives the "stellar population scatter", σ sp = 0.008 +0.005 −0.006 . This scatter is not significantly different from zero. The likelihood function, marginalized over µ, is shown by the solid line in Fig. 3.
We use the flexible stellar population synthesis (FSPS) framework (Conroy et al. 2009) with the MIST isochrones (Choi et al. 2016) to determine the (limits on) variation in age and metallicity that is implied by the stellar population scatter. −0.06 if there is no variation in age or σ age = 1.3 +0.8 −1.0 Gyr if there is no variation in metallicity. In the simulations of Lee et al. (2021) the globular clusters can have a spread of up to ≈ 0.1 dex in metallicity and ≈ 150 Myr in age, and we conclude that the bullet hypothesis cannot be ruled out.

COLORS OF THE DIFFUSE LIGHT
The bullet model also makes predictions for the global colors of DF2 and DF4, although these are more modeldependent than the predictions for the globular clusters (Shin et al. 2020;Lee et al. 2021). The galaxies are predicted to have higher metallicities than the globular clusters as they form stars over a longer time period (see Lee et al. 2021). Therefore, while the colors of DF2 and DF4 should be very similar to each other, they are predicted to be redder than those of the globular clusters.
We measure the average colors of the galaxies in the following way. An object mask is created by comparing the summed V 606 +I 814 image to a binned and median-filtered version of itself. A first model for the galaxy is made by median filtering the V 606 and I 814 images, not taking masked pixels into account. This model is subtracted from the data and the object mask is optimized with a lower threshold, taking care not to include giants in the mask. Then the median filtering is repeated to create a final model in each filter. There is a background gradient in all images, and at this stage a surface is fitted to the background and subtracted. The rms of this background model (that is, 68 % of the gradient that is removed) is taken as the uncertainty in each filter. Finally, the I 814 model is divided by the V 606 model, multiplied by the object mask, and a flux threshold is applied so that the faint outskirts are excluded.
These color images are shown in Fig. 4, and compared to the mean color of the globular clusters. The variations within  We infer that the colors of the galaxies are identical within the errors, as predicted by the bullet model.
The galaxies are redder than the luminous globular clusters. If the actual color of both galaxies is V 606 − I 814 = 0.375 and the errors are Gaussian, the probability that we measure (V 606 − I 814 ) DF2 ≥ 0.420 and (V 606 − I 814 ) DF4 ≥ 0.436 is < 1 %.
The color difference between the galaxies and the clusters of 0.05 mag is qualitatively consistent with the metallicity difference predicted by the hydrodynamical model of Lee et al. (2021). Using the relation in § 5 we find ∆[Fe/H] ≈ 0.5, somewhat larger than the ≈ 0.2 predicted by Lee et al. (2021) but in good agreement with the spectroscopically-determined value of ∆[Fe/H] = 0.56 ± 0.15 of Fensch et al. (2019).

DISCUSSION
The central result of this paper is that the bright globular clusters in DF2 and DF4 have extremely similar colors. The observed scatter is ≈ 0.015 mag, and this can be explained . Color images of the galaxies, created by dividing binned and median-filtered I814 and V606 images of the galaxies. Numerical values are simply the ratios of the observed fluxes in e − s −1 . The mean value for the globular clusters is indicated with the circle. Color variations within each galaxy are caused by surface brightness fluctuations. The average color of DF4 is slightly redder than that of DF2 but the difference is not significant: (V606 − I814)DF2 = 0.420 ± 0.024 and (V606 − I814)DF4 = 0.436 ± 0.075. The galaxies are redder than the globular clusters. by a combination of measurement uncertainties (≈ 0.01) and stochastic variations in the number of red giants (≈ 0.01). The remaining scatter among the 18 luminous clusters in DF2 and DF4 is σ ssp = 0.008 +0.005 −0.006 , that is, not significantly different from zero. The diffuse light is redder in both galaxies, with again no significant difference between DF2 and DF4. These results are expected in the bullet dwarf scenario (see Silk 2019;Shin et al. 2020;Lee et al. 2021;van Dokkum et al. 2022) and we conclude that this model survives an important falsification test. Returning to the formation models listed in § 1, no other published explanation for the lack of dark matter in DF2 and DF4 also "naturally" produces the extreme uniformity of their globular cluster populations.
The globular clusters in DF2 and DF4 are different from those in Virgo galaxies; as detailed in § 4 they are brighter, bluer, and have a much smaller scatter. DF2 and DF4 are ultra-diffuse galaxies (van Dokkum et al. 2015) and at least two other UDGs also have very homogeneous glob-ular cluster populations, NGC 5846-UDG1 (Müller et al. 2021;Danieli et al. 2022) and DGSAT I (Janssens et al. 2022). In other respects they are different; NGC 5846-UDG1 has a canonical globular cluster luminosity function, and DGSAT I's clusters are significantly redder than those in DF2/DF4. A small scatter indicates synchronized formation in a dense, homogeneous medium and it may arise generically in any formation scenario that results in the extreme (factor of 10 9 ) density contrast between the globular clusters and the galaxy light that is seen in UDGs (see, e.g., van Dokkum et al. 2018a;Trujillo-Gomez et al. 2021). Specifically, intense feedback from the formation of the globular clusters may have caused the galaxies to expand and turn into UDGs (Trujillo-Gomez et al. 2021;Danieli et al. 2022). Further deep HST or JWST studies of globular clusters in UDGs in various environments are needed to investigate this further.
Our analysis focuses on the brightest clusters as these have the smallest uncertainties. Shen et al. (2021a) showed that the globular cluster luminosity function in DF2 and DF4 can be modeled as a combination of a bright peak of overluminous clusters plus a "normal" luminosity function with a normalization and mean that are typical for the galaxies' luminosities. In this context it is interesting that the clusters with M 606 > −8.6 are somewhat redder than the brighter ones, with V 606 − I 814 = 0.400 ± 0.011. Perhaps the faint clusters formed together with the diffuse light, with a higher mean metallicity than the brighter ones. 11 The observed scatter is also higher for the faint clusters at σ 606−814 = 0.034 ± 0.008, although this can be explained by a combination of measurement errors (0.028) and stochastic sampling (0.016). Extremely deep spectroscopy may shed more might on these questions.
Further tests of the bullet model are possible. The most straightforward next step is probably obtaining radial velocities and line-of-sight distances of other galaxies along the trail, as the bullet model predicts that these follow a regular sequence (with some contamination from unrelated objects). Dynamical mass measurements of other trail galaxies will also be highly constraining, and interesting in a broader context as bullet dwarf events can, in principle, constrain the selfinteraction cross section of dark matter. Modeling of the bullet cluster has provided an upper limit (Randall et al. 2008), but as self-interacting dark matter was introduced to explain the "cored" dark matter density profiles of low mass galaxies (Spergel & Steinhardt 2000) it is important to measure the cross section on those scales (Tulin & Yu 2018).