Systematically Measuring Ultra-Diffuse Galaxies (SMUDGes). III. The Southern SMUDGes Catalog

We present a catalog of 5598 ultra-diffuse galaxy (UDG) candidates with effective radius $r_e>5.3$ arcsec distributed throughout the southern portion of the DESI Legacy Imaging Survey covering $\sim$ 15000 deg$^2$. The catalog is most complete for physically large ($r_e>2.5$ kpc) UDGs lying in the redshift range $1800 \lesssim cz/{\rm km\ s}^{-1} \lesssim 7000$, where the lower bound is defined by where incompleteness becomes significant for large objects on the sky and the upper bound by our minimum angular size selection criterion. Because physical size is integral to the definition of a UDG, we develop a method {of distance estimation} using existing redshift surveys. With three different galaxy samples, two of which contain UDGs with spectroscopic redshifts, we estimate that the method has a redshift accuracy of $\sim$ 75% when the method converges, although larger, more representative spectroscopic UDG samples are needed to fully understand the behavior of the method. We are able to estimate distances for 1079 of our UDG candidates (19%). Finally, to illustrate uses of the catalog, we present distance independent and dependent results. In the latter category we establish that the red sequence of UDGs lies on the extrapolation of the red sequence relation for bright ellipticals and that the environment-color relation is at least qualitatively similar to that of high surface brightness galaxies. Both of these results challenge some of the models proposed for UDG evolution.


INTRODUCTION
This is the third paper in a series presenting results from our ongoing search for low surface brightness, physically large galaxies. The previous papers, Zaritsky et al. (2019, hereafter, Paper I) and Zaritsky et al. (2021, hereafter Paper II), presented both the scientific motivation and description of our methodology. The principal difference between the earlier papers and the current one is that we have progressed beyond developing and demonstrating how we identify and measure these galaxies and now produce a large catalog. We present 5598 candidate ultra-diffuse galaxies (hereafter UDG candidates, with criteria of µ 0,g > 24 mag arcsec −2 and r e > 5.3 arcsec) covering the southern portion of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys (hereafter referred to as the Legacy Survey; Dey et al. 2019), defined as the portion of the Legacy Survey that uses DECam (Flaugher et al. 2015) images obtained with the Blanco 4m telescope. We refer to these sources as UDG candidates because we do not yet have the dis-arXiv:2205.02193v1 [astro-ph.GA] 4 May 2022 tance measurements needed to determine their physical size. However, we describe below a method by which we do obtain distance estimates for 1079 of the candidates and determine that 514 of those do indeed have r e > 1.5 kpc, thereby confirming their nature as UDGs. Further arguments suggest that an even greater majority of the candidates presented in the catalog are likely to be UDGs. We refer the reader to Papers I and II for a description of our scientific interest in UDGs and historical context, but stress here that the observational definition of UDGs is merely a way of selecting objects that lie on the tails of physical parameter distributions, rather than defining a physically distinct class of galaxy. Nevertheless, such extreme objects have the potential to challenge galaxy evolution models that are tuned to reproduce more typical galaxies.
To illustrate the science potential of this new catalog, we revisit a few of the results described in Paper II based on the much smaller sample available then and we also present results using our new UDG subsample with estimated redshifts. Detailed and complete investigations of the aspects explored here will be provided in future papers and, we hope, by other investigators who exploit this catalog and the upcoming northern extension. We describe the data and reprise the outline of our processing methodology in §2 and §3. Some additional details and the catalog are presented in §4. We describe our method for distance estimation in §5 and a few preliminary results in §6. We use the concordance ΛCDM cosmology when converting to physical units (WMAP9; Hinshaw et al. 2013) and magnitudes are on the AB system (Oke 1964;Oke & Gunn 1983).

THE DATA
We report the results of our analysis of all images obtained with DECam (Flaugher et al. 2015) at the CTIO 4m, or Blanco telescope, that are included in Data Release 9 (DR9) of the DESI Legacy Imaging Surveys . Briefly, this survey was initiated to provide targets for the DESI survey (DESI Collaboration et al. 2016). In addition to the images from DECam (referred to as DECaLS), the complete Legacy Survey incorporates observations obtained with an upgraded MOSAIC camera (Dey et al. 2016) at the KPNO 4m, or Mayall telescope, (MzLS, Mayall z-band Legacy Survey) and the 90Prime camera (Williams et al. 2004) at the Steward Observatory 2.3m, or Bok, telescope (BASS, Beijing-Arizona Sky Survey; Zou et al. 2017) to provide deep three-band (g = 24.7, r = 23.9, and z = 23.0 AB mag, 5σ point-source limits) images. The Legacy Survey encompasses about 14,000 deg 2 of sky visible from the northern hemisphere between declinations approxi-mately bounded by −18 • and +84 • . The DECaLS covers about 9,000 deg 2 and provides the vast majority of observations obtained between −18 • and +32 • . DR9 is augmented with deep DES (The Dark Energy Survey Collaboration 2005) DECam observations 1 obtained to southern declinations of −68 • (Figure 1) of an additional ∼ 6000 deg 2 .
Here we present an analysis of only the DECam data. Extending SMUDGes to the full Legacy Survey footprint will require some adjustment and re-certification of the pipeline for MzLS and BASS data. We will present that portion of our catalog in the next data release paper.
3. PROCESSING All processing and analyses are performed on the Puma cluster at the University of Arizona High Performance Computing center 2 . The compute nodes on this machine currently contain 94 usable CPUs, 512 GB of RAM, and 1640 GB of solid state storage with half of the storage guaranteed to be available during processing. All files and observational data used in this study are publicly available at the Legacy Survey 3 or the NSF's NOIRLab 4 website. We limit observations to those contained in the file survey-ccds-decam-dr9.kd.fits.gz which is included in the Legacy Survey's DR9. This file has information for each CCD image used in the data release and excludes those considered inadequate for further processing. We also make use of the magnitude zero points and image full widths at half maximum (FWHMs) contained in this file which are generated for each CCD by the Legacy Survey's pipeline. The footprint ( Figure 1) contains 80,986 observations with acquisition dates ranging from 31 August 2013 to 7 March 2019 and includes 4,991,222 individual CCD images considered adequate for processing. In compressed form these observations require more than 25 TB of storage which far exceeds the capacity of the processing nodes. Because transferring files between the processing nodes and main storage drives is inefficient, we limit the number of observations that are simultaneously processed to 1,000, which allows all intermediate files needed for processing to be kept on the processing node. This is accomplished by dividing the observation footprint into individual tiles with sizes determined by the number of included observations rather than area. Observation centers extend from about −67.4 • to 34.8 • in Declination and we divide this into 10 equal stripes. We overlap Figure 1. Footprint of the Legacy Survey DR9 observations used in this study in Right Ascension and Declination. In the top three panels, shading denotes the observation density for each band as shown in the top color bar. The bottom panel shows regions with coverage available in all three filters. Footprints for our previous work in Coma (Paper I) and SDSS Stripe 82 (Paper II) are shown in green and red, respectively. The Galactic plane is traced by the orange curve. adjacent stripes by 1.2 • in Declination to account for the 2.2 • DECam field of view. Each stripe is then divided into individual tiles in Right Ascension with an objective of maximizing the number of observations within our imposed limit of 1,000. As with Declination, tiles overlap 1.2 • in Right Ascension to account for the field of view. This process results in 107 tiles which are individually processed.
Other than adaptations made because of the much larger footprint, our processing pipeline is essentially unchanged from that described in Paper II. The major steps involved in creating our catalog include: 1) image processing to create a list of potential UDGs; 2) screening for cirrus contamination which can create false positives; 3) automated classification of remaining candidates; 4) modeling completeness, biases, and uncertainties using simulated sources; and 5) creating the catalog. Each of these steps and prior modifications are described in detail in Paper I and Paper II and here we only briefly summarize them. We also describe further modifications that were implemented because of the expanded footprint. Unless otherwise stated, total numbers provided below and in Table 1 include all tiles. Duplicate UDG candidates from overlapping regions are removed prior to our machine classification of the candidates.

Image processing
Our image processing pipeline for identifying potential UDG candidates consists of the following steps. Rationales for each of these steps are provided in Paper II and are not repeated here.
1. We obtain calibrated images and data quality masks that have been processed with the DECam Community Pipeline (Valdes et al. 2014) from the NOIRLab website. Because of tile overlaps, a total of 99,316 observations consisting of 5,897,921 individual CCD images are reprocessed by our pipeline.
2. We use the data quality masks to assist in identifying and removing CCD artifacts and wings of saturated stars. 3. We model and subtract sources on CCDs that are 2 mag arcsec −2 brighter than a specified threshold in each band (24.0 for g, 23.6 for r, and 23.0 for z), thereby removing objects that are clearly too bright to qualify as UDG candidates.
4. We isolate candidates of different angular scales using wavelet transforms with tailored filters. This results in a total of 453,373,301 detections, or an average of ∼77/CCD, the vast majority of which will not be classified as UDG candidates after further screening. 5. We only retain candidates with at least two coincident detections (defined as lying within 2 of their mean centroid) among different exposures, regardless of which wavelength filter was used in the detection to limit spurious detections. Each group created in this manner is considered to be a unique candidate located at the mean centroid position. This requirement rejects all but 267,952,256 wavelet detections which comprise 82,439,516 groups.
6. To limit the number of detections requiring timeconsuming coaddition and GALFIT (Peng et al. 2002) modeling, we use the LEASTSQ function from the Python SciPy library (Jones et al. 2001) to obtain much faster, rough parameter estimates. We fit an exponential Sérsic model (n=1) to each candidate on a CCD and require that they meet parameter thresholds of r e > 4 and µ 0 values of greater than 23.0, 22.0 and 21.5 mag arcsec −2 for g, r, and z, respectively. These criteria are relaxed relative to those required after our final GALFIT modeling (Step 9) to avoid inadvertently rejecting valid candidates. A total of 11,452,256 detections representing 8,003,489 distinct groups meet these criteria. However, the majority of groups have only a single surviving member with only 1,853,319 having more than one.
7. For further verification, we require that candidates pass the preliminary screening described in the prior step on at least 20% of the available observations or a minimum of two observations for those with less than ten. A total of 752,798 meet this threshold.
8. We perform an initial GALFIT screening of stacked cutouts using a fixed Sérsic index of n = 1, without incorporating the point spread function (PSF) into the model. We again use generous thresholds compared to those of Step 9 and accept candidates with r e > 4 and µ 0,g > 22.95 mag arcsec −2 or µ 0,z > 21.95 mag arcsec −2 if there is no available measurement of µ 0,g . A total of 191,933 candidates survive this stage.
9. We obtain our final GALFIT results using a variable Sérsic index with an estimate of the PSF incorporated into the model. Estimates of morphological parameters (r e , b/a, n, and θ) are derived from a stacked image using all three filters. These morphological parameters are then held fixed when estimating photometric properties. Because our machine learning classifier (Section 3.4) uses information from all three filters, we require that a candidate have at least one observation in each filter. Images from a band are used even if the object was not initially detected on it. The 67,902 candidates that pass our final criteria of r e ≥ 5.3 , µ 0,g ≥ 24 mag arcsec −2 (or µ 0,z ≥ 23 mag arcsec −2 if GALFIT failed to model g), b/a ≥ 0.37, and n < 2 form the population used for further screening as described below.

Screening of Spurious Sources Caused by Cirrus
Our work on Stripe 82 (Paper II) showed that large regions of the survey footprint are contaminated by Galactic cirrus that can result in detections that are sometimes difficult to differentiate from legitimate UDG candidates. To address this challenge, we developed a screening process for probable cirrus contamination that makes use of τ 353 (Planck Collaboration et al. 2014) and WISE 12 µm (Meisner & Finkbeiner 2014) dust maps. In particular, we extract single point values from each dust map located at the coordinates of a candidate and reject those with values exceeding 0.05 for τ 353 or 0.1 MJy/sr for WISE 12 µm. With 43.4% of the Stripe 82 footprint from Paper II exceeding these thresholds, dust contamination was a major factor in determining completeness in that region. In contrast, only 1.7% of the Coma region studied in Paper I exceeds these thresholds, demonstrating the large variations found within the DECaLS footprint. This conclusion is visually confirmed in Figure 2, which shows that ∼31% of the entire DECaLS footprint exceeds our thresholds, primarily in regions immediately adjacent to the Galactic plane. Using our criteria, we reject 54,029 candidates (∼80%) that survived our image processing pipeline as potential false detections caused by dust, leaving 13,873. The fraction of candidates rejected far exceeds the fraction of the DE-CaLS footprint that fails our dust criteria, emphasizing the problem with false positive detections caused by cirrus contamination.

Screening of Duplicates
Before classifying the remaining candidates, we eliminate duplicate entries that result primarily from our defined overlapping tiles. We consider any candidates lying within 10 of each other to to be duplicate detections. While this theoretically could cause the loss of some closely spaced UDGs, we visually inspected all 75 cases of duplicates lying between 5 and 10 and found none containing bona fide separate candidates. The rejected systems consisted either of cirrus that was not rejected by our dust criteria, tidal material, or very large candidates that the processing broke up and identified as separate sources. We discuss our incompleteness in very large sources in §3.6.2.
We follow a two step protocol for handling duplicates. We select whichever of the multiple sources has the greater number of observations and reject the remainder. If two or more have the maximum number of observations, we create a new entry using the median values of all parameters. Finally, because our machine learning classifier uses information from all three filters, we require that a candidate have at least one observation in each of filter, leaving us with 10,892 sources.

Automated Classification
Details on our approach for computer classification are described in detail in the appendix of Paper I with modifications addressed in Paper II. Briefly, we found that our best results are obtained with the TensorFlow Keras version of the convolutional neural network, Effi-cientNetB1 (Tan & Le 2020) trained on 224 × 224 pixel (∼ 59 × 59 ) cutouts downloaded from the Legacy Survey. Using this protocol we achieved an overall accuracy of 96.2% (513/533) on a test set with 8 false positives (specificity of 96.5%) and 12 false negative classifications (sensitivity of 96.1%). As described in Paper II, training and test sets were derived from both our Coma and Stripe 82 data with depth distributions that should approximate those of the current footprint shown in Figure  1. Therefore, we make no changes for the current study and use the prior trained network resulting in 5,860 candidates classified as potential UDGs. As mentioned in Paper II we find occasional sources structurally simi-lar to other candidates but significantly redder than the Coma cluster red sequence. Although some of these may be objects of interest in their own right (e.g., high redshift Ly α nebulae), we conclude based on visual inspection that these are unlikely to be UDGs and so reject those with g − r colors >1.0 mag. A total of 100 candidates fail to meet this threshold, leaving 5,760 catalog entries. Applying the completeness corrections described further below, this number of cataloged candidates corresponds to 15830 candidates covering the ∼ 15000 deg 2 survey footprint, for a surface density of ∼ 1 candidate per deg 2 .

Visual Confirmation
In Paper II we concluded from visual examination that about 2.6% (8/306) of the candidates identified as potential UDGs by our automated classifier in a test set were false positives. To minimize the effects of false positives in our current catalog, objects classified as candidate UDGs were visually reviewed by two authors (DZ and RD) in three steps. To estimate consistency, both reviewers initially classified the same random 10% (576) of the sample. Of these, 10 (1.7%) are classified as false positives by both reviewers with disagreements on another 10. Because UDGs may randomly fall in close proximity to an unrelated normal galaxy along the line of sight, we find that most disagreements involve differentiating candidates from tidal material or spiral arms. Although without additional information (distance measurements, higher resolution images, etc.) it is sometimes impossible to be certain, we attempt to add some objectivity to this decision: for a candidate to be assigned to another galaxy, we require that it have either a clear bridge to the purported parent or appear to be part of a shell surrounding the parent. We then split the remaining candidates into 2 groups of 2,592 each which either RD or DZ visually evaluated. This step is intended to catch those with obviously incorrect designation. Of the 5,184 candidates inspected by either RD or DZ, a total of 79 (1.5%) were classified as false positives and 231 (4.5%) were flagged for further evaluation. The false positives include two large candidates with duplicate detections whose centers are separated by greater than the 10 distance required for our duplicate screening ( §3.3). The final step consisted of evaluating these 231, together with the 10 disagreements from the first set, in more detail by both reviewers. A total of 57 are labeled as false positives by both reviewers and disagreements remain on 16. Because we want to minimize the number of false or ambiguous detections in our catalog, we consider any disagreements to be false positives resulting in 162/5,760 (2.8%) being labeled as such. This fraction is nearly identical to that presented in Paper II suggesting that our earlier training and test sets are appropriate for the current data.

Estimating completeness, biases, and uncertainties using simulated UDGs
In Paper II we planted simulated UDGs at random locations to estimate uncertainties and recovery completeness. Simulated sources were placed at an average surface density of 2000 deg −2 (about 100 per CCD) using Sérsic profiles with random structural and photometric properties. We process the sources separately with the same pipeline as for our real sources, including the automated classification. We estimated completeness and both structural and photometric uncertainties using polynomial models created with the Polynomi-alFeatures function from the Python Scikit-learn library (Pedregosa et al. 2011a) and a four layer neural network implemented with Keras (Chollet, F. and Keras Team 2015). Details about our protocol for developing the models and selection of their parameters are given in Paper II and, other than modifications and information needed for understanding results, will not be repeated here.
We initially used uniform distributions for all parameters but found that the number of faint simulations surviving our pipeline was inadequate for robust statistics and, therefore, augmented the initial run with two more using normal distributions with a mean of µ 0,g = 26.4 mag arcsec −2 and a standard deviation of σ = 1.3 mag arcsec −2 . We now incorporate these into a single run consisting of 1/3 with a flat distribution and 2/3 with normal distributions. Because the full DECaLS footprint is so much larger than that of Stripe 82, we decrease the simulation density to 600 deg −2 or about 30 per CCD. In another departure from Paper II, we now account for tile overlaps by assigning both simulated sources and pipeline survivors to a single tile with borders defined by the midpoints of the overlapping regions. Points lying outside of this region are ignored. This is necessary because tile size is determined by a maximum number of observations and regions with higher observation densities have smaller tiles with a relatively larger fraction overlapping, resulting in a bias towards such regions. Furthermore, because we require our science candidates to be imaged at least once in all three filters, we only include the 7,090,079 simulations meeting this same criterion (see Section 3.3). As described in Paper II, in order to obtain adequate statistics our methodology prevents us from modeling the full range of our simulations. Because we want our models to include the expected ranges of our science candidates, we use ex-panded criteria for simulations (23.5 < µ 0,g < 27.5 mag arcsec −2 , 3.5 < r e < 20 , b/a > 0.25, and 0.1 < n < 2) with 1,142,617 meeting these thresholds.
As with our real candidates, we use cutouts downloaded from the Legacy Survey and centered on the detection when applying our automated classification. Because real sources may occasionally be incorrectly associated with a simulated one, we make two passes through our automated classification network. We initially evaluate cutouts before placing our simulations on them. Any that pass our classifier at this step cannot be simulations and are rejected from further consideration. We then place the simulations on these same cutouts and reclassify them. Using these criteria, a total of 956,604 of the original simulations are classified as candidates and are used for estimating uncertainties and completeness.
As noted in Paper II, our ability to estimate uncertainties and completeness is limited by the differences between our smooth Sérsic models and real UDGs, some of which may have complex morphologies. Nonetheless, they do provide baseline estimates of these parameters which may be augmented in the future by comparing our results to other surveys using different observational and analytic strategies.

Uncertainties
We define the parameter error for a given simulated source as the difference between the final GALFIT value and the value used when creating the simulated source (GALFIT − input). Errors are generally asymmetric and we define the bias as the median difference and the "1σ" confidence limits as the 15.1 and 84.9 percentiles of the distribution for a set of similar simulated objects.
Polynomial models, especially of high order, may extrapolate very poorly for data points lying outside of the fitted range. Other than for r e , which has an upper limit of 20 , our simulation range for individual parameters extends beyond those expected for our science targets. However, the models do not include all possible combinations in the 4-D parameter space, which is limited by the number of simulations surviving our entire pipeline. Therefore, candidates with parameters that are within individual simulation limits may still fall out of the full parameter space when combined. Our models for Stripe 82 used a 9 th degree polynomial which, except for a few candidates, gave us acceptable uncertainty estimates for parameters that fell out of the modeling range. Because of the much larger area of observation and a more diverse population, the range of GALFIT estimates for all parameters in the current study exceed those found in Stripe 82. This problem is particularly acute for r e where GALFIT estimates may produce values several factors greater than our simulation maximum of 20 . We find that using a 9 th degree polynomial results in meaningless uncertainty estimates for a significant number of candidates with parameters lying outside of the fitted model. Because we use a much larger number of simulations in the current study, even low order polynomials fit our model data points better than the 9 th degree polynomial did for Stripe 82 and we select a 2 nd order polynomial for this study. As discussed in Paper II, there is negligible bias in our θ determinations and to avoid adding noise, we set all θ biases to zero in the catalog.

Completeness
Completeness is defined as the probability that a candidate with given structural and photometric parameters will be identified as such after passing through our entire pipeline, and is assessed using four modeled parameters (µ 0,g , r e , b/a, and n). We again use a 2 nd degree polynomial to fit the simulation results rather than the 3 rd degree used for Stripe 82. Bias corrections are applied to our catalog entries before estimating their completeness probabilities.
A limitation of our completeness analysis arises from the training sample used for automated classification. This was drawn from candidates in the Coma and Stripe 82 regions, which contained very few visually confirmed candidates with angular extents >30 and, therefore, the vast majority of "large" candidates were classified as artifacts or tidal material. To estimate the effect of this limitation, we visually examined all 57 candidates with GALFIT r e >25 in the tile containing the Virgo cluster prior to our automated classification. The Virgo cluster is only about one-sixth the distance to the Coma cluster and a candidate with our target minimum physical extent of 2.5 kpc at Coma would have an angular extent of ∼32 in Virgo. Therefore, all the galaxies that we would classify as UDGs in the Coma cluster would be susceptible to misclassification in the Virgo cluster. Of the 23/57 that we visually determine to be legitimate candidates, none (0/10) of those with r e >33 were correctly classified by our automated classifier. The recovery rate improves to 50% (3/6) for candidates with 30 < r e < 33 and to 71% (5/7) for those with 25 < r e < 30 . This incompleteness is unfortunate, but not highly problematic for the survey as a whole. The principal goal of SMUDGes is to explore the nature of large (r e > 2.5 kpc) UDGs across environments. Accurate distances are required to determine physical parameters, and those galaxies that have low recessional velocities are plagued by large distance uncertainties due to unknown peculiar velocities. Our calculated completeness estimates extend to r e = 20 . This angular size corresponds to 2.5 kpc at z ∼ 0.006 or cz = 1800 km sec −1 . This lower bound excludes Virgo (cz ∼ 1100 km sec −1 ), and even so is still somewhat of a low recessional velocity for an accurate (i.e., likely to be good to 10 to 20%) Hubble flow distance determination. As such, we conclude that our catalog best reflects the population of large (r e > 2.5 kpc) UDGs beyond cz ∼ 1800 km sec −1 . The upper bound on cz over which we expect to find large UDGs simply comes from determining when large UDGs begin to fall below our 5.3 arcsec angular size criterion, which occurs at cz ∼ 7000 km sec −1 . Nevertheless, the catalog also contains smaller UDGs, with r e between 1.5 and 2.5 kpc, and UDGs beyond cz ∼ 7000 km sec −1 .

THE CATALOG
To keep our simulation pipeline identical to our science pipeline, we include all 5,760 candidates in our catalog with those visually identified as false positives flagged. Although a relatively small fraction, these should be omitted from any conclusions drawn from our results that do not depend sensitively on the completeness fractions. Descriptions of the catalog entries are presented in Table 2 with the full catalog available in the electronic version of the Table. The parameter entries include their GALFIT estimates as well as their bias and confidence limits produced by our models. We additionally flag any entry where we had to extrapolate the fitted model beyond the range of the constraints. Even when using lower order polynomials, we find that uncertainty and completeness estimates may be unacceptable when a candidate has an estimated r e > 30 and, therefore, we flag these by setting these estimates to −99.9999 for candidates with r e values exceeding this limit. Users are encouraged to apply the bias values by subtracting those presented in the catalog from the corresponding uncorrected measurements when drawing conclusions from the data. However, uncertainties and completeness estimates for flagged entries are suspect and should be used with caution ( §3.6). Note that this limitation results in severe incompleteness in physically large UDGs in nearby clusters such as Virgo.
Parameters are corrected for bias before their completeness values are estimated. Completeness estimates may be suspect (completeness flag = 0) for either of two reasons. The parameters may be outside of the parameter space defined by our completeness model and these have flag = 1. Alternatively, the bias correction derived from the uncertainty model may be unreliable and these have flag = 2. In either case, the results should be used with caution.
Photometric parameters are not corrected for extinction, but extinction values are included in the table for those who wish to use them. Our extinction estimates (A g , A r , A z ) are calculated using the SDSS g, r, and z Legacy Survey extinction coefficients 5 which differ slightly from those in Table 6 of Schlafly & Finkbeiner (2011). E(B-V) SF D is estimated using the dustmaps.py (Green 2018) SFD dust map based on the work of Schlegel et al. (1998).
Finally, we recognize that even in situations where both of our reviewers visually classify a candidate as a potential UDG, in a population of this size some are going to be ambiguous and other observers may think otherwise. This number should be small and not significantly affect any conclusions drawn from analyses of the entire unflagged data set. However, we obviously would recommend that images be reviewed for any studies drawing conclusions based on individual candidates, particularly if those are extreme in any way (e.g., largest, faintest, etc.).

Comparison to Previous Catalogs
In Paper II we presented a comparison to the Greco et al. (2018) and Tanoglidis et al. (2021) catalogs over the Stripe 82 region. Here, with our newly realized greater areal coverage, we expand that comparison and also now include the Prole et al. (2019) catalog. As we stressed in Paper II, catalogs tend to have different strengths and weaknesses so the comparison here is not meant to place any catalog above another, but rather to assess the robustness of the catalogs and highlight where the potential advantages of using the SMUDGes catalog might lie.
In Figure 4 we present the distribution of matched sources from the three catalogs and SMUDGes on the sky. The first clear difference is that SMUDGes covers a much larger area of sky than the Greco et al. (2018) and Prole et al. (2019) catalogs and provides more northern coverage than the Tanoglidis et al. (2021) catalog. In detail the various catalogs differ from SMUDGes in other ways as well. The Greco et al. (2018) catalog includes many more systems of small angular extent (only 23% of the galaxies satisfy the SMUDGes r e > 5.3 criterion) and a number of higher central surface brightness galaxies (only 67% satisfy the the SMUDGes µ 0,g > 24 criterion, assuming the global g − i color is representative of the central color). On the other hand, the Greco et al. (2018) catalog has significantly better representation at the faintest central surface brightnesses where  SMUDGes begins to become significantly incomplete. As such, the two samples, even within the overlapping regions of sky, are nearly distinct with only 62 sources in common. Similarly, the Prole et al. (2019) sample also is dominated by source of smaller angular extent than the SMUDGes criterion and only 57 objects are matched to the SMUDGes catalog. Of the 66 sources in that catalog with r e > 5.3 , 57 are matched, demonstrating that the mutual completeness is high when comparing similar populations and that the low number truly reflects the differences in the selection.
SMUDGes has the greatest overlap with the Tanoglidis et al. (2021) catalog, where we now identify 1261 objects in common. However, that catalog has broader selection criteria and includes many objects with brighter central surface brightnesses than SMUDGes, so care is still warranted before comparing results. Furthermore, as described in Paper II and confirmed here, SMUDGes is especially more sensitive for objects with µ 0,g > 25.5 mag arcsec −2 ( Figure 5).

DISTANCE BY ASSOCIATION
We approach the UDG candidate distance challenge in a similar way as previous investigators have. We identify those UDG candidates projected on overdense physical systems, as defined using galaxies with existing redshift measurements, and presume that the UDG candidates are members of that overdensity. This approach was first adopted by van Dokkum et al. (2015) for candidates projected on the Coma cluster and has been used widely (e.g., Mihos et al. 2015;Yagi et al. 2016;Trujillo et al. 2017;Román & Trujillo 2017; van der Burg  Tanoglidis et al. 2021). At least for those UDGs projected onto the Coma cluster, this approach has been verified to work for most candidates using spectroscopic follow-up (van Dokkum et al. 2016;Kadowaki et al. 2017Kadowaki et al. , 2021. The results are likely to become more questionable at lower overdensities and will depend on the degree of correlation between 'normal' galaxies and UDG candidates. Rather than applying this method only in highly overdense systems, e.g., the Coma or Virgo clusters, we aim to expand the environments in which we can provide estimated redshifts. We define the overdensities ourselves rather than depending on existing group or cluster catalogs. Our sample of 'normal' galaxies with measured redshifts comes from the compilation provided by the on-line database SIMBAD (Wenger et al. 2000). For each galaxy of interest without a spectroscopic redshift, we draw galaxies from SIMBAD that are projected within 5 • and have 0.0017 < z < 0.05. The lower redshift limit, corresponding to cz ∼ 500 km sec −1 , is set to avoid confusion with Galactic sources and, in practice, also removes a few local galaxies from consideration. Although this selection may prevent us from identifying a local UDG, working in a regime where peculiar velocities are comparable to the expansion velocity renders any association and inferred distance highly suspect. Furthermore, our completeness for local UDGs is extremely low ( §3.6.2). The upper limit, z = 0.05, is guided by our understanding that UDGs larger than r e > 6 kpc are quite rare (van Dokkum et al. 2015;Yagi et al. 2016) and often spurious (Kadowaki et al. 2021).
To have roughly equal sensitivity to structures across the relevant volume, we limit ourselves to using bright galaxies, M B or M g < −19 mag. We include in our consideration only such galaxies in SIMBAD that lie at a projected physical distance less than r C,proj from the galaxy of interest, using the angular diameter distance at the redshift of the SIMBAD galaxy to evaluate r C,proj . We explored a range of values for r C,proj ranging from 0.5 to 2.5 Mpc and concluded that 1.5 Mpc provided a compromise between providing an estimated redshift for as many candidates as possible and including contaminating, unrelated overdensities in the line-of-sight galaxy distribution. In practice, the results were not highly sensitive to the choice of r C,proj within this range.
Using galaxies that satisfy all of these criteria surrounding each galaxy of interest, we then drop from consideration those that have 2 or fewer potential companion galaxies. We opted for a minimum of three associated galaxies because we want to measure the mean recessional velocity and its dispersion for the putative group. To help determine whether an association be-tween the UDG candidate and a structure along the line of sight can be plausibly made we divide the galaxies along the line of sight into 30 redshift bins between 0.0017 < z < 0.05 and identify the location of the peak in the number distribution. We fit a Gaussian to the redshift distribution of all the potential companions. We refer to the standard deviation of this distribution as σ ALL . Then we work our way to both lower and higher redshift bins from the peak of the distribution until there are zero galaxies in a bin. We fit a second Gaussian to this trimmed z distribution and refer to it as σ T RIM . The ratio between these two measurements is the measure of the complexity of the line of sight galaxy distribution that we will use to accept or reject an estimated redshift.
We explore this and other potential criteria we might use to help us discriminate between correct and incorrect estimated redshifts using two different sets of galaxies. First, we estimate redshifts for galaxies with spectroscopic redshifts drawn at random from SIMBAD to hone the method and evaluate the resulting estimated redshifts. The advantage of this sample is that it can be quite large, limited only by the number of local galaxies in the SIMBAD database. The disadvantage is that these galaxies do not match the properties of UDG candidates and so may provide somewhat skewed results. Second, we estimate redshifts for a sample of SMUDGes sources with previously-obtained spectroscopic redshifts. We compile this sample by searching for cataloged sources projected within 6 of our candidates and with −300 < cz/km sec −1 < 30000 in NED, SIMBAD, and SDSS. The association is fairly secure because the projection of a low redshift source that is sufficiently bright to have been previously targeted for spectroscopy would have most likely prevented us from detecting the UDG candidate and we have visually examined each SMUDGes candidate during our visual classification process. We complement this list with the 68 redshifts from our own set of compiled spectroscopic measurements from a variety of sources (Kadowaki et al. 2021). The principal disadvantage of the spectroscopic UDG sample is its size, with only 187 sources, although it too can be somewhat non-representative given the heterogeneous selection of spectroscopic targets across different studies.
Using both of these samples, we now aim to understand under what conditions the estimated redshifts are most likely to be accurate. In Figure 6 we show three example of redshift estimation failures using SMUDGes sources with existing spectroscopic redshifts. In the first two cases we show how a difference between σ ALL and σ T RIM reflects additional structure along the line Figure 6. Example of failure cases for three candidate UDGs. Each panel shows the line-of-sight galaxy distribution, as selected for using the description in the text. The dashed blue vertical line shows the calculated mean redshift of the main concentration of galaxies identified along the line of sight and the solid vertical line the spectroscopic redshift of the UDG candidate. The blue histograms highlight the galaxies in the main peak, as defined in the text. The blue dashed curve shows the Gaussian fit to these galaxies, while the red dashed curve shows the fit to all of the galaxies along each line of sight. The left panel shows an example where the estimated and spectroscopic redshifts agree well, but the Gaussian fit to all of the galaxies is much broader, suggesting the possibility for confusion due to other galaxies along the line of sight. Unfortunately, we would reject this estimated redshift even though it is indeed correct. The middle panel shows another case that is likely to be rejected because of the different widths of the fitted Gaussians, but here we see that the estimated redshift is an inaccurate estimate of the spectroscopic redshift. Finally, in the right panel we show a case where the fit is accepted because the full distribution and main peak are identical, but the UDG lies at a redshift with no cataloged galaxies. This is an example of a catastrophic redshift estimation failure that is impervious to our choice of criteria.
of sight, including one case (middle panel) where that structure leads to a catastrophic redshift estimate. In the left panel, we see that we unfortunately reject a case where the estimated redshift was correct due to a complex line of sight. In the right panel of the Figure, we show a case where σ ALL and σ T RIM agree, but the estimated redshift is far from the spectroscopic one. Here we have either identified a SMUDGe that is not associated with other galaxies, we are victims of incomplete redshift catalogs, or the spectroscopic redshift is incorrect. The last option may appear unlikely, but with such low surface brightness objects the spectra are often of very low S/N (e.g., Kadowaki et al. 2017). Assuming that the spectroscopic redshift is correct, then this example demonstrates why this approach will never yield 100% accurate redshift estimates, although it will improve significantly with the increased sampling eventually provided by DESI.
Using randomly selected normal galaxies and defining an accurate estimated redshift as one that is within 3σ of the spectroscopic redshift, we experimented with different thresholds of σ ALL /σ T RIM . We adopt σ as measured for the associated group unless σ < 100 km sec −1 , in which case we set σ = 100 km sec −1 for this purpose. For a sample of 10000 randomly selected galaxies with spectroscopic redshifts, we were able to find overdensities to associate them with in 4021 cases. However, the estimated redshift was accurate in only 49% of those.
Restricting the sample to those with unambiguous lines of sight, done by requiring σ ALL /σ T RIM < 1.5, increases the accuracy to 72%. This cut results in a sample size of 734, from the original sample of 4021 with estimated redshifts.
To see if we can realize any further improvement in the redshift accuracy, we then applied the Scikit-learn random forest classifier (Pedregosa et al. 2011b) with σ ALL , σ T RIM , the number of galaxies in the associated peak, the fraction of galaxies along the line of sight in that associated peak, and the estimated redshift as the relevant features. We find an improvement in the accuracy to 92%, while now having estimated redshifts for only 556 galaxies. The combined procedure therefore yields estimated redshifts for ∼ 6% of this sample, although with reasonably high accuracy (estimated to be ∼ 90%). Bypassing the initial cut on σ ALL /σ T RIM and relying only on the random forest classifier results in lower accuracy.
We now proceed to determine if this accuracy is reproducible with the actual UDG candidates. Of the 187 candidates with spectroscopic redshifts, we are able to associate overdensities along the line of sight for 130. Of these, only 57 are considered as reliable based on the σ ALL /σ T RIM < 1.5 criterion. Applying the further random forest classifier results in a final sample of 52 candidates with estimated redshifts and the resulting accuracy is 76%. We find a far larger return rate of redshift estimates, but a lower accuracy percentage in comparison to the random galaxy sample. We attribute both of these aspects to a feature of this sample that can be seen in the middle panel of Figure 7. A large fraction of the UDG candidate sample with spectroscopic redshifts lie either in the Virgo or Coma cluster regions. This is simply the result of spectroscopic campaigns preferentially targeting those areas (cf. Kadowaki et al. 2017;Chilingarian et al. 2019), but it does lead to both a higher redshift return rate, because candidates lie in a region of sky with overdensities, and a lower accuracy because any candidate projected onto the Virgo or Coma clusters will have the corresponding redshift estimate whether or not it lies in either cluster.
Finally, we also test our method using a distinct sample of low surface brightness galaxies, an HI-bearing sample of ultra-diffuse galaxies (HUDs; Leisman et al. 2017;Janowiecki et al. 2019). These are likely to be the most isolated subset of UDGs and, as such, pose the greatest challenge to our redshift estimation technique. Because of this relative isolation, the redshift yield (8%) is lower than that for the SMUDGes spectroscopic set (28%), but it is nevertheless similar to that of the random sample (7%), while the accuracy (74%; see §5.1) is comparable to that of the SMUDGes spectroscopic set (76%; see §5.1). We conclude that across the range of prospective UDG properties and environments, our redshift accuracy is likely to be ∼ 75% when the method yields a redshift estimate.
Applying the method and criteria to the full sample of available UDG candidates from SMUDGes (5598) we are able to estimate redshifts for 1079. The return rate is 19% and, as expected, lies between the return rates for the random galaxy and spectroscopically-confirmed UDG samples. In the end, the random forest classifier removes about 10% of the estimated redshifts in the SMUDGes training sample, the HUD sample, and the full SMUDGes catalog. Thus, it provides at best a modest improvement given the estimated 25% contamination rate, but it does highlight a potential way forward once the training samples are larger.

Understanding the Redshift Estimates
Figure 7 is our starting point for the discussion of the properties of the estimated redshifts. As illustrated by the left and middle panels, the method appears to produce large fractional errors for galaxies at low cz. For the SMUDGe data this is mostly due to Virgo members, which have a large intrinsic velocity scatter and are at relatively low cz, leading to large fractional errors. This is related to the generic problem of using recessional velocities, which include peculiar velocities, to estimate distances for nearby objects. We exclude objects with an estimated cz < 1800 km sec −1 in our discussion of physical properties ( §6.2). That cut is shown in the shaded region of the upper set of panels in Figure 7 and is constructed primarily to exclude very nearby groups and clusters. Other than this one set of outliers, the method appears to be roughly equally capable across the relevant redshift range. The method, after this cut, yields accurate redshifts for 76% and 74% of the SMUDGes and HUD samples, respectively. Note that even the majority of the Virgo candidates are likely to have been assigned the correct distance, as they are probably truly in Virgo, but given the large angular extent of Virgo there are also a significant number of accidental projections and we opt to exclude the entirety of these low cz objects from our final sample.
In Figure 8 we examine the estimated redshift quality vs. environment. As might be expected and appears evident in the randomized galaxy sample (upper panel), the estimated redshift accuracy and precision are both lower for overdensities with a small number of associated galaxies, N . While the same may be true for the SMUDGes and HUDs samples, it is not clearly the case, perhaps due to small number statistics. Because it is of scientific interest to explore the distribution of UDGs in low N environments, we choose to not require larger N values than we currently do (i.e., N ≥ 3). However, we do recommend additional caution when using the estimated redshifts for systems associated with N ≤ 6 environments. Of our full sample of estimated redshifts, only about 20% are associated with environments of N ≤ 6.
The potential increased uncertainty for systems associated with low N suggests that we examine how the systems with recovered cz are distributed in N . We see in Figure 9 that we are much more successful at recovering systems in high density environments. This is not unexpected because one would expect to assign systems projected near a rich system to that system with relatively little confusion, e.g., the Coma cluster, and have a more difficult time assigning candidates to those projected near poor systems. As such, the sample of UDGs with estimated redshifts is not representative of the overall sample in terms of the numbers of UDGs per environment, but there is no evidence (yet) that it is not representative in terms of other UDG properties. The recovery fraction as a function of N is consistent between our SMUDGes spectroscopic sample and the full sample (solid line in middle panel of Figure 9). We find little variation in the recovery fraction with spectroscopic redshift, except for the features introduced by the Virgo and Coma clusters in the SMUDGes spectroscopic sample (Figure 10). Figure 7. Comparison of spectroscopic and estimated redshifts. Darker symbols represent those objects for which the estimated redshift is within 10% or 3σ of the spectroscopic redshift, while the lighter symbols for those where it is not. In the left panel we show the results for galaxies drawn randomly from the SIMBAD database, in middle panel for UDGs with spectroscopic redshifts, and in the right panel for a sample of HUDs with spectroscopic redshfits. Shaded regions show exclusion zones based on estimated redshifts that eliminate where cz is too low to yield a reliable distance. Finally we examine the estimated redshift quality as a function of a derived parameter, physical r e . We have used the physical value of r e partly as a prior in our methodology by not allowing redshift solutions that result in what we considered to be unphysically large values of r e (i.e., r e ≥ 6 kpc). We examine the resulting distribution for the two UDG samples in Figure 11. We find that the recovered redshift behaves well over the range of inferred sizes that are consistent with a UDG classification (i.e., r e > 1.5 kpc). We find that no further selection on r e would improve the estimated redshifts.
We conclude that while the estimated redshifts do not provide redshifts for a representative subset of the UDG candidates in terms of the numbers across environment, in all other respects they behave as needed to provide a set of redshifts that are likely to be correct in ∼ 75% of the cases and have no apparent biases in terms of redshift or environment. For cases that match the criteria that we will use to select the sample of "redshiftconfirmed" UDGs (σ ALL /σ T RIM < 1.5, r e > 1.5 kpc, cz > 1800 km sec −1 ), the fractional statistical precision on cz for the ∼ 75% of the sample with accurate redshifts is 0.09, 0.12, and 0.11 as calculated from the random, SMUDGes, and HUD samples, respectively. Of course, for the 25% of the sample with catastrophic redshift estimates the errors turn out to be much larger. Because of this hybrid error distribution it is impossible to propagate errors through a frequentist analysis, although one could forward model these uncertainties for a given model and compare to the data.

RESULTS
In Figure 12 we present a basic overview of the parameter distribution of the UDG candidates in the catalog by showing the distributions of r e , µ 0,g , and m g . The Figure 9. The fraction of galaxies with recovered estimated redshifts as a function of the environment they are associated with by our technique. As expected, we tend to accept the redshift association more often in richer systems. Also as anticipated, the HUD sample avoids rich systems. The recovered redshifts for the full SMUDGes sample (solid line in middle panel) follows the trend established with the spectroscopic sample, showing no clear bias in the environments that are sampled relative to the training sets. Shaded regions represent 1σ confidence bounds estimated using Poisson errors.
concentrations of candidates toward smaller, higher surface brightness objects are evident. The surface brightness limits which are vertical in the right panel of the Figure, imposed by definition at the bright end and by the nature of the data at the faint end, lead to effective diagonal bounds in the left panel.
We now describe some preliminary results from an inspection of both the full catalog and the subsample with estimated redshifts. In both cases, we do not consider objects that were visually rejected by at least one reviewer and those which lie in regions of parameter space that are less than 25% complete. The latter criterion is applied to avoid being misled by a few objects with large, and uncertain, correction factors.
We will discuss cases where loosening this criterion makes a qualitative difference to the interpretation. These are all cursory demonstration cases for the catalog and more complete analysis and discussion will follow elsewhere.
6.1. Distance-Independent Results Figure 10. The fraction of galaxies with recovered estimated redshifts as a function of the spectroscopic redshift. We find little dependence of the recovery fraction on redshift across the relevant redshift range for the random and HUD samples. The SMUDGes sample shows strong fluctuations but these map the influence of the Virgo and Coma clusters on this particular data set. Shaded regions represent 1σ confidence bounds estimated using Poisson errors. Figure 11. The estimated redshift errors as a function of inferred re in physical units. Points coded as in Figure  7. Shaded region shows exclusion region for UDGs. Catastrophic errors are scattered along re and scatter among successful redshift estimates is independent of re.
With the much larger sample of candidates in hand we now revisit the results we presented in Paper II. First, we had found no significant decline in the number of UDG candidates as a function of µ 0,g to the limit of our survey (∼ 26.5 mag arcsec −2 ). However, in Figure 13 we now see a significant decline. If we remove from consideration the two faintest bins, as well as the few sources where the measurement bias correction results in µ 0,g < 24, Figure 12. Distribution of basic observational parameters for the cataloged UDG candidate sample. A small number of candidates have µ0,g < 24 mag arcsec −2 once the measurement is bias corrected. Figure 13. Surface brightness distribution of sources in the current SMUDGes catalog. The thin green bars represent the raw distribution, while the broader, darker bars represent the completeness corrected distribution. Errorbars represent Poisson noise, multiplied by completeness fraction. The line is a least squares fit to all bins with > 1000 counts. A small number of candidates have µ0,g < 24 mag arcsec −2 once the measurement is bias corrected.
a linear fit to the distribution, as shown in the Figure, results in a slope that is 7σ away from zero. One concern is that the uncertainties are simply Poisson and do not account for uncertainties in the completeness corrections. Statistically, the uncertainties in the corrections are small as we use many simulated sources to recover the corrections. However, there are possible systematic uncertainties if the nature of the sources differs significantly from what is assumed in those simulations. We have no reason to suspect this is the case, particularly at µ 0,g ∼ 26 mag arcsec −2 where we have many candidates and the sources are well modeled with the same Sérsic models as we use for brighter sources. However, fainter than this we have both fewer sources and the fitting The lower panel shows the distribution of blue candidates (g − r < 0.45) in the blue histogram and red candidates (g − r > 0.55) in the red histogram. Bluer candidates tend slightly toward smaller values of n, while those with the largest n values are red. This large n-red wing of the distribution is sparsely populated and highly incomplete, therefore less certain than the blue-small n wing. The sample is dominated by sources with n ∼ 0.8 and g − r ∼ 0.6 mag.
uncertainties become larger, allowing, perhaps for a different kind of low surface brightness object, for example one that is highly elongated that evades our detection algorithm. On the basis of the observed distribution for µ 0,g < 26 mag arcsec −2 and our fit to that distribution, we conclude that there is a decline in the number of candidates as a function of central surface brightness over the range of surface brightness we explore.
Second, we had found that bluer candidates have smaller Sérsic n. In Figure 14 we confirm that those candidates with the smallest values of n do appear bluer than the bulk of the population and that those with the largest values of n do appear redder. A KS test comparing the n values of blue (g − r < 0.45) and red (g − r > 0.55) candidates finds that there is less than a 10 −5 chance that they are drawn from the same parent distribution. The bulk of the population has n ∼ 0.8 across all colors, with differences only at the extremes. Because this difference is only at the extremes, it requires a large sample to see it at all. This statistical difference may explain why previous studies (Greco et al. 2018;Tanoglidis et al. 2021) did not identify this possi- In the lower panel we compare the distribution of colors for brighter surface brightness (24 < µ0,g/mag arcsec −2 < 24.5) in gray and the somewhat lower surface brightness (25 < µ0,g/mag arcsec −2 < 25.5) candidates in green. Blue UDG candidates are mostly confined to the higher central surface brightnesses. The blue cloud nearly disappears by µ0,g ∼ 25 mag arcsec −2 . The red sequence of objects has a slight tilt that we will return to when discussing the colormagnitude relation in §6.2.2. A small number of candidates have µ0,g < 24 mag arcsec −2 once the measurement is bias corrected. ble behavior or, as we described in §4.1, this difference may simply reflect that fact that these galaxy samples are all somewhat differently selected.
Third we found that the bluest (g − r < 0.45 mag) candidates have µ 0,g ≤ 25 mag arcsec −2 . That result is more striking with the larger sample available now ( Figure 15) and we might even move the limit a bit brighter to 24.5 mag arcsec −2 . A similar trend has been noted before (e.g., Greco et al. 2018). We argued in Paper II that there is no identified low redshift, blue population that could fade to populate the red sequence below ∼26.5 mag arcsec −2 . This argument appears to remain valid and should be explored further to determine at what surface brightness limit we might expect to only find "primordial" UDGs. Examples of red and blue UDGs are provided in Figure 17.
Lastly, we had found that candidates with fainter µ 0,g tended to smaller n. We find that trend here too ( Figure  16), but the slope of the relationship is small and could be the result of selection biases that are not completely captured in our completeness correction procedure. Indeed we find that the bias observed in our recovered simulated sources is in the same sense as and of larger amplitude than that observed, suggesting that the observed one might be due to a slight undercorrection of the bias.

The Fraction of UDGs
A basic question underlying the SMUDGes survey is what fraction of our candidates satisfy the r e > 1.5 kpc criterion. Of the 1079 with redshift estimates, 679 (63%) satisfy r e > 1.5 kpc, although 165 of those have cz < 1800 km sec −2 so we exclude them, leaving us with 514 confirmed UDGs. However, this fractional return (48%) is likely to underestimate the return from the full catalog. The vast majority of candidates that failed the size criterion lie in two nearby clusters, principally Virgo and Fornax (see upper panel of Figure 18). Excluding sources with an estimated cz < 1800 km sec −1 , which effectively excludes these two clusters, of the remaining 563 sources 514 (91%) satisfy the r e > 1.5 kpc criterion. These results are encouraging and indicate that most SMUDGes candidates are likely to be UDGs. For the following discussion, we present results only for the 514 candidates that satisfy this physical size criterion and cz > 1800 km sec −1 , the latter cut imposed to avoid objects with large distance uncertainties. We refer to these as our UDG sample.
In Figure 18, we present the distribution of the UDGs in our sample. Although some of the known clusters are well represented (compare to Figure 3), many UDGs lie Figure 17. Examples of UDGs in the blue (0.2 < g − r < 0.45) and red (0.55 < g − r < 0.7) higher central surface (µ0,g < 24.5) populations. Blue galaxies in the upper two rows, red in the lower two. Some blue UDGs show more internal structure and irregularities. Some red UDGs show what appear to be nuclear star clusters.
outside the clusters. This distribution suggests that our approach for estimating redshifts is able to extend the technique beyond the strongest overdensities and that the sample of UDGs with redshifts may not be grossly distorted from a general one.
Next, we support our claim that the bulk of the candidates are more likely to be UDGs than not, by comparing the properties of the candidates as a whole to those of the candidates that are estimated to have r e > 1.5 kpc and those that have r e < 1.5 kpc in Figure 19. The distribution of the UDGs, those with r e > 1.5 kpc, is a closer match to the overall parameter distribution than that of the non-UDGs. The only systematic deviations in the comparison of the UDGs and the overall sample properties are at large angular size, which reflects the over-representation of the nearby clusters in the sample with estimated redshifts, and the slightly redder colors, which is also related to the over-representation of UDGs in denser environments (see §5.1). We conclude that we have no evident reason to suspect that the redshift distribution of those candidates without redshift estimates is grossly different than those for which we obtained estimated redshifts, and therefore that a large number of the remaining candidates will also eventually be spectroscopically confirmed as UDGs.

Color-Magnitude Relation
In Figure 15 we see a prominent red population of UDG candidates. With the estimated redshifts, we can now place the confirmed UDGs on the color magnitude diagram and determine whether they lie on the wellestablished red sequence of galaxies. We present the results of this exercise in Figure 20. The red sequence is evident and falls closely to the extrapolated red sequence defined using the colors of normal, low luminosity (L < L * ) elliptical galaxies (Schombert 2016). The density ridge of galaxies is nearly indistinguishable from the extrapolated relation. Note that the fractional redshift uncertainties ( §5.1), for the dominant fraction of UDGs with accurate redshift estimates, results in absolute magnitudes errors that are smaller than the bin size (0.3 mag) in Figure 15.
The presence of UDGs on the extension of the colormagnitude relation suggests that UDGs fall on the stellar mass-metallicity relation (see also Barbosa et al. (2020)). It also argues against certain formation scenarios for the majority of UDGs. For example, tidal dwarfs Figure 18. The distribution of candidates that are estimated to have re < 1.5 kpc (upper panel) and those that have re > 1.5 kpc and also have cz > 1800 km sec −1 (lower panel) so that the estimated distances are more reliable. Ellipses highlight positions of two known nearby (cz < 1800 km sec −1 ) clusters (Virgo, upper left; Fornax, lower right). Outside of nearby clusters, we do not find that a large fraction of our candidates are likely to have re < 1.5 kpc, suggesting that most of our candidates (outside of nearby clusters) are likely to be UDGs (see text for numbers). (Hunsberger et al. 1996;Duc 2012;Bennet et al. 2018) would be expected to lie off this relation because their stars would come from a more massive parent galaxy. Even though stars from the outskirts of such a massive galaxy would be lower than the characteristic chemical abundance because of metallicity gradients (Searle 1971), it is unlikely that the metallicity of those stars would be on average a match to the extrapolation of the color-magnitude relation.
These observations can also provide a constraint on an alternative UDG formation model that posits that UDGs are galaxies that have lost a majority of their initial stellar mass (e.g., Conselice 2018). Galaxies with significant stellar mass loss would be expected to have a metallicity above that implied by their current luminosity. This argument is made more clearly in the bottom panel of Figure 20 where we compare the color distribution of UDGs with −15.5 < M r < −15 to the position of the extrapolated red sequence (dotted line) and the corresponding mean color of galaxies on that red sequence that lost either 50 or 90% of their initial stellar mass to match the current M r . Although the scenario where 50% of the mass is lost is marginally consistent with the observed color distribution, given uncertainties in the extrapolation of the red sequence and photometric calibration, models with larger mass loss become increasingly inconsistent with the observations. We close this discussion by noting that DF 44, one of the best studied UDGs and a close analog in total mass to the Large Magellanic Cloud (LMC) with a mass of ∼ 10 11 M (van Dokkum et al. 2019;Erkal et al. 2019), has an I-band mass-to-light ratio of 26 +7 −6 in solar units (van Dokkum et al. 2019) compared to the LMC's of 4.6 (Kadowaki et al. 2021)). If an LMC-like galaxy was the progenitor of DF 44, then it lost ∼ 80% of its stellar mass.

Color-Environment Relation
We quantify the environment of each UDG using the standard deviation of the fitted Gaussian to the associated peak of normal galaxies along the line of sight. This measurement is an estimate of the velocity dispersion of the environment. In Figure 21 we plot color, g−r as a function of environmental velocity dispersion. Red, quenched UDGs are found in all environments, while blue, star-forming ones are more highly represented in the lower velocity dispersion environments. This is highlighted by the rolling mean (solid line) which bends at σ ∼ 400 km sec −1 . The same qualitative behavior is evident when we categorize environment by the number of normal galaxies found in the associated redshift peak. As seen in previous studies (Greco et al. 2018;Tanoglidis et al. 2021;Kadowaki et al. 2021), UDG properties track environment, confirming that models of UDGs in isolation will not fully describe them.

SUMMARY
This paper principally presents a catalog of 5598 ultradiffuse galaxy (UDG) candidates distributed throughout the southern fraction of the DESI Legacy Imaging Surveys , defined as the portion of the survey that used the Blanco 4m telescope and DECam (Flaugher et al. 2015). By focusing on those candidates that are large on the sky (r e >5.3 ) we aim to limit the number of spurious sources and provide those for which the measured structural and photometric parameters are most accurately determined. The catalog, therefore, is most complete for physically large (r e > 2.5 kpc) UDGs lying in the approximate redshift range 1800 cz/km s −1 7000, the lower bound defined by where peculiar velocity uncertainties do not significantly affect the distance estimates and completeness is high and the upper bound by the limits introduced by our angular selection criterion. Figure 19. The comparison of observed properties of the overall sample in grey, the sample of candidates for which we were able to estimate a redshift and the inferred size confirms the candidate as a UDG (re > 1.5 kpc) in green (upper panels), and candidates with an estimated redshift that are rejected as UDGs (re < 1.5 kpc) in red (lower panels). All distributions are normalized to sum to unity for comparison.
Because the definition of UDG incorporates a physical size criterion, we proceed to develop a methodology for estimating distances based on an extension of the way the original UDG samples were constructed, using distance-by-association. In distance-by-association those UDG candidates projected near evident overdensities (e.g., the Coma cluster; van Dokkum et al. 2015) are placed at the distance of the overdensity. We extend the method to lower amplitude, uncatalogued overdensities and obtain preliminary estimated redshifts for 1050 candidates. We find that the redshifts are accurate for between 70 and 90% of the sources. Of those with estimated redshifts, 514 satisfy the r e ≥ 1.5 kpc criterion for UDGs and lie at cz > 1800 km sec −1 . We consider these preliminary redshift estimates because as the sample of UDG candidates with spectroscopic redshifts increases we will be able to train the method further and refine the sample of estimated redshifts that we consider to be accurate.
Finally, we present a sampling of results drawn from the catalog to illustrate its uses. We present results that are distance independent and dependent. In the former we revisit results from Paper II, and in the latter we establish that the red sequence of UDGs follows closely the extrapolation of the red sequence relation for bright ellipticals and that the environment-color relation is at least qualitatively similar to that of high surface brightness galaxies. Both of these results challenge some of the In the left panel, we plot UDG color, g − r, vs. the line of sight velocity dispersion of the associated overdensity (σ) for each UDG with an estimated redshift. In the right panel, we plot color vs. the number of galaxies with known redshift in the associated overdensity for each UDG with an estimated redshift. In each panel, the solid line is the rolling mean of 75 galaxies. The lightly shaded area shows the dispersion among measurements in each rolling bin, while the more heavily shaded area shows the dispersion in the mean. models proposed for UDG evolution, and more detailed examination of the catalog will surely provide further constraints. Modelers now have additional empirical results to target. DZ, RD, JK, and HZ acknowledge financial support from NSF AST-1713841 and AST-2006785. AD's research is supported by NOIRLab, the John Simon Guggenheim Foundation, and the Institute of Theory and Computation at the Harvard-Smithsonian Center for Astrophysics. KS acknowledges funding from the Natural Sciences and Engineering Research Council of Canada (NSERC). An allocation of computer time from the UA Research Computing High Performance Computing (HPC) at the University of Arizona and the prompt assistance of the associated computer support group is gratefully acknowledged.