Photometric Completeness Modelled with Neural Networks

In almost any study involving optical/near-infrared photometry, understanding the completeness of detection and recovery is an essential part of the work. The recovery fraction is, in general, a function of several variables including magnitude, color, background sky noise, and crowding. We explore how completeness can be modeled, with the use of artificial-star tests, in a way that includes all of these parameters simultaneously within a neural network (NN) framework. The method is able to manage common issues including asymmetric completeness functions and the bilinear dependence of the detection limit on color index. We test the method with two sample Hubble Space Telescope data sets: the first involves photometry of the star cluster population around the giant Perseus galaxy NGC 1275, and the second involves the halo-star population in the nearby elliptical galaxy NGC 3377. The NN-based method achieves a classification accuracy of > 94% and produces results entirely consistent with more traditional techniques for determining completeness. Additional advantages of the method are that none of the issues arising from the binning of the data are present and that a recovery probability can be assigned to every individual star in real photometry. Our data, models, and code (called COINTOSS) can be found online on Zenodo at the following link: https://doi.org/10.5281/zenodo.8306488.


INTRODUCTION
In photometric studies it is often necessary to understand the completeness of the measurements, as a function of magnitude, crowding, or other properties of the images.Completeness, defined in rough terms as the fraction f of detected objects relative to the true number present in the image, can be evaluated quantitatively through artificial-star tests (ASTs).The ASTs yield other key characteristics of the data including the internal measurement uncertainties in magnitude and Corresponding author: William E. Harris harris@physics.mcmaster.caposition, systematic measurement bias, and "limiting magnitude", which is normally taken to be the faintness level where the completeness has dropped to some fiducial level such as 50% or 80%.
At the basis of the completeness issue is an extremely simple question: in the first stage of the photometry, the image is searched for any objects brighter than a well defined detection threshold.At that stage, any object i is either detected (d i = 1) or it is not (d i = 0).That is, the detection process is a binary response function.After the detection stage, various downstream selection cuts can then be applied to determine whether or not the objects are of high enough quality (e.g., have a signal-tonoise ratio above a specified threshold) to be included in a final photometric catalog, with objects that pass (q i = 1) being included and objects that do not (q i = 0) being excluded.In the final catalog of objects, we define those with y i = q i × d i = 1 to be recovered.Throughout the remainder of the following discussion, we will try to disambiguate between these notions of detection, which only involve whether or not an object was identified at some stage of a photometric pipeline, versus recovery, which measures whether or not an object made it into the final catalog after additional culling steps were applied.
Stating the problem this way raises the possibility that completeness may be suited for modelling by logistic regression (LR)1 .Additional evidence for this view comes from the dependence of the recovery fraction f (m) on magnitude m, which typically behaves approximately as (Harris et al. 2016) where m 0 is the completeness limit or "limiting magnitude" where f (m) drops to 50%, and α represents the steepness of decline of the curve around m 0 .This empirical interpolation formula is formally identical with the sigmoid function used in LR to describe the probability of recovery.For example, Rosolowsky et al. (2021) use LR to model completeness for recovering extragalactic GMCs (Giant Molecular Clouds) in CO emission from PHANGS-ALMA imaging data.In their work, the recovery probability is correlated with cloud mass, surface intensity, and cloud virial parameters, all of which can be put into a single LR function.
Completeness can therefore be viewed from two different statistical perspectives.For an ensemble of stars, the ratio f (m) represents the fraction of objects present that are successfully recovered in a bin of width dm at magnitude m.For a single star, we can instead think of this curve as representing p(m), the probability of its recovery.While the distinction between these two perspectives may appear somewhat moot if both can be expressed by the same equation (Eadie et al. 2022), it is important to recognize that methods such as LR are ultimately trying to estimate p(m), not f (m), and that stellar recovery or non-recovery can ultimately be estimated (and used) on a star-by-star basis rather than a population average.In particular, we can explicitly relate the two by interpreting f (m) as the long-run expec-tation of recovery y ∈ {0, 1} at a given m with recovery probability p(m) over many data realizations n. 2As will be seen below, however, LR in the simplest form of Eq. 1 may turn out not to be a good match to real data in important ways.The main causes for this shortcoming are biases in photometric measurements and two-band detection requirements, both of which lead to asymmetric and rapidly changing recovery criteria across the color-magnitude diagram that are difficult to model with simple LR methods.These aspects of the problem are discussed in more detail below.As a result, we were led to explore more complex extensions of LR that can reproduce the more complex behaviors seen in the data.We ultimately find that neural networks deliver excellent performance at predicting aggregate f (m) values, along with individual recovery rates y i , and are well motivated by the core concepts behind LR.
The aim of the present paper is to develop new formalism for photometric completeness in the optical/NIR regime, where the measurements of concern may be physically analogous to quantities at other wavelengths (luminosity, surface brightness, crowding/confusion etc.) but are parametrized very differently.
The discussion is organized as follows: In Section 2, some of the background of completeness studies in the literature is summarized.Sections 3 and 4 present the photometry and artificial-star tests for two sets of data (imaging fields targeting the galaxies NGC 1275 and NGC 3377) that will be used as testbeds for development of the methodology.Section 5 describes the underlying parameters that we consider in our modelling.In Section 6, we outline the basic LR model, highlight its shortcomings, and describe its extension into a neural network framework.In Section 7, we demonstrate the excellent performance of the neural network models tested on the artificial star data and its application to the real data.We also discuss some of the potential benefits and shortcomings of our approach.Section 8 finishes with a quick summary.The code for the NN models, along with the data used to train them, are provided online on Zenodo at https: //doi.org/10.5281/zenodo.8306488.

SOME BACKGROUND
There is now a vast literature of photometric studies in which completeness measurements are derived, and the present discussion can hope to mention only a few examples.Soon after the advent of CCD detectors for digital imaging, the procedures that became standard methodology for determining completeness were developed (e.g.Lupton & Gunn 1986;Stetson 1987Stetson , 1991;;Stetson & Harris 1988;Drukier et al. 1988;Mateo 1988;Bolte 1989;Vallenari et al. 1994).Many of these studies particularly involved stellar populations in globular clusters, where a number of classic issues had to be confronted including highly nonuniform crowding levels and luminosity functions that rise steeply toward fainter magnitudes.
Before proceeding, the terms for completeness should be stated more precisely.In a given magnitude bin (m, m + dm), completeness f as determined from the ASTs is normally defined as the number n rec of recovered artificial stars measured in that bin, divided by the number n in of input stars originally inserted into it.This definition implicitly includes the effect of binjumping (i.e.whenever a star inserted into a particular bin is detected and measured, but its measured magnitude is different enough to move it to a neighboring bin).
In addition, f can be plotted versus either input magnitude or measured magnitude (see Stetson 1991).If the mean measurement bias δ = ⟨m rec − m in ⟩ (i.e. the difference between the recovered and input magnitudes) is small, these two versions of f (m rec ) ≈ f (m in ) will be similar.However, for the real stars, the only avail-able information consists of their measured magnitudes m rec .In the following discussion, we will therefore drop the m rec subscript and the conventions used here will be to define f (m i ) as the number of measured stars found in the ith bin (regardless of their input magnitudes) relative to the number inserted into that bin, and to plot f as a function of measured (= recovered) magnitude.
ASTs are the central tool for measuring completeness, and series of tests built on millions of artificial stars are now commonly found in the literature.The default first-order approach to running ASTs is to generate an artificial-star population randomly and uniformly distributed in magnitude, color, or spatial location, and for many situations this approach may be quite adequate.However, several studies place emphasis on generating distributions of artificial stars that mimic the luminosity function of the real objects in the image, or their distribution across the color-magnitude diagram (e.g.Stetson & Harris 1988;Depoy et al. 1993;Bolte 1994;Aparicio & Gallart 1995;Grillmair et al. 1996;Olsen et al. 2003;Harris et al. 2006;Hansen et al. 2007;Monachesi et al. 2011;Alamo-Martínez et al. 2013;Williams et al. 2013;Harris et al. 2016;Cohen et al. 2020).In other studies where the degree of crowding changes across the field, an analogous emphasis is put on matching the spatial distribution of the real objects (e.g.Mateo 1988;Bolte 1989Bolte , 1994;;Gallart et al. 1996;Radburn-Smith et al. 2011).
Aside from magnitude, which of the other features of an image are important depends very much on the particular situation.To mention only a few of many possible examples, for stellar photometry studies in nearby galaxies (e.g.Monelli et al. 2010;Williams et al. 2014;Cohen et al. 2020), much effort is concentrated on the effects of crowding.By contrast, in the ACS Virgo Cluster Survey of globular clusters (GCs) around the Virgo galaxies, Jordán et al. (2007) determine completeness fraction versus GC magnitude, sky background, and object morphology (the cluster scale size r h ), and then interpolate in the resulting 3D lookup table to find the recovery probability for any real GC in the list.Adamo et al. (2017), for the LEGUS program on young star clusters in nearby galaxies, construct similar prescriptions for completeness versus object scale size.Peng et al. (2011), in their study of GC populations in the Coma cluster, derive the completeness at any spatial location in the survey by blanketing the area with artificialstar populations, implicitly accounting for magnitude, color, and background brightness.Cases involving severe crowding have generated a large previous literature by themselves (e.g.Stephens et al. 2001;Olsen et al. 2003Olsen et al. , 2006;;Williams et al. 2014;Cohen et al. 2020;  Schlafly et al. 2019;Saydjari et al. 2023, to mention only a few).In the extreme, photometry built on deconvolution methods has shown improvements in depth and precision over more normal PSF-fitting photometric codes (e.g.Lauer et al. 2012;Dong et al. 2018).
In still other studies, a significant concern is the presence of a strong gradient of local sky brightness across the field (quite independently of crowding), for example in the measurement of star cluster populations around giant galaxies (see, e.g.Whitmore et al. 2010;Faifer et al. 2011;Escudero et al. 2015Escudero et al. , 2018;;Caso et al. 2017Caso et al. , 2019;;Ennis et al. 2020, for recent examples).Here, the issue has traditionally been dealt with by deriving completeness curves versus magnitude in distinct radial zones centered on the galaxy.This approach implicitly but roughly accounts for the radial gradient of sky background noise, which decreases with increasing galactocentric distance.For photometry within star clusters or in nearby galaxies, completeness can be a highly localized function of location, since areas near very bright stars are strongly compromised (e.g.Bedin et al. 2008;Williams et al. 2014).
In almost all of these previous studies, the completeness is estimated from ASTs binned in magnitude.The resulting trend may either be numerically interpolated across bins, or else fitted with some smooth analytic function.Examples of such functions, all of which have basically similar shapes, can be found in Fleming et al. (1995), Puzia et al. (1999), Barker et al. (2004), Alamo-Martínez et al. (2013), Harris et al. (2016), andHarris &Reina-Campos (2023).
For any binned data, however, bin-jumping may need careful attention.Alamo-Martínez et al. (2013) demonstrate especially clearly that in cases where the real objects follow a luminosity function (LF) that rises steeply toward fainter magnitudes, and where the measurement uncertainty is also inevitably rising with fainter magnitude, asymmetric bin-jumping can occur (Eddington bias) along the downturn of the completeness curve.Saydjari et al. (2023) also discuss this effect, which can even produce the interesting anomaly of a completeness level f larger than 1.0 in bins just brighter than the downturn region.As these discussions emphasize, this effect places importance on matching the LF of the inserted artificial stars to the real stars, in order to get correct relative numbers of bin-jumping stars at each magnitude level.
As a basis for developing a new model for photometric completeness, we draw from two different sets of optical/NIR photometry from the Hubble Space Telescope (HST) that are deliberately chosen to bring into play all the parameters described above.These are laid out in the next sections.
NGC 1275 is the giant and extremely active early-type galaxy (ETG) at the center of the Perseus cluster (Harris et al. 2020).As is the case for all giant ETGs, NGC 1275 has a rich population of many thousands of classic old GCs spread throughout its halo (Carlson et al. 1998;Penny et al. 2012;Harris & Mulholland 2017;Harris 2023).But unlike most such systems, its inner ∼ 20 kpc also has an extensive, massive network of gas and dust filaments that hosts a large population of young stars and star clusters (Canning et al. 2010(Canning et al. , 2014;;Lim et al. 2020).At the Perseus distance of 75 Mpc the star clusters are unresolved and appear near-starlike in structure even with the spatial resolution of HST (Harris et al. 2020).This last feature fortunately allows stellar photometry codes like DAOPHOT (Stetson 1987) or DOLPHOT (Dolphin 2016) to be used very effectively (cf. the references cited above).

HST Photometry
The photometry of the NGC 1275 cluster system is taken from the raw data described in Harris et al. (2020) and the field is shown in Figure 1.These HST images include exposures in the filters (F475W, F814W) from the ACS/WFC camera, one orbit per filter and three sub-exposures per orbit, with total exposure times of 2436 sec in F475W and 2325 sec in F814W.Previous photometry of the system is reported in Harris (2023) but the measurements were redone for the present study through the photometric packages in DOLPHOT.The reference image used to detect objects for the photometry was adopted as the deep *.drc F814W image from the MAST Archive, where the three individual exposures were combined through AstroDrizzle.
DOLPHOT is a versatile code especially tuned for HST imaging data, and detailed discussions of its many input parameters can be found in, e.g., Dalcanton et al. (2009), Monelli et al. (2010), Radburn-Smith et al. (2011), Williams et al. (2014), andCohen et al. (2020).A number of the parameters are important particularly for photometry in extremely crowded fields.However, as will be seen below, the star cluster population in and around NGC 1275 is not strongly crowded across most of the field and so the choices are less critical.3Objects were detected on the original reference image, without removal of the overall light profile of the galaxy.
From the raw DOLPHOT output measurements, any objects not detected on both filters were rejected, as were objects with signal-to-noise ratio SNR < 4, or ones that were non-starlike in morphology.The latter culling step was done with the DOLPHOT chi and sharp parameters as illustrated in Figure 2.These figures show the distribution of detected objects before any culling is done.Objects lying outside the boundary lines shown in the figure are mostly faint background galaxies that are visibly nonstellar, extended or asymmetric, and thus with excessively negative sharp values; these were rejected.At fainter magnitudes the sharp values for stars inevitably spread more, so the rejection boundaries were drawn to reflect the observed scatter.The boundary lines were set fairly conservatively because many of the GCs are very slightly more extended than the pure PSF and thus have small negative sharp values and χ > 1; see Harris et al. (2020) for a detailed discussion and measurement of cluster sizes.The brighter objects in the final culled list of objects are marked in Fig. 1, and their distribution in the color-magnitude diagram (CMD) is shown in Figure 3.
Classic old GCs are found in large numbers in all giant galaxies studied to date (e.g.Larsen et al. 2001;Peng et al. 2006;Harris 2009;Harris et al. 2016Harris et al. , 2017;;Harris 2023;Hartman et al. 2023).Their range of colors is primarily driven by GC metallicity, and the color distribution function (CDF) usually can be seen to have distinguishable 'blue' and 'red' vertical sequences in the CMD that correspond to mean metallicity values near [Fe/H] ≃ −1.5 (blue) and −0.5 (red).In Fig. 3 these sequences are centered near the colors F475W − F814W ≃ 1.6 and 2.0.In giant galaxies particularly, these two subgroups each have broad enough internal ranges of metallicity that the sequences overlap in color (see Larsen et al. 2001;Peng et al. 2006;Harris 2023;Hartman et al. 2023, for numerous examples and statistical tests).However, for NGC 1275 an additional feature not seen in most other giant ETGs is the presence of many young blue clusters (color indices ≃ 0.6 − 1.2 in Fig. 3), which are found in the inner halo and are associated with the gas and dust filaments as mentioned above.These clusters increase the total range of colors in the entire sample of objects, which is a useful feature of the data for our testing purposes.

Artificial-Star Tests
To provide the raw data for evaluating completeness, two sets of artificial stars following different input luminosity functions (LFs) were used.The first set consisted of 2.6 × 10 5 artificial stars that randomly and uniformly covered the magnitude and color ranges 23 < F814W < 30, 0.0 < F475W − F814W < 3.0 (i.e. a flat LF).The second set consisted of 2.0 × 10 5 stars  following an exponentially rising LF n(m) ∼ e am .To generate the latter over an adopted magnitude interval (m 0 , m 1 ), first n numbers {q i } were produced following a random uniform distribution from 0 to 1. Then the magnitudes {m i } given by the transformation are distributed exponentially as desired.For this case we use a = 0.7, representing quite a steep rise.
Both input LFs are shown in Figure 4.In each case the fake stars were inserted one-by-one into the images, and then remeasured through DOLPHOT with exactly the same culling criteria as described above.The fake stars were also spatially distributed following a centrally concentrated radial profile that generated roughly the same numbers of stars within radial annuli of the same width.
In Section 2, emphasis was placed on the need to design ASTs that should match the LF of the target objects.The two input LFs were chosen to roughly bracket the expected LF of the real data.For the NGC 1275 GC population, the LF should not necessarily follow rising numbers towards fainter magnitude, because the real data have a completeness limit near the classic turnover of the GC LF at F814W ∼ 27 (Harris et al. 2020), so at fainter magnitudes the numbers of GCs will actually decrease past the completeness limit.However, this decline will be offset by the numbers of faint background contaminants, which rise at fainter magnitudes.
In Figure 5, the CMDs of both the input and recovered measurements are shown for the rising LF.Near a magnitude of F814W ≃ 28, a transition zone is apparent through which the numbers of objects that were successfully recovered gradually die away either because they are too faint to be detected in one filter or the other, or else detected but then rejected on the basis of low SNR, high chi, or high sharp values.The AST data were binned in steps of 0.2 mag, and the completeness fraction f (m) in each bin was plotted versus magnitude.We illustrate the results using the F475W magnitude (though either magnitude could be used equally well).The curves for four different radial zones around the center of the galaxy, and for both input LFs, are shown in Figure 6.For the innermost zone (r < 20 ′′ = 400 px), even moderately bright stars are incompletely recovered because of the very bright and complex background light from the central bulge of the galaxy.Outside this zone, however, the background surface brightness of the galaxy falls away rapidly, and the completeness curves converge more closely together.As seen in the Figure, the completeness curve results are similar for both input LFs.The stronger effect is the change from one radial zone to the next, pointing to the desirability of a general solution depending on local sky brightness or sky noise that will not impose radial binning.
The NGC 1275 data have suitably wide ranges in color and local sky noise, but they do not enable strong tests of the crowding parameter.A useful example of a single field that displays a larger range of crowding is the HST ACS deep imaging of the halo of NGC 3377.This galaxy is an intermediate-sized ETG in the Leo group at d = 11 Mpc, at high Galactic latitude (b = 58 o ) and low foreground extinction (A V = 0.09).The raw HST images comprise total exposures of 38500 sec in F606W and 22260 sec in F814W and were originally used to measure the metallicity distribution of the resolved red-giant-branch (RGB) stars in the halo (Harris et al. 2007).Since NGC 3377 is not a giant galaxy, its scale size is small enough (R ef f ∼ 2 kpc ≃ 38 ′′ ) that the degree of crowding changes strongly across the ACS field of view moving outward from the galaxy center.A 3-kpc portion of the field is shown in Figure 7 where the gradient of surface brightness is clearly visible (see also Figures 1 and 2 of Harris et al. 2007).
Our new DOLPHOT reductions for the NGC 3377 field are shown in Figure 8.At all galactocentric radii the CMD of the recovered objects is completely dominated by many thousands of halo RGB (red-giant-branch) stars with negligible field contamination.In all radial zones, the sharp lower boundary of the CMD is due to the SNR limit imposed in the selection stage.
To set up the data for the subsequent modelling, a set of 5×10 5 artificial stars distributed uniformly across the field was used, covering the magnitude range F475W = 23 to 36 and color range F606W − F814W = −1 to 5, and following an exponentially rising luminosity function with a = 0.7 (Eq.( 2)) to match a normal old RGB LF.
An important difference between the NGC 3377 and NGC 1275 datasets is that for NGC 3377, the RGB star population itself is the dominant source of the local sky noise.Separate parameters for sky noise and crowding are thus not needed and we use only the crowding parameter to represent both.The specific way that crowding is defined is described in the next section.

MODEL PARAMETERS
Recasting the completeness problem in terms of recovery probability p in a (generalized) linear model has two major advantages: any issues connected with binning of the data are bypassed, and as will be seen below, all the physical variables can be treated simultaneously within a single solution.The next step is to define more specifically the variables {x j } determining recovery probability that will actually be used.We will consider the following potential variables: 1.The first variable x 1 , and usually the most important one, is magnitude.For the two sample databases described above, in principle the magnitude in either of the two filters defining the CMD could be used equally well.For NGC 1275 we use the F475W magnitude, while for NGC 3377 we use F814W.
3. The third variable x 3 represents local sky brightness or background noise, which in turn determines the detection threshold.This parameter can be quantified in several ways and is discussed further below.
4. The fourth variable x 4 represents crowding.This parameter can also be defined in different ways and is discussed further below.
5. There is a potential fifth variable x 5 , which represents object morphology or scale size; essentially, objects that are more extended or of lower surface brightness are harder to detect than point sources at the same magnitude and color.This might be quantified in a simple way by (e.g.) the half-light radius or mean surface brightness of an object (e.g.Jordán et al. 2007).In the present study, however, we specifically measure only starlike, unresolved objects, so x 5 is not used.It is worth noting as well that we are working with HST imaging data where the stellar PSF is closely similar for all the images for a given filter in the same observing program, unless extremely different epochs are involved (not the case here).For ground-based imaging, the PSF will often differ between images, and thus even for stellar photometry the parameter x 5 will need to be introduced.In this case it could be quantified as the FWHM of the PSF.

Background noise
The parameter x 3 requires some additional discussion.For the NGC 1275 field and in many previous studies of this type (cf. the references cited earlier), the radius r from the center of the galaxy may be used essentially as a proxy for local sky brightness and thus for the noise σ sky .These quantities may be well correlated (see Figure 9), though not always by a simple scaling.More importantly, defining x 3 in terms of r does not account for morphological asymmetry or irregularities in the central galaxy, or for large satellite galaxies in the field (see Fig. 1) that generate their own local enhancements of surface brightness.Extreme cases where σ sky differs strongly with location but in a complex way, such as photometry of stars on a nebular background or within a star cluster (cf.Bedin et al. 2008), should also be manageable in a general procedure.Defining x 3 in terms of local sky noise goes directly to the essential quantity that determines the detection threshold, and bypasses any of these geometric or location-specific limitations.Fortunately, the relevant measurements can easily be done with standard photometry codes.For the NGC 1275 field, after some experimental trials, we adopted x 3 = ln σ sky , where σ sky is the standard deviation of the sky noise immediately around each object.4Specifically, the deep F814W reference image was used to directly measure the local value of σ sky within an annulus of 4 to 9 px (2.0 to 4.5 times the stellar FWHM) at the location of each object.To minimize any effects of crowding, the value of σ sky was taken to be the mode of the rms scatter of pixel values within the annulus after up to 10 iterations of 3σ clipping.

Crowding
An intuitively clear definition of crowding is simply the local number density of objects brighter than a given magnitude (Williams et al. 2014).Specifically, for the NGC 1275 field the number of real objects with F475W ≤ 27.0 and within r < 60 px = 3 ′′ of each object is counted (including the central object itself), and their density expressed as number per resolution element.We take one resolution element to be a circle of radius equal to the PSF hwhm or 0.05 ′′ = 1 px, so the 60-px circle contains 3600 resolution elements.It should be noted that any star can be crowded not just by other stars but also by any other type of object such as small faint background galaxies, so all detected objects within the 60-px circle brighter than the adopted threshold are counted.King et al. (1968) or Hogg (2001) describe crowding in terms of the fractional area covered by nearby objects, i.e. the inverse of their number density.A problem with either definition is that they will be affected by the photometric completeness itself, which is the very quantity we are trying to measure.Also, at very high crowding levels the number density will enter a nonlinear regime where objects overlap each other.
For the NGC 3377 field where local crowding can reach more extreme levels, an alternate definition is adopted: we define x 4 in terms of the total measured light within a specified annulus around each star.This definition implicitly includes all types of objects and is immune to incompleteness in number counts and overlapping stars.It assumes only that the LF of the RGB stars will be consistent from place to place in the NGC 3377 halo.
In Figure 10, the crowding x 4 as defined this way is plotted versus projected galactocentric radius R gc .Specifically, x 4 equals the light in an annulus from 4 to 40 px (0.2 ′′ to 2 ′′ ) around each star, as measured by the phot module in DAOPHOT, directly from  .38 shown by the magenta line, and is equivalent to the surface brightness profile of the galaxy.The 'spikes' of points extending upward from the main locus represent objects near small satellite or background galaxies, or bright stars, where the local background light is enhanced.The sparse dusting of points below the main locus shows objects near the CCD detector edges, the F814W reference image.For convenience x 4 is expressed in units of surface brightness, x 4 = (SB F814W − SB 0 ) e − s −1 arcsec −2 where SB 0 is the background sky brightness as measured in the uncrowded corners of the image at large radius.
In brief, different ways to define a crowding variable are available but they may not all be equally effective.The best choice will be not just one easily measurable but one most suited to the needs of the individual situation.

DEFINING THE MODEL
With the testbed data in hand, the next stage is to develop a framework for modelling the recovery probability p as a function of the variables listed above.Based on our earlier definition, any star is said to be recovered by the photometry if it is first successfully detected (it lies above the detection threshold) but also then successfully passes any following culling tests like those described above (detected in both filters, more than the minimum SNR, starlike morphology, etc.).Though it will not be our final version, the LR model in its basic form provides a good starting point.If we have a vector of variables X = {x i } that correspond to the physical properties listed above5 , we can model the probability of recovery as a generalized linear model

Logistic Regression
where g(p) is the logit function and β = {β i } are a set of corresponding coefficients that must be solved for (and give the relative importance of each factor).The LR expression for the probability of recovery is then (Eadie et al. 2022) where g −1 (x) is the logistic function (i.e. the inverse of the logit).The simplest possible case would be stellar photometry in a single filter, in an uncrowded field, over a uni-form sky background.In that case the recovery probability reduces to p = 1 1 + e −(β0+β1x1) , (5) leaving only the zero-point β 0 and magnitude coefficient β 1 to be solved for.As noted previously, this simple sigmoid-curve version does provide an approximate match to observed f (m) completeness curves in situations where the other factors are not changing, as in Eq.
The simple LR model in Eq. ( 4) can be used as it stands with the complete set of variables {x i } to obtain approximate solutions that will give the recovery probability p i for each real object i as a function of its magnitude, color, local sky noise, and degree of crowding.However, there are three extra issues in the data that need to be addressed: 1. Sharp CMD boundaries: The first issue is already apparent in the CMDs of both Figs. 3 and 8, which is that the completeness limit (lower boundary to the CMD) is marked by two distinct segments.For bluer objects, the boundary is set by the detection limit of the redder filter, while for redder objects the boundary is set by the limit of the bluer filter.These two boundary lines of different slopes intersect at an intermediate color where both filters have similar limits.The transition between them is sharp.This pair of boundary lines typically appears in any dataset where the exposure times in the two filters are well matched to reach similar overall depths.However, this feature means that the LR color parameter x 2 is bilinear and will not be adequately matched by a single linear function or smooth polynomial across the entire observed range of colors.
2. Nonlinearity: More generally, the simple LR model defined above assumes a logistic function g that is a linear combination of all the parameters.Just as for the color term, the other terms may not be correctly modelled by such an assumption.

Asymmetric shape:
A third issue is illustrated in Figure 11, drawn from the NGC 1275 data.In the upper panel, the completeness in the outer zone (r > 60 ′′ ) is presented in two different forms.
One is f versus the measured, recovered magnitudes (black line) binned in 0.2−mag steps: here, as stated previously f is defined as the number of stars originally put into the given magnitude bin, divided into the number of stars that were recovered in that bin, regardless of which bin those measured stars were originally placed in.That is, bin-jumping is implicitly included.The result is that the downturn portion of the completeness curve is asymmetric and does not match a simple sigmoid-type curve like Eq. 4 very closely.
Note that the other curve in Figure 11 is f plotted versus the input magnitude m in (red line).Here, f is defined differently: it is the number of stars originally put into the given bin, divided into the number of those same stars that recovered, regardless of which bin they reappeared in.The plot of f versus input magnitude is much more nearly symmetric and matches a true sigmoid curve (dashed red line in Figure 11) very well.However, as stated above, for any real data the true magnitudes are unknown and only the measured magnitudes can be used.The effects of bin-jumping are explicitly shown in the lower panel of Figure 11: here, the fractional excess equals the number of recovered stars that entered the given bin from other bins minus the number of stars that were originally in the bin but were recovered in other bins, and then divided by the number of input stars for that bin (i.e. the fractional excess is the relative gain of each bin at the expense of other bins).The large spike near F475W ∼ 28.5 fairly near the 50% completeness level is due to many fainter stars near the completeness limit that are randomly scattered upward to brighter magnitudes and thus recovered.But other such stars that were scattered downward to fainter levels are mostly unrecovered and lost (Eddington bias).This effect distorts the shape of the f −curve and in particular shifts the estimated 50% completeness point (the limiting magnitude) fainter by a quarter of a magnitude or more.
In short, because of these effects, the LR model in its basic form will not account accurately for a more general variety of completeness curves.We show this explicitly in Figure 12, where we highlight the predicted recovery probability for NGC 3377 from the best-fit simple LR model versus the original artificial star tests, binned over the CMD.The results clearly show the key issues at play: simple LR is not able to capture the sharp CMD boundaries or the asymmetric steep dropoff, and instead smoothly averages across both features (highlighted in the red box).
These results motivated us to seek an extended model capable of handling different forms of the completeness curves that do not assume symmetry or other limiting features.

From Logistic Regression to Neural Networks
We will now extend our LR model to build up the foundation of an artificial neural network (NN).In its most basic form, NNs comprise a collection of neurons, each of which takes in a given set of inputs X, transforms them with a set of coefficients γ (often referred to as   to emphasize that the transformation from the input data X to the output value χ depends on both the weights and biases γ and the chosen activation function f (x).
In this single neuron case, if we specifically choose our activation function to be the logistic (i.e.sigmoid) function in ( 4) and input the same data used in Section 6.1, the result is exactly equivalent to performing LR over our data and the output χ will be equivalent to the recovery probability p.In other words, LR can be interpreted as a NN with a single neuron using a sigmoid activation function.
We can now use this intuition to extend LR with NNs.By combining j = 1, . . ., m individual neurons together, we can create a "fully-connected" layer that takes in a given input X and returns a new collection of m outputs via Single Layer : where each χ j is the output from each individual neuron indexed by j.
Unlike the single neuron case, we can no longer interpret these outputs as a simple probability, since we ultimately are trying to predict a single number , the recovery probability p, rather than multiple numbers.To turn this set of new outputs χ = {χ j } m j=1 into a final predicted probability p then, we therefore need to apply a "probability conversion function" f p (x).The input to this function will be a single number x = γ p χ that we will take to be a linear combination of the input χ values based on an additional set of coefficients γ p .The output of this function f p (x = γ p χ) ∈ [0, 1] will then need to be bounded between 0 and 1 so that it can be properly interpreted as a probability.Without loss of generality, we can consider this function to be the sigmoid function, making the connection with LR again explicit. 6utting this together, the m neurons and their corresponding weights/biases Γ = {γ j } m j=1 are often together referred to as a hidden layer (since we do not observe their outputs directly) and the total composition of the NN (the number of neurons it has, the activation functions used, etc.) is often referred to as its architecture.In this particular case, we end up with a single-layer perception (SLP): where Γ is now a matrix whose size is equal to the number of input parameters and the number of output parameters (i.e. the number of neurons in the layer).
In the SLP model, the final classification layer can be directly interpreted as simply performing LR over a new set of inputs χ instead of the original inputs X.The previous hidden layer therefore engages in what is often called feature extraction or representation learning, i.e. learning how to process the data in order to improve the final LR.In other words, NNs can be viewed as applying LR using learned representations of the data rather than over the data directly.
Building on this intuition, we can interpret each neuron (and associated weights/biases) in the hidden layer as learning a particular feature to help interpret the data.Under the assumption that each neuron's activation function f (x) is a logistic, for example, this means that the learned features are based on applications of LR over the original input data X.In other words, we can broadly think of each neuron representing a basis function and the outputs χ representing the representation within that basis.This fact also means there is no longer a requirement that f (x) is a logistic function -as long as f p (x) is logistic, f (x) can be any set of arbitrary functions that can serve as an efficient basis to capture features in our data.This will be crucial to helping to resolve some of the issues with standard LR highlighted in Section 6.1.
As a brief aside, we can understand this general concept of a hidden layer and activation functions as a learned basis by making a broad analogy with Fourier Series.In a Fourier series, we can write down any arbitrary function as a linear combination of sine and cosine functions.However, not all functions can be represented equally efficiently in this basis, with square waves requiring many terms to get to a specified level of accuracy.This behaviour is because localized, sharp edges are difficult to represent with a globally smooth set of basis functions (sines and cosines).Likewise, the types of functions that can be built using of the outputs of previous activation functions χ also depends on the activation function itself.Sigmoid functions, for instance, are smooth and continuous and therefore, just like with Fourier series, serve as more efficient representations of similarly smooth functions.Other activation functions (see Section 6.3) can serve as efficient representations for different types of complex functions.
To continue, we finally introduce the general multilayer perception (MLP) model, where multiple hidden layers are present before the final probability conversion  11).The bottom panels show the behaviour over the measured (known) magnitudes, which show the NNs can reproduce the same complex behaviour highlighted in Figure 11.layer: p Probability Layer (Logistic Regression) (9) Using our intuition from the simpler SLP model with one hidden layer, we can interpret our MLP model as simply having a more involved feature extraction process where, at each layer, a set of basis function representations are learned, those are applied to the input data, and the outputs are passed on to the next layer (where the process is repeated).
Based on the above derivations, the main connections between LR and NNs that motivate our final model can be summarized as follows: 1.Both methods (eventually) apply a logistic function (with learned coefficients) to convert inputs into output probabilities.Indeed, a NN with a single neuron using a logistic activation function is identical to LR.
2. The hidden layers that are present in a NN prior to the final probability conversion (logistic regression) layer allow the NN to learn features from the data in order to improve the final LR operation.More hidden layers allows for learning potentially more complex and/or non-linear features.
3. The types of features NNs can learn at each layer are based on the activation functions used for the neurons in that layer.In other words, each layer can be broadly interpreted as a basis function ex-pansion and the individual neurons as individual basis functions.

Defining Our Neural Network Model
With the motivation from Section 6.2 in hand, we will now proceed to define the exact NN-based model used in this work.After some testing, we found that the bestperforming NN was a simple, fully-connected network with three layers, each with 200 neurons.Due to the sharp and asymmetric recovery boundaries, we found that the overly smooth basis functions such as the logistic or tanh were unable to capture the sharp transitions in the CMD very efficiently.As a result, we opted to use the rectified linear unit (ReLU) as the activation functions across all three layers, which is linear when β i X > 0 and 0 otherwise.This allows the hidden layers of our NN to model the data using the subset of all piece-wise linear functions, which is able to better accommodate rapid local changes in the recovery probability across the CMD seen in Figure 12.We train our neural network using scikit-learn (Pedregosa et al. 2011) to optimize the classification loglikelihood ln L(θ) (i.e.minimize the log-loss LL(θ)) where the log-likelihood is defined as and where y i = 1, 0 states whether the ith object was recovered (1) or not (0), p(X i ; θ) is the recovery probability predicted from the NN given the weights/biases θ, and the sum over i is taken over all objects n in a given training batch.We opt to train using 80% of the AST data (with the remaining 20% reserved for validating/testing performance) with a batch size of n = 256.We apply a small L2 regularization of α = 10 −4 to prefer slightly sparser distributions of weights/biases.We perform the optimization itself using the adam optimizer (Kingma & Ba 2015), which uses a combination of stochastic gradient descent (SGD) and momentumbased strategies.We found that getting the NN model to reproduce the exact recovery fraction across both magnitude and color to high precision required a stepped training procedure involving a learning rate schedule (i.e.time-dependent learning rate) accompanied by different amounts of training data.The learning rate a(t) at a given iteration multiplies the gradient, reducing the overall step taken after processing a particular training batch.We found a multi-step approach where adjusting the learning rate along with the quantity of input data helped the net-  16, but now for NGC 3377.We see the same trends but even stronger signs that the NN model is able to derive accurate estimates of the recovery fraction even in the presence of shot noise.
work learn the correct "rough behaviour" first, followed by slow adjustment period to properly reproduce all the features seen in the model data.We confirmed across a number of ASTs, training data sizes, and random seeds that this approach helped guarantee robust final solutions.
Our final approach involved three main phases: 1. Burn-in phase: In the first "burn-in" phase, we used 5% of the training data accompanied by a learning rate schedule with 10 values logarithmically spaced from 10 −2.5 to 10 −4 .We train for 50 epochs (i.e.full passes through a random reshuffling of the input training data) for each value of the learning rate.This phase helps the NN establish a good approximation to the functional form of the recovery probability.
2. Tuning phase: In the second "tuning" phase, we used 10% of the training data accompanied by a learning rate schedule with 5 values logarithmically spaced from 10 −3.33 to 10 −4 and again training for 50 epochs each.This entire process was then repeated using 20% of the training data, and then finally with 100% of the training data.This phase allows the NN to appropriately improve its predictions based on larger segments of the training data 3. Settling phase: In the final "settling" phase, we used 100% of the training data with a constant learning rate of 10 −4 and ran for 100 epoch.This process helps ensure the final network parameters are fine-tuned and remain stable.We first show the general performance of our NN model as a function of the predicted recovery probability pi = p(X i ; θ) for each object i, where we classify an object as recovered (ŷ i = 1) or unrecovered/lost (ŷ i = 0) following Here p thresh is the minimum probability threshold for an object to be considered "recovered".Based on this, we consider four standard primary metrics that gauge how often our prediction ŷi compares to the truth value y i : where 1(•) is the indicator function which returns 1 when the condition is true and 0 if it is false.In other words: • the accuracy is the total fraction of correct predictions for recovered/non-recovered across the entire sample, • the completeness is the fraction of objects we successfully (i.e."correctly") recover out of all true objects, • the precision is the fraction of objects we successfully recover out of all claimed/predicted objects, and • the F-score is the harmonic mean of the completeness and the precision.We plot these metrics for both NGC 1275 and NGC 3377 given our best-fit NN model in Figure 14.These results show accuracies for both galaxies that reach > 94%, which achieve several percentage points higher than the best-performing LR models.We find the accuracy is maximized for values of p thresh ≈ 0.5.Our model also tends to be more complete than precise at these values, highlighting the robustness of our recovery (i.e. when forced to classify individual stellar predictions, the NNs generally produce 5% to 10% false recovered objects while only "missing" 2% to 5% of true recovered objects).
The magnitude dependence of our best-performing NN models for NGC 1275 (under the ASTs produced under the flat LF) 8 and NGC 3377 is shown in Figure 15.We find that the NNs are able to reproduce very precisely the complex magnitude dependencies for both sets of data.Furthermore, we find that the NN predictions, when plotted against the underlying true (input) 8 We find that the behaviour of the NN model for NGC 1275 under the flat LF is slightly improved compared to the steep LF, mainly because the large number of faint, non-recovered sources (with virtually no chance of every being recovered) in the latter case skews the model towards slightly higher incompleteness estimates in order to more accurately capture this behaviour.None of the results presented in this Section change if this other NN model is used instead.
magnitudes, also correctly reproduce the overall logistic/sigmoid shape seen in Figure 11.Finally, we highlight the behaviour across the measured CMDs from the ASTs for NGC 1275 and NGC 3377 in Figures 16 and 17.We see that the NN models are able to accurately reproduce all of the features that LR struggled with in Figure 12: the strong asymmetric features, sharp, broken boundaries, and additional structure even at brighter magnitudes.Together, these demonstrate that our NN-based approach succeeds where our simple LR failed, and serves as a qualitatively and quantitatively accurate model of the associated selection function.
Based on the results above, in Figure 18 we return to the real photometry for the two fields studied here.The measured CMDs for NGC 1275 (Figure 3) and NGC 3377 (Figure 8) are replotted, but now with each object colored by the predicted recovery probability p(X).What the complete NN model fit has allowed us to do here is to assign a reasonable recovery probability to each individual object in the CMD, depending on its magnitude, color, location in the image field, and local crowding level.All issues arising from binning of the data are avoided.12) is highlighted as horizontal dashed lines.The behavior of the four specific NN models shown in Figure 21 (below) is also highlighted.We find that only a few tens of thousands of artificial star tests are needed to capture the majority of the behaviour seen in the data, although additional ASTs often still help to improve performance.

Reproducing the Completeness Curve
One of the biggest concerns surrounding the use of a more complex model such as NN is that the "black box" nature of the model can substantially hinder interpretability in the output predictions.While we have provided some theoretical arguments illustrating that our NN builds on more interpretable foundations such as LR, now we want to investigate whether the model is behaving appropriately under a variety of conditions.
In Figures 15, 16, and 17, we showed that the NN model predictions agree extremely well with the true recovery fraction as a function of magnitude and color.However, these plots might be hiding other troubling behavior in individual predicted probabilities, and also do not display very clearly the impact of crowding and sky noise as opposed to magnitude and color.
To address these questions, we also investigated the behavior of our NN predictions for the star-by-star probabilities p(X) and the average recovery fraction f (X) across all of our inputs.The results are shown in the panels of Figure 19, which demonstrate that the NN models are able to reproduce the binned average recovery fractions across all parameters extremely well.The figures also demonstrate the extent to which the NNs are able to learn complex dependencies that can be easily overlooked when only binned averages are used.For instance, in the NGC 1275 field the model is able to successfully reproduce the slight incompleteness even at bright magnitudes that is caused by the very high background sky noise in the central bulge of the galaxy (see Figure 6).This effect shows up as a low-probability cloud in the magnitude-based plot (upper left panel of Fig. 19).Similarly, for the NGC 3377 field, we instead see that the NN is able to reproduce a complex distribution for individual recovery probabilities across magnitude, colour, and background density, where two population trends emerge from the more complex selection function across the CMD (see Figure 12).Most of these complexities end up being completely masked if we look, e.g., only at the mean behaviour as a function of magnitude f (m) instead of the distribution of p(X).

Artificial-Star Testing: How Much is Enough?
We now want to use our model to discuss a general and very practical question that is rarely addressed in previous literature: how much is enough?That is, how many ASTs do we need to ensure we can accurately characterize the recovery probability?To investigate this question, we train new NN models with widely differing numbers of input artificial stars for the three sets of ASTs discussed in Sections 3 and 4: ASTs for NGC 1275 drawn from a flat LF, ASTs for NGC 1275 drawn from a steep, rising LF, and ASTs for NGC 3377 drawn from a steep but truncated LF.To ensure stable results, each model was trained with the exact same procedure as described in Section 6.39 and with training sets that were explicit subsets of each other, i.e. the training sets were not shuffled between objects to ensure that (Training Set 1) Our results are summarized in Figure 20, where the overall classification accuracy as defined in ( 14) provides a convenient way to evaluate the effectiveness of each test run.Accuracy improves with larger numbers of input stars, but the rapid gains that occur in the small-N regime where N ≲ 10 4 input stars then start to sat-Figure 21.Similar Figure 15, but now highlighting the behaviour of the four NN models labeled in Figure 20 and with the magnitude distribution of the training (rather than testing) data shown.The 50% recovery limit in the testing set from the prediction and the data are also plotted as red and black vertical lines, respectively.These results help to illustrate how the NN is able to learn from increasing numbers of ASTs, and how many ASTs might be needed for various tasks.
urate and level off for larger N .Overall, we require only a few thousand ASTs for our NN model to exceed the performance of our initial LR benchmarks (see Section 6.1 and Figure 12).The exact number depends on how complex the simulated recovery behaviour is: the more complex the behaviour, the fewer ASTs are needed to get substantial performance gains and to outperform simple LR.
To understand these results more fully, in Figure 21 we show the estimated and true recovery fractions for a subset of the NN models trained over the NGC 1275 ASTs (flat input LF).With an input sample of only a few hundred stars to train over (top panel), we find that although the performance is poor (well below the LR benchmark), the NN is still able to estimate the 50% recovery fraction limit relatively accurately.As we increase our training set to a few thousand stars (top middle panel), we find the NN broadly learns to mimic the behaviour of a simple LR model, such as the one shown in Figure 12.Once we start using tens of thousands of stars (bottom middle panel), we find the NN is successfully able to reproduce almost all major features seen in the data.Finally, once we move into hundreds of thousands of stars (bottom panel), we find the NN is able to reproduce the full complexity of the input ASTs.
While these results have been derived with our NNbased approach, they broadly imply that the size of ASTs one needs depends on how precisely one wants to estimate the recovery fraction.If it is enough just to estimate the 50% recovery limit to within 0.2 mag or so, then only a few thousand input stars might be enough.If one wants to account more accurately for populations all the way down to (or even below) the 20% recovery level, however, then likely N > 10 5 artificial stars will be required.
In addition, we have found that it is not necessary to overload the AST population with stars that are far fainter than the completeness limit where they have no chance of recovery.numbers of such stars will have the unwanted effect of pulling the solutions away from the more important goal of matching the downturn region of the completeness curve.Furthermore, if too many faint unresolved stars are added all at once to the image, they can even change the intrinsic level of sky noise.

Handling Multiple Filters: Extending the Model
Many photometric studies employ just two filters, yielding a magnitude and one color index.This is the case our current discussion focusses on.Magnitude and color are logical choices as input variables to a LR or NN model because they represent two astrophysically different quantities, i.e. stellar luminosity and effective temperature.But a natural question is how the NN model could be extended to photometric studies with three or more filters, such as the SDSS survey in ugriz (York et al. 2000, and a vast series of later papers) as perhaps the most prominent example.
Multiple (N f ≥ 3) filters open up a wider range of possibilities for the choice of variables.From a strict computational viewpoint, the NN algorithm is designed to be able to handle these cases, but it is worth considering carefully what is actually wanted from the solution.One option might be to replace the (magnitude+color) pair with N f magnitudes, treating them all as if they were independent.A practical issue to consider may be that the measured magnitudes from the various filters will often be strongly correlated with each other (unless their central wavelengths are so far apart that they sample widely different parts of the spectrum), so that in fact they are not adding fully independent pieces of information into the solution.(However, although their magnitudes may be correlated, their measurement uncertainties are usually not.)Another option could be to use one magnitude (probably the one giving the deepest data) and (N f −1) color indices.Again, the different colors will often be well correlated with each other if they sample parts of the spectrum that are not too widely separated.Still another option might be to use just one magnitude, and just one color index from the two filters most widely separated in wavelength.
For any of these choices, we assume that the effective limits of the photometry are similar in all the bands, so that most of the original catalog of sources that one is interested in will have useful measurements in every filter.If that is not the case, then the completeness of the data will be dominated by the shallowest of the filters.In that case, careful consideration should be given to which bandpasses should actually be included in the analysis.
There is no ideal single solution to the multiplebandpass situation that would clearly be best for all possible alternatives.Above all, it is important to know the features of one's own database, to have a clear view of what kind of result is desired, and to adapt the choices of input variables accordingly.Given that useful assessments of completeness can be obtained from surprisingly small ASTs (see above), our recommendation would be to use the preliminary testing stage to experiment with different combinations of input variables and settle on a combination that maximizes the classification accuracy.

SUMMARY
In this paper, we explore the application of a neural network to the problem of completeness for optical/NIR photometry.As testbeds for developing the model, we use HST multicolor imaging data for the star cluster population around the Perseus giant galaxy NGC 1275, and for the RGB stellar population in the nearby earlytype galaxy NGC 3377.The main conclusion of this study is that neural networks (NNs) provide an attractive approach by comparison with conventional methods for measuring photometric completeness, such as logistic regression (LR) or binning as a function of magnitude and color.
Within the NN model, the effects of the parameters that govern the recovery fraction -magnitude, color, local sky noise, and local crowding -can all be handled simultaneously within a single model.The degree of completeness can be determined for objects located anywhere in the field, and at any location on the colormagnitude diagram, from a single functional form.The solutions have the additional advantage that they clearly reveal the relative importance of each parameter in the given situation.
It will be apparent to the reader that mapping out the recovery fraction of the photometry as a function of all the variables (magnitude, color, local sky and crowding) is essential for several purposes in later data analysis.For example, comparing the luminosity function of the objects in different places within the field, or more generally their distribution in the CMD or color/color diagram, must account for the way the completeness changes across the distribution.Deducing the range of underlying source parameters such as mass, metallicity, age, or reddening follows as well.
Artificial-star tests continue to be the key to revealing the pattern of recovery completeness.From the experiences gained in this study, we recommend the following approach to running ASTs: 1. Do a initial AST run of ∼ 10 3 input stars, enough to find the key magnitude range over which the completeness drops from one to zero; 2. Carry out a second major run of ∼ 10 4−5 stars that is more carefully designed to cover the range of magnitudes, color, sky noise, and crowding that are present in the real images.
The NN code and the full AST data used in this study can be found at Zenodo at https://doi.org/10.5281/zenodo.8306488under DOI: 10.5281/zenodo.8306488.The code is named COINTOSS (COmpleteness wIth Neural networks TO Simulate Stellar recovery).The HST data presented in this article were obtained from the Mikulski Archive for Space Telescopes (MAST) at the Space Telescope Science Institute.The spe-cific observations analyzed can be accessed via DOI: 10.17909/wb05-2618 and DOI: 10.17909/kxhv-aj70.

Figure 1 .
Figure 1.HST ACS image centered on the Perseus central giant NGC 1275.The field shown is 200 ′′ across, equivalent to 75 kpc at the distance of Perseus.The maroon dots marks positions of star cluster candidates (F814W < 25.5, F475W− F814W < 2.4; see also the color-magnitude diagram in Fig. 3).

Figure 2 .
Figure 2. Left panel: DOLPHOT values of sharp plotted versus magnitude, for each of the two filters.All detected objects are plotted here before any culling.Non-stellar, extended objects scatter downward to lower sharp values.The red lines denote the culling boundaries.Right panel: DOLPHOT values of chi versus magnitude.Normally χ−values near 1 indicate objects that ideally match the point-spread function, while higher χ−values indicate objects that are poorly fit by the PSF.

Figure 3 .
Figure 3. Color-magnitude diagram (CMD) for the star cluster population around NGC 1275.No correction has been made for foreground reddening.The relatively sharp cutoff of the data at the faint end represents the signal-to-noise ratio SNR = 4.The second and third panels show the CMD for inner and outer radial zones: in the inner zone, the bluer clusters with (F485W-F814W) ≲ 1.2 are relatively more prominent, and the limiting magnitude of the photometry is brighter.

Figure 4 .
Figure 4. Luminosity functions (LFs) for the two sets of artificial-star tests run on the NGC 1275 images.The first set (blue) assumes a flat LF while the second set (magenta) assumes an exponentially rising LF towards fainter magnitudes.The vertical dashed line indicates the magnitude level at which the recovery fraction is approximately 50% depending on radial zone and color (see text for additional details).

Figure 5 .
Figure 5. Color-magnitude diagram in F814W versus F475W−F814W for artificial stars following the exponentially rising LF.Left panel: The input magnitudes and colors of the artificial stars.Right panel: Measured values for stars that were successfully recovered (i.e. were detected and passed the culling tests).

Figure 6 .
Figure 6.Completeness as a function of magnitude for the artificial stars in four radial zones around NGC 1275, for the AST data binned in 0.2-mag steps.Solid lines show the results for the exponentially rising input LF, while the dashed lines show the results for the flat input LF.Black lines are for radial zone 1 (r < 20 ′′ ), blue for zone 2 (20 ′′ − 40 ′′ ), green for zone 3 (40 ′′ −60 ′′ ), and red for zone 4 (r > 60 ′′ ).Zone 1 is heavily affected by background sky noise in the central bulge of the galaxy, reducing the completeness fraction even for relatively bright objects.Note that at brighter magnitudes, the curves for the rising LF are statistically noisier because the numbers of input stars are smaller than for the flat LF.

Figure 7 .
Figure 7.A portion of the field in the halo of NGC 3377, imaged with HST/ACS.The field shown is 55 ′′ across (about one-quarter of the width of the ACS field of view), corresponding to a linear distance of 3 kpc.Large numbers of halo red-giant stars are scattered across the image; the galaxy center is off the field to the lower right.See also Figures 1 and 2 of Harris et al. (2007).

Figure 8 .
Figure 8. Color-magnitude diagrams for the NGC 3377 halo, in four different zones of projected galactocentric radius Rgc.The innermost edge of the ACS field is at 1.6 ′ from the galaxy center.

Figure 9 .
Figure9.The standard deviation of local sky noise plotted versus galactocentric radius, for the measured objects around NGC 1275.Here σ sky in adu is measured on the F814W reference image within an annulus of 4 to 9 px around each object.Note the 'spikes' of higher noise at large radius where satellite galaxies are located.

Figure 10 .
Figure 10.Dependence of local crowding level on galactocentric radius Rgc (arcmin), for all measured stars in the NGC 3377 field.The crowding density parameter x4 is defined as the local mean surface brightness in the image around each object after background subtraction, as described in the text.The trend follows a simple power-law form x4 ∼ R −2.38 shown by the magenta line, and is equivalent to the surface brightness profile of the galaxy.The 'spikes' of points extending upward from the main locus represent objects near small satellite or background galaxies, or bright stars, where the local background light is enhanced.The sparse dusting of points below the main locus shows objects near the CCD detector edges,

Figure 11 .
Figure 11.Upper panel: Comparison of completeness fraction plotted versus either the measured magnitudes (black line) or the input magnitudes (red line).See text for the exact definitions of each fraction.The dashed red line shows the best-fit sigmoid function fit to the binned f versus input magnitude.Lower panel: Fractional excess of recovered stars for each bin, gained at the expense of other bins: see text for explanation.

Figure 12 .
Figure12.Demonstration of the inadequacy of a simple LR model for NGC 3377.The left panel shows the average recovery fraction binned across the CMD, which highlights sharp cutoffs in colour and magnitude as well as asymmetric complex behaviour (highlight with a red box).The right panel shows the predictions from a simple LR model which incorrectly smooths over both of these behaviours, leading to an overly broad completeness transition that is "tilted" in the CMD.

Figure 13 .
Figure 13.An example of our multi-step training scheme for our neural network (NN) model (see Section 6.3) over artificial star data from NGC 1275.The black curve shows the average (log-)loss values over the training set as over each epoch, while the red curve shows the learning rate associated with each epoch.The behaviour for the burn-in phase, our three tuning phases, and the final settling phase are highlighted in shades of red, yellow, and blue, respectively, along with the associated amount of training data used in each phase.

Figure 14 .
Figure 14.The accuracy (black), completeness (red), precision (blue), and F-score (purple) for the NN models trained over NGC 1275 assuming a flat LF (top) and NGC 3377 (bottom), plotted as a function of the probability threshold used to decide whether an object is successfully recovered or not.The peak accuracy values are highlighted as an orange star.Both models are able to achieve > 94% accuracy with stable performance over a wide range of values.
weights and biases), and then outputs the results based on an activation function f (x).In essence, each neuron performs the operation f (x = γX), which we will write as Single Neuron : X Input Weights and Biases γ − −−−−−−−−− →

Figure 15 .
Figure 15.The predicted completeness/recovery fraction f (m) from our best-fit NN models binned in magnitude (orange line) versus the true completeness fraction f (m) from the ASTs (black line) for NGC 1275 (left) and NGC 3377 (right).The input magnitude distribution versus the recovered magnitude distribution are shown for reference in light gray and blue, respectively (i.e. the black line is the ratio of the blue and gray).The top panels show the behaviour as a function of the (unknown) input magnitudes min, which behave much like a logistic function (see Figure11).The bottom panels show the behaviour over the measured (known) magnitudes, which show the NNs can reproduce the same complex behaviour highlighted in Figure11.

Figure 16 .
Figure 16.The dependence of stellar recovery over the measured CMD from the NGC 1275 ASTs.Top left: The individual input artificial stars classified by whether they were recovered (purple) or not (gray).Top right: The recovery fraction estimated in bins across the CMD (note that the colour scale is non-linear to emphasize the behaviour closer to 1).Bottom left: As the top left panel, but now coloured by the predicted recovery probability from our NN model.Bottom right: As the top right panel, but now coloured by the predicted recovery fraction from our NN model.We find excellent agreement across the CMD, with the top/bottom right panels providing strong evidence that the NN predictions are able to smoothly interpolate over the true recovery fraction even in the presence of shot noise.

Figure 17 .
Figure17.As Figure16, but now for NGC 3377.We see the same trends but even stronger signs that the NN model is able to derive accurate estimates of the recovery fraction even in the presence of shot noise.

Figure 18 .
Figure 18.The measured CMDs for NGC 1275 (left) and NGC 3377 (right) with individual objects coloured by the predicted recovery probability from our best-fit NN models.As in the previous graphs, note the nonlinear color scale.The behaviour of the loss function over time for one of the networks trained over the NGC 1275 artificialstar data is shown in Figure13, highlighting the varying loss rates and overall performance over the training set.The exact NN models we trained 7 along with the entire collection of ASTs can be found on Zenodo at https: //doi.org/10.5281/zenodo.8306488.

Figure 19 .
Figure 19.Similar to Figure 15, but now plotting the predicted recovery fraction (orange) and the true recovery fraction (black) against all input parameters for NGC 1275 (top) and NGC 3377 (bottom), with the density of individual predictions shown in blue tones.This includes magnitude (far left), colour (center left), sky noise (top center right), crowding (top far right), and background density (bottom right).These individual panels highlight that our NN model predictions behave sensibly in not only magnitude (declining steeply near the detection limit) but also colour (declining outside of the main colour range) and sky noise/crowding (declining in noisier and more crowded regions).They also highlight that mean trends can mask a lot of complex behaviour that the NN model is able to successfully capture in the predicted individual recovery probabilities.Note that in the upper right panel, the vertical streaks of points are due to the quantized nature of the crowding variable for NGC 1275.

Figure 20 .
Figure 20.The change in the classification accuracy (recovered vs lost with a decision boundary of p(X) = 0.5) for our NN models based on the size of the training set for the three AST scenarios explored in this work: ASTs for NGC 1275 drawn from a flat LF (red), for NGC 1275 drawn from a steep LF (purple), and for NGC 3377 drawn from a steep but truncated LF (blue).The accuracy for each scenario from a simple logistic regression model (as shown in Figure12) is highlighted as horizontal dashed lines.The behavior of the four specific NN models shown in Figure21(below) is also highlighted.We find that only a few tens of thousands of artificial star tests are needed to capture the majority of the behaviour seen in the data, although additional ASTs often still help to improve performance.