The Panchromatic Hubble Andromeda Treasury: Triangulum Extended Region (PHATTER). III. The Mass Function of Young Star Clusters in M33

We measure the star cluster mass function for the Local Group galaxy M33. We use the catalog of stellar clusters selected from the Panchromatic Hubble Andromeda Treasury: Triangulum Extended Region (PHATTER) survey. We analyze 711 clusters in M33 with $\rm 7.0$ 3.0 as determined from color-magnitude diagram fits to individual stars. The M33 cluster mass function is best described by a Schechter function with power law slope $\alpha = -2.06^{+0.14}_{-0.13}$, and truncation mass log($M_c/M_{\odot}$) $= 4.24^{+0.16}_{-0.13}$. The data show strong evidence for a high-mass truncation, thus strongly favoring a Schechter function fit over a pure power law. M33's truncation mass is consistent with the previously identified linear trend between $M_c$, and star formation rate surface density, \SigSFR. We also explore the effect that individual cluster mass uncertainties have on derived mass function parameters, and find evidence to suggest that large cluster mass uncertainties have the potential to bias the truncation mass of fitted mass functions on the one sigma level.


INTRODUCTION
Star clusters are fundamental probes of galaxy evolution and the process of star formation. Ancient globular clusters trace the halos and formation histories of galaxies (e.g. Kruijssen et al. 2019), while young clusters trace the quantity and characteristics of on-going and recent star formation. For young clusters, studies have shown a correlation between the star formation rate (SFR) surface density, Σ SFR , and the fraction of stars that form in clusters (Larsen 2009;Johnson et al. 2016;Adamo et al. 2017). More intense star formation leads to a higher fraction of a galaxy's stars being formed in star clusters. Because of this link, star clusters encode a record of galactic star formation activity in their population characteristics (e.g. Johnson et al. 2017). Furthermore, studying the properties of star cluster formation as a function of galactic environment can help us better understand the star formation process -for example, regarding the efficiency of star formation and the role of stellar feedback (e.g., Grudić et al. 2021).
One measurable trait of a star cluster population is the cluster mass function (CMF). The mass function for young star clusters has been observed to be consistent with a power law distribution (dN/dM ∝ M α ) where α ∼ −2 (Portegies Krumholz et al. 2019); this is similar to the giant molecular cloud mass function, but due to hierarchical cluster growth, mapping between the two mass functions is difficult (e.g Longmore et al. 2014). The measurement of CMFs is complicated by a number of factors. First, different mass ranges of clusters are studied in different galaxies (e.g. Bik et al. 2003;Zhang & Fall 1999;Larsen 2009). Second, different methods for identifying, and measuring cluster ages and masses can significantly impact CMF measurements (see recent review Krumholz et al. 2019).
Initially, the observed power law slopes of CMFs suggested that there may be a universal power law CMF. However, increasing evidence suggests that the masses of young clusters deviates from a power law at high masses, and instead young star clusters follow a Schechter (1976) function with an exponential truncation (dN/dM ∝ M α exp(−M/M c ); Larsen 2009;Bastian et al. 2012;Adamo et al. 2015;Johnson et al. 2017;Adamo et al. 2017;Lieberz & Kroupa 2017;Messa et al. 2018).
Constraining the Schechter truncation mass can be difficult due to small number statistics of massive clusters and small predicted differences between Schechter and non-truncated power law models. Some studies continue to favor a power law model or very large Schechter truncation masses (Chandar et al. 2016;Cook et al. 2019;Mok et al. 2019;Whitmore et al. 2020). However, a growing number of truncation detections using high-quality cluster data and strong statistical fitting techniques anchor a growing body of evidence favoring a Schechter CMF. The M c of 8.5 × 10 3 M determination in M31 made using a sample of 840 clusters with ages between 10-300 Myr (Johnson et al. 2017) provides particularly convincing evidence in favor of a truncation (Krumholz et al. 2019).
Like the fraction of stars that form in clusters, the truncation of the CMF also appears to vary with the intensity of star formation. In the observations cited above, clusters in high intensity star formation environments follow power law mass distribution extending up to ∼10 6 M , while clusters in more normal star-forming galaxies have CMFs with highmass truncations at ∼10 5 M . The truncation masses are even lower in relatively quiescent galaxies, which form very few high mass clusters. Using a handful of measurements with reliable truncation detections, Johnson et al. (2017) find that there is a nearly linear relationship between Σ SFR and M c , where M c ∝ Σ SFR 1.1 . A trend between the maximum star cluster mass scale and a galaxy's star formation properties is not surprising. High Σ SFR is physically connected to increasing Σ gas and pressure, both of which are associated with increased star formation efficiency, cluster formation efficiency (or bound stellar fraction), and resulting stellar density (see e.g., Elmegreen 2009;Kruijssen 2012;Reina-Campos & Kruijssen 2017;Elmegreen 2018;Grudić et al. 2021). While the development of theoretical models to predict and understand the maximum cluster mass scale are still on-going, there is increasing consensus that variations and trends like the ones we observe are expected given our current understanding of star formation. However, we are motivated to confirm and quantify such trends to further constrain the process of cluster formation.
One notable strength of the M31 CMF study was its use of high-precision ages and masses that were inferred from resolved color-magnitude diagrams (CMDs) of member stars in each cluster. This methodology is not possible in more distant galaxies, making studies of Local Group galaxies par-ticularly valuable. Beyond the Local Group however, cluster ages and masses can only be inferred from integrated light fitting methods. These methods for determining cluster properties have been proven to be less reliable than traditional CMD isochrone fitting (Krumholz et al. 2019;Johnson et al. 2022). It is imperative to capitalize on the small number of Local Group galaxies, such as M33, where high-precision samples are accessible and obtain high quality cluster mass function determinations for these targets.
In this work, we measure and analyze the CMF for the Local Group galaxy M33. Our work utilizes data from the Panchromatic Hubble Andromeda Treasury: Triangulum Extended Region (PHATTER) survey detailed in Williams et al. (2021), and the cluster sample from Johnson et al. (2022). We determine cluster ages and masses through maximumlikelihood CMD analysis, and we fit the CMF using a probabilistic Bayesian approach. Although M33's Σ SFR is somewhat higher than that of M31, star formation in M33 is relatively quiescent compared to other galaxies with previous CMF measurements. Therefore, M33 occupies a valuable place in parameter space in the investigation of CMF truncation behavior.
We structure the paper as follows. First we present the data in Section 2, and then lay out the probabilistic approach for fitting the cluster mass function in Section 3. We will then present the CMF fitting results Section 4. In Section 5, we will compare our CMF results to other galaxies' published CMFs and further analyze the link between a galaxy's CMF and Σ SFR . We also examine the implications of mass measurement uncertainties on CMF fitting results, and show that high cluster mass uncertainties can bias M c measurements towards high values.

DATA
Our cluster sample is drawn from the Local Group Cluster Search (LGCS) cluster catalog (Johnson et al. 2022), which was created using data from the PHATTER survey (Williams et al. 2021). PHATTER uses the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) and Wide Field Camera 3 (WFC3) to image the inner region of M33's disk in six filters from UV to IR. The Johnson et al. (2022) catalog presents 1214 star clusters identified through a crowdsourced, visual search of the optical PHATTER data (F475W & F814W), facilitated by the LGCS project 1 , a citizen science effort hosted on the Zooniverse 2 platform. We adopt the recommended cluster catalog threshold ( f cluster, W > 0.674), which limits the contamination rate to 4% for our cluster sample.

Derivation of Cluster Ages and Masses
For each cluster in our sample, we extract an optical color magnitude diagram (CMD) in the F475W and F814W filters. These passbands yield the deepest CMDs available from the PHATTER data. Specifically, we use CMDs composed of stars which lie within the photometric aperture (R ap ) derived in Johnson et al. (2022), which corresponds to approximately three times the cluster half-light radius. We assume all stars within R ap are cluster members, and all members are within R ap . We make no correction for mass that lies outside R ap . Based on experiments with synthetic clusters, we expect <30% of light to fall outside our photometric aperture. The median aperture correction for clusters in our sample is -0.04 mags, with the largest correction being -0.69. This suggests we may be losing ∼4% of the light from our clusters in typical cases, and thus underestimating clusters logM by 0.02 dex. This number is much smaller than the uncertainties in our M c values derived below, and we don't correct our masses for this aperture correction. We characterize the surrounding field star population using an annulus which spans 1.2-3.4 R ap . This annulus has 10× the area of the cluster aperture. The background is fit along with the cluster models; the scaling of the background is a free parameter in the fit.
We use the MATCH software package to perform maximumlikelihood CMD fits and derive constraints on cluster properties following techniques described in Dolphin (2002). Unlike its typical use in determining time-resolved star formation histories, we use MATCH in a more limited simple stellar population (SSP) mode. Here, the model CMD is composed of a population drawn from a single time bin, rather than a linear combination of populations from multiple time bins.
The MATCH code uses theoretical isochrones to populate synthetic CMDs according to input parameters of age, dust extinction (A V ), distance, metallicity, stellar initial mass function (IMF), and binary fraction. Synthetic populations are created from unique combinations of input parameters, convolved with a model of observational completeness and noise derived from artificial star tests (ASTs), to produce a simulated CMD. This simulated CMD is combined with a background model created from the CMD of stars lying in an annulus surrounding the cluster, then both the model and background components are scaled to best reproduce the observed cluster CMD. The fit quality is calculated according to a Poisson likelihood statistic and the code iterates through a grid of input parameter value combinations to map the distribution of probability.
We adopt the M33 distance modulus of 24.67 (de Grijs & Bono 2014), and assume a metallicity [M/H] of −0.15±0.25 based on the range of present-day gas phase metallicity in M33 (e.g. U et al. 2009). For young clusters, the age is heavily weighted by the main sequence and thus has little metal-licity sensitivity. For consistency with previous studies, our assumptions in stellar modeling follow the cluster fitting of Weisz et al. (2015); Johnson et al. (2016). Briefly, we adopt a binary fraction of 0.35 with a uniform mass ratio distribution, a Kroupa (2001) stellar IMF between 0.15-120 M , and Padova stellar models (Marigo et al. 2008) with low mass asymptotic giant branch tracks from Girardi et al. (2010).
For each cluster we perform 25,000 ASTs to ensure accurate characterization of photometric completeness and noise, encompassing a wide range of CMD positions and cluster radii. Input positions for ASTs are distributed based on the measured half-light radii of the clusters, and assume a King (1962) profile with a concentration of 10.
We compute CMD fits for a grid of age (6.6 < log(Age/yr) < 9.0) and dust extinction (0 mag < A V < 2 mag), deriving the mass at each grid point from the best fit CMD model. We use relative likelihoods derived across the grid to obtain marginalized probability distribution functions (PDFs) for each parameter. We adopt the best fit model for mass, age, and A V , and assign each an uncertainty defined by the 16th and 84th percentile of the PDF. The masses derived represent the initial masses for the clusters. Figure 1 shows the full sample of best-fit ages and masses for the M33 cluster sample. We find the majority of the cluster age and mass PDFs to be Gaussian. In < 10% of cases the PDFs are bi-modal or have a tail to lower or higher values. The median 1σ uncertainty in age is 0.11 dex, while for mass the median is 0.04 dex. The full catalog of cluster parameters is presented in Table 1. For the remainder of the paper, we use the maximum a posteriori values for the cluster ages and masses; these are contained in the Best-fit column in Table 1.
While our results focus only on clusters <300 Myr, we note that there also exists of a relatively large number of massive ∼1 Gyr old clusters. A similar sample of clusters is not present in the M31 PHAT data (Johnson et al. 2016).

Cluster Selection for Mass Function Analysis
After fitting the cluster CMD's, we perform a visual inspection of each cluster's CMD fits and optical images. From this inspection, we exclude results for 33 clusters that a group of co-authors unanimously agreed were poor fits. These clusters are older clusters with few detected member stars that were poorly and erroneously fit with a young, high A V model. These clusters are denoted by an 'exclude' flag in Table 1, and are represented by the red open circles in Figure  1. We note that excluding these flagged fits does not significantly impact the CMF results (i.e., differences in parameter fits with and without these clusters are much less than 1σ). We also visually identify 13 globular cluster candidates (blue open circles in Figure 1) and exclude these objects from our sample; globular clusters are known failure cases for our CMD analysis due to limits in the age and metallicity range  Table 1 is published in its entirety in machine-readable format. A portion is shown here for guidance regarding its form and content. N MS gives the number of bright main sequence stars in the subimage with the cluster, while the Exclude Flag is set to one for clusters with bad CMD properties (see text for more details). The rest of the columns provide the results of our CMD fitting, including 16th, 50th and 84th percentiles of the marginalized 1D PDFs for each parameter, as well as the best fit parameter values.  We select a sample of clusters for mass function analysis in the age range 7.0 < log(Age/yr) < 8.5. The lower threshold is adopted because during the first 10 Myr of a clusters life, clusters are embedded, making optical observations difficult, and our sample incomplete. In addition, young embedded groupings of stars (< 10 Myr), are still forming through hierarchical merging of sub-clumps, making it unclear whether the resulting structure will be a long-lived, gravitationally bound cluster (Allison et al. 2010;Gieles et al. 2012;Messa et al. 2021). The upper threshold is where CMD fitting becomes less accurate due to the cluster's main sequence turnoff dropping below the 50% detection limit for the stellar photometry.
In Figure 2, we present CMD fitting results for two example clusters that lie at the median age (∼100 Myr; LGCS-M33 110) and maximum age (∼300 Myr; LGCS-M33 156) of the cluster sample. For each cluster, we show the observed CMD, modeled CMD, residuals, and the significance of the residuals.
We find that the uncertainty of our age estimates increase as cluster masses decrease, and that large age uncertainties also translate to less reliable mass estimates. As a result, we adopt a minimum mass of 1000 M for our CMF fitting cluster sample. Below this mass limit, the average 1σ error in log(Age/yr) is 0.30 dex for cluster masses from 2.8 ≤ log(M/M ) < 3.0, where as the average 1σ error improves to 0.21 dex for clusters with masses from 3.0 ≤ log(M/M ) < 3.2. We note that this selection of minimum mass has a less than 1σ variance in the inferred CMF.
Our final cluster sample for CMF fitting includes 711 clusters. The selected age and mass range is denoted by the green dashed lines in Figure 1, and the spatial distribution of the selected sample is shown in Figure 3. For this selected sample of clusters, the median 1σ age uncertainty is 0.10 dex and the median 1σ mass uncertainty is 0.03 dex. The distribution of cluster mass errors is discussed further in Section 5.

Cluster Sample Completeness
Once we obtained measurements of cluster ages and masses, we needed to correct the cluster catalog for com- Figure 2. Quality assessment of CMD fits to two clusters. Panels show the observed F475W-F814W vs F475W CMD, modeled CMD, residuals, and significance of the residuals for cluster IDs 110 (left four panels), and 156 (right four panels). Cluster ID 110 has a best fit age estimate of 100 Myrs, representing the median age of our final sample, while cluster ID 156 shows a somewhat older cluster showing that good age estimates are possible out to 300 Myr. (1) where k sets the slope of the logistic function (fixed to 6.02 as explained in Johnson et al. (2022)) and M 50 is the 50% completeness limit. M 50 is well described by an exponential function: where τ ≡ log(Age/yr) and τ min is the median cluster log(Age/yr) in the youngest bin, which is 7.09. The constants a, b, and c were fit through minimizing the χ 2 on the binned data set of synthetic cluster detections over the age range of 7.0 < log(Age/yr) < 8.5. The best fit M 50 (τ ) parameters are a = 0.0303, b = 1.9899, c = 2.9770, with a reduced χ 2 of 0.82. Johnson et al. (2022) finds environment plays a significant role in completeness, largely due to the density of main sequence stars in search fields that contain a cluster. We characterize the number of main sequence stars per LGCS  search image (∼36×25 arcsec) as log(N MS ), which ranges from 2.0 < log(N MS ) < 3.75 in M33. Johnson et al. (2022) finds that over this range, the 50% mass completeness is impacted by up to 0.6 dex.
We further examine the environmental completeness dependence here and incorporate it into our completeness model, by deriving a relationship between the 50% mass completeness and log(N MS ). We first exclude synthetic clusters that are greater than 2σ outliers in log(N MS ), then split the cluster into three equal bins of log(N MS ), and fit the mass completeness as a function of log(Age/yr) for each bin. We find a linear trend between the average 50% mass completeness values for median values of the three log(N MS ) bins, which we characterize using a linear function m × (log(N MS )) + b nms , where m = 0.7727 and b nms = 0.6674. We adopt the value at the bin edge for log(N MS ) values lower than the lowest bin and higher than the edge of the highest bin to avoid extrapolating where our number statistics are minimal. The 50% mass completeness as a function of log(N MS ) is shown in Figure 4.
We incorporate the environmental impact on completeness from Figure 4 into the sample completeness function by having the c parameter in Equation 2 be dependent on log(N MS ), such that our completeness function becomes, . Observed and completeness corrected mass distributions for our cluster sample. These mass distributions include clusters with mass above 10 3 M and age between 10 and 300 Myr. We present raw number counts in the dashed lines and the completeness-corrected distribution in the solid lines. The gray region represents the range of 50% completeness limits across our M33 sample. and c(N MS ) is described by, 3.47 log(N MS ) > 3.49 (4) In Figure 5, we show the completeness correction with respect to the raw data. We note that this figure is a visual representation of binned data, and the completeness correction is done on a cluster by cluster basis.

PROBABILISTIC ANALYSIS
We follow the statistical methodology of Johnson et al. (2017) and perform probabilistic cluster mass function fitting. We deviate from Johnson et al. (2017) methodology only in the application of completeness as Johnson et al. (2017) evaluates the 50% mass completeness for two age bins, while we improve this treatment to include the 50% mass completeness for each individual cluster. Using the masses of all the clusters, we run the Markov Chain Monte Carlo (MCMC) code, emcee (Foreman-Mackey et al. 2013), which takes advantage of the affine invariant ensemble sampler of Goodman & Weare (2010) to determine the functional form of the mass function through maximizing the likelihood of each cluster belonging to a mass function with given parameters. We then derive the posterior probability distributions of Schechter and power law mass function parameters.
For our MCMC calculation, we use 500 walkers, each performing 500 steps, of which we discard the first 100 burn-in steps. We ensure convergence of our chains according to the autocorrelation time, which we estimate to be 30 steps, far surpassed by our burn in period. For the power law function we report the median value of the marginalized posterior probability distribution function (PDF) for the power law slope parameter α, as well as the 1σ confidence interval representing the 16th and 84th percentile range of the marginalized PDF. For the Schechter functional form we present the median value of the marginalized PDF for the two parameters, the power law slope (α), and cluster mass cutoff (M c ), as well as the 1σ confidence interval for each parameter.

Bayesian Approach
The likelihood function for an observed cluster with mass M is given as where the cluster distribution is represented by p MF (M|θ) defined by parameters θ, and p obs (M|τ ) is the observational completeness given as a function of cluster age τ . In order to have the likelihood integrate to 1, the normalization Z is required and given by We note that this normalization is calculated for each cluster, and does not refer to Bayesian evidence. We test a pure power law distribution where the only parameter is given by θ = (α); α is defined as the power law index. We also adopt a Schechter (1976) functional form that has two parameters given by θ = (α, M c ) where α is the power law index for low-mass clusters, and M c is the characteristic mass defining the exponential high mass truncation. The Schechter (1976) distribution follows the form: as first used in the cluster context by Larsen (2009).
To limit the impact of clusters with low completeness, we cut out any clusters below their local 50% completeness limit, M 50 (τ , N MS ), as expressed in Equation 3. Mathematically, this cluster selection can be described as: We find 533 clusters have masses larger than the local 50% completeness limit.
We use Bayes' theorem to derive the posterior probability distribution function for the mass function parameters given as where p(θ) is the prior probability of the Schechter parameters, M i is a set of 533 clusters, and p cluster (θ|M i , τ ) is the combined likelihood function for the full set of clusters defined below. We adopt a uniform tophat prior for the Schechter parameters that covers the range of published values with plenty of cushion: The assumption with this implementation of a probabilistic approach is that the cluster mass uncertainties are negligible. Weisz et al. (2013) demonstrates that this assumption is valid if the fractional mass uncertainties are smaller than 10%. However, when the fractional error of the masses extends to 50% and beyond, fitting results can become affected (Johnson et al. 2017). We will further discuss this assumption in Section 5.
The likelihood function for a set of clusters is defined as the product of the individual cluster mass probabilities given by, where the normalization becomes Power Law Model: For comparison, we also fit for a pure power law form of the mass function, as opposed to the Schechter form. The corresponding likelihood function for the power law model is given by: where the normalization becomes 4. RESULTS

Schechter Function Fitting Results
Schechter function fitting results for 533 young clusters with masses greater than the local 50% mass completeness limit are shown in Figure 6. The left panel shows the completeness corrected mass distribution for the observed sample and a Schechter function overplotted in blue that uses median log(M c ) = 4.24 +0.16 0.13 α and M c values derived from marginalized posterior PDFs. We draw 100 random (α, M c ) samples from the 2D posterior PDF and plot their corresponding Schechter forms in gray to indicate the variance in the fitted parameters. The binned histogram of observed masses is used only for visualization purposes, and not used for Schechter function fitting. We find the CMF to be well described by a Schechter function with α = −2.06 +0.14 −0.13 , and log(M c /M ) = 4.24 +0.16 −0.13 . These results are based on marginalized one-dimensional posterior PDFs shown in the right panel of Figure 6. We discuss this primary result further and place it into context with previous results in Section 5.

CMF Dependencies on Cluster Properties
We test for age dependence of the Schechter function fits by dividing the cluster sample into age bins of 10-100 Myrs and 100-300 Myrs, as shown in the top panel of Figure 7. A signature of mass dependent cluster destruction would be a flattening of the low mass slope of the CMF with increasing age due to the dissolution of low mass clusters (Gieles 2009). Using clusters with masses greater than the local 50% mass completeness, we fit 254 clusters with ages between 10-100 Myrs and 279 clusters with ages between 100-300 Myrs. We present Schechter function parameters for the two age bins in the bottom panel of Figure 7.
In addition to the median values presented in Figure 7 we find the maximum a posteriori best-fit values to be, α = −2.04, and −2.03, and log(M c /M ) = 4.39 and 4.07 for the young and old samples respectively. Both values well within the 1σ range of the PDF's.
We do not observe any flattening in the slope of the CMF with increased age. There is therefore no clear evidence for mass-dependent cluster destruction being important over time scales of 300 Myrs in the central regions of M33. We do notice an increase in M c for the younger sample. In comparing the PDFs, we find an 87% probability that the younger sample has a higher M c than the older sample. This could be evidence of a non-constant SFH, with enhanced star formation in the last 100 Myr relative to the 100-300 Myr time period. However, we note that the constraints on the best-fit Schechter function parameters are significantly broader than for the full sample due to the decrease in number statistics in each age bin. Therefore, drawing strong conclusions about M33's SFH based on these measurements is difficult.
In galactic radius (Adamo et al. 2015), and in M33 there is evidence of a radial dependence on stellar age, with older ages at the center (Williams et al. 2009;Davidge & Puzia 2011). Unfortunately, this work is unable to answer this question. The region of M33 imaged by the PHATTER survey only extends to galactic radii of ∼5 kpc, and we only find 92 clusters with radii >3 kpc. Fits to radially binned samples yielded no significant trends due to these limited number statistics and the small range of galactic radii probed. Determination of any CMF radial dependence in M33 will require cluster samples that extend to larger radii.

Analyzing our Assumptions in Completeness
The parameters of the Schechter function fits are affected by our adopted completeness model. We examine the size of potential systematic errors based on our completeness corrections and sample selection. We do this by comparing our cluster-by-cluster completeness function to the simpler binned completeness function of Johnson et al. (2017).
First, we directly use the methodology of Johnson et al. (2017), binning the sample by age, and calculating the completeness on a binned basis opposed to a cluster-by-cluster basis. We split the sample into bins of 10-100 Myr and 100-300 Myrs. We find 589 clusters with masses greater than the age bins 50% completeness, which leads to fitted Schechter function parameters α = −1.90, and log(M c /M ) = 4.25. It is interesting to note that the power law slope for the binned correction is slightly outside of the 1σ range of our fitted PDF, while the M c parameter is extremely similar. Further, this power law slope is similar to the value published for M31, suggesting that having M31's completeness calculated on a cluster-by-cluster basis would bring the same power law slope for M31 and M33 into agreement.
We also analyze the impact of our choice to include environmental dependence in our completeness function. To remove this dependence, we use Equation 2, and assume the sample wide c parameter fit in Johnson et al. (2022) to be 2.9770. After recalculating the local 50% completeness limits without the environmental dependence, we find 578 clusters have masses greater than the local 50% completeness limit. Schechter function fitting results in the power law slope α flattened slightly to −1.98, with an log(M c /M ) of 4.31. However, both values are well within the 1σ uncertainty of our fitted PDF.
To conclude the comparison of our completeness function to Johnson et al. (2017), we believe our enhanced completeness function is justified. Even so, the results from a simplified completeness model seem to yield results within the 1σ uncertainty. More importantly, there is no need to refit M31, and we believe we can reasonably compare the results from the two completeness models.
All of these models only include clusters above the local 50% completeness limit. To test how varying this minimum completeness impacts our results, we varied the minimum completeness from 40% to 90%. We find for values >50% (which we tested at 55%, 60%, 75%, 80%, and 90%) that changes are within 1σ of our results presented above. How-ever, at lower completeness, the best-fit parameters started to deviate from our best-fit results. Specifically, if we lower the completeness limit to 45% the best fit parameters are a power law slope α = −2.30 and truncation mass log(M c /M ) = 4.49. If we lower it further to a 40% completeness limit we find the best fit parameters are a power law slope α = −2.49 and truncation mass log(M c /M ) = 4.68. In summary, using clusters above the local 50% completeness limit appears to give robust results consistent with more conservative completeness limits, however, including clusters at lower completeness starts to impact our results significantly.

Power Law Fitting Results and Comparison to Schechter Function Fits
In addition to fitting a Schechter function, we fit the M33 cluster sample for a power law distribution. Power law fitting results are shown in the left panel of Figure 8. We find the distribution of clusters with masses greater than the local 50% mass completeness limit to be best described by α = −2.49 +0.06 −0.06 . This slope is much steeper than the canonical -2 power law in the literature (e.g. Krumholz et al. 2019), but still overpredicts the number of high mass clusters by > 4σ.
Previous studies have used the number of massive clusters to test whether cluster masses follow a power law or Schechter function distribution (Johnson et al. 2017;Adamo et al. 2020). Implementing this same test here, we observe 16 M33 clusters above the fitted truncation mass of log(M c /M ) = 4.24. To compare the models to this observation, we calculate this same number for 10,000 synthetic distributions of clusters drawn from the PDFs of both the Schechter function and power law fits presented above.
The predictions from these synthetic distributions are shown in the right two panels of Figure 8. In both panels it is evident that the Schechter function distribution better matches the actual number of high mass clusters (i.e. 16). The middle panel shows that the median number of synthetic clusters observed above log(M/M ) of 4.24 for the Schechter function is 18 +7 −6 , and 35 +7 −5 for the power law, where uncertainties give the 16th and 84th percentiles of the distributions. All 10,000 draws from the power law distribution result in more than the 16 observed high-mass clusters. The Schechter function provides a much better fit to this population of high-mass clusters. Tests using other threshold masses indicate that these results are not sensitive to our exact threshold choice.
The right panel of Fig. 8 shows the cumulative distribution of clusters above the mass on the x-axis. We perform a Kolmogorov-Smirnov (KS) test on these distributions for both Schechter and power law distributions to assess the overall goodness of fit to observed data. Due to inherent randomness in drawing distributions, we draw 100 distributions and report the median values of cluster counts above a given mass to compare cumulative distributions to observed counts, as shown in the middle of Figure 8. The KS statistic for the Schechter function is 0.04 with a p-value of 0.99 indicating the observed distribution is consistent with the Schechter function. However, for the power law distribution the KS statistic is 0.21 with a p-value of 0.02. Thus, the test suggests that the two samples are not drawn from the same distribution. In addition to the KS test, we run an Anderson-Darling test, which suggests a Schechter function is consistent with our mass distribution with p > 0.25; on the other hand, the power law distribution has a p-value of 0.008, and thus is not consistent with the data.
One final test we use to determine which model best describes the cluster sample is the Bayesian Information Criterion (BIC) test. We find a delta BIC of 6.28. We use the Kass & Raftery (1995) criterion which classifies this delta BIC as strong evidence in favor of Schechter function parameters opposed to the power law model. We also perform this test on the two age samples in Section 4.1.1. Due to the decreased number statistics, the delta BIC is less significant than for the full sample at 5.54 and 5.64 for the young and old samples respectively, but each sample nonetheless provides positive evidence in favor of the Schechter function according to the Kass & Raftery (1995) guidelines.
Both the number of high mass clusters, and mass distribution of clusters is therefore not well described by a power law distribution, thus providing strong evidence for the existence of a truncation at higher masses.

DISCUSSION
In this section we discuss the relation between M c and Σ SFR for M33 and compare it to other galaxies. We also discuss the implications of individual mass uncertainties on CMF results both in our, and previous literature results.

M33 CMF in Context: Comparison to Observational
Results and Theoretical Predictions Johnson et al. (2017) found a clear correlation between the truncation mass of the CMF and a galaxy's SFR surface density, Σ SFR . In this section we assess whether M33 follows this trend, and compare our results to theoretical predictions of mass function truncation.
Following the methodology described in Appendix A of Johnson et al. (2017), we measure a characteristic Σ SFR for M33. We combine GALEX FUV and Spitzer 24 µm images following the prescription of Leroy et al. (2008) to produce a map of Σ SFR . We use this particular SFR prescription for consistency with other galaxy measurements, but note that in the future a CMD-based star formation history estimate will be available for the PHATTER survey region (M. Lazzarini, in prep.). We use an SFR-weighted average surface density, Σ SFR , to summarize the local, kpcscale properties of M33's disk in a way that accounts for Right: the cumulative number of clusters above the mass given on the x-axis. Black represents the observed counts in M33, which we compare to number counts from the Schechter function (blue) and power law (red) models. For the models, we draw 100 random distributions with parameters of the median fitted PDF, and plot the median of the 100 samples in solid lines.
We compare M c and Σ SFR for M33 to a compilation of measurements for other galaxies in Figure 9. We combine results from M31 (Johnson et al. 2017), M51 (Messa et al. 2018), M83 (Adamo et al. 2015), NGC628 (Adamo et al. 2017), the Antennae (Jordan et al. 2007), as well as four galaxies from the HiPEEC survey (NGC 3256, NGC 3690, NGC 4194, and NGC 6054;Adamo et al. 2020). For Σ SFR measurements, we adopt the 80% Σ SFR values tabulated in Adamo et al. (2020) for the HiPEEC galaxies, while we use measurements from Johnson et al. (2017) for M31, M51, M83 and the Antennae. While the HiPEEC 80% Σ SFR values are not a perfect match to the SFR-weighted Σ SFR measurements discussed above, this alternative form of areaweighting tends to produce a similar, focused measure of Σ SFR . Johnson et al. (2017) found a nearly linear relation between M c and Σ SFR represented by, log(M c ) = (1.07 ± 0.10) × log( Σ SFR ) + (6.82 ± 0.20). (14) We plot this relation as the black line in Figure 9. In Johnson et al. (2017), the fit was based on just four data points: M31, M51 3 , M83, and the Antennae. Here, we add six new data points that show remarkable agreement with the original fitted trend from Equation 14, with a mean and maximum resid-3 Note that the M51 Mc measurement used here was updated to the result from Messa et al. (2018) as opposed to Gieles (2009  ual of 0.24 dex and 0.81 dex, respectively. We also use the Python package linmix (Kelly 2007) to fit this new sample of observations and constrain the relation's intrinsic scatter, accounting for uncertainties in M c and Σ SFR . We derive a slope of 0.97 ± 0.13 and an intercept of 6.56 ± 0.18, which are consistent with the values fit by Johnson et al. (2017). We constrain a median intrinsic scatter of only 0.19 +0.39 −0.15 dex around the newly fit relation, which is small relative to the four dex range of both axes.
In addition to the observational results discussed above, Reina-Campos & Kruijssen (2017) present a theoretical model that extends the cluster formation efficiency (Γ) prescription from Kruijssen (2012) and predicts a maximum cluster mass based on three environmental observables: the gas surface density, the epicylic frequency, and the Toomre Q parameter. We use rotation curve and radial surface density profiles of total gas and stars compiled in Utomo et al. (2019) to compute a prediction for the PHATTER M33 cluster sample. We adopt input parameter values 4 that correspond to a characteristic galactic radius of 2.2 kpc, the median galactocentric radius of the PHATTER cluster sample.
The Reina-Campos & Kruijssen (2017) model predicts a feedback-limited maximum cluster mass for M33 of log(M/M ) = 4.06, ∼0.2 dex below our fitted M c value. Similarly, we calculate a model prediction for M31 using observable input values 5 applicable to the dominant 10 kpc "Ring-Total" region from the PHAT Γ study by Johnson et al. (2016), whose properties apply to a vast majority of that sample's clusters. The Reina-Campos & Kruijssen (2017) model predicts a feedback-limited maximum cluster mass for M31 of log(M/M ) = 3.53, 0.4 dex below the fitted M c value.
While the model appears to underestimate the maximum cluster mass in both M31 and M33, the availability of an environmentally-dependent model that is accurate within a factor of 2-3 is a fantastic step toward better understanding cluster formation.
The nearest galaxy to M33 in Σ SFR is NGC628; however there is ∼ 1 dex difference in values of log(M c /M ). Despite the similar Σ SFR , these two galaxies are quite different, and these differences may account for their different M c values. Specifically, NGC 628 has a log(M ) of 10.2 (Cook et al. 2014), nearly an order of magnitude higher mass than M33. Similarly, NGC 628's peak rotation curve velocity is ∼180 km/s (Aniyan et al. 2018), about 60% higher than in M33 (Utomo et al. 2019). Also, NGC 628's central metallicity is [O/H]=+0.1 (Berg et al. 2015), roughly 0.2 dex higher than M33. Further, as we will discuss in Section 5, the large 4 Σgas = 11.0 M pc −2 , Ω = 0.0367 Myr −1 , Toomre Q = 4.21, and φ P = 9.51 (used as part of Γ calculation; see §4.2 in Johnson et al. 2016). 5 Σgas = 10.5 M pc −2 , Ω = 0.021 Myr −1 , Toomre Q = 1.77, and φ P = 1.6. mass errors on the individual cluster measurements could lead to a significant upward bias in the measurement of M c in NGC 628 that may account for some of the observed disagreement with M33.

Effect of Individual Cluster Mass Uncertainties on the CMF
Our CMF fitting method ignores the individual cluster mass errors that results from CMD analysis. For small errors ( 0.05 dex), this simplification has been shown to minimally impact the mass function fitting (Weisz et al. 2013). However, larger errors may result in significant biases in our inferred parameters. In particular, integrated light measurements typically have larger mass errors than the CMD based mass estimates we have used here. In this section, we explore the potential biases in mass function parameters as a function of the cluster mass error distribution by using literature samples of both CMD and integrated light measurements. We expect large mass errors to lead to the overestimation of M c values, due to the steeply declining mass function, which results in more clusters being scattered to higher mass values.

Masses
We first consider data from M31 where cluster masses have been derived with both CMD and integrated light (IL) methods. The CMD masses in M31 (Johnson et al. 2016) were derived using an identical technique to that described in Section 2. The cluster ages used range between 10 and 300 Myr. The integrated light ages and masses were fit by M. Fouesneau (pri. comm.) using the method described in Fouesneau et al. (2014). The authors utilize a large number of simulated clusters that are sampled using MCMC to develop a PDF for each cluster's age, mass and extinction. We show the direct comparison of CMD masses to the integrated light masses in Figure 10. Because we use the same sample of clusters that have CMD ages, for our mass function fitting we use the same completeness function of Johnson et al. (2017) for both cluster samples.
We quantify the distributions of reported uncertainties for the two methods in Figure 11. The median 1σ errors on the cluster masses from the CMD method are just 0.03 dex. However, for the integrated light measurements the median 1σ errors are 0.14 dex. Further, the median ratio of integrated light uncertainty to CMD uncertainty is 3.3. While the difference in uncertainties between the two methods is large, in Figure 10, we do not see a significant bias in the one-to-one comparison of cluster masses, with the median difference of −0.06 dex. However, there is a fair amount of scatter around this median with a standard deviation of 0.59 dex.
The Schechter function fits to the CMD based masses is presented in Johnson et al. (2017). We use an identical method to fit the integrated light ages for the same sample pact the trend of M c with Σ SFR . It is important to note that the only difference between the two measurements is the method in which the ages and masses were derived.

Examining Mc Biases in Published Mass Function Fits
To test if cluster mass uncertainties are the cause of the observed bias in the previous section, in this section we run a series of tests fitting mock samples of clusters drawn from a Schechter function, incorporating the effects of mass uncertainty. In our first experiment we examine the effect of a fixed mass error for each cluster, in the second, we insert single high mass outliers, while the third experiment incorporates the observed error distributions and Schechter function parameters of galaxies with published CMF fits. Overall, we find that large individual cluster mass uncertainties leads to a significant positive bias for inferred M c values.
We supplement clusters from M31 with data from M51, M83 and NGC628. The integrated light cluster mass data from these catalogs are from the M83 results in Adamo et al. (2015), and the LEGUS results (Calzetti et al. 2015): Adamo et al. (2017) for NGC628, and Messa et al. (2018) for M51. A detailed description of the cluster identification and mass measurements is given in Adamo et al. (2017). Due to the distance of these galaxies (∼ 4 Mpc, ∼5× that of M31 and M33), clusters appear only partially resolved, and thus only integrated light measurements are possible. We use the Bayesian mass estimates of the clusters derived using the SLUG code (Krumholz et al. 2015). Their sample includes clusters with masses above 5000 M .
In all cases, we assume the mass uncertainty estimates to be Gaussian in log(M), where the standard deviation of the Gaussian correspond to the (84th percentile -16th percentile) /2 of the PDF. We note that while we refer to these errors as mass errors, they are actually errors in log(M/M ). We incorporate these individual cluster mass errors into simulated mass function fits.
For our first experiment, we create a synthetic distribution of 1000 clusters with α, and log(M c /M ) values of −2, and 5.0 respectively, consistent with the best fit parameters of typical literature galaxies. We fit the initial distribution as a baseline, then add a uniform Gaussian mass error to each cluster. We then refit a Schechter function to the new distribution of scattered masses. Finally, we compare the fitted M c parameter to the value of the original distribution. We run 100 trials of this experiment for a fixed mass error of 0.05 dex, and 0.15 dex, which covers the range of error distributions we observe in the literature.
For a mass error of 0.05 dex, we find a median log(M c /M ) bias of 0.01 dex above the original value, while for the 0.15 dex mass errors we get a log(M c /M ) bias of 0.09 dex. When we compare these biases to the median 1σ range for the derived log(M c /M ) parameters of 0.30 dex, these biases are not outside of normal uncertainty. This base experiment shows that for small errors, the impact on Schechter function parameters is minimal, but that larger errors produce more bias in the Schechter function truncation masses.
During this first experiment we found a handful of our simulated clusters were scattered to very high masses. To understand the effect that a single high mass outlier has on the best-fit CMF, we conduct a second experiment. We use the same setup as in our first experiment with no mass errors added. We then add a single, high mass cluster starting from the maximum mass cluster drawn from the Schechter function (which had log(M/M ) = 5.35), and steadily increasing this mass to log(M/M )=6.35. The results are shown in Figure 13. As expected, the single high mass cluster raises the M c value inferred, with very significant biases seen for maximum mass outliers more than 1 dex higher than the rest of the distribution. We run this experiment for a range of input M c values and total number of clusters that reflects values of the galaxies we discuss here (i.e M31, M33, M51, M83, NGC628) and find these results to be consistent across input parameters. This shows the significant impact cluster mass errors can have for the highest mass clusters. While a single high mass cluster can impact the inferred M c value, the majority of the observed bias is not due to single outliers. For each of the three samples presented in Figure 13, we calculate the delta BIC for Schechter parameters relative to the pure power law model. Not surprisingly, we find that with increased number statistics, the delta BIC favoring a Schechter model is slightly larger. However, regardless of number statistics, the single outlier causes negligible changes to the delta BIC, and in all cases the delta BIC is larger than 6, providing strong evidence for a Schechter function model over a power law model according to the Kass & Raftery (1995) guidelines. While the Schechter function is still preferred, for large bias, the inferred M c values are very poorly constrained with 1σ uncertainties upwards of ∼1 dex. These constraints are worse for the sample with the fewest clusters.
Our third experiment simulates the real distribution of uncertainty in published mass estimates by generating a synthetic cluster distributions from Schechter function parameters published in the literature and assigning each synthetic cluster a mass error of an associated real cluster. We then add a Gaussian random mass error to each cluster, and fit a Schechter function to the new distribution of scattered masses running an MCMC fit. Finally, we compare the fitted M c parameter to the value fitted to the original distribution. We run 100 trials per galaxy. We note that we are able to assign a real cluster mass error to a random synthetic cluster because we do not find a correlation between cluster mass and mass error in any of the cluster samples considered here.
We run this experiment for M83 and NGC628 using the Schechter function fits from Adamo et al. (2015Adamo et al. ( , 2017 (Adamo et al. 2015), and NGC628 (Adamo et al. 2017). The orange star represents the measured delta Mc between the CMD and integrated light measurements in M31 and shows that the observed difference falls within the distribution of the simulated differences. spectively, while for M51 we use the results in Messa et al. (2018). We also run the experiment on our set of CMD derived cluster properties in M31 (Johnson et al. 2017), and M33 (this work), as well as the SED integrated light cluster properties for M31 clusters discussed in Section 5.2.1. For each galaxy we report the median output log(M c ) -input log(M c ) for the 100 trials, along with the 1σ confidence interval.
We present the results of this experiment in Figure 14. We find that when we simulate realistic cluster mass uncertainties and fit the CMF, the M c value recovered is typically higher than the input value. The resulting M c bias is <1 dex in all cases, with the median bias being comparable to the reported 1σ uncertainties in M c in all cases. The median bias is highest in NGC 628 and for the integrated light measurements of M31 due to these samples larger cluster mass errors. NGC 628 also has the smallest number of clusters,  Figure 14, colored by the median bias found in Figure 14.
while the galaxy with the largest number of clusters (M51) shows a very small bias. We also find that the measured bias in log(M c ) for M31 clusters discussed in the previous subsection falls within the distribution of simulated differences. We have seen that both large average cluster mass errors and single mass outliers can significantly impact our estimate of M c . Thus, the distribution of cluster mass errors, not just the average error, is important. To quantify the error distributions of our literature cluster samples, we perform an Anderson-Darling test and find that these distributions can each be described by a log-normal distribution. A log-normal distribution is parameterized by a mean, µ, and standard deviation, σ, as represented by: . For a log-normal's given µ, σ, we calculate the predicted M c bias, shown in Figure 15. As in our first experiment, we assume 1000 cluster masses drawn from Schechter function parameters α = −2 and M c = 5.0 and fit this distribution as a baseline. Here we add cluster mass errors drawn from each log-normal distribution before refitting a Schechter function. We then take the median bias in M c from 100 trials. These values are the circular points in Figure 15. We find the model to have a clear gradient of increased bias with respect to both log-normal parameters. Thus, either having overall larger errors or having a tail of clusters with large errors can lead to significant biases in the inferred M c . We also find the model to be consistent with the biases we observe in published fits.
However, we note that our test grid does not exactly match the number statistics of the individual measurements, which could lead to the small differences we see between the in-ferred values for the individual galaxies and our model grid. We verified that the calculated bias is insensitive to the input M c .
In practice, the M c bias could be avoided by incorporating cluster mass uncertainties into the derivation of the CMF with a hierarchical Bayesian model. For large cluster samples with small uncertainties (like for the M33 CMD mass estimates presented here) this more complicated procedure is not necessary. However, it should be considered for smaller samples, or those that rely on integrated light measurements.

SUMMARY
We have used the star cluster catalog of Johnson et al. (2022) to measure the star cluster mass function within the PHATTER survey region of M33. We find strong evidence for a high-mass truncation, where the data is best represented by Schechter function parameters with power law slope α = −2.06 +0.14 −0.13 , and truncation mass log(M c /M ) = 4.24 +0.16 −0.13 . We also show that the M33 CMF is not well described by a pure power law.
We derive a Σ SFR value of −2.04 +0.16 −0.18 for M33. When we examine the relation between M c and Σ SFR , we find M33 agrees well the relation described in Johnson et al. (2017), where M c ∝ Σ SFR ∼1 . Adding additional literature estimates for a total of 10 galaxies, we find the average residual from this relation is just 0.24 dex. We fit the intrinsic scatter of the relation, and find it to be 0.19 +0.39 −0.15 dex. This data adds evidence that there is a clear correlation between the cluster truncation mass M c and Σ SFR .
Finally, we analyse the effect individual cluster mass uncertainties have on mass function measurements. We find that the truncation mass can be biased to higher values for cluster samples with high mass uncertainties. These biases are similar to the 1σ errors on M c published in the literature. We believe the next clear step in deriving reliable mass function fits is to develop a method for incorporating hierarchical Bayesian modeling into mass function derivations.