EPOCHS IX. When cosmic dawn breaks: Evidence for evolved stellar populations in 7 < 𝑧 < 12 galaxies from PEARLS GTO and public NIRCam imaging

The presence of evolved stars in high-redshift galaxies can place valuable indirect constraints on the onset of star formation in the Universe. Thus we use a combination of PEARLS GTO and public NIRCam photometric data to search for Balmer break candidate galaxies at 7 < 𝑧 < 12. We find that our Balmer break candidates at 𝑧 ∼ 10 . 5 tend to be older (115 Myr), have lower inferred [O iii] + H 𝛽 emission line equivalent widths (120 Å), have lower specific star formation rates (6 Gyr − 1 ) and redder UV slopes ( 𝛽 = − 1 . 8) than our control sample of galaxies. However, these trends all become less strong at 𝑧 ∼ 8, where the F444W filter now probes the strong rest-frame optical emission lines, thus providing additional constraints on the current star formation activity of these galaxies. These weaker trends likely reflect the bursty nature of these Epoch of Reionisation galaxies, which can lead to a disconnect between their current star formation activity and SED profiles, and their more extended star formation history. We discuss how strong emission lines, the cumulative effect of weak emission lines, dusty continua and AGN can all contribute to the photometric excess seen in the rest-frame optical, thus mimicking the signature of a Balmer break. Additional medium-band imaging will thus be essential to more robustly identify Balmer break galaxies. However, the Balmer break alone cannot serve as a definitive proxy for the stellar age of galaxies, being complexly dependent on the star formation history. Ultimately, deep NIRSpec continuum spectroscopy and MIRI imaging will provide the strongest indirect constraints on the formation era of the first galaxies in the Universe, thereby revealing when cosmic dawn breaks.


INTRODUCTION
The James Webb Space Telescope (JWST, Gardner et al. 2023) is forever changing our view of the cosmos.Among its chief scientific goals (Gardner et al. 2006) is to witness the beginning, the emergence of the First Light, through observing the formation of the very first galaxies in the Universe, shortly after the Big Bang.Indeed that first emerging primordial starlight, the break of cosmic dawn, marks a key transition point: from a Universe shrouded in darkness, the 'Cosmic Dark Ages', to a Universe bathed in light, the onset of the Epoch of Reionisation.
Our search for the first stars and galaxies has a rich history (see e.g.Dunlop 2013;Finkelstein 2016), being greatly accelerated in the Hubble + Spitzer era.With the incorporation of increasingly more sensitive imaging instrumentation, which image at increasingly longer wavelengths, the redshift frontier steadily increased, especially so with the introduction of Hubble's Advanced Camera for Surveys (ACS) (see e.g.Bunker et al. 2003;Bouwens et al. 2004;Stanway et al. 2004;Yan & Windhorst 2004) and Wide Field Camera 3 (WFC3) (see e.g.Bouwens et al. 2010;Bunker et al. 2010;Finkelstein et al. 2010;McLure et al. 2010;Oesch et al. 2010).The sensitive, near-infrared (NIR) filters on these instruments enabled galaxies at increasingly higher redshifts, which begin to dropout (due to Ly attenuation by the intergalactic medium (IGM), see e.g.Madau 1995;Inoue et al. 2014) at increasingly longer wavelengths, to be identified, thus bringing us ever closer to the formation epoch of the first galaxies at cosmic dawn.However, this search for the highest redshift galaxies in the HST + Spitzer era hit a natural redshift frontier (see e.g.Bouwens et al. 2011Bouwens et al. , 2019;;Coe et al. 2013;Ellis et al. 2013;Oesch et al. 2012Oesch et al. , 2018)).And this frontier was not because we had already reached the formation epoch of the very first galaxies in the Universe (which we hope to achieve in the JWST era), but rather simply due to running out of sensitive NIR HST filters with which to actually detect these high-redshift dropout galaxies (as well as the low resolution of Spitzer).With WFC3/F160W, this placed the long-standing redshift frontier at ∼10-11, with perhaps the best (i.e. with a likely HST grism detection of the Ly break, see Oesch et al. 2016, and a possible detection of rest-frame UV emission lines, see Jiang et al. 2021) high-redshift candidate in the HST + Spitzer era being GN-z11 (Oesch et al. 2016), a bright  ∼ 11 galaxy in the GOODS North field (though see e.g. also Zheng et al. 2012, Coe et al. 2013, Zitrin et al. 2014 for other ∼10 candidates, with the Ellis et al. 2013  = 11.9 candidate, identified from a singleband WFC3/F160W detection, now confirmed to be at  = 11.5, see Curtis-Lake et al. 2023).With JWST spectroscopy, this galaxy has now been spectroscopically-confirmed to be at  = 10.6 (Bunker et al. 2023), hosting a wealth of emission lines (Cameron et al. 2023;Maiolino et al. 2023) and a clearly detected Ly break (Tacchella et al. 2023a).
Despite the  ∼ 10 redshift frontier for identifying high-redshift candidates, valuable constraints on the formation of  > 10 galaxies were able to be placed in the HST + Spitzer era.Just as old stars in our Milky Way Galaxy enable us to wind back the cosmic clock beyond the epoch of observation, chronicling the formation history of the Galaxy billions of years into the cosmic past (see e.g.Tolstoy et al. 2009;Snaith et al. 2014;Frebel & Norris 2015;Xiang & Rix 2022), so too do the presence of old stars in high-redshift galaxies, when the Universe was very young, enable us to place valuable indirect constraints on the onset of star formation in the Universe.This indirect search for the first starlight, too, has a rich history.With, for example, the identification of a ∼3.5 Gyr old galaxy at  = 1.55 (when the Universe was 4.15 Gyr old, Dunlop et al. 1996;Spinrad et al. 1997), galaxies ∼800 Myr old at  ∼ 3.75 (when the Universe was 1.65 Gyr old, Nayyeri et al. 2014), galaxies several hundred Myr old at  ∼ 6 (when the Universe was 930 Myr old, Mobasher et al. 2005; Yan et al. 2006) and most recently with six galaxies inferred to be > 250 Myr old at  ∼ 9 (when the Universe was 540 Myr old, Laporte et al. 2021).
Indeed, in the epoch of reionisation (EoR), the most readily identifiable and accessible indicator of evolved stellar populations in the HST + Spitzer (and now JWST) bands is the Balmer break, a sudden jump in the continuum level at 3646 Å rest-frame, which manifests itself as a photometric excess in the 3.6 µm (at  ∼ 8) and 4.5 µm (at  ∼ 10) Spitzer/IRAC bands (see e.g.Roberts-Borsani et al. 2020;Laporte et al. 2021).Serving as an indicator for intermediately aged stellar populations (several hundred Myr), the Balmer break is attributable to A-type stars, which are sufficiently hot such that the majority of the hydrogen gas in their atmospheres is in the  = 2 state, but not too hot that the hydrogen gas is ionised (such as for the hotter, brighter, shorter-lived O-and B-type stars).These stars are therefore very effective at absorbing their own stellar radiation at wavelengths capable of shifting their electrons from the  = 2 state to a higher energy level ( ≥ 3), resulting in Balmer absorption lines (for an early analysis of galaxy stellar ages, see e.g.Couch & Sharples 1987), as well as radiation that is capable of ionising ( = ∞) the  = 2 electrons, resulting in a strong attenuation of light more energetic than the Balmer limit (i.e. < 3646 Å) and therefore an increased (decreased) continuum level redward (blueward) of the resulting Balmer break.We note that the 4000 Å break also serves as a stellar age indicator, though this builds up over longer (compared to the Balmer break), ∼Gyr timescales (see e.g.Kauffmann et al. 2003;Gallazzi et al. 2005), and thus is likely negligible for galaxies in the EoR.
However, galaxies in the epoch of reionisation have also been thought (and now confirmed by JWST, see e.g.Curti et al. 2023;Fujimoto et al. 2023;Matthee et al. 2023b;Sanders et al. 2023;Tacchella et al. 2023b;Trussler et al. 2023) to be highly star-forming, with the strong (i.e.high equivalent width) emission from the [O iii] + H rest-frame optical lines possibly contributing substantially to the bandpass-averaged flux density measured in broadband photometry (see e.g.Smit et al. 2014Smit et al. , 2015;;Roberts-Borsani et al. 2016;Endsley et al. 2021), boosting the observed fluxes well above the continuum level and thus mimicking the photometric excess signature of a Balmer break (Roberts-Borsani et al. 2020;Laporte et al. 2021).Thus, in order to distinguish between Balmer breaks and strong line emission from photometric data, Laporte et al. (2021) required their Balmer-break candidate galaxies to be at sufficiently high redshift ( > 9) such that the strong rest-frame optical emission lines are redshifted out of the filter exhibiting the photometric excess (IRAC 4.5 µm).In principle, this redshift cut implies the observed IRAC excess cannot be attributable to strong line emission, but is instead (more likely to be) due to a Balmer break.In this way, Laporte et al. (2021) derive stellar ages and star formation histories for their Balmer-break candidates, with a significant fraction (70%) of stellar mass forming before  = 10.
JWST, with its unique set of imaging and spectroscopic capabilities in the near-and mid-infrared, is already beginning to transform our understanding of the formation and evolution of the first galaxies in the Universe.Indeed, in terms of the direct search for high-redshift galaxies, the long-standing  ∼ 10 frontier has already been broken, with the identification of e.g. ∼ 12 candidate galaxies from the very first Near Infrared Camera (NIRCam) data (Pontoppidan et al. 2022;Treu et al. 2022;Finkelstein et al. 2023), as reported in analyses (Castellano et al. 2022;Finkelstein et al. 2022;Naidu et al. 2022;Adams et al. 2023;Atek et al. 2023;Donnan et al. 2023) just weeks after the commencement of scientific operations with JWST.Furthermore, as of the time of writing this article, the highest spectroscopically-confirmed galaxy is at  = 13.2 (Curtis-Lake et al. 2023), i.e. 320 Myr after the Big Bang, already pushing 100 Myr earlier in cosmic history than the limits of HST and Spitzer (GN-z11, Oesch et al. 2016).
The transformative capabilities of JWST are also beneficial to indirect analyses on the onset of star formation in the Universe, such as the one conducted in this work.Of course, for a given inferred stellar age, a galaxy detected at higher redshift naturally places stronger constraints on the formation era of the first galaxies.Furthermore, the great sensitivity of JWST enables the fainter and much more numerous galaxy population to be uncovered, which may display different star formation histories to their brighter counterparts.Indeed, sources that once took several hundred hours to detect with Spitzer/IRAC (see e.g.Stefanon et al. 2021a,b) and are well-beyond the capabilities of existing K-band imaging from the ground, can now be detected with JWST in a matter of minutes.Hence much larger samples, of high S/N Balmer-break candidates can now be assembled.Perhaps most importantly however, are the new photometric bands that JWST brings, which will allow for a much clearer characterisation of the physical properties of these high-redshift galaxies.This is partially due to the new F277W filter, but especially through the extensive set of medium-band filters (see e.g.Roberts-Borsani et al. 2021), which allow for the most reliable identification of a Balmer break, as the separate contributions from emission lines and the underlying continuum can be effectively disentangled (see e.g.Laporte et al. 2023) In this work we therefore make use of these excellent new capabilities of JWST, utilising a combination of NIRCam imaging from the Prime Extragalactic Areas for Reionisation and Lensing Science Guaranteed Time Observations (PEARLS GTO) programme (Windhorst et al. 2023) and public surveys (CEERS, GLASS and NGDEEP, see Bagley et al. 2023b, Treu et al. 2022and Bagley et al. 2023a), to search for Balmer-break galaxy candidates at 7 <  < 12.More specifically, we apply colour cuts, which probe the strength of the (supposed) Balmer break, to identify candidates at  ∼ 10.5 with a F444W excess, as well as (thanks to the new F277W filter) candidates at  ∼ 8 with a F356W excess.Using the full available wide-band NIRCam photometry (and the medium-band F410M when available), we aim to determine the stellar ages and star formation histories of these Balmer-break galaxies, to (in principle) place new indirect constraints on the onset of star formation in the Universe.However, as we will discuss in detail in this paper, additional medium-band imaging will be essential to reliably identify Balmer breaks and thus better constrain the star formation histories in these high-redshift galaxies.Indeed, both strong emission lines, weak emission lines, dusty continua and a (dusty) AGN can all contribute to the photometric excess seen in our wide-band NIRCam filters, thus mimicking the signature of a Balmer break.With these degeneracies in mind, we also comment on the (extremely) large stellar masses (log ( * /M ⊙ ) > 10) inferred for some red rest-frame optical Balmer-break candidate galaxies at  ∼ 8 (Labbé et al. 2023b).Finally, we discuss the limitations of the Balmer break strength as a proxy for the stellar age of a galaxy, advocating for deep follow-up NIRSpec continuum spectroscopy and MIRI imaging to derive the best possible star-formation history constraints for these galaxies.
This paper is structured as follows.In Section 2, we discuss the NIRCam data which underpins our analysis, the methodology for constructing the EPOCHS parent sample of high-redshift galaxies (see Adams et al. 2023 and Conselice et al., in prep.), as well as our use of the Bagpipes (Carnall et al. 2018) SED-fitting code to place constraints on the star formation properties of these galaxies.In Section 3, we discuss the redshift and colour cuts we apply to identify Balmer-break candidates.In Section 4, we discuss the stellar ages, star formation histories and other physical properties inferred for these high-redshift galaxies.We also comment on the magnitudes (i.e.brightnesses) and stellar masses of the Balmer-break candidates to establish what is likely driving the (extremely) large stellar masses inferred in the literature for some of these systems.In Section 5 we discuss caveats regarding the identification of Balmer breaks from photometric data, as well as the constraints these Balmer breaks can place on the presence of evolved stars in high-redshift galaxies.Finally, in Section 6 we conclude.We use the AB magnitude system throughout and assume a Planck Collaboration (2020) cosmology.

DATA
In this section we describe the NIRCam data underpinning our analysis, the methodology for constructing the EPOCHS high-redshift galaxy parent sample (see Adams et al. (2023) and Conselice et al., in prep.), as well as the procedure for using the SED-fitting code Bagpipes (Carnall et al. 2018) to derive constraints on the stellar ages, star formation histories and star formation properties of these high-redshift galaxies.

NIRCam data
We use a combination of PEARLS GTO (Windhorst et al. 2023) and public NIRCam (Rieke et al. 2005) imaging data (Treu et al. 2022;Bagley et al. 2023b,a).For full details of these datasets and the reduction process, we refer the reader to Adams et al. (2023) and Conselice et al. (in prep.).Briefly, we use a modified version of the JWST official pipeline, utilising pipeline version 1.8.2 and Calibration Reference Data System (CRDS) context jwst_0995.pmap,which includes wisp subtraction, 1/f noise correction, background subtraction and alignment onto GAIA WCS.We briefly outline the key aspects of the NIRCam data we use in our analysis below.

PEARLS imaging
The PEARLS (Prime Extragalactic Areas for Reionization and Lensing Science, Windhorst et al. 2023) GTO is a 110 hour Cycle 1 programme, which is primarily obtaining medium-deep (∼28-29 AB mag depth) NIRCam imaging across a variety of separate extragalactic fields.These fields can be divided into two distinct classes: blank fields and cluster fields.
In terms of the blank fields, we make use of the 4 pointings (spokes) obtained in the North Ecliptic Pole Time Domain Field (NEP-TDF, Jansen & Windhorst 2018), an extragalactic field that is always accessible year-round by JWST.
In terms of the cluster fields, we make use of the single pointing on El Gordo (see e.g.Diego et al. 2023;Frye et al. 2023), together with the three epoch (pointing) data on the MACS 0416 cluster.In this work, in order to avoid having to apply magnification corrections/perform an intracluster light background subtraction, we do not make use of the data from the NIRCam module centred on the galaxy clusters themselves.Instead, we solely use the data from the other available NIRCam module, which is essentially imaging the blank field parallel (3 arcmin away) to the cluster itself.

Public imaging
We also make use of NIRCam imaging from a number of public surveys.Namely, CEERS, GLASS and NGDEEP.CEERS (Bagley et al. 2023b) has 7 imaging bands (lacking F090W, compared to PEARLS), GLASS (Treu et al. 2022) has 7 imaging bands (lacking F410M) and NGDEEP (Bagley et al. 2023a) has 6 imaging bands (lacking F090W and F410M).We do not use the SMACS 0723 ERO data (Pontoppidan et al. 2022) in this work, as the lack of F115W imaging can make it challenging to properly distinguish between  ∼ 10 Balmer-break galaxies and  ∼ 8 strong [O iii] + H emitters (see e.g.Adams et al. 2023;Trussler et al. 2023), thus being particularly detrimental for our analysis.

EPOCHS high-redshift parent sample
Here we briefly describe the methodology for constructing the EPOCHS parent sample of high-redshift galaxies.For full details we refer the reader to Adams et al. (2023) and Conselice et al. (in prep.).
We use SExtractor (Bertin & Arnouts 1996) to identify sources in the F444W band.We then carry out forced aperture photometry across the various NIRCam bands, adopting a 0.32 arcsec diameter circular extraction aperture.The resulting flux densities are then aperture-corrected using corrections based off of the simulated WebbPSF (Perrin et al. 2012(Perrin et al. , 2014) ) point-spread functions for each band.
We fit the aperture-corrected 0.32 arcsec diameter aperture photometry using both the Le Phare (Arnouts et al. 1999;Ilbert et al. 2006) and EAZY (Brammer et al. 2008) SED-fitting codes.We note that for EAZY we use a combination of the default plus Larson et al. (2022) spectral templates.We impose a maximum SNR of 20 per NIRCam filter when SED-fitting the photometry, to account for remaining uncertainties in the photometric flux calibrations, as well as template mismatch between the observed data and the models.In order to increase the likelihood of selecting true high-redshift candidates, we apply the following quality cuts.
Firstly.we require the resulting redshift probability distribution functions to have an integrated probability  > 0.8 of the galaxy being at  > 5, both for Le Phare and for EAZY.Furthermore, the integrated probability within 10% of the peak photometric redshift should be greater than 0.6.Additionally, we require the peak probability amplitude in this redshift range to be at least twice as large as the probability amplitude of the secondary, low-redshift solution (if present).Secondly, we require the galaxy to (begin to) drop out in one or more of the NIRCam bands.For fields with F090W imaging, this criterion translates into a rough lower redshift limit of  ∼ 6.5.For fields without F090W imaging, but that include F115W (i.e.CEERS and NGDEEP), this criterion translates into a rough lower redshift limit of  ∼ 8.5.Thirdly, given the assigned median photometric redshift, we require the candidate to be > 5 detected in the first two bands redward of the (supposed) Lyman break.We further require the candidate to be < 3 non-detected in the bands blueward of the Lyman break.This cut increases the confidence in the location of the (supposed) Lyman break, and reduces the chance of contamination by low-redshift interlopers (which typically have weaker Balmer/4000 Å breaks).Fourth, we require a good quality of fit, i.e. with both the Le Phare and EAZY reduced  2 < 6.
Following this procedure, we obtain the EPOCHS parent sample, which contains 204  > 6.5 galaxy candidates.

SED-fitting with Bagpipes
We fit the photometric data with the SED-fitting code Bagpipes (Carnall et al. 2018) to derive constraints on the star formation histories of these galaxies.As with Le Phare and EAZY, we set a minimum 5% error (i.e. a maximum SNR of 20) per band.We use the default Bagpipes stellar and nebular templates, fitting the data assuming an exponential star-formation history.
We adopt uniform priors on the age (i.e.time elapsed since onset of star formation) 0.001 <  (Gyr) < Age Univ (), -folding time 0.1 < (Gyr) < 10, stellar mass formed 1 < log(/M ⊙ ) < 15, metallicity 0 < /Z ⊙ < 2.5, ionisation parameter −4 < log  < −2 and V-band dust attenuation 0 <  V < 2, assuming a Calzetti et al. (2000) dust attenuation law.Finally, following the methodology in our Austin et al.. (in prep.)companion paper, we adopt a Gaussian prior for the redshift, set by the mean and standard deviation of the EAZY redshift probability distribution function.The redshift is instead fixed at the known spectroscopic redshift when this is available.
By default, Bagpipes outputs constraints on e.g. the stellar mass, mass-weighted stellar age, star-formation history and specific star formation rate of the galaxy being fit.We make use of these quantities in our analysis.
Furthermore, we use the Bagpipes median-fit to derive the Balmer break strength (see definition in Section 4), [O iii] doublet + H equivalent width, absolute UV magnitude  1500 and the UV slope .We compute  1500 from the median flux density over 1400 <  rest (Å) < 1600, while  is computed by fitting a power-law over 1250 <  rest (Å) < 2600 (i.e. the same wavelength range as in Calzetti et al. 1994).The errors on these quantities, for each galaxy, are determined by repeating the computation procedure for a set of 100 randomly selected posterior fits generated by Bagpipes, and determining the (16, 84) percentiles of the resulting quantity distributions.

BALMER BREAK SELECTION
In this section we describe the principles behind our selection of Balmer-break candidates, outlining the redshift and colour cuts we apply to generate our sample.We also show example SEDs and cutouts for both Balmer break and non-Balmer-break candidate galaxies.

Redshift cuts
As described earlier, both strong line emission and Balmer breaks can give rise to elevated flux densities (i.e.'excesses') in filters covering the rest-frame optical, potentially making it difficult to distinguish between these two effects (see also Roberts-Borsani et al. 2020;Laporte et al. 2021Laporte et al. , 2023)).Thus, in order to disentangle line emission from Balmer breaks, we follow the methodology first adopted by Laporte et al. (2021), by requiring the galaxies to be at sufficiently high redshift such that the (potentially) strong line emission is redshifted out of the filter of interest (i.e. the filter in which we wish to measure the photometric excess).Provided that the photometric redshift is correct, this then implies that the photometric excess observed is (mostly) attributable to a Balmer break, possibly providing evidence for evolved stellar populations in these high-redshift galaxies (though we will discuss various caveats regarding this procedure in Section 5).
More specifically, we require both the [O iii] 5007 and H restframe optical lines to be redshifted out of the filter of interest (given the galaxy's photometric/spectroscopic redshift), as both of these In order to distinguish between line emission and Balmer breaks, we require the galaxies to be at sufficiently high redshift such that the [O iii] and H lines are redshifted out of the red filter in the filter pair defining the colour selection (left vertical dashed lines).We also require the galaxies to be at sufficiently low redshift such that the Balmer break is at most two-thirds of the way through the red filter (right vertical dashed lines).Our flat colour selection (red horizontal dashed lines) and redshift-dependent colour selection (orange) which accounts for the fact that the perceived strength of the Balmer break depends on where the break resides within the filter pair.Our strict (red squares/upward triangles), relaxed (orange diamonds/downward triangles) and control (blue circles) sample of galaxies.Crosses denote galaxies which do not satisfy our > 68% integrated redshift probability criterion.Individual error bars are shown for our strict and relaxed Balmer-break candidates.The median error bars for the full sample of galaxies are shown in the corner of each panel in black.Left panel: Our Balmer break selection at  ∼ 10.5, for galaxies which exhibit a flux density excess in the F444W filter.Right panel: Our Balmer break selection at  ∼ 8, for galaxies which exhibit a flux density excess in the F356W filter.
lines can have very large equivalent widths (up to ∼4000 Å rest-frame for [O iii], see e.g.Matthee et al. 2023b) and thus can contribute substantially to the photometric excess seen in broadband photometry.Indeed, in contrast to Laporte et al. (2021), we also require H to be redshifted out as we have found (see e.g.Trussler et al. 2022) that the H rest-frame equivalent width can in principle also be very high (∼500 Å) immediately (< 3 Myr) after a starburst, producing a photometric excess Δ (= 2.5 log 10 [1 + EW rest (1 + )/Δ], where Δ is the filter width) of up to 0.4 mag.Even if such extremely strong H line emission is not present, at the very least the H emission can contribute a non-negligible amount of flux to the filter of interest, thereby reducing the accuracy with which the strength of the Balmer break can be determined.In our analysis we will focus on galaxies at  ∼ 10.5 that exhibit a photometric excess in the F444W filter, as well as galaxies at  ∼ 8 that first exhibit a photometric excess in the F356W filter.We note that the [O iii] 5007 and H lines start to get redshifted out (i.e.where the filter transmission drops to 50% of its maximum) of the F444W filter at  = 9.0, 9.3, and out of the F356W filter at  = 7.0, 7.2, respectively.Thus the lower redshift limits for our F444W-excess and F356W-excess samples are  = 9.3 and  = 7.2, respectively.
In addition to a lower redshift limit, we also impose an upper redshift limit for our F444W-and F356W-excess samples, to account for the Balmer break itself redshifting out of the filter of interest.Indeed, just as the perceived photometric strength of the Lyman break depends on where the break resides within the wide-band filter of interest (being a maximum when the light blueward of the Lyman break fully spans a given filter and only ∼0.75 mag if it falls halfway through a filter), so too does the perceived strength of the Balmer break also depend on where the break resides within the filter.The perceived Balmer break strength (i.e. the magnitude difference be-tween the two neighbouring filters spanning the Balmer break) is at a maximum when the Balmer break is exactly between the two neighbouring filters, with the lower-lying continuum level blueward of the break fully in the bluer filter and the higher-lying continuum level redward of the break in the redder filter.The perceived Balmer break strength gradually weakens with increasing redshift as the lower-lying continuum blueward of the break begins to occupy an increasingly larger fraction of the redder filter in the filter pair, causing the bandpass-averaged flux density in this filter to decrease toward the lower continuum level.Hence the sensitivity to the Balmer break drops.To account for this, we therefore set an upper-redshift limit, which we deem to be when the Balmer break is two-thirds of the way through the filter of interest.This corresponds to  = 11.65 and  = 9.15 for the F444W and F356W filters, respectively.
Hence when investigating galaxies with a potential Balmer break in the F444W filter, we restrict the redshift range to 9.30 <  < 11.65 (i.e.our  ∼ 10.5 sample).For galaxies with a potential F356W excess, we instead restrict the redshift range to 7.20 <  < 9.15 (i.e.our  ∼ 8 sample).
In order to increase the degree of confidence in the redshifts for our selected samples (and thus minimise contamination from e.g.strong line emission), we require both the Le Phare and EAZY photometric redshift probability distributions to have an integrated probability greater than 68% over the aforementioned redshift ranges.
In addition to the potential difficulty in distinguishing between Balmer-break galaxies and strong line emitters from photometry, there can also be general challenges in telling apart high- galaxies from low- interlopers, due to confusion between the Lyman and Balmer breaks.In principle the SEDs for these systems should be distinct, with the Lyman break strength essentially being infinite, while the Balmer break is finite.In practice however, due to the finite depth of observations, the SEDs of these systems can be comparable, especially for fainter systems close to the detection limit (e.g.5 detected).Our high-redshift Balmer-break candidates should not suffer from this potential high-low redshift ambiguity, as by the nature of their selection, they all exhibit two prominent breaks in their photometry: the Lyman break and (supposedly) the Balmer break.
It is difficult to reproduce this distinct double break SED profile through a low- solution.Firstly, because this low- solution would have to have both a strong Balmer break (mimicking the Lyman break) and a second break (mimicking the high- Balmer break) seemingly powered by very strong line emission, which seems incompatible.Secondly, because the two breaks in the SED have to be at very specific wavelengths.The wavelength ratio between the Lyman- break and the Balmer break is 3, while the ratio between the Lyman- break and the [O iii]+H line complex is approximately 4. If the dual break SED is attributable to a low- Balmer break interloper, then its second break must be attributable to a prominent feature either at 1.09 µm or 1.46 µm rest-frame, of which there are none.Hence we can be confident in the high-redshift nature of our Balmer-break candidates.

Colour cuts
In addition to a redshift cut, we also apply a colour cut.Since we are interested in identifying potential Balmer break systems, we aim to find galaxies which have an excess in F444W (for  ∼ 10.5 galaxies) or an excess in F356W (for  ∼ 8 galaxies).This excess requirement translates into a sufficiently red F356W−F444W (=  34 ) or F277W−F356W (=  23 ) colour .We consider two different colours cuts in our analysis.
Firstly, we adopt a flat colour cut, requiring the colour  > 0.4 mag (i.e. 34 = F356W−F444W > 0.4 mag at  ∼ 10.5, or  23 = F277W−F356W > 0.4 mag at  ∼ 8).The galaxies that satisfy this colour cut will be referred to as our strict sample.The motivation behind adopting the 0.4 mag threshold is firstly to reduce the chance of the colour selection being driven purely by noise, as if e.g. both the F356W and F444W fluxes are measured at 5, the magnitude uncertainty in each filter is ≈ 0.2 mag, so the F356W flux would have to be scattered down by 1 and the F444W flux would have to be scattered up by 1 to spuriously satisfy our colour requirement.Secondly, this threshold also reduces the chances of the red colour being driven by something other than a Balmer break (e.g.dust or cumulative weak line emission, see Section 5).However, we acknowledge that the colour threshold is somewhat arbitrary.As we will discuss in more detail in Section 5, the greater the colour threshold that is adopted, typically the greater the age of the galaxies that are selected.
Secondly, we adopt a redshift-dependent colour cut, that accounts for the fact that the perceived strength of the Balmer break depends on where the break resides within the filter pair comprising the colour selection.The galaxies that satisfy this colour cut (assuming the EAZY redshift), but not the flat colour cut, will be referred to as our relaxed sample.
The redshift-dependent colour threshold was determined by assuming that the Balmer break in the galaxy SED is given by a step function, which is comprised of a flat (in terms of   ) continuum blueward of the break, and another flat continuum that is 0.4 mags brighter redward of the break.In other words, we assume the Balmer break has an intrinsic (i.e.rest-frame) Balmer break strength of 0.4 mag.We then determine, as a function of redshift, the perceived strength (i.e.observed-frame) of this Balmer break, namely the colour  in F356W−F444W (for  ∼ 10.5) and F277W−F356W (for  ∼ 8).As shown in Fig. 1, this redshift-dependent colour selection (orange) thus matches our flat colour selection (red) when the Balmer break is exactly in between the F356W and F444W filters (i.e. = 9.8) or between the F277W and F356W filters (i.e. = 7.6).However, the colour requirement becomes smaller as the redshift deviates from these values.Indeed, the main drawback of our flat colour selection is that a given observed colour  demands an increasingly large intrinsic Balmer break strength as one deviates from the aforementioned redshifts.For reference, the  = 0.4 mag criterion requires an intrinsic Balmer break strength of 0.7 mags when the break is halfway through either the blue or red filter.The drawback of our redshift-dependent colour cut is that the smaller colour requirement means that we are more likely to select spurious objects.It is for this reason that we set a minimum colour threshold  of 0.2 mags.
Finally, galaxies that satisfy neither our flat colour selection, nor our redshift-dependent colour selection, but are in the appropriate redshift range (i.e.9.30 <  < 11.65 or 7.20 <  < 9.15 for the  ∼ 10.5 and  ∼ 8 samples respectively), will be referred to as the control sample.In principle these should be galaxies with weak Balmer breaks and thus, on average, younger stellar populations.

Samples
We show our redshift-and colour-selections applied to our set of  ∼ 10.5 and  ∼ 8 galaxies in the left and right panels of Fig. 1, respectively.The vertical dashed lines represent the lower and upper limits of our redshift selection.The flat red horizontal dashed line represents our fixed colour cut, while the orange curve represents our redshift-dependent colour criterion.The red, orange and blue data points corresponds to our strict, relaxed and control samples respectively.The F444W-excess (F356W-excess) Balmer-break candidates are shown in squares and diamonds (upward and downward triangles), control galaxies are shown as circles.The points denoted by crosses fail to satisfy our integrated redshift probability > 68% criterion.
We show example SEDs for strict, relaxed and control galaxies in the top (left and right), bottom-left and bottom-right panels of Fig. 2 (for our  ∼ 10.5 sample) and Fig. 3 (for our  ∼ 8 sample), respectively.For each object, we display the NIRCam photometry in the top subpanel (in black), together with the best-fit Le Phare (light blue) and EAZY SEDs (green), as well as the median-fit Bagpipes SED (magenta, which has been fit assuming either the EAZY redshift PDF as a prior, or fixed at the known spectroscopic redshift).In the bottom subpanels, we show the position of these galaxies in our colour-redshift selection plane, as well as the Le Phare and EAZY redshift PDFs.
We show the photometry and best-fitting SEDs for the  ∼ 10.5 Balmer-break candidate displaying the largest F444W excess in our sample in Fig. 4. As can be seen from Fig. 4, our standard model fits (blue, green, purple) provide a reasonable match to the wide-band NIRCam photometry.However, the exception lies with the mediumband F410M filter, whose bandpass-averaged flux density is strongly overpredicted by the models.Owing to the relatively low SNR measurement in this band (4.3), this may perhaps be due to noise, though this is still somewhat unlikely, as the F410M measurement deviates from the model-predicted value by >3, which has a likelihood of only ∼0.1% of occurring.
The alternative explanation (missed by our Le Phare and EAZY    fit (shown in yellow), with the redshift held fixed at 7.88.We see from Fig. 4 that this yields a satisfactory fit to the observed data, with notably better agreement with the (somewhat noisy) F410M measurement.
As will be advocated throughout this paper, additional mediumband photometry is essential to reliably distinguish between photometric excesses driven by Balmer breaks and strong line emission, and hence is needed to definitively establish the nature of the source in Fig. 4. Thus strong photometric excesses should be interpreted with particular caution in high-redshift galaxies with only wideband photometry.Such a system may be exhibiting a very prominent Balmer break and thus harbouring particularly old stellar popula-tions, thus (in principle) placing valuable constraints on the onset of cosmic star formation.At the same time, the 0.88 mag excess in Fig. 4 demands a stellar population age of >150 Myr (see Fig. 14), which is a substantial fraction of the age of the Universe at that epoch (∼450 Myr) and thus perhaps unlikely.On the other hand, strong line emission is ubiquitous in the EoR, with the 0.88 mag excess requiring a combined [O iii]+H rest-frame equivalent width of ∼1400 Å, a condition readily met in these high-redshift systems (e.g.

RESULTS
In this section we aim to establish how effective our Balmer break colour selections are at preferentially identifying galaxies with evolved stellar populations.We investigate and compare the physical properties of our Balmer-break candidates to those of the control sample, to establish whether there are any inherent differences between these two potentially distinct populations of galaxies.Additionally, we also discuss the qualitative constraints the inferred star formation histories for these Balmer-break candidates place on the onset of star formation in the Universe.
We note that when we refer to 'old' stellar populations or 'old' stellar ages in this analysis, this is with respect to the age of the Universe at the epoch of observation.Since the Universe is still very young at these high redshifts (e.g.440 Myr at  = 10.5),we consider galaxies with mass-weighted stellar ages of ∼100 Myr to be old in the following discussion.Of course, such galaxy ages would be considered to be very young at e.g. = 0, where ages of several Gyr are the norm (see e.g.Gallazzi et al. 2005Gallazzi et al. , 2021;;Trussler et al. 2020Trussler et al. , 2021)).

Magnitudes and stellar masses
We begin by comparing the magnitudes (i.e.brightnesses) and stellar masses of our Balmer-break candidates against the control sample to determine whether there are any notable differences between the two, as well as to better understand the origin of the potentially (very) large stellar masses (log ( * /M ⊙ ) > 10) inferred for some red rest-frame optical galaxies in the EoR (see e.g.Labbé et al. 2023b).
We show the rest-frame UV (at 1500 Å) absolute magnitudes (top panels), absolute magnitudes immediately blueward of the supposed Balmer break  3500 (middle) and apparent magnitudes in the restframe optical (i.e.redward of the Balmer break, bottom) against the Bagpipes-derived stellar masses in Fig. 5, with the  ∼ 10.5 and  ∼ 8 samples shown in the left and right panels, respectively.We have chosen to use apparent magnitudes rather than absolute magnitudes for the rest-frame optical, as we found that it can be difficult to accurately assess the continuum level redward of the Balmer break given our available data.As will be discussed later, this difficulty arises for the  ∼ 10.5 sample simply due to the lack of photometry redward of the break.However, even for the  ∼ 8 sample, with the F410M and F444W filters now probing beyond the Balmer break, there can be a degeneracy, with Balmer breaks, Balmer breaks plus line emission, and pure line emission sometimes being comparably compatible with the data.Hence a conclusive assessment of the continuum level can- not be made, as we cannot satisfactorily discriminate between these different scenarios without additional medium-band NIRCam and longer wavelength MIRI photometry.Note that we show the 'total' magnitudes and 'total' stellar masses, obtained by converting the  1500 and  3500 aperture magnitudes and stellar masses derived in the SED fitting process to 'total' (i.e.AUTO) magnitudes/masses.For this, we scale the inferred luminosity/mass by the ratio of the fluxes measured by the Kron elliptical aperture and the 0.32 arcsec diameter circular aperture in the appropriate filter.Namely, the F200W (F150W), F356W (F277W) and F444W (F356W) filters, for  1500 ,  3500 and the stellar mass, respectively, for the  ∼ 10.5 ( ∼ 8) sample.
From an inspection of Fig. 5, the strict + relaxed Balmer-break candidates and the control sample appear to be comparably distributed in terms of rest-frame UV magnitudes, as well as magnitudes blueward and redward of the Balmer break.Thus our Balmer-break candidates likely have comparable intrinsic brightnesses to the overall galaxy population at these redshifts, rather than being e.g.systematically brighter.For reference, we also show the magnitudes and stellar masses (black hexagons) for the 7.4 <  < 9.1 candidate massive galaxies from Labbé et al. (2023b), in the right panel of Fig. 5. Whilst these objects are comparably bright to our full  ∼ 8 sample in the rest-frame UV (as traced by  1500 ), they do tend to be slightly brighter immediately blueward of the Balmer break (as traced by  3500 ), and much brighter redward of the Balmer break (as traced by  F356W ).This is not too surprising, given the Labbé et al. (2023b) criteria for selecting these massive candidates.
Now, what we do find for our own sample, is that the Balmerbreak candidates do tend to have larger stellar masses than the control sample for a given brightness.Thus the larger stellar masses found for our Balmer-break candidates are an outcome of the SED-fitting process, rather than being due to their inherent brightness.As we only include the light contributions from stellar populations and nebular emission in our Bagpipes fitting process (i.e.excluding an AGN component), these larger stellar masses are attributable to requiring an older stellar population component, with higher mass-to-light ratios, to best reproduce the data.Although the Labbé et al. (2023b) sample consists of brighter galaxies than our sample, it is still possible that the large stellar masses inferred for their systems (shown in Fig. 5) are perhaps also still an outcome of the SED-fitting process, rather than being due to (and thus demanded by) the inherent brightness of these galaxies.
As we will discuss in more detail in Section 5, strong emission lines (i.e.[O iii] and H), the cumulative effect of weaker emission lines, a dusty continuum and a (dusty) AGN can all contribute to the red excess seen in the supposed Balmer-break candidates.All of these effects alleviate the need for a substantially old (∼100 Myr) stellar population, i.e. reduce the inferred stellar mass-to-light ratio, thus bringing the resulting (possibly extremely large) stellar masses inferred in Labbé et al. (2023b) down to be more in line with the expectations from simulations of galaxy formation (see e.g.Lovell et al. 2023).

Ages and star formation histories
In order to establish how well our Balmer break colour selections are at identifying galaxies with evolved stellar populations, we show the measured F444W excess (i.e.F356W−F444W for our  ∼ 10.5 sample, top-left) and F356W excess (i.e.F277W−F356W for our  ∼ 8 sample, top-and bottom-right) against the inferred massweighted stellar ages (derived using Bagpipes, see Section 2) in Fig. 6.As before, the strict, relaxed and control samples are shown in red, orange and blue, respectively.We also show the median excess colour and age for each sample, with the endpoints of the error bars representing the 16-84 percentile range.Note that the median and percentile range for the relaxed sample represent the median and percentile range for the combined strict and relaxed samples (rather than just the relaxed sample alone).
From the left panel of Fig. 6, we see that the observed F444W excess appears to be a good predictor of the mass-weighted stellar age of the  ∼ 10.5 galaxies, with the inferred age generally increasing with increasing F444W excess.Indeed, our Balmer-break candidates have a median inferred age of 115 Myr versus the 50 Myr of the control sample.From this panel alone, this result suggests that the observed strength of (what we believe to be) a Balmer break places tight constraints on the age and star-formation history of a galaxy.We note that the four control galaxies with ages > 90 Myr all narrowly miss our redshift-dependent colour cut.
However, from the top-right panel of Fig. 6, where we now show the F356W excess for our  ∼ 8 sample, we see that this potential connection between the (supposed) observed Balmer break strength and galaxy age is much less clear.Galaxies with a strong F356W excess (our strict sample) do not necessarily have old stellar ages.Additionally, galaxies with a weak F356W excess (our control sample) can still have old stellar ages.What is true, however, is that our flat colour selection (red) and redshift-dependent colour selection (orange) do indeed, on average, select older galaxies (70, 100 Myr Bottom panels: Rest-frame optical (i.e.redward of the Balmer break) apparent magnitudes, using F444W (left) and F356W (right).Our Balmer-break candidates exhibit comparable intrinsic brightnesses to the control sample, but do tend to have larger inferred stellar masses for a given brightness.Thus the larger stellar masses found for our Balmer-break candidates are an outcome of the SED-fitting process (and the assumptions therein), rather than being due to their inherent brightness.As strong emission lines, the cumulative effect of weaker emission lines, a dusty continuum and a (dusty) AGN can all contribute to the red excess seen in these Balmer-break candidates, there is certainly room for bringing the (sometimes extremely large) inferred stellar masses in Labbé et al. (2023b) down to be more in line with the expectations from simulations of galaxy formation.
vs. the 30 Myr for the control sample).Though this trend is less strong than what was seen at  ∼ 10.5.
The key difference, from the observational perspective, for our  ∼ 8 sample is that, being at lower redshift, the F410M and F444W filters now probe at longer rest-frame wavelengths than for our  ∼ 10.5 sample.More specifically, these filters now probe the rest-frame optical beyond the Balmer break (as F356W probes the break itself), crucially covering the rest-frame optical emission lines (such as [O iii] and H).As the strength (i.e.equivalent width) of these emission lines is closely connected to recent star formation (or shocks and AGN activity, see e.g.Baldwin et al. 1981;Kewley et al. 2019), the F410M and F444W filters thus provide valuable additional constraints on the current star formation rates in these galaxies, thus likely refining the inferred star formation histories and stellar ages (though we do note that there can still be degeneracies, with Balmer breaks, breaks plus line emission and pure line emission sometimes being comparably compatible with the data).
To establish whether this is indeed the case, or whether the different results at  ∼ 10.5 and  ∼ 8 are driven by a fundamental difference between the star formation histories of galaxies at these two redshifts, we repeat our Bagpipes SED-fitting procedure for our  ∼ 8 sample.We now intentionally omit the F410M and F444W photometry, with F356W (probing the Balmer break) thus being the longest available filter, hence mimicking the rest-frame coverage for our  ∼ 10.5 .Photometric excess (a proxy for the Balmer break strength) against the inferred mass-weighted stellar ages of the galaxies in our sample.Colour coding and symbols are the same as in Fig. 1.The median excess and age are also shown for each sample, with the endpoints of the error bars representing the 16-84 percentile range.The median and percentile range for the relaxed sample combines both the relaxed and strict sample galaxies.The red dashed vertical line denotes our flat colour threshold for selecting strict Balmer-break candidates.Top-left panel: F444W excess for our  ∼ 10.5 sample of galaxies.Top-right panel: F356W excess for our  ∼ 8 sample of galaxies.Note the weaker correlation between the observed excess and the inferred stellar age than for the  ∼ 10.5 sample.Bottom-right panel: F356W excess for our  ∼ 8 sample of galaxies, but now omitting the F410M and F444W photometry in the SED-fitting with Bagpipes (mimicking the rest-frame coverage for the  ∼ 10.5 sample).A stronger correlation between observed excess and age emerges, as with the  ∼ 10.5 sample.Thus, longer wavelength photometry, which probes the (stronger) emission lines in the rest-frame optical beyond the Balmer break, are likely essential to derive more accurate stellar ages and star formation histories for galaxies (such as MIRI F560W for our  ∼ 10.5 sample).
sample (with F444W probing the Balmer break).We show the results of this revised SED-fitting procedure in the bottom-right panel of Fig. 6.We find, similar to our  ∼ 10.5 analysis, that the excess correlates more strongly with the inferred stellar age.This is in contrast to the weaker correlation seen in the middle panel where the added constraints from F410M and F444W are included.Hence longer wavelength photometry, which probes the (stronger) emission lines in the rest-frame optical beyond the Balmer break, are likely essential to derive more accurate stellar ages and star formation histories for galaxies.Thus, for the highest redshift galaxies (such as our  ∼ 10.5 sample), additional MIRI imaging (e.g.F560W) will be crucial to place the tightest possible indirect constraints on the onset of star formation in the Universe.Now, the weakened connection between the (supposed) observed Balmer break strength (i.e.F356W excess) and the inferred galaxy stellar age seen in the top-right panel of Fig. 6 should not necessarily be surprising.As has already been known for dwarf galaxies in the local Universe (see e.g.Emami et al. 2019), and is now becoming more definitely established at high-redshift with JWST, the star formation histories of galaxies in the epoch of reionisation are likely bursty (see e.g.Endsley et al. 2023;Looser et al. 2023a,b).Indeed, galaxies in this era can likely display both elevated star formation activity and a (temporary) cessation (Looser et al. 2023a) of star formation.Thus there can be a disconnect between the current star formation activity (and hence key SED properties) and the extended star formation histories of these galaxies.As we will discuss in more detail in Section 5, it is for this reason that a (currently observed) Balmer break need not necessarily imply a particularly old galaxy (see also Wilkins et al. 2023).Likewise, a galaxy harbouring evolved stars may simply have its older stars outshone (see e.g.Narayanan et al. 2023) by the brighter younger stars (thus resulting in a weak Balmer break, if any).
With the lack of MIRI F560W constraints for our  ∼ 10.5 sample in mind, together with the possible degeneracy between line emis-  We display both the median (solid) and 16, 84 percentiles (dotted) SFH percentiles, determined by taking the appropriate percentile per lookback time bin of the normalised SFH for each galaxy in the sample.The age of the Universe on the upper axis assumes the galaxy is at  = 10.5 (left) and  = 8 (right).We stress that we just focus on the qualitative aspects of the inferred star formation histories, due to the insufficient constraining power from the current data, as well as the dependency on the assumed SFH parameterisation and priors on the associated parameters.Ultimately, deep continuum spectroscopy with the NIRSpec PRISM, together with MIRI imaging, will be needed to provide the best possible indirect constraints on the onset of star formation in the Universe.
sion and breaks even for our  ∼ 8 sample, we now turn to the inferred star formation histories for these high-redshift galaxies.In principle, the presence of evolved stars (inferred through their unique spectral signatures) in high-redshift galaxies observed shortly after the Big Bang places valuable constraints on the formation history of the first galaxies in the Universe.Indeed, taken at face value, the mass-weighted stellar ages of ∼150 Myr and ∼275 Myr at  ∼ 10.5 (age Universe = 440 Myr) and  ∼ 8 (age Universe = 640 Myr) correspond to redshifts of  = 14.2 and  = 12, respectively.In practice however, due to the aforementioned caveats, together with the fact that the star formation histories derived likely depend on the SFH parameterisation assumed in the SED-fitting process, as well as the priors placed on these parameters, strict caution should be applied in interpreting these results.
Hence we just focus on the qualitative aspects of the inferred star formation histories, which we show in Fig. 7.For each sample, we determine the normalised (to the current SFR) star-formation history (inferred using the SED-fitting code Bagpipes), by taking the median (or 16, 84 percentile) per lookback time bin of the normalised SFH for each galaxy in the sample.In line with their older inferred mass-weighted stellar ages, our strict and relaxed Balmer-break candidates tend to have more extended star formation histories than the control sample.As will be discussed in more detail in Section 5, additional NIRCam medium-band imaging will be essential to better disentangle the effects of line emission and Balmer breaks, as well as to discriminate between evolved stellar populations and dusty galaxies/AGN.Ultimately, deep continuum spectroscopy with the NIRSpec ∼100 PRISM (the NIRISS and NIRCam grisms lack the necessary continuum sensitivity), together with MIRI imaging, will be needed to provide the best possible indirect constraints on the onset of star formation in the Universe.

Further star formation properties
We now turn to further inferred SF-related properties for our  ∼ 10.5 and  ∼ 8 samples.Indeed, prior we examined the observational proxy for the Balmer break strength, namely the colours  34 = F356W−F444W and  23 = F277W−F356W.However, these measured colours are subject to observational errors, with the perceived colour also being dependent on the redshift of the source, and can also be affected by the flux from weak emission lines (which we will discuss in more detail in Section 5).Thus, here we instead focus on the Balmer break strength inferred from a full SED fit to the data, which should in principle account for these effects.
In Fig. 8, we show the inferred Balmer break strength (in magnitudes) against the combined rest-frame equivalent width of the [O iii] 4959, 5007 doublet and H (inferred from SED fitting).
Here we define the Balmer break strength to be the ratio of the median flux density (in terms of   ) between 4000-4300 Å (restframe) and 3400-3600 Å, i.e. redward and blueward of the Balmer break.Thus our definition of the Balmer break strength spans both the Balmer break itself (at 3646 Å rest-frame), as well as the wellknown 4000 Å break.We have chosen to use this broad wavelength range to mimic the rest-frame wavelength ranges typically probed by e.g. the (F356W, F444W) or (F277W, F356W) filter pairs in our observational analysis.
As can be seen from the left panel of Fig. 8, our strict and relaxed Balmer break colour selection does tend to select galaxies with stronger inferred Balmer break strengths at  ∼ 10.5 (with a median of 0.35, 0.32 mag vs. 0.09 mag for the control sample).From the SED-fitting these galaxies are expected to have weaker (i.e.lower equivalent width, with a 120 Å vs. a 160 Å median) line emission though note that the [O iii] and H emission lines are (by design of our redshift cut) redshifted out of the F444W filter, so no direct/reliable emission line constraints are in principle available from the photometry.Indeed, at  ∼ 8 (right panel), where F410M and F444W now ] and H lines are redshifted out of the NIRCam range for our  ∼ 10.5 sample, with the inferred equivalent widths therefore being based off an extrapolation with no direct/reliable emission line constraints in principle being available from the photometry.Our strict and relaxed colour selections do tend to select galaxies with larger Balmer breaks, though these galaxies can have comparable emission line equivalent widths to the control sample.Furthermore, galaxies that do not meet our colour selection criteria are still found to have large (>0.4 mag) Balmer breaks.Hence, ideally the full available photometry should be used to identify galaxies with strong Balmer breaks/evolved stellar populations (as opposed to the single colour cut in this work), though additional medium-band NIRCam imaging will be essential to reliably make such inferences from the photometric data (see Section 5).do provide direct constraints on the emission line strengths, we find that our strict and relaxed Balmer-break candidates, together with the control sample, have comparable inferred line equivalent widths.On average, our strict and relaxed samples do tend to exhibit stronger (0.33, 0.35 mag vs. 0.11 mag) inferred Balmer breaks (though again we stress that with the available photometry, there can still be degeneracy between the relative contributions from emission lines and the Balmer break).We note that there are also control galaxies which ex-hibit large Balmer breaks (>0.4 mag).These objects either narrowly miss our redshift-dependent colour cut, or are near the upper edge of our redshift cut, where the sensitivity (through the F277W−F356W colour) to the Balmer break drops as the break gets redshifted out of the F356W filter.
This therefore highlights the need to, ideally, use the full available photometry to identify galaxies with strong Balmer breaks and evolved stellar populations (as opposed to a single colour cut as in this and  ∼ 8 samples, respectively).Bottom panels: Rest-frame optical sizes (F444W and F356W for our  ∼ 10.5 and  ∼ 8 samples, respectively).We find that the strict+relaxed Balmer-break candidates are roughly comparably distributed in effective radii to the control galaxies, suggesting that the typical relative contribution of AGN emission (if any) is likely similar in these systems.Some of our Balmer-break candidates (as well as control galaxies), particularly in the rest-frame optical, have small effective radii that are only slightly above the filter half width half maximum (dashed horizontal lines).Thus it is plausible that these systems may possibly harbour an AGN component, which have been suggested to potentially contribute to the flux excess seen in the longer wavelength NIRCam bands.
work).However, as will be discussed in more detail in Section 5, additional medium-band NIRCam imaging will be essential to reliably make such inferences from the photometric data.
Furthermore, we note that, upon omitting the F410M and F444W photometry (as before) for the  ∼ 8 sample and refitting with Bagpipes, we obtain a strong anticorrelation between Balmer break strength and the emission line equivalent width (as for the  ∼ 10.5 sample, not shown).
As mentioned in Section 1, the 4000 Å break also serves as a stellar age indicator, typically building up over ∼Gyr timescales (compared to the ∼several hundred Myr timescales for the Balmer break).Hence the 4000 Å break strength is expected to be small for the high-redshift, epoch of reionisation galaxies considered in this work.Indeed, we find that the median 4000 Å break strength for our  ∼ 10.5 sample of strict Balmer-break candidates is only 0.14 mag and 0.04 mag when using the original (Bruzual A. 1983) and narrow (Balogh et al. 1999) definitions of the 4000 Å break, respectively.These are considerably smaller than our  > 0.4 mag requirement for selecting strict Balmer-break candidates.Hence we advise against using the 4000 Å break strength (alone) as an age indicator for EoR galaxies, due to it being so small (at these early epochs) and therefore difficult to measure both through spectroscopy and through photometry.Rather, we instead advocate for using the stronger Balmer break, or the combined Balmer + 4000 Å break, as a stellar age indicator, though see Section 5.4 for caveats regarding this.
Finally, we briefly comment on the inferred specific star formation rates (sSFR, using the average SFR over the past 10 Myr) and UV slopes  for these galaxies, which we show in Fig. 9. Again, although we find a clear connection between the sSFR and  UV slope at  ∼ 10.5, with our strict and relaxed Balmer-break candidates tending to have lower specific star formation rates (6 Gyr −1 vs. 12 Gyr −1 ) and redder  slopes (−1.7, −1.9 vs. −2.2),we find that the  trends are less strong at  ∼ 8.

Galaxy sizes
Here we comment on the apparent sizes (measured using the SExtractor FLUX_RADIUS parameter) of our high-redshift galaxies, aiming to establish whether any of our Balmer-break candidates have compact morphologies compatible with a potentially dominant AGN component (thus disfavouring the Balmer break scenario).We show the rest-frame ultraviolet effective radii (probed by F200W for  ∼ 10.5 and F150W for  ∼ 8, top panels), as well as the rest-frame optical effective radii (probed by F444W and F356W respectively, bottom panels) in Fig. 10.The horizontal dashed lines denote the half width half maximum of the filters.For both our  ∼ 10.5 and  ∼ 8 samples, we find that the strict+relaxed Balmer-break candidates are roughly comparably distributed in effective radii to the control galaxies, rather than being e.g.preferentially more compact if a dominant AGN component was often present in these systems.Thus, the typical relative contribution of AGN emission (if any) is likely similar for the Balmer-break candidates and control galaxies.We do note that some of our Balmer-break candidates (as well as control galaxies), particularly in the rest-frame optical, have small effective radii that are only slightly above the filter half width half maximum.Thus it is plausible that these systems may potentially harbour an AGN component, which have been suggested (see e.g.Furtak et al. 2022;Barro et al. 2023;Kocevski et al. 2023;Labbe et al. 2023a;Matthee et al. 2023a) to contribute to the flux excess seen in the longer wavelength NIRCam bands for some red-looking galaxies in other samples (Labbé et al. 2023b).

RELIABLY IDENTIFYING BALMER BREAKS AND THEIR CONSTRAINING POWER
In this section we discuss caveats regarding the identification of Balmer breaks from photometric data, as well as the constraints these Balmer breaks can place on the presence of evolved stellar populations in high-redshift galaxies.

Strong line emission
The basis behind the lower limit for our redshift cut is to ensure that the strong rest-frame optical lines [O iii] and H are both redshifted out of the filter of interest, thus increasing the likelihood that the photometric excess seen is attributable to a Balmer break.However as the bulk of our redshifts are based off of photometric redshifts, there can be some uncertainty, especially with the (limited) available photometry, that this is indeed the case.The two defining features in the SEDs of these high-redshift Balmer-break candidates are the Lyman break in the shorter wavelength bands, together with the (supposed) Balmer break excess seen in the F444W or F356W bands.However, while the perceived photometric strength and location of the Lyman break depend on the source redshift (and depth), galaxies can also begin to dropout in these same filters due to strong dust attenuation in the rest-frame UV.In order to reproduce the same observed dropout strength, dropouts that are driven purely by IGM Ly attenuation will therefore be assigned a higher photometric redshift than galaxies that dropout by a combination of IGM and dust attenuation.If there is insufficient flexibility in the dust models used in the SED-fitting process to mimic the dust attenuation laws seen in these high-redshift galaxies, this could cause the redshift to be overestimated.
In addition, insufficient flexibility in the emission line templates can also cause the observed red F444W (or F356W) excess to be preferentially fit by a higher-redshift Balmer break, causing the lowerredshift [O iii]+ H solution to be overlooked entirely (thus yielding close to zero probability in the resulting photometric redshift PDFs).We found that this could be the case with e.g. the SED-fitting code SEDs and photometry for two galaxies from the FLARES simulations.The magnitude difference  356 −   on the vertical axis is the magnitude offset relative to F356W, to highlight the strength of the (supposed) Balmer break.We show both a  = 8 strong line-emitting galaxy (blue) and a  = 10 Balmer-break galaxy (red and orange).These two galaxies were selected to have very similar NIRCam wide-band photometry (dark blue and red circles, below 5 µm), hence it would be difficult to discriminate between these two scenarios, given only this data.However, with the addition of NIRCam medium-band photometry (light blue and orange squares), one can readily distinguish the two, as the signature of the  = 10 Balmer break is revealed through the consistently elevated flux density in the medium-band filters, while the medium-band photometry for the  = 8 line emitter varies between being high or low depending on whether it is boosted by line emission or just tracing the continuum level.Furthermore, MIRI F560W imaging (dark blue and red circles, above 5 µm) can in principle also help to discriminate between Balmer breaks and line emission, though can sometimes be inconclusive (see text).
Le Phare, where the emission line strengths (i.e.equivalent widths) in the models are often too low to reproduce the observed data, thus heavily favouring the Balmer break solution (see also Adams et al. 2023;Trussler et al. 2023).
To further highlight the potential difficulty in distinguishing between line emitters and Balmer-break galaxies (given the available data), we show the SEDs and photometry for two galaxies from the FLARES simulations in Fig. 11.The FLARES simulations (Lovell et al. 2021;Vĳayan et al. 2021;Wilkins et al. 2023) are a suite of hydrodynamic simulations of galaxy formation and evolution, which use the physics of EAGLE (Crain et al. 2015;Schaye et al. 2015) and self-consistently models nebular line and continuum emission, as well as the effect of the 3D star-dust geometry on the overall attenuation.These two galaxies were selected to have very similar NIRCam wide-band photometry (dark blue and red circles below 5 µm), yet starkly different origins for the F444W excess, namely  = 8 [O iii]+ H line emission vs. a  = 10 Balmer break, respectively.Note that the magnitude difference Δ on the vertical axis refers to the magnitude offset relative to F356W, i.e.Δ =  356 −   .Thus it would in principle be difficult to distinguish between these two scenarios, given only the NIRCam wide-band data.The only possible disciminating factor would be the flux in the F115W filter, which is non-zero for the  = 8 line emitter, but is consistent with zero for the  = 10 galaxy.However, if the observed data has limited depth, or the rest-frame UV dust attenuation is particularly strong in this band, then the SEDs in practice appear to be identical.
The key differentiating factor between these two SEDs is revealed with the addition of medium-band NIRCam photometry (shown in light blue and orange squares): F410M, F430M, F460M and F480M.
Here the signature of the  = 10 Balmer-break galaxy is revealed through a consistently elevated flux density across the four mediumband filters, while the nature of the  = 8 line emitter becomes apparent through the high and low flux densities measured by the medium-band filters, which are boosted by line emission and tracing the continuum level, respectively.MIRI F560W imaging (the dark blue and red circles above 5 µm) can in principle also help to discriminate between Balmer breaks and line emitters.However, the main caveat here is that this is not guaranteed.Depending on the ionisation parameter  and metallicity, which control the ratio between the [O iii] (in F444W) and H (in F560W) equivalent width, the H-boosted flux density in the F560W filter can be comparable to that in the [O iii]-boosted F444W, thus mimicking the signature of a Balmer break.
Thus medium-band imaging with all four of the F410M, F430M, F460M and F480M filters, which collectively span the entirety of the F444W filter range, is essential to confidently distinguish between Balmer breaks and strong line emission.Moreover, the inclusion of these medium-bands enables one to disentangle the contributions from line emission and continuum breaks, thus alleviating the need to apply redshift cuts (such as those in this work) to isolate the contributions from Balmer breaks to broadband photometry (see also Laporte et al. 2023).Indeed F410M imaging has already been taken for many of the fields studied in this work.Additional imaging with all three of F430M, F460M and F480M (such as for the JEMS survey, Williams et al. 2023) would take roughly half the time per pointing than MIRI F560W.This shorter exposure time, combined with the fact that the NIRCam imaging footprint (∼9.7 arcmin 2 ) is roughly 4× the MIRI footprint (∼2.35 arcmin 2 ) solidifies follow-up medium-band imaging as the ideal next step for identifying Balmerbreak galaxies with JWST.
Indeed, we note that follow-up spectroscopy with the NIRSpec G395M grating could very rapidly (i.e. with an exposure time comparable to a single medium-band NIRCam filter) identify strong [O iii]5007 emission (if there), thus ruling out the Balmer break scenario.However, a lack of strong line emission (due to [O iii] and H being redshifted out) need not necessarily imply the F444W excess (or F356W excess) seen is attributable to a Balmer break, as we will now discuss in the following two sections.

Cumulative weak line emission
Having discussed the strong emission lines ([O iii] and H), we now shift our attention to the weaker (i.e.lower equivalent width) emission lines in the rest-frame optical (e.g.[O ii], [Ne iii], H etc.).The main point we wish to make in this section is that, although these lines may (perhaps) be individually weak, when combined together, these lines can have a non-negligible contribution to the photometric excess seen in broadband photometry, thus mimicking the Balmer break signature and hence reducing the accuracy with which the Balmer break strength can be determined.
We show this more clearly in Fig. 12, where we show the restframe equivalent width for various weak rest-frame optical emission lines, against ionisation parameter , both 1 Myr (solid) and 10 Myr (dashed) after an instantaneous starburst, for a range of metallicities (various colours).We use the Nakajima & Maiolino (2022) spectral models for galaxies, which incorporate both stellar emission as well as the contributions from nebular line and continuum emission (incorporated using CLOUDY, Ferland et al. 1998Ferland et al. , 2013)).We note that we obtain similar qualitative results when using the Zackrisson et al. (2011) models and default stellar+nebular Bagpipes templates.
We show the [O ii] doublet equivalent widths in the top-left panel of Fig. 12.The [O ii] equivalent widths can in principle be very high (∼500 Å rest-frame), provided that the galaxy is observed shortly after a starburst, the metallicity is high (≥ 0.2 Z ⊙ ) and the ionisation parameter is low (log  ≤ −3.0).However, while such starburst activity may be likely in the EoR (see e.g.Endsley et al. 2023;Looser et al. 2023a,b), with the young stellar populations potentially outshining the older underlying starlight (see e.g.Narayanan et al. 2023), galaxies at these high-redshifts are expected (and indeed have been found) to be metal-poor (Maiolino et al. 2008;Maiolino & Mannucci 2019;Curti et al. 2023;Nakajima et al. 2023) and to have relatively high ionisation parameters (see e.g.Cameron et al. 2023;Curti et al. 2023;Matthee et al. 2023b).Hence the [O ii] equivalent widths may, in practice, still be relatively low.However, unless there is photometry directly probing the [O iii] strength (such as F410M and F444W for our  ∼ 8 sample), this cannot be definitively ruled out from the photometric data alone.At the very least, this [O ii] emission can have a non-negligible contribution to the photometric excess Δ = 2.5 log 10 (1 + EW rest (1 + )/Δ) observed in either F444W or F356W, as shown on the right axis of the figure.Here we assume a source at  = 10, with Δ representing the width of the F444W filter ≈ 10000 Å.
Furthermore, as shown in the top-right panel, the [Ne iii] doublet can also contribute to the photometric excess seen in broadband photometry, with the equivalent width increasing with increasing ionisation parameter.Shortly after a starburst, these equivalent widths are ∼150 Å rest-frame, producing a photometric excess Δ ∼ 0.15 mag.The H equivalent widths and photometric excesses (shown in the bottom-left panel) are comparable to this.Indeed, when combined with the emission from the rest of the Balmer series (i.e.H, H, H8, etc.), the combined Balmer rest-frame equivalent width can be 300-400 Å, i.e.Δ ∼ 0.3.Thus, when combining the contributions from all of these emission lines together (i.e.[O ii], [Ne iii], H and the rest of the Balmer series, see bottom-right panel), the net equivalent width and thus effect on broadband photometry can be considerable, generating a photometric excess Δ that is at the very least comparable to our colour criterion for selecting strict/relaxed Balmer-break candidates (0.2-0.4 mag).See also the elevated flux density in F356W (compared to F277W) for the  = 8 line emitter in Fig. 11 for a visual SED example of this.Thus it can be difficult to distinguish between a Balmer break and the cumulative effect of these weak emission lines, using wide-band NIRCam photometry alone.
Hence, additional medium-band imaging (F410M, F430M, F460M and F480M) will be essential to definitively disentangle the contributions of these weak rest-frame optical emission lines from that of the Balmer break (at  ∼ 10.5).Indeed, these four mediumband filters collectively span the width of the F444W filter, thus enabling one to isolate the (slight) emission from individual lines from the underlying continuum level.At  ∼ 8, the (F300M, F335M, F360M, F410M) filter set achieve the same result for the F356W filter.

Dusty continuum or a (dusty) AGN
In addition to line emission, a steeply rising continuum level in the rest-frame optical can mimic the photometric excess signature of a Balmer break.As has been discussed in the literature, this steeply rising continuum level can be driven by a dusty continuum (see e.Although these lines may individually be weak, when combined together, these lines can have a non-negligible effect on broadband photometry, producing a photometric excess Δ = 2.5 log 10 (1 + EW rest (1 + )/Δ) (right axes) that is comparable to our colour criterion for selecting Balmer-break candidates (0.2-0.4 mag).Additional medium-band imaging will be essential to disentangle the contributions of these weak rest-frame optical emssion lines from that of the Balmer break.et al. 2023;Kocevski et al. 2023;Labbe et al. 2023a;Matthee et al. 2023a).We briefly explore these two scenarios in this section.
Firstly, we note that our Balmer-break candidates tend to have relatively flat (or at most slightly red) SEDs in the rest-frame UV (see Fig. 2 and Fig. 3), with the SED suddenly jumping in flux density in the rest-frame optical, due to the (supposed) Balmer break.Hence this SED profile is inconsistent with a single component galaxy that is affected by dust attenuation, as we would expect (given the wavelength dependence of dust attenuation laws) the attenuation to be strongest in the rest-frame UV and weaker in the optical (i.e. the opposite of what is seen in the data).Hence we assume that our galaxy consists of two components: a relatively flat component (in terms of   ) that dominates in the rest-frame UV, and a second, dust-reddened component that begins to dominate in the rest-frame optical.If the photometric excess seen at  ∼ 10.5 in the F444W band is driven by a steeply rising dusty continuum level, and we assume that all of the measured F356W and F444W flux originates from this dusty emission, then our > 0.4 mag colour requirement for Balmer-break candidates translates into a minimum E(B−V) of 0.37, i.e.  V ≈ 1.5 (assuming a Calzetti et al. 2000 dust attenuation law).Although galaxies at these high-redshifts are not expected to be particularly dust-rich (with  1600 ≈ 2.5 V ≈ 1 for bright  ∼ 7 Lyman break galaxies, see Bowler et al. 2018Bowler et al. , 2022)), at the very least a dusty continuum can partially contribute to the photometric excess seen in the F444W (or F356W) bands.
Alternatively, the photometric excess seen may be driven by the steeply rising continuum level of a (dusty) AGN component, which begins to dominate the SED in the rest-frame optical.Assuming that this AGN emission follows a power law, with   ∝  −  , and is contributing all of the measured F356W and F444W flux, then our > 0.4 mag colour requirement for Balmer-break candidates translates into a rest-frame optical power law index requirement of  optical > 1.67.In practice, the minimum power law slope will likely have to be even steeper than this, due to the UV-bright component still contributing a non-negligible (or even substantial) amount of flux to the F356W and F444W bands (see Fig. 13).We note that in the Nakajima & Maiolino (2022) models, AGN spectra with UV power law indices of [1.2, 1.6 and 2.0] are considered.Although the power law slope tends to be less steep (i.e. a smaller ) in the rest-frame optical (which is what our photometric excess is probing), with the inclusion of additional dust attenuation the  optical > 1.67 requirement seems within reach of the expectations for AGN models.) that dominates in the rest-frame optical represents a dusty galaxy or (dusty) AGN component.The double power law fit provides a reasonable match to the photometry for this particular Balmerbreak candidate (though this is not the case for all of our candidates), yielding a comparable quality of fit to the stellar+nebular Bagpipes result (purple).Thus a steeply rising dusty/AGN continuum level in the rest-frame optical can in principle mimic (or at least contribute to) the photometric excess signature of a Balmer break, making it difficult to properly distinguish between these scenarios with NIRCam wide-band imaging alone.As before, additional medium-band imaging (as well as MIRI imaging) enables one to break this degeneracy, as these filters would trace this rapidly rising continuum (if present).
To highlight this further, we show an example of a double component galaxy fit in Fig. 13, with a blue/flat power law component that dominates in the rest-frame UV (shown in blue) and a red power law component that begins to dominate in the rest-frame optical (red).Combined (shown in green), this double power law fit provides a reasonable match to the data for this particular ∼10.5 strict Balmer-break candidate (though this is not the case for all of our candidates), yielding a comparable quality of fit to the stellar+nebular Bagpipes result.We note that the inferred power law slope for the optically-bright component is very red, with  = 4.1, which is comparable to the typical optical slopes inferred for extremely red dusty galaxy/AGN candidates at  ∼ 7.5 (Barro et al. 2023).
Thus it is at the very least plausible that a dusty continuum or a (dusty) AGN can contribute (somewhat) to the photometric excess seen in the F444W or F356W bands.Indeed, with just the NIRCam wide-band imaging alone, there is likely insufficient information to properly distinguish between these two scenarios and a Balmer break (see also Barro et al. 2023).As before, additional medium-band imaging again enables one to break this degeneracy and place tighter constraints on the Balmer break strength.In contrast to a Balmer break, where the continuum level redward of the break is relatively flat, the continuum level in the two aforementioned scenarios rises much more steeply, which can readily be identified using the (F410M, F430M, F460M, F480M) filter set, which would trace this rapidly rising continuum level (if present).In a similar vein, longer wavelength imaging with e.g.MIRI F560W would also aid in the identification of dust/AGN, again due to the steeper SEDs expected in these scenarios.However, as outlined before, the medium-band imaging is both more time-and area-efficient, and thus remains the ideal follow-up strategy also in this scenario.

The Balmer break strength as a proxy for stellar age
Even if both strong and weak line emission can be ruled out (or constrained), as well as the contributions from a dusty continuum and/or (dusty) AGN, the Balmer break strength alone likely cannot provide definitive constraints on the stellar ages of galaxies (see also Wilkins et al. 2023).This is because the Balmer break strength depends on the full star-formation history of the galaxy, which can be complex, given the likely bursty SFHs of galaxies in the EoR (see e.g.Endsley et al. 2023;Looser et al. 2023a,b).
We show this more clearly in Fig. 14, where we show how the Balmer break strength (in magnitudes, following our definition in Section 4) evolves with time, for a range of different metallicities (various colours).For this analysis we use the default stellar and nebular templates from the SED-fitting code Bagpipes, and assume zero dust attenuation.
In the left panel of Fig. 14, we see that the Balmer break strength rapidly rises following an instantaneous starburst, passing our 0.4 mag colour criterion in less than 50 Myr.Hence a moderatelysized Balmer break in this scenario does not imply particularly old stars, nor does it therefore place particularly strong constraints on the onset of star formation in the Universe in this case.On the other hand, we see that the Balmer break strength rises much more slowly in the case of a constant star-formation history (right panel).In the more metal-enriched scenarios (≥ 0.2 Z ⊙ ), it takes ∼250 Myr of constant star formation to pass our 0.4 mag threshold, and even takes up to the age of the Universe at  = 10.5 and  = 8 (vertical dashed lines) at even lower metallicity.Hence a moderately-sized Balmer break implies very old stellar ages in this scenario, and would therefore place very strong indirect constraints on the onset of star formation in the Universe in this case.
Furthermore, we note that, from our investigations, a 1 Myr starburst is able to mask out the Balmer break signature of e.g. a 200 Myr old starburst, even if the young population only has 3% the mass of the older stars.Hence the Balmer break strength is very dependent on the star-formation history of the galaxy.
Thus we wish to make three key summarising points regarding the use of the Balmer break strength as a proxy for the presence of old stars in high-redshift galaxies.Firstly, that a moderately-sized Balmer break (such as the 0.4 mag for our colour criterion) need not imply a particularly old stellar population.Secondly, that a galaxy with old stellar populations does not necessarily have a particularly strong Balmer break.Thirdly, that it is only the strongest Balmer breaks (perhaps 0.8 mag or so) that likely imply the presence of relatively old stars (taking e.g.∼100 Myr after an instantaneous starburst and unachievable within the age of the Universe at these redshifts for a constant SFH).However, such large Balmer breaks in the observed data should be interpreted with caution, as they may be attributable to strong line emission ([O iii] and H) that is misidentified as a Balmer break (see also Adams et al. 2023;Trussler et al. 2023).
Thus the Balmer break strength alone cannot provide definitive constraints on the age of the stellar population.Indeed, Wilkins et al. (2023), who investigate the Balmer break strength in the FLARES simulations in greater detail, arrive to a similar conclusion.However, as we have seen in Section 4 and will discuss in more detail in the next two subsections, a galaxy with a larger Balmer break does, on average, have an older stellar age.As outlined earlier, additional medium-band imaging will be essential to robustly identify Balmer-break galaxies.redward of the Balmer break at  = 8 (and  = 7, 9) should have comparable flux densities, as they both probe the continuum level.In other words, the F410M−F444W colour should be close to zero.Crucially, these two filters also cover the strong [O iii] and H lines at these redshifts, and owing to the fact that the F410M filter is narrower than the F444W filter, it is more sensitive to and thus more strongly boosted by line emission.Hence at  = 7, where the [O iii] line is both in the F410M and F444W filters, any ongoing (strong) star formation would manifest itself as a blue F410M−F444W colour.Similarly, at  = 8, where [O iii] is still in the F444W filter (but not in F410M), any ongoing star formation would result in a red F410M−F444W colour.Thus our F410M−F444W colour cut requires this colour to be close to zero, to rule out strong line emission, i.e. −0.1 < F410M−F444W < 0.2.However, similar to the caveats with the previous colour selection, galaxies with no ongoing star formation need not necessarily be old.Furthermore, galaxies with old stars can still currently be actively star-forming.
We show these two colour selections, together with our Balmer break colour selection, against the mass-weighted stellar ages of FLARES galaxies in Fig. 16.We note that only galaxies with stellar masses log( * /M ⊙ ) > 9 are shown in this diagram.
In the top panel of Fig. 16, we find that, as has been discussed before, that the Balmer break strength does not correlate very well with the stellar age, provided that the Balmer break strength is small.It is only for the largest Balmer break values, that a larger break strength generally is associated with an older stellar population.
Similarly, in the middle panel of Fig. 16, the rest-frame UV colour does not generally correlate very well with the galaxy age.It is only for the reddest, most extreme colours, that the age tends to be higher than for a random selection, thus motivating this additional colour selection.
Finally, in the bottom panel of Fig. 16, we show our F410M−F444W colour selection.As can be seen, the  = 7 galaxies with flat (i.e.close to zero) colours and presumably a lack of line emission tend to be older than the rest of the population.Likewise, the older  = 8 galaxies tend to have more flat F410M−F444W colours, though the trends are clearly not as strong as for  = 7.
We remark that these additional colour selections appear to become less effective at lower stellar masses (8 < log( * /M ⊙ ) < 9).These lower mass galaxies tend to have bluer UV slopes, with now only a single galaxy satisfying our UV colour requirement.
Thus ideally (and ultimately) the full photometry should be used to infer the presence of evolved stars in high-redshift galaxies.However, medium-band imaging is essential for the reliable identification of Balmer breaks and thus for placing definitive constraints on the stellar populations in these galaxies.In the absence of such data, applying single (or multiple) colour cuts increases the likelihood of selecting galaxies with evolved stellar populations.

CONCLUSIONS
In this work we utilised a combination of PEARLS GTO and public NIRCam photometric data to search for Balmer-break candidate galaxies at 7 <  < 12.In principle, the detection of a Balmer break implies the presence of evolved stars in these high-redshift galaxies, thus placing valuable indirect constraints on the onset of star formation in the Universe.
As the signature of a Balmer break in photometry is a photometric excess seen in the rest-frame optical, i.e. the longer wavelength NIRCam bands, we use the measured F444W excess or F356W excess as the basis for our identification of Balmer-break candidates at  ∼ 10.5 and  ∼ 8, respectively.We investigate and compare the physical properties and star formation histories of our Balmer-break candidates against a control sample of galaxies (i.e.galaxies at the same redshifts which do not exhibit a F444W or F356W excess).
Regarding the magnitudes (i.e.brightnesses) and stellar masses of our Balmer-break candidates, we find the following: • Our Balmer-break candidates are similarly distributed in brightness compared to the non-Balmer break control sample.
• However, we do find that our Balmer-break candidates tend to be more massive for a given brightness than the control sample.
• Therefore, our results suggest that the (very) large stellar masses inferred for some red rest-frame optical galaxies in the EoR (log ( * /M ⊙ ) > 10, Labbé et al. 2023b), may be driven by their higher inferred mass-to-light ratios (a result of the SED-fitting process and assumptions therein), rather than these galaxies being inherently ultra-bright.
• We find that strong line emission, the cumulative effect of weak emission lines, a dusty continuum and a (dusty) AGN can all contribute to the photometric excess seen in broadband photometry.This somewhat alleviates the need for evolved stellar populations, bringing the inferred stellar mass-to-light ratios down, thus likely lowering the (extremely) large stellar masses inferred for some high-redshift galaxies to be more in line with the predictions from theoretical simulations of galaxy formation.Furthermore, utilising the Bagpipes SED-fitting code to derive constraints on the stellar ages, star formation histories, emission line equivalent widths, specific star formation rates and UV slopes , we find the following: • Our  ∼ 10.5 sample of Balmer-break candidates (i.e.galaxies with a F444W excess) tend to be older (with a median age of 115 Myr), have more extended star formation histories, have lower inferred [O iii] + H equivalent widths (120 Å rest-frame), have lower specific star formation rates (6 Gyr −1 ) and redder UV slopes ( = −1.8)than the control sample of galaxies.This suggests that the observed strength of (what we believe to be) a Balmer break places tight constraints on the age and star-formation history of a galaxy.
• However, these trends all become less strong for our  ∼ 8 sample of Balmer-break candidates.Now it is only true that, on average, the Balmer-break candidates display older ages (100 Myr) than the control sample (30 Myr), i.e. a Balmer break alone does not guarantee an old stellar population.
• Indeed, the F444W filter now probes the strong rest-frame optical [O iii] and H emission lines in these  ∼ 8 galaxies, thus providing valuable additional constraints on their current star formation activity.
• Thus longer wavelength photometry which probes beyond the Balmer break, such as MIRI F560W, will likely be essential to place stronger constraints on the star formation histories on the highest redshift galaxies, such as our  ∼ 10.5 sample.
• The weakened connection between the (supposed) observed Balmer break strength (i.e.F356W excess) and the inferred galaxy stellar age may be attributable to the (likely) bursty nature of these EoR galaxies, causing more of a disconnect between their current star formation activity and SED profiles, and their more extended star-formation history.Finally, we discuss caveats regarding the identification of Balmer breaks from photometric data, as well as the constraints these Balmer breaks can place on the presence of evolved stellar populations in high-redshift galaxies.We find the following: • Despite applying redshift cuts to remove strong [O iii] + H line emitters from our Balmer-break candidate sample, such line emitters can still be confused for Balmer-break galaxies due to the limited constraining power of wide-band photometric data, thus resulting in inaccurate photometric redshifts.
• The cumulative emission from the weak rest-frame optical emission lines ([O ii], [Ne iii], H, H and the remainder of the fainter Balmer series) can contribute a non-negligible amount of flux density to broadband filters (with a photometric excess Δ that can be a few tenths of a mag), thus mimicking the photometric excess signature of a Balmer break.
• Both a dusty continuum, as well as a (dusty) AGN, which begin to dominate in the rest-frame optical, can contribute to the photometric excess seen in broadband filters.
• All of the above processes can therefore (somewhat) masquerade as a Balmer break in the wide-band photometric data, thus reducing the accuracy with which the Balmer break strength can be inferred.
• Additional medium-band imaging, which can clearly discriminate between these scenarios, will therefore be essential for the reliable identification of Balmer breaks.At  ∼ 10.5 the (F410M, F430M, F460M, F480M) filter set collectively span the full width of the F444W filter, thus enabling one to disentangle the contributions from line emission, a sharply rising dust/AGN continuum and a Balmer break.The (F300M, F335M, F360M, F410M) filter set achieves the same purpose at  ∼ 8.
• Even if these other processes can be ruled out or accounted for, the Balmer break strength alone cannot serve as the definitive indicator of the stellar age of the galaxy.This is because it exhibits a complex dependence on the star-formation history of the galaxy.Thus, ultimately, deep NIRSpec continuum spectroscopy, together with MIRI imaging, will be needed to provide the strongest possible constraints on the star formation histories of these high-redshift galaxies.In turn, these observations will place the strongest indirect constraints on the onset of star formation in the Universe, thereby revealing when cosmic dawn breaks.

Figure 1 .
Figure1.Our colour-redshift selection for identifying Balmer-break candidates.In order to distinguish between line emission and Balmer breaks, we require the galaxies to be at sufficiently high redshift such that the [O iii] and H lines are redshifted out of the red filter in the filter pair defining the colour selection (left vertical dashed lines).We also require the galaxies to be at sufficiently low redshift such that the Balmer break is at most two-thirds of the way through the red filter (right vertical dashed lines).Our flat colour selection (red horizontal dashed lines) and redshift-dependent colour selection (orange) which accounts for the fact that the perceived strength of the Balmer break depends on where the break resides within the filter pair.Our strict (red squares/upward triangles), relaxed (orange diamonds/downward triangles) and control (blue circles) sample of galaxies.Crosses denote galaxies which do not satisfy our > 68% integrated redshift probability criterion.Individual error bars are shown for our strict and relaxed Balmer-break candidates.The median error bars for the full sample of galaxies are shown in the corner of each panel in black.Left panel: Our Balmer break selection at  ∼ 10.5, for galaxies which exhibit a flux density excess in the F444W filter.Right panel: Our Balmer break selection at  ∼ 8, for galaxies which exhibit a flux density excess in the F356W filter.

Figure 4 .
Figure 4.The  ∼ 10.5 Balmer-break candidate displaying the largest F444W excess in our sample.Whilst the Balmer break solution (blue, green, purple) yields a satisfactory fit to the NIRCam wide-band data, the F410M flux density is strongly overpredicted.A  = 7.88 Bagpipes fit (yellow) better matches the medium-band measurement, attributing the lack of F410M flux density to the low-lying Balmer jump continuum, with the F444W excess now being driven by strong [O iii] + H line emission.Additional medium-band photometry is needed to definitively distinguish between these two scenarios (see also Fig.11).

Figure 5 .
Figure 5. Magnitudes (i.e.brightnesses) against stellar masses for our  ∼ 10.5 (left panels) and  ∼ 8 samples (right), as well as theLabbé et al. (2023b) sample of red massive candidate galaxies at 7.4 <  < 9.1 (dark red hexagons).The median error bars for our samples are shown in the corner of each panel in black.Top panels: Rest-frame UV absolute magnitudes  1500 .Middle panels: Absolute magnitudes immediately blueward of the Balmer break ( 3500 ).Bottom panels: Rest-frame optical (i.e.redward of the Balmer break) apparent magnitudes, using F444W (left) and F356W (right).Our Balmer-break candidates exhibit comparable intrinsic brightnesses to the control sample, but do tend to have larger inferred stellar masses for a given brightness.Thus the larger stellar masses found for our Balmer-break candidates are an outcome of the SED-fitting process (and the assumptions therein), rather than being due to their inherent brightness.As strong emission lines, the cumulative effect of weaker emission lines, a dusty continuum and a (dusty) AGN can all contribute to the red excess seen in these Balmer-break candidates, there is certainly room for bringing the (sometimes extremely large) inferred stellar masses inLabbé et al. (2023b) down to be more in line with the expectations from simulations of galaxy formation.
Figure6.Photometric excess (a proxy for the Balmer break strength) against the inferred mass-weighted stellar ages of the galaxies in our sample.Colour coding and symbols are the same as in Fig.1.The median excess and age are also shown for each sample, with the endpoints of the error bars representing the 16-84 percentile range.The median and percentile range for the relaxed sample combines both the relaxed and strict sample galaxies.The red dashed vertical line denotes our flat colour threshold for selecting strict Balmer-break candidates.Top-left panel: F444W excess for our  ∼ 10.5 sample of galaxies.Top-right panel: F356W excess for our  ∼ 8 sample of galaxies.Note the weaker correlation between the observed excess and the inferred stellar age than for the  ∼ 10.5 sample.Bottom-right panel: F356W excess for our  ∼ 8 sample of galaxies, but now omitting the F410M and F444W photometry in the SED-fitting with Bagpipes (mimicking the rest-frame coverage for the  ∼ 10.5 sample).A stronger correlation between observed excess and age emerges, as with the  ∼ 10.5 sample.Thus, longer wavelength photometry, which probes the (stronger) emission lines in the rest-frame optical beyond the Balmer break, are likely essential to derive more accurate stellar ages and star formation histories for galaxies (such as MIRI F560W for our  ∼ 10.5 sample).

Figure 7 .
Figure7.Normalised (to the current SFR) star formation histories for our  ∼ 10.5 (left panel) and  ∼ 8 samples (right).We display both the median (solid) and 16, 84 percentiles (dotted) SFH percentiles, determined by taking the appropriate percentile per lookback time bin of the normalised SFH for each galaxy in the sample.The age of the Universe on the upper axis assumes the galaxy is at  = 10.5 (left) and  = 8 (right).We stress that we just focus on the qualitative aspects of the inferred star formation histories, due to the insufficient constraining power from the current data, as well as the dependency on the assumed SFH parameterisation and priors on the associated parameters.Ultimately, deep continuum spectroscopy with the NIRSpec PRISM, together with MIRI imaging, will be needed to provide the best possible indirect constraints on the onset of star formation in the Universe.

Figure 9 .
Figure9.UV slope  against specific star formation rate (sSFR, using the average SFR over the past 10 Myr).Our strict and relaxed Balmer-break candidates at  ∼ 10.5 tend to have lower specific star formation rates and redder  slopes than the control sample, though the  trends seem less strong at  ∼ 8.

Figure 10 .
Figure10.Effective radius  e (in arcsec) against absolute UV magnitude  1500 .Top panels: Rest-frame ultraviolet sizes (F200W and F150W for our  ∼ 10.5 and  ∼ 8 samples, respectively).Bottom panels: Rest-frame optical sizes (F444W and F356W for our  ∼ 10.5 and  ∼ 8 samples, respectively).We find that the strict+relaxed Balmer-break candidates are roughly comparably distributed in effective radii to the control galaxies, suggesting that the typical relative contribution of AGN emission (if any) is likely similar in these systems.Some of our Balmer-break candidates (as well as control galaxies), particularly in the rest-frame optical, have small effective radii that are only slightly above the filter half width half maximum (dashed horizontal lines).Thus it is plausible that these systems may possibly harbour an AGN component, which have been suggested to potentially contribute to the flux excess seen in the longer wavelength NIRCam bands.
Figure11.SEDs and photometry for two galaxies from the FLARES simulations.The magnitude difference  356 −   on the vertical axis is the magnitude offset relative to F356W, to highlight the strength of the (supposed) Balmer break.We show both a  = 8 strong line-emitting galaxy (blue) and a  = 10 Balmer-break galaxy (red and orange).These two galaxies were selected to have very similar NIRCam wide-band photometry (dark blue and red circles, below 5 µm), hence it would be difficult to discriminate between these two scenarios, given only this data.However, with the addition of NIRCam medium-band photometry (light blue and orange squares), one can readily distinguish the two, as the signature of the  = 10 Balmer break is revealed through the consistently elevated flux density in the medium-band filters, while the medium-band photometry for the  = 8 line emitter varies between being high or low depending on whether it is boosted by line emission or just tracing the continuum level.Furthermore, MIRI F560W imaging (dark blue and red circles, above 5 µm) can in principle also help to discriminate between Balmer breaks and line emission, though can sometimes be inconclusive (see text).

Figure 12 .
Figure 12.The rest-frame equivalent widths for various weak rest-frame optical emission lines, as a function of ionisation parameter , shown both 1 Myr (solid) and 10 Myr (dashed) after an instantaneous starburst, for a range of metallicities (various colours).We use the Nakajima & Maiolino (2022) spectral models for galaxies.Top-left panel: [O ii] doublet.Top-right panel: [Ne iii] 3869.Bottom-left panel: H.Bottom-right panel: The combined equivalent width of all of the weak emission lines ([O ii], [Ne iii],H, H  and the rest of the fainter Balmer series).Although these lines may individually be weak, when combined together, these lines can have a non-negligible effect on broadband photometry, producing a photometric excess Δ = 2.5 log 10 (1 + EW rest (1 + )/Δ) (right axes) that is comparable to our colour criterion for selecting Balmer-break candidates (0.2-0.4 mag).Additional medium-band imaging will be essential to disentangle the contributions of these weak rest-frame optical emssion lines from that of the Balmer break.

Figure 13 .
Figure13.Double power law fit (green) to the NIRCam photometry (black) of a  ∼ 10.5 strict Balmer-break candidate.The relatively flat/blue power law (shown in blue, with  = −0.1)that dominates in the rest-frame UV represents a young/unobscured component, while the (very) red power law (shown in red, with  = 4.1) that dominates in the rest-frame optical represents a dusty galaxy or (dusty) AGN component.The double power law fit provides a reasonable match to the photometry for this particular Balmerbreak candidate (though this is not the case for all of our candidates), yielding a comparable quality of fit to the stellar+nebular Bagpipes result (purple).Thus a steeply rising dusty/AGN continuum level in the rest-frame optical can in principle mimic (or at least contribute to) the photometric excess signature of a Balmer break, making it difficult to properly distinguish between these scenarios with NIRCam wide-band imaging alone.As before, additional medium-band imaging (as well as MIRI imaging) enables one to break this degeneracy, as these filters would trace this rapidly rising continuum (if present).
Examples of galaxies that satisfy our strict Balmer break colour criterion and redshift requirement (top-left and top-right panels), a galaxy that satisfies our relaxed colour criterion (bottom-left) and a galaxy that does not satisfy either colour criterion (i.e. the control sample, bottom-right), for our  ∼ 10.5 sample.Top subpanels: NIRCam photometry (black) and SNR for each band (non-detections are displayed with downward arrows at the 2 depth), together with the best-fit Le Phare (light blue) and EAZY (green) SEDs, as well as the median-fit Bagpipes SED (magenta, which has been fit assuming either the EAZY redshift PDF as a prior, or fixed at the known spectroscopic redshift).The locations of the Balmer breaks (3646 Å rest-frame) for the Le Phare and EAZY fits are denoted by the colour-coded dashed vertical lines.Bottom subpanels: our colour-redshift selection, as well as the Le Phare and EAZY photometric redshift probability distribution functions, with median redshifts displayed.
SED fitting procedure) is that the F444W excess is attributable to strong [O iii] + H line emission, with the faint F410M flux density tracing the lowered rest-frame optical continuum level associated with the Balmer jump (essentially the inverse of the Balmer break, being driven by ionised hydrogen recombining to the  = 2 state).This scenario requires the [O iii] + H lines to reside within the spectral range of the F444W filter, constraining the redshift to 7.0 <  < 9.3.Furthermore, due to the dense packing of rest-frame optical emission lines, the F410M filter can only provide a clean (i.e.unaffected by emission lines) measurement of the continuum when it falls between the H and H emission lines, which further constrains the redshift to 7.85 <  < 7.91.Hence we perform another Bagpipes

Table 1 .
Trussler et al. 2023).Coordinates (Field, Right Ascension, Declination), EAZY photometric redshifts, F444W excess (F356W−F444W) and inferred physical properties (stellar mass  * , SFR and mass-weighted stellar age) for our  ∼ 10.5 sample of strict and relaxed Balmer-break candidates.Note that the physical properties were derived using Bagpipes, with the masses and SFRs being corrected by the ratio of the fluxes measured within a Kron elliptical and 0.32 arcsec diameter circular aperture (see Section 4.1).
Balmer break strength (in magnitudes, see text for definition) against the combined rest-frame equivalent width of the [O iii] 4959, 5007 doublet and H, both inferred from SED fitting.It should be noted that the [O iii