The sizes of bright Lyman-break galaxies at 𝑧 ≃ 3 − 5 with JWST PRIMER

We use data from the JWST Public Release IMaging for Extragalactic Research (PRIMER) survey to measure the size scaling relations of 1668 rest-frame UV-bright Lyman-break galaxies (LBGs) at 𝑧 = 3 − 5 with stellar masses log 10 ( 𝑀 ★ / 𝑀 ⊙) > 9. The sample was selected from seeing-dominated ground-based data, presenting an unbiased sampling of the morphology and size distributions of luminous sources. We fit Sérsic profiles to eight NIRCam bands and also measure a non-parametric half-light radius. We find that the size distributions with both measurements are well-fit by a log-normal distribution at all redshifts, consistent with disk formation models where size is governed by host dark-matter halo angular momentum. We find a size-redshift evolution of 𝑅 𝑒 = 3 . 51 ( 1 + 𝑧 ) − 0 . 60 ± 0 . 22 kpc, in agreement with JWST studies. When considering the typical (modal) size over 𝑧 = 3 − 5, we find little evolution with bright LBGs remaining compact at 𝑅 𝑒 ≃ 0 . 7 − 0 . 9 kpc. Simultaneously, we find evidence for a build-up of large ( 𝑅 𝑒 > 2 kpc) galaxies by 𝑧 = 3. We find some evidence for a negatively sloped size-mass relation at 𝑧 = 5 when Sérsic profiles are used to fit the data in F200W. The intrinsic scatter in our size-mass relations increases at higher redshifts. Additionally, measurements probing the rest-UV (F200W) show larger intrinsic scatter than those probing the rest-optical (F356W). Finally, we leverage rest-UV and rest-optical photometry to show that disky galaxies are well established by 𝑧 = 5, but are beginning to undergo dissipative processes, such as mergers, by 𝑧 = 3. The agreement of our size-mass and size-luminosity relations with simulations provides tentative evidence for centrally concentrated star formation at high-redshift.


INTRODUCTION
The diversity of galaxy shapes and size have fascinated astronomers for decades.Early visual classification of structure within galaxies established the Hubble tuning fork, a classification system still used to this day (Hubble 1926).In the distant Universe, where galaxies are still evolving into the structures we see locally, their sizes are a highly useful probe into the fundamental physics of their growth (e.g.see Conselice 2014).
Luminous, massive (log 10 ( ★ / ⊙ ) > 9) galaxies at highredshift ( ≳ 3) are excellent laboratories for galaxy physics.These galaxies populate the most massive dark matter haloes (Wechsler & Tinker 2018) and must be more evolved than their low-mass counterparts to accumulate such large stellar mass.This means they have had time to undergo major mergers, and there may have been enough time for star-forming clumps to coalesce and form bulges and disks (e.g.Dekel et al. 2009;Rujopakarn et al. 2023).Massive galaxies are amongst the first to undergo a quenching of their star formation (e.g.Cowie et al. 1996;Peng et al. 2010;McLeod et al. 2021).Studying the resolved properties of these galaxies can thus enable us to learn about ★ rohan.varadaraj@physics.ox.ac.uk the earliest structural formation and feedback processes.The search for these rare massive galaxies was revolutionised by the advent of degree-scale ground-based imaging (e.g.Miyazaki et al. 2002;Lawrence et al. 2007;McCracken et al. 2012;Jarvis et al. 2013), necessary due to the low surface densities of high-redshift Lyman break galaxies (LBGs).Since then, various number statistics of these rare high-redshift galaxies have been well-constrained at  ≃ 3 − 10, such as the rest-UV luminosity function (e.g.Bowler et al. 2014;Ono et al. 2018;Stefanon et al. 2019;Bowler et al. 2020;Bouwens et al. 2021;Harikane et al. 2022;Donnan et al. 2023;Adams et al. 2023c;Varadaraj et al. 2023) and the two-point correlation function (clustering, e.g.Adelberger et al. 2005;Bian et al. 2013;Durkalec et al. 2015;Hatfield et al. 2018;Harikane et al. 2022).However, since ground-based images are often seeing-dominated (e.g.Bowler et al. 2017), resolved properties have hitherto been difficult to measure.
Space-based telescopes such as the Hubble Space Telescope (HST) do not suffer from atmospheric seeing, providing a clear view of galaxy morphology (e.g.Ferguson et al. 2004;Trujillo et al. 2006;Mosleh et al. 2011;Law et al. 2012;Grazian et al. 2012;van der Wel et al. 2014;Shibuya et al. 2015;Whitney et al. 2019), but rest-frame optical-wavelength shapes were only observable out to  ≃ 3.This limited the comparison of high-redshift morphologies with local optical observations, as well as the comparison between young and old stellar populations (as probed by the rest-UV and rest-optical respectively) in high-redshift galaxies.JWST's infrared coverage and unparalleled resolution has opened the window to rest-UV and rest-optical for the first time at  ≳ 3 (e.g.Ferreira et al. 2023;Kartaltepe et al. 2023;Leethochawalit et al. 2023;Ormerod et al. 2023) revealing early mass assembly and a significant fraction of disk galaxies by  ≃ 6.
Models describe galaxy disk formation by creating rotationally supported exponential disks from baryons.These disks have mass and angular momentum that are some fixed fraction of the host dark matter (DM) halo.The disk scale radius is proportional to a dimensionless 'spin parameter' , which is related to the angular momentum, mass and energy of the system (Fall & Efstathiou 1980;Mo et al. 1998;Bullock et al. 2001).These models can be tested with various size scaling relations.First, if the size of galaxies are determined from their host halo angular momentum, then we expect the size distribution of galaxies to follow a log-normal distribution (de Jong & Lacey 2000).Second, the disk model of Fall & Efstathiou (1980) predicts that the disk size of a galaxy scales with redshift as  ∝ (1 + ) −1 in the case of a fixed circular velocity of the host DM halo, and as  ∝ (1 + ) −1.5 in the case of fixed virial mass.Prior to JWST, various studies have supported each relation, as well as for values in-between (e.g.Hathi et al. 2008;Oesch et al. 2010;Mosleh et al. 2011).Furthermore, Curtis-Lake et al. (2016) argue that if one considers the typical (modal) sizes of galaxies then no strong evolution is seen, with the effect driven by incompleteness in HST samples at  ≳ 3. Third, the size-mass relation (and the related sizeluminosity relation) gives insight into quenching mechanisms.In the local Universe, quiescent galaxies follow a steeper size-mass relation than star-forming galaxies (SFGs, van der Wel et al. 2014).The onset of quenching may thus be seen in an evolution of the high-redshift size-mass relation.Additionally, within the framework of these disk formation models, van der Wel et al. (2014) find no evolution in the size-mass relation of SFGs, suggesting that their sizes are still controlled by their host DM halos.Note however that large scatter and uncertainty exists in these measurements (e.g.Huang et al. 2013;Curtis-Lake et al. 2016;Shibuya et al. 2015;Whitney et al. 2019).
JWST has opened a new dimension to the study of size scaling relations via their wavelength dependence.Since rest-optical and rest-UV observations probe stellar populations of different ages, we may expect differences based on how and where stars form in high-redshift galaxies.Simulations of LBGs at a broad range of wavelengths have made enticing predictions for the JWST era of astronomy.García-Argumánez et al. (2023) simulate observations from the Cosmic Evolution Early Release Science (CEERS) Survey (Finkelstein et al. 2017) with Illustris-1 (Vogelsberger et al. 2014) and find that progenitors of local galaxies with  > 10 11  ⊙ form ∼ 25% of their  ∼ 1 stellar masses by  ≃ 2.7, suggesting significant mass buildup at high redshift.Shen et al. (2024) use the THESAN simulation (Kannan et al. 2022a) to show that massive ( ★ > 10 9  ⊙ ) galaxies become increasingly compact over  = 10 to  = 6.Additionally, their sizes roughly agree with analytical predictions from disk formation models, governed by the spin parameter  (Fall & Efstathiou 1980;Mo et al. 1998;Bullock et al. 2001).
In the First Light And Reionization Epoch Simulations (FLARES, Lovell et al. 2021), Roper et al. (2022) (hereby R22) find intrinsically negative size-luminosity relations.At  = 5 − 10, the inclusion of dust attenuation predicts observed size-luminosity relations with increasingly positive slopes at shorter wavelengths, building on similar predictions by Marshall et al. (2022).Costantin et al. (2023) (hereby C23) also simulate CEERS observations with Illustris TNG50 (Pillepich et al. 2019) at  = 3 − 6 and predict a size-redshift evolution in the rest-optical consistent with fixed circular velocity of DM haloes, as well as negative observed size-mass relations with wavelength at  = 5 − 6.Both R22 and C23 explain their findings by requiring that massive galaxies undergo episodes of centrally concentrated star formation, leading to small sizes at high mass and high dust attenuation in the cores of massive galaxies.Studying massive galaxies with JWST allows us to test these predictions.
Having established the versatility of galaxy size measurements for understanding galaxy evolution, we now discuss how sizes are measured.Galaxy sizes can be complex to define depending on the morphology, so comparing like-for-like with other observations and simulations can often be complicated by the choice of definition.The Sérsic (1963) brightness profile is a common functional form fit to galaxies following  1/ for a given 'Sérsic index' .This profile assigns galaxies an effective radius   which contains half the total light.Various values of  are effective at fitting both late-and early-type massive galaxies in the local Universe (e.g.Lange et al. 2015).Additionally, JWST has continued to find that galaxies at highredshift which appear disky are well-fit by a Sérsic profile with  ≃ 1 (Adams et al. 2023b;Kartaltepe et al. 2023;Ormerod et al. 2023).Alternatively, non-parametric measurements using some form of a circularised half-light radius are a popular choice in simulations at high redshift (Roper et al. 2022;Katz et al. 2023).Such definitions make no assumptions on the shape of the galaxy, allowing a measurement to be made for disturbed and clumpy morphologies which are commonly seen at high redshift (Shibuya et al. 2016;Huertas-Company et al. 2023;Kartaltepe et al. 2023;Jacobs et al. 2023).It is thus vital that studies on galaxy size consider various definitions to test differences between parametric and non-parametric fitting, allow for a diverse range of galaxy morphologies, and provide like-for-like comparison with simulations and between redshifts.
In this work we measure the sizes of massive, rest-UV bright galaxies at  = 3−5 selected from ground-based imaging.We use NIRCam imaging from Public Release IMaging for Extragalactic Research (PRIMER, Dunlop et al. 2021) that overlaps with the ground-based imaging from Visible and Infrared Survey Telescope for Astronomy (VISTA) Deep Extragalactic Observations (VIDEO, Jarvis et al. 2013), UltraVISTA (McCracken et al. 2012) and the Hyper-Suprime Cam Subaru Strategic Program (HSC-SSP, Aihara et al. 2022) to measure sizes across a broad wavelength range, 0.9 − 4.4m.We use Sérsic and non-parametric size fitting to measure the redshift evolution of sizes, as well as the size-mass and size-luminosity relations.We also test the wavelength dependence of the size-mass and sizeluminosity relations, comparing with predictions from simulations.
The layout of this paper is as follows: in Section 2 we outline the ground-selected catalogues and JWST imaging used in this work.Section 3 describes the creation of postage stamp images and the size fitting of the sample.Our results are presented in Section 4, where we discuss the size-redshift evolution, the size-mass relations and the size-luminosity relations.In Section 5 we discuss our results in the context of disk formation models.We also explore difficulties in measuring sizes and the size-redshift evolution.We then examine the comparison of our results to simulations, after which we discuss the implications of our results and simulation comparison for the scenario of compact central star formation at high-redshift.We finally conclude in Section 6.We use a Λ cold dark matter (CDM) cosmology with H 0 = 70 km s −1 Mpc −1 , Ω M = 0.3, Ω Λ = 0.7.Magnitudes are reported in the AB system (Oke & Gunn 1983).

DATA AND SAMPLE
In this section we outline the deep degree-scale ground-based imaging and derived catalogues, as well as the JWST imaging used in this work.Although the area covered by PRIMER is much smaller than ground-based surveys, the improvement in resolution allows us to measure resolved properties in what was previously a seeing-limited sample.We do not conduct a new selection with the JWST data to make use of this fact -a seeing-dominated sample is less biased by the sizes, with galaxies being purely selected by being bright in the rest-frame UV, providing a better representation of typical galaxy sizes and avoiding selection effects.

Ground-based catalogues
We make use of rest-UV selected catalogues in the 1.5 deg 2 COS-MOS field and 4.3 deg 2 XMM-Large Scale Structure (XMM-LSS) field created by Adams et al. (2023c).The full details of the selection can be found within their section 2, however we summarise it here.Ground-based near-infrared imaging from the UltraVISTA survey (McCracken et al. 2012) in COSMOS and VIDEO (Jarvis et al. 2013) et al. 2007).We update the photometry with the most up-to-date survey data, including HSC-SSP Data Release 3 (Aihara et al. 2022) and UltraVISTA Data Release 5. We show an example galaxy at  = 4.58 in these filters in Fig. 1.Fluxes are measured in 2 arcsec diameter circular apertures.The LePhare spectral energy distribution (SED) fitting code (Arnouts et al. 1999;Ilbert et al. 2006) is used to calculate photometric redshifts and classify objects into galaxies, active galactic nuclei (AGN) and Milky Way stars.Objects are split into redshift bins at  ≃ 3, 4 and 5 (2.75 <  < 3.5, 3.5 <  < 4.5 and 4.5 <  < 5.5 respectively).They are then required to be detected at a 5 threshold in the deepest band containing the rest-frame ultraviolet continuum emission.These are HSC-, HSC- and HSC- for each redshift bin in the regions containing the JWST imaging, outlined in Section 2.3.The 5 depths of the ,  and  bands are 27.1,26.9 and 26.5 in COSMOS and 26.4,26.3 and 25.7 in UDS based on a 2 arcsec diameter circular aperture.For the  ≃ 5 sample, a < 3 limit is imposed on the CFHT- band to ensure a strong Lyman break and to remove further low-redshift interlopers.A cut of  2 < 100 is applied to the best-fitting SED solution to remove artefacts and contaminants.Spitzer/IRAC data are not used due to blending issues introduced by the large point spread function (PSF).The catalogues are then cross-matched with a spectroscopic sample to test the reliability of the photometric redshifts.The outlier rate (the fraction of photometric redshifts which differ from the spectroscopic redshift by more than 0.15 × (1 + )) is no more than 4.5% in any of the redshift bins.The recovery rate of objects in the spectroscopic sample is also greater than 80% for the redshifts of interest.The COSMOS and XMM-LSS catalogues contain 175,508, 47,978 and 14,875 objects in the  = 3, 4, 5 bins respectively.

Photometry and SED fitting
Some objects, particularly at  ≃ 3, have effective radii significantly larger than the 2 arcsec diameter aperture used to measure their photometry, resulting in underestimates of the total flux.We use SExtractor (Bertin & Arnouts 1996) to measure FLUX_AUTO (which uses a flexible elliptical aperture to estimate total flux) in each ground-based filter.We then apply a correction to the aperture fluxes in each band based on the ratio of FLUX_AUTO to the aperture flux in   (Adams et al. 2023a).We find that the ratio is similar in the other bands.We then use this corrected ground-based photometry to estimate properties of the sample using a SED fitting analysis with Bayesian Analysis of Galaxies for Physical Inference and Parameter EStimation (bagpipes, Carnall et al. 2018).We use a fiducial constant star formation history (CSFH).Although this choice may not capture complex star formation histories and episodes, it has been found that a CSFH finds similar stellar masses to non-parametric SFHs for very high-redshift galaxies (Whitler et al. 2023).We set the redshift to be the photometric redshift found by Adams et al. (2023c) with LePhare, allowing it to vary by ±0.1.We adopt a uniform prior on the ionization parameter, −4 ≤ log ≤ −2.The metallicity is allowed to vary in the range 0.2 ≤ / ⊙ ≤ 1, and we assume  ⊙ = 0.02.Dust reddening was prescribed by the Calzetti et al. (2000) dust law, with attenuation in the range 0.0 ≤   ≤ 4.0.The time since star formation switched on varies between the start of the Universe and the redshift being considered.Fig. 2 shows the mass and photometric redshift distribution of the sample.For our analysis we take objects with stellar masses log 10 ( ★ / ⊙ ) > 9 to focus on the massive objects whilst balancing against incompleteness towards lower mass, particularly at higher redshift.By comparing to the stellar mass distribution in a wider area around the PRIMER imaging, we estimate that the  = 5 bin is 85% complete.Additionally, the log 10 ( ★ / ⊙ ) > 9 is the same cut used by C23, allowing for a direct comparison to their simulation results.This leaves 1429, 366 and 236 objects in the  = 3, 4, 5 bins.We then estimate the restframe magnitudes of objects in each NIRCam band by convolving the SED found from the ground-based photometry with the NIRCam filter transmission curves.This method is used to remain consistent with the determination of  UV by Adams et al. (2023c) of placing a top-hat filter at 1500Å with width 100Å on the rest-frame SED, and provides robust errors.

JWST PRIMER imaging
Public Release IMaging for Extragalactic Research (PRIMER) is a JWST Cycle-1 program (GO PID: 1837, PI: J. Dunlop) focusing on two key HST CANDELS (Grogin et al. 2011;Koekemoer et al. 2011) fields, COSMOS and UDS.This work focuses on the overlap between the COSMOS and XMM-LSS catalogues produced by Adams et al. (2023c) and the 340 arcmin 2 CANDELS-COSMOS and CANDELS-UDS fields imaged by JWST, respectively.These fields contain 1572, 1056 and 275 objects from the ground-based catalogues in the  = 3, 4, 5 bins.Imaging is taken in ten NIR-Cam+MIRI bands.We make use of the eight deep NIRCam filters, namely F090W, F115W, F150W and F200W from the shortwavelength detector and F270W, F356W, F410M and F444W from the long-wavelength detector, allowing us to probe the rest-frame UV and optical emission of  ≃ 3 − 5 galaxies.We show an example galaxy at  = 4.58 in these filters in Fig. 1, and show a comparison to the ground-based filters outlined in Section 2.1.In this work we primarily make use of F115W, F200W, F356W and F444W.The rest-frame wavelengths covered by these filters at  = 3 − 5 are  ≃ 0.2 − 0.3m, 0.3 − 0.5m, 0.6 − 0.9m and 0.7 − 1.1m respectively.We use version 0.5 of the PRIMER reduction, which contains observations between Dec. 2022 and Jan. 2023.The NIRCam data is reduced using the "jwst_1039.pmap"calibration reference file, and the JWST Pipeline version 1.9.4 (Bushouse et al. 2023).Custom

METHODS
In this section we describe how the NIRCam imaging is used for the morphological analysis.The longest wavelength NIRCam filters (F356W, F410M and F444W) provide us with a new view into the rest-frame optical emission of  ≃ 3 − 5 galaxies previously only covered by Spitzer/IRAC, opening up a new resolved rest-optical view of high-redshift LBGs.

NIRCam stamps and PSFs
In order to make use of the superior resolution of JWST compared to ground-based imaging, we crossmatch the ground-based COSMOS and UDS catalogue with the PRIMER imaging to extract 20 × 20 arcsec postage stamps of the objects.This corresponds to 667 × 667 pixels.We then use Photutils (Bradley 2023) to perform a 2D background subtraction with a box size of 50 pixels, masking objects and bad pixels at a 2 threshold.We convolve the error images with a Gaussian filter (with standard deviation of two pixels) to remove erroneous very low values which cause the morphological fits to fail.The JWST PSF is required for measuring the sizes of galaxies in the sample.A popular tool for modelling the JWST PSF is WebbPSF (Perrin et al. 2014).However, it has been found that empirical PSFs constructed from NIRCam imaging often have broader FWHMs than constructed by WebbPSF (e.g.Ding et al. 2022;Ono et al. 2023;Tacchella et al. 2023), likely because PSFs generated by WebbPSF are not drizzled.Additionally, Zhuang & Shen (2023) find strong variation in the NIRCam PSF over the field of view, with variations increasing at shorter wavelengths.They recommend using PSFEx (Bertin 2011) to construct a grid of empirical PSFs across the image.We use a global PSF model due to the limited number of stars in the field, identified from the FWHM-magnitude diagram with FWHMs taken from SExtractor.Zhuang & Shen (2023) find that a global PSF still leads to good results, with an average ≲ 0.02 mag systematic offset and ≲ 0.05 mag random scatter when later attempting AGN-host decomposition (see Section 3.2.1).The PSF FWHMs as measured by PSFEx are 0.07 ′′ , 0.09 ′′ , 0.16 ′′ and 0.17 ′′ in F115W, F200W, F356W and F444W respectively (the filters used in this work).These are shown for F200W and F356W on Fig. 6.

Size fitting
In order to compare directly with other studies and with simulations, we use both a parametric and non-parametric method to measure the sizes of galaxies in our sample.

Parametric sizes
We make use of PyAutoGalaxy1 (version 2022(version .07.11.1, Nightingale et al. 2023) ) to measure the sizes of galaxies in the NIRCam filters.This takes in the image, noise map and PSF and uses fully automated Bayesian model fitting of galaxy two-dimensional surface brightness profiles to return the best-fit parameters, allowing us to extract the Bayesian Information Criterion (BIC) and assess model fits.We first mask the postage stamp of each object to a circular region of radius 0.7 arcsec, expanding to 1.8 arcsec for the largest objects at  ≃ 3. We find that these sizes are sufficient for including the entire object whilst balancing against the number of pixels that are fitted for.These choices also do not affect the results as the best-fit sizes are much smaller than these circular regions.We use a simple Sérsic profile to measure the effective radius   .We employ a Gaussian prior on the Sérsic index , centred at  = 1 with a standard deviation of   = 2 and truncated at an upper limit at  = 4 (Kartaltepe et al. 2023).We place a uniform prior on the centre of the object, allowing it to vary within the central 0.2 arcsec to account for any differences in the centroid between the ground-based and JWST imaging.The effective radius is allowed to vary between 0 − 3 arcsec, and we allow the ellipticity to vary freely.We make use of the linear light profiles available in PyAutoGalaxy where the intensity of the profile is solved for implicitly, reducing the dimensionality of the fitting and removing degeneracies that can occur between intensity and other profile parameters.However, we note that degeneracies can still occur between Sérsic index and size.We input the empirical PSF described in Section 3.1 to convolve with the model.We discard bad fits by inspection of residuals in F115W and F444W to check the most extreme wavelengths probed in this work.These bad fits almost always arise in the case of clumpy or irregular morphologies, leading to very large residuals.This leaves 1203, 278 and 187 objects in the  = 3, 4, 5 bins.This means we remove 16%, 24% and 21% of the objects that remain from the mass cut in each redshift bin respectively.We show four examples of the Sérsic fitting in Fig. 3, including one case of a poor fit to a clumpy galaxy.
Since the sample selected in Adams et al. (2023c) included both AGN and SFGs, we need to account for objects that may contain accretion-dominated emission and contaminate the total flux, leading to overestimates of stellar mass and inaccurate size measurements.We run a second round of fitting with a PSF+Sérsic profile to account for this, using the initial round of fitting to further inform the priors.We place a Gaussian prior on the centre of the PSF and Sérsic components with a standard deviation of 0.05 arcsec, centred on the position found by the Sérsic-only fitting.Similarly, we employ a Gaussian prior on the ellipticity and Sérsic index, with standard deviations of 0.2 and 0.5 respectively, centred on the values found in the previous round of fitting.These are generally wide enough to account for large differences in the best-fit parameters.We no longer use linear light profiles for the fitting, instead allowing the intensity of both profiles to vary freely so that the contribution from each component can be measured.We apply this fitting in F115W and F444W in order to check for a PSF contribution in both the rest-frame UV and the rest-frame optical, motivated by recent discoveries of highly reddened AGN found by JWST (e.g.Kocevski et al. 2023;Matthee et al. 2023;Labbe et al. 2023;Juodžbalis et al. 2023;Scholtz et al. 2023).We compute the BIC for both rounds of fitting for each object in order to compare the models.We find that when the PSF+Sérsic fit is preferred according to the BIC, it is preferred in both NIRCam filters, revealing no hidden, highly reddened AGN candidates in this UV-selected sample.Objects with a preferred PSF+Sérsic fit (2% at  = 5, 6% at  = 4, 1% at  = 3) are discarded from the sample.We find that the fraction of sources with a PSF component increases towards brighter  UV .They are interesting in their own right as Type I AGN candidates, but in this work the sample size of such candidates is small.
We also fit a Sérsic model to the PSFs from the filters used in this work to test the smallest size measurable by PyAutoGalaxy.The Sérsic indexes are typically ∼ 0.2.The modal effective radius is 0.15 ′′ and they range from 0.013 ′′ to 0.021 ′′ , comparable to the pixel scale of the imaging (0.03 ′′ ).These sizes are significantly smaller than the PSF FWHM (see Section 3.1).At  = 3, 4, 5 the maximum size of 0.021 ′′ corresponds to log 10 (  /kpc) = −0.79,−0.84, −0.88 respectively (see Fig. 6).

Non-parametric sizes
We measure a non-parametric size following the pixel-based method of R22 to compare like-for-like size-luminosity relations.This method is better suited to deal with clumpy and irregular morphologies.We select pixels in the image at a 5 threshold and then place a circular aperture with a diameter of 2 arcsec on each object, which is large enough to capture the entire object.We then order the pixels in this aperture from most to least luminous.We take the number of pixels containing half the luminosity, and use the pixel area  to compute the radius using  = √︁ /.R22 use an aperture size enclosing 30 pkpc which is too large for our objects (∼ 4 − 5 arcsec diameter) as neighbouring objects occasionally enter the aperture.The number of pixels containing half the luminosity is generally much smaller than the total number of pixels in the aperture, so our choice of aperture radius does not impact the non-parametric size measurements.The size measured is also broadened by the PSF.To allow for comparison with simulations, we need to account for this effect.We first scale the PSFs constructed in Section 3.1 to match the luminosity of the object in each filter.We then measure the size of the PSF in the same manner as before, and then deconvolve assuming the PSF is a Gaussian (by subtracting the PSF size from the initial size in quadrature).In Appendix A we present a brief comparison of the Sérsic and non-parametric sizes.

RESULTS
In this section we present the results of the size fitting and the dependence of size on redshift, stellar mass and luminosity.The sizes reported in the log-normal distributions and the size-redshift relations are measured in F356W to compare directly with the predictions of C23, probing 0.6 − 0.9 m in the rest frame.Similarly, the Sérsic size-mass relations are reported in F200W (probing rest-frame 0.3 − 0.5 m) and F356W to compare directly with C23.For the size-luminosity relations (with both size measurement methods) we use F115W and F444W, which probe 0.2 − 0.3 m and 0.7 − 1.1 m respectively, to maximise the difference in wavelength to provide a clearer comparison to the predictions of R22.

Log-normal distribution fitting
In the top panel of Fig. 4 we show histograms of Sérsic sizes in each redshift bin.We also show the log-normal fit to each distribution.We use SciPy (Virtanen et al. 2020) to fit a log-normal function to the size distributions in each redshift bin.Errors on the peak of the fit are estimated with bootstrapping.We find that the size distributions are well-described by the log-normal function, with reduced  2 values of  2 red = 1.16, 1.11, 1.08 in the  = 3, 4, 5 bins respectively.Using the peak as a measure of typical (modal) galaxy size in each bin, we find sizes of R = 0.79 ± 0.08 kpc at  ≃ 5, R = 0.88 ± 0.07 kpc at  ≃ 4 and R = 0.86 ± 0.03 kpc at  ≃ 3.These values are consistent with no evolution in the typical size of galaxies over  = 3 − 5.This weak evolution extends the findings of Curtis-Lake et al. ( 2016) at  = 4 − 8 to lower redshifts (although we note the different size definition of circularised half-light radii in that work).Shibuya et al. (2015) also find compact Sérsic sizes (  < 1kpc) at  > 3 using the peak of the log-normal distribution.They find this for a bright ( UV = 0.3 − 1 * =3 ) HST-selected sample, where  *

𝑧=3
is the characteristic luminosity of LBGs at  = 3, corresponding to  UV = −21 (Steidel et al. 1999).There appears to be some evolution in the tail of the distribution via emergence of objects with   ≳ 2 kpc between  ≃ 3 − 4. To test whether this is a genuine build-up in the tail or due to the limited volume probed and the evolving rest-UV luminosity function over  = 3 − 5 (Adams et al. 2023c), we scale the  = 3 distribution by the total number of galaxies at  = 4 and 5 and calculate the expected number of large galaxies by integrating under the curve at   > 2 kpc.If the intrinsic distributions are the same across redshift and only changes due to the limited volume, then the expected numbers from the scaled distribution should match the observed distributions.
At  = 4 and  = 5 we expect 5 +12 −4 and 6 +21 −5 galaxies with   > 2 kpc respectively by integrating the tail of the curve.The scaled  = 3 distribution predicts 133 +17 −16 and 77 +10 −9 galaxies with   > 2 kpc at  = 4 and  = 5.This points to a genuine build-up in the tail of the distribution in the high-mass end, suggesting a transition between  = 3 and  = 4, where much larger, extended structures begin to develop within galaxies.
A source of bias in measuring this tail of large galaxies comes from cosmological surface brightness dimming.This makes lowsurface brightness regions of galaxies more difficult to detect at higher redshift.To test this, a sub-sample of the  = 3 sources (including   > 2 kpc galaxies) were redshifted out to  = 5.The scatter in the redshifted sizes is 20%, consistent with the error on the size measure.There is no dependence on the initial size at  = 3, i.e. the scatter is not larger for galaxies with extended morphologies.We conclude that cosmological dimming does not significantly bias our results.

Irregular galaxies
We note that the Sérsic fitting employed here fails on disturbed/irregular and clumpy galaxies, and such fits are discarded.Huertas-Company et al. ( 2023) conduct a morphological analysis of log 10 ( ★ / ⊙ ) > 9 galaxies in CEERS at  = 0 − 6.They find that the disturbed fraction of galaxies in F150W at  ≃ 3 − 6 remains high at ∼ 70% for log 10 ( ★ / ⊙ ) ∼ 9−10.5.Although we measure sizes in F356W where we find a lower fraction of disturbance, we may still underestimate a tail of large objects due to the failure of the Sérsic fitting for irregular morphologies.We therefore also carry out the log-normal fitting on our non-parametric sizes, shown in the bottom panel of Fig. 4. The distributions are still well-fit by the log-normal distribution, with reduced  2 values of  2 red = 1.11, 1.13, 1.07 in the  = 3, 4, 5 bins respectively.There appears to be a mild evolution in the typical size when clumpy/irregular galaxies are included, with R = 0.86 ± 0.11 kpc at  ≃ 5, R = 1.09 ± 0.07 kpc at  ≃ 4 and R = 1.15±0.03kpc at  ≃ 3. The distributions are also broader than their Sérsic counterparts due to the contribution by large (  > 1 kpc) clumpy/irregular systems.We can use the fraction of bad Sérsic fits as a proxy for the irregular fraction.Here, irregular includes disturbed and clumpy morpholo- We show the distributions as histograms, and then plot the log-normal fit to the distributions.The  = 4 and 5 distributions are scaled by a factor of two and three respectively to emphasise the relative positions of the peaks.This peak is taken as the size of a typical galaxy in each redshift bin.
gies.In Fig. 2 we show the irregular fraction binned in both redshift and stellar mass.The irregular fractions drops from 0.25 at  = 5 to 0.20 at  = 3.This agrees with visual classification results from Ferreira et al. ( 2023), who find that galaxies with log 10 ( ★ / ⊙ ) > 9 at  > 3 are not dominated by irregular structures.Our results are also in agreement with Kartaltepe et al. (2023), who find that the irregular fraction increases from 0.2 at  = 5 to 0.3 at  = 3.Our results differ to the visual classification by Jacobs et al. (2023), who find a very high peculiar fraction of ∼ 0.9 at  ≈ 4.9, dropping to nearly zero at  ≈ 2.5 for log 10 ( ★ / ⊙ ) > 9.5, although the number of galaxies in their high-mass bins is small.They see a constant peculiar fraction of ∼ 0.6 over  = 3−5 in their lower-mass bin, log 10 ( ★ / ⊙ ) < 9.5.This higher value compared to our fraction may be caused by the inclusion of low-mass galaxies (log 10 ( ★ / ⊙ ) < 9) where we apply a cut.In terms of stellar mass, our irregular fraction decreases slightly between 10 9  ⊙ and 10 10  ⊙ .This trend broadly agrees with Jacobs et al. ( 2023), who find that the fraction of peculiars is higher at log 10 ( ★ / ⊙ ) < 9.5.However, our irregular fraction spikes up to larger values at  > 10 10.5  ⊙ .In the high-mass regime, a visual inspection reveals that the irregular galaxies in our sample appear to be merging systems, such as the bottom-right galaxy in Fig. 3.

Size-redshift evolution
In Fig. 5 we show the size evolution of our sample with redshift.
The left panel shows the individual measurements for each LBG, as well as various measures of the average size (mean, median and log-normal peak) in each redshift bin.These values are reported in Table 1.The mean and median values in each bin are significantly larger than the log-normal peaks (based on Sérsic sizes), due to being skewed by large objects.The black dashed line shows the best fit derived from fitting to the full sample (all galaxies with log 10 ( ★ / ⊙ ) > 9) using a power law parameterisation   = (1+ )  .Our best fit finds   = 3.51(1 + ) −0.60±0.22kpc.Comparing this fit to the results binned by redshift suggests the evolution is more rapid than implied by the log-normal peaks by  ≃ 3.However, due to large scatter in the sizes there is large uncertainty in the power-law fitting.
In the right-hand panel we compare our results to other studies using JWST.Our power-law fit and mean sizes in each redshift bin are remarkably consistent with the power-law fitting of Ormerod et al. (2023), who measure the size evolution of galaxies across  = 0 − 8. Their relation is based on the mean size of galaxies with redshift.We note that they use a slightly smaller area than this study (64 arcmin 2 ).Their power-law index is in agreement with ours with  = 0.71±0.19.Our results also agree with the evolution determined by Ward et al. (2023), who find a size evolution with power-law index of  = 0.63 ± 0.07 using 97 arcmin 2 of JWST imaging.We note that their determination differs slightly -they find the evolution at a fixed stellar mass of 5 × 10 10  ⊙ .
The build-up in the tail of the log-normal distribution (see Section 4.1) is complemented by a size-redshift relation that finds larger sizes at lower redshift.Together, they suggest an emergence of large (  > 2) galaxies at  = 3.We can complement this result with a measure of the evolution of typical galaxy sizes by focusing on our log-normal peaks.Our use of log-normal distribution fitting is based on the results of Curtis-Lake et al. (2016), which is physically motivated (see Section 5.1) and better accounts for the typical size of galaxies by incorporating a tail for the largest objects.Additionally, using the log-normal peak of the distribution allows for a determination of the typical size of the galaxy in samples that are down to 50% complete.Curtis-Lake et al. ( 2016) mention that selection effects need to be accounted for carefully with certain flux cuts to ensure the typical sizes of galaxies can be recovered.They find that selection effects in previous works (e.g.Shibuya et al. 2015) preferentially biases towards smaller sizes due to their choice in selection methodology.They require a signal-to-noise of 15 in a 0.35 arcsec diameter aperture, tending to select smaller galaxies.We do not expect to be preferentially skewed to smaller galaxies because the 5 depths of the PRIMER imaging is ∼ 0.3 − 3 mag deeper than the ground-based filters used to select the sample.Additionally, the ground-based selection is seeing-dominated and therefore less affected by size.Curtis-Lake et al. ( 2016) find a redshift evolution   ∝ (1 + ) −0.20 at  = 4 − 8 when fitting to their log-normal peaks, a much weaker evolution than Shibuya et al. (2015) and C23.If we fix the exponent  = −0.20,also shown in Fig. 5 and fit for the power law coefficient to our log-normal peaks, we find a relation that agrees with our typical galaxy sizes, extending the results of Curtis-Lake et al. ( 2016) to  = 3.   2023) and predictions from C23.We also plot the median and mean values when the sample is split into two mass bins at log 10 (  ★ / ⊙ ) = 9.6.Note the narrower   axis for clarity.

Comparison with other studies
The illustrisTNG simulation (Pillepich et al. 2018) is used by C23 to measure the sizes of log 10 ( ★ / ⊙ ) > 9 galaxies with artificial noise added to match the depth of the Cosmic Evolution Early Release Science survey (CEERS, Finkelstein et al. 2017).Our mean sizes are 15% smaller than the predictions of C23 at  = 3.They conclude that their upturn in the size evolution is attributed to more massive galaxies becoming larger at  ≃ 3, and our best-fit derived from the full sample recovers a similar trend, but not as steep (although agreeing within the errors).Similarly, Ormerod et al. (2023) do not observe a redshift evolution as steep in their HST+JWST sample.
The CEERS fields have an area of ∼ 100 arcmin 2 , a factor of ∼ 3.8 less than PRIMER.Although scaling the log-normal distributions (see Section 4.1) suggests a genuine build-up of large galaxies, we caution that conclusions regarding the growth of high-mass galaxies require a similar analysis on larger volumes.
C23 measure median values in three mass bins centred at log 10 ( ★ / ⊙ ) = 9.1, 9.4 and 10.0.In Figure 5 we show our mean and median sizes when split into two mass bins at log 10 ( ★ / ⊙ ) = 9.6.We use two bins because there are too few objects in the lowand high-mass bins when using the same binning as C23.Additionally, this value approximately splits the sample in half.We find that at  = 5, both the mean and median in the low-mass bin are larger than in the high-mass bin.At  = 3 − 4 the mean and median are larger in the high-mass bin.A similar trend is seen in C23, and this reflects the flat/negative size-mass relations (see Section 4.4).C23 also see a broadening in their median sizes towards lower redshift and a convergence to similar values at higher redshift when binning in mass (see their fig.8).Although we see this to some extent, the broadening of sizes between the bins by  = 3 is not as pronounced -they differ by no more than 0.15 kpc at any redshift (compared to > 0.5 kpc in C23).This suggests a weak mass dependence of the size-redshift evolution in our sample.To summarise, we find that whilst the typical size of galaxies does not evolve significantly over  = 3−5 (as measured by the log-normal peaks of Sérsic sizes in each redshift bin), there is evidence for an emergence of large (  > 2 kpc) galaxies when we fit for the entire sample (i.e.all galaxies with log 10 ( ★ / ⊙ ) > 9).

Size-mass relations
In Fig. 6 we present the Sérsic size-mass relations in each redshift bin for both F200W and F356W in order to compare directly with the predictions of C23.They use the illustrisTNG simulation (Pillepich et al. 2018) to measure the Sérsic sizes of log 10 ( ★ / ⊙ ) > 9 synthetic galaxies matching those found in the Cosmic Evolution Early Release Science survey (CEERS, Finkelstein et al. 2017).We fit straight lines of the form log 10   =  × log 10 ( ★ / ⊙ ) + .The independent variables in our size-mass and size-luminosity relations themselves possess errors, complicating the task of fitting a straight line.Bartlett & Desmond (2023) find that common approaches to dealing with both  and  errors can ignore the underlying distribution of the true independent variable values, leading to biased results (see references therein).They find that using a single Gaussian prior (with mean and variance to be determined as part of the inference) on 'latent variables' describing the true values of the independent variables performs best, and they dub this 'Marginalised Normal Regression'.It is key that this works in the presence of intrinsic scatter, an additional property we want to measure.We therefore use their Regression and Optimisation with X and Y errors (ROXY) package2 to fit the size relations.We measure the intrinsic scatter and compare it to the statistical error of the fitting.We present the best-fit values in Table 2.In order to assess the robustness of measuring very small sizes, we compare the PSF sizes (see Section 3.1 and Fig. 6) to the sizes of our sample.The PSF FWHMs correspond to sizes of 0.38 kpc (0.63 kpc) at  = 3 in F200W (F356W), 0.34 kpc (0.56 kpc) at  = 4 in F200W (F356W) band, and 0.31 kpc (0.50 kpc) at  = 5 in F200W (F356W).In these bands, 95% of the sample is larger than the PSF FWHM, meaning the majority of sources are resolved.We verified that removing these sources from the sample does not significantly change the size-mass relation.We note that any observational biases towards larger sizes in this work due to the PSF minimum size are also present in C23, still allowing for a direct comparison.IllustrisTNG indeed predicts sizes smaller than the PSF FWHM (see their fig.7), which are not measurable in the mock observations.Interestingly, at  = 5 we see some evidence for a negative sizemass relation in F200W, consistent with the prediction of negative slopes by C23.A negative slope is allowed in F356W within errors.At  = 4, the slopes in F200W and F356W are both positive and consistent with one another.We also find that the slopes are consistent with C23.At  = 3, the slope in F356W appears to be flatter than that in F200W.We test whether the fit is being skewed by the relatively compact objects at log 10 ( ★ / ⊙ ) ≳ 10.5 in F356W by re-fitting without them, and find that the results do not change significantly.The relation in F200W is consistent with C23, and the F356W relation appears flatter but is consistent within 2.Our slopes are flatter than those found by Ward et al. (2023), who measure the rest-5000Å sizes of star-forming galaxies in 97arcmin 2 of CEERS imaging.They find a gradient of  = 0.25 in a redshift bin of  = 3.0−5.5,and a gradient of  = 0.15 in a redshift bin of  = 2.0−3.0.We note some differences between our methodologies -our redshift bins span almost the whole range of their highest redshift bin, and their  = 2.0 − 3.0 bin will contain objects at lower redshifts than we can select for, complicating the comparison.Broadly speaking, Ward et al. (2023) find a flattening of the size-mass relation at lower redshifts, and we see hints of this from  = 5 to  = 4. Similarly, Pandya et al. (2024) find some evidence for a flattening size-mass relation over  = 3 − 8.

Intrinsic scatter
One notable result is that the intrinsic scatter of objects about the bestfit relation increases with decreasing redshift, and is lower in F356W than in F200W by 0.03-0.07dex.The values are reported in Table 2.By plotting both the intrinsic scatter and statistical error in Fig. 6, we can see that the intrinsic scatter dominates over measurement errors in the fitting.Morishita et al. (2023) measure the size-mass relation in star-forming galaxies at 5 <  < 14 and find a redshiftcorrected intrinsic scatter of  log  = 0.30 ± 0.01 dex, consistent with our findings at  = 3 − 4 in F200W but higher than our findings in F356W (see Table 2).Our intrinsic scatters increase by 0.04 dex from  = 5 to  = 3 in both F200W and F356W.The values in F200W are larger than low-redshift ( < 1) results for both star-forming and quiescent galaxies, being more consistent with our resuts in F356W ( log  ≃ 0.20 dex, Kawinwanichakĳ et al. 2021).Whilst our values in F356W broadly agree with Ward et al. (2023), the increase we find is in disagreement with their constant intrinsic scatter (∼ 0.2 dex) out to  = 5.5.This is likely due to the fact that their highest redshift bin nearly encapsulates all three of our redshift bins.

Comparison with quiescent galaxies
The infrared coverage provided by JWST has allowed for the selection of massive quiescent galaxies at  > 3, implying extremely early quenching episodes at  ≳ 5 (e.g.Carnall et al. 2023a,b;Long et al. 2023;Ito et al. 2023;Ji et al. 2024).We can thus compare our results for star-forming LBGs to the quiescent population in the rest-optical at  = 3 − 5.In Fig. 6 we show the findings of Ito et al. (2023) to compare to our F356W results.Although our number counts are low at log 10 ( ★ / ⊙ ) > 10.5, extrapolating our results would suggest that quiescent galaxies tend to be more compact than star-forming galaxies, as seen at lower redshifts (van der Wel et al. 2014).Additionally, Ji et al. ( 2024) also find compact sizes for massive (log 10 ( ★ / ⊙ ) > 10) quiescent galaxies at  = 1.6 − 5. Ito et al. (2023) note that AGN may result in apparent compact sizes for some objects, but do not remove any objects since none have X-ray detections.They also find that quiescent galaxies are more compact with increasing wavelength, which could be caused by inside-out quenching that produces compact morphologies for evolved stellar populations.Although wider areas are needed to better constrain the positions of the most massive objects in the size-mass plane, perhaps this explains the flatter relation we find using F356W compared to F200W at  = 3 where massive galaxies are still undergoing intense star formation, but have begun to quench in their centres.Ormerod et al. (2023) find a similar trend for both star-forming and passive galaxies at  < 3, also interpreting this as evolved stars occupying the central regions of galaxies with star formation occuring at larger radii.Similarly, at  = 3 − 4 Allen et al. (in prep.)find a flattening of the size-mass relation at longer wavelengths.Correspondingly, at  = 1 − 2.5 Suess et al. (2019a) find that most star-forming and quiescent galaxies have negative colour gradients (i.e.redder in their centres and bluer at larger radii).They also find that the colour gradient evolves rapidly over this redshift range (Suess et al. 2019b), broadly fitting into the picture of inside-out growth and quenching.
Overall, we find good agreement of our results with predictions from simulations.Although the slope of the relations appear to transition from positive to negative at  = 4 − 5, a lack of dynamic range in mass limits this conclusion.However, we do find an increase in the intrinsic scatter with increasing redshift.Better dynamic range in massive objects is required to better constrain and confirm the negative slopes at  = 5, the flattening of the F356W slope, and the evolution of the gradients.We note that in order to compare directly with C23 we use the same filters across the full redshift range, meaning there is a shift in the rest-frame wavelengths probed ( ≃ 0.3−0.5m,0.6−0.9mfor F200W and F356W respectively).A good alternate probe of the evolution would be with consistent restframe wavelengths (Allen et al., in prep.).

Size-luminosity relations
In Fig. 7 we present the results of our size-luminosity fitting in the F115W and F444W filters to the functional form log 10   =  ×  rest frame +, where  rest frame is the rest-frame absolute magnitude in each filter.We choose these filters to maximise the difference in wavelength between the relations, providing a better test of the wavelength dependence predicted by R22.These filters probe 0.2 − 0.3m and 0.7 − 1.1m in the rest frame, respectively.The top panel shows the results for the Sérsic fitting, and the bottom panel shows the non-parametric results.The best-fit values are presented in Table 3.As in Section 4.4, we briefly compare our non-parametric sizes with the PSF sizes.At  = 3 − 4, 97% of the sizes are larger than the PSF FWHMs in F115W and F444W.At  = 5, nearly 93% of the sizes are larger.The majority of the sample is therefore resolved.We verified that removing sources consistent with the PSF FWHM from the sample does not significantly change the size-luminosity relations.First looking at the Sérsic results, at  = 4 − 5 we find that the relations are consistent with one another.It also appears that for a given magnitude, galaxies become larger between  = 4 − 5.At  = 3 the F444W relation becomes negative and the sizes are more compact than in F115W.
The results from non-parametric size fitting described in Sec- The rest-frame absolute magnitudes calculated for each of F115W and F444W (see Section 2.2) are shown on the x-axis, with the equivalent luminosity on the top axis.We also plot the 1 error contours.Bottom: results from our non-parametric size fitting.The plots are the same as the top panel, but we also show predictions from the FLARES model of R22 at  = 5 with the dashed lines.tion 3.2 are slightly different.The non-parametric size-luminosity relations generally show a flattening of the relation with decreasing redshift, and the sizes in F115W are more compact than in F444W for any given luminosity.We do not find significant differences between the gradients in these bands at a given redshift.
Comparing our two size-fitting methods, we find that at  = 4 − 5 the non-parametric sizes have slightly steeper relations than the Sérsic sizes.This is likely due to the inclusivity of the non-parametric method to massive, clumpy/irregular galaxies.Whilst the Sérsic fitting may fit to an individual clump or fail altogether (see Fig. 3), the non-parametric method better accounts for this, meaning we do not have to discard any fits.Indeed, if we remove clumpy galaxies the non-parametric size-luminosity relations flatten slightly, but the results are not changed significantly.
Massive galaxies can thus contribute to the population with larger sizes.By  = 3 the relations for both methods are relatively flat.The offsets between the bands differs across the two methods.In the Sérsic fitting, the offset between the red and blue bands disappears by  = 4.For the non-parametric relations a significant offset remains between the two bands at  = 3 − 4. Since the non-parametric method is agnostic to where the light in the galaxy lies, it may perform better on disturbed profiles in F115W where the Sérsic fitting converges, but may not be the most appropriate size measure.A visual inspection of the galaxies in each NIRCam filter reveals that objects which appear disk-like at long wavelengths can often have slightly irregular morphologies at short wavelengths.For example, in Fig. 1 the object appears to be disk-like in the long-wavelength filters, whereas in the short-wavelength filters it appears to be comprised of two components.In these cases we find that the Sérsic fitting ignores the clumps and still fits a disk, similar to the bottom right panel in Fig. 3 but to a less extreme extent.This may slightly overestimate these sizes, leading to the agreement in the Sérsic relations across wavelength.We thus conclude that galaxies tend to be more compact in their rest-UV light compared to their rest-optical light.There also appears to be a flattening of the relations towards lower redshift across both measurement methods.In Appendix A we compare the Sérsic and non-parametric sizes for galaxies that have a good Sérsic fit.

Comparison with the literature
We can also compare our results from both size-fitting methods to the literature.Assuming  ∝  − , Huang et al. ( 2013) find power laws  of  = 0.22 +0.058 −0.056 and 0.25 +0.15 −0.14 at  = 4 and  = 5 respectively for LBGs using Sérsic fitting.Converting our relations from magnitude to luminosity, the non-parametric relations in F115W have power laws with exponent  = 0.15 ± 0.10 and 0.20 ± 0.08 at  = 4 and  = 5 respectively.The Sérsic relations in F115W have exponents  = 0.00 ± 0.08 and 0.15 ± 0.08 at  = 4 and  = 5 respectively.Our Sérsic results are thus slightly shallower in slope than Huang et al. (2013) (although the errors are large), and our non-parametric results are consistent within the errors.The non-parametric results are also consistent with Shibuya et al. (2015) and Curtis-Lake et al. ( 2016) who both find values spanning  ≃ 0.1 − 0.27 over  = 3 − 5. We also note that these results are consistent with size-luminosity relations of bright galaxies at  > 5 found by, e.g., Holwerda et al. (2015); Shibuya et al. (2015).However, results from lensed galaxies at  > 5 (thus probing intrinsically fainter galaxies) tend to show much steeper slopes around  ∼ 0.5 (e.g.Grazian et al. 2012;Kawamata et al. 2018;Bouwens et al. 2022;Yang et al. 2022).Yang et al. (2022) find that different lensing models all produce the same steeper relation, and Bouwens et al. (2021) still find steep slopes after accounting for surface-brightness selection effects.Kawamata et al. (2018) recover this steeper relation in a simple analytic model where smaller galaxies have lower specific angular momenta.

Sérsic
Our results are complementary to those of Suess et al. (2022).For a sample of 1179 galaxies at  = 1 − 2.5 with log 10 ( ★ / ⊙ ) ≥ 9 selected from CEERS, they compare the observed 1.5m ( rest ∼ 0.55m) and 4.4m ( rest ∼ 1.6m) sizes using the F150W and F444W NIRCam filters, respectively.This work is similar to their results in that the rest-frame wavelengths vary over the redshift range due to the fixed choice of filters ( ≃ 0.2 − 0.3m, 0.7 − 1.1m in F115W and F444W), thus the focus is on the relative differences between the relations.They find that the observed 4.4m sizes are more compact than at 1.5m, and that the size difference is more pronounced at higher stellar mass.This is in agreement with our Sérsic size-luminosity relation at  = 3 -sizes in F444W are more compact than in F115W, and this size difference increases towards brighter magnitudes.Roper et al. (2022) We also compare our results to predictions from R22, who use the FLARES simulation (Lovell et al. 2021) to predict the sizeluminosity relation of massive galaxies at  = 5 − 10 as a function of wavelength.The non-parametric size measurement we use follows their method, and hence they can be directly compared (although see the discussion in Section 5.3).Note that we can only compare our results against their  = 5 predictions.The closest matching synthetic filter to F115W in the rest-frame is their MUV-band, and the closest to F444W is their R-band.Our gradients are consistent with R22, with the relation in F115W being steeper than in F444W.However, there is a significant offset between our relations and their predictions.Their relations also cross, leading to objects appearing larger in F115W at the bright end (see Fig. 7).This behaviour is not seen in our non-parametric results, but is seen in the Sérsic results.We note that the Sérsic results use a sub-sample of all galaxies in the PRIMER imaging, as poor fits are discarded.We test the impact of this by rerunning the non-parametric size-luminosity relations on this sub-sample.The relations flatten slightly but it does not change the results significantly.

DISCUSSION
Our results point to an emergence of large (  > 2 kpc) galaxies by  = 3, whilst simultaneously pointing to little evolution in the typical size of galaxies over  = 3 − 5. Our Sérsic size-mass relations show hints of a negative slope at  = 5, intrinsic scatter that increases towards lower redshift and is larger in bluer bands, and some consistency with the predictions by C23.Our size-luminosity relations exhibit a flattening towards lower redshift in both the Sérsic and non-parametric sizes.The slopes of our non-parametric relations at  = 5 are consistent with the predictions by R22 from FLARES, with an offset in the intercept.In this section, we discuss and interpret these results in the context of disk formation models and explore the implications of the agreement of our results with simulation works.

Formation of disks at high-redshift
The use of log-normal fitting to the size distribution of galaxies is motivated by the disk formation model of Fall & Efstathiou (1980).In this theoretical framework the size of a galaxy is governed by its angular momentum, given to the system by tidal torques with neighbouring objects.Peebles (1969) express the total angular momentum in terms of a dimensionless spin parameter  =  1/2  −5/2  −1 where  is the angular momentum of the system,  is the total energy,  is the total mass (all dominated by the DM halo before collapse) and  is the gravitational constant.In the picture of hierarchical structure formation, the distribution of spin parameters follows a log-normal distribution.Assuming the disk is relaxed, we then expect   ∝  vir where  vir is the virial radius of the DM halo (Bullock et al. 2001;Curtis-Lake et al. 2016).As described in Section 4.3 and shown in Fig. 4, our size distributions are well-described by a log-normal distribution, consistent with these theoretical models out to  = 5.Previous studies with HST (Shibuya et al. 2015;Curtis-Lake et al. 2016) also found strong agreement with a log-normal distribution out to  ≃ 6 (beyond which sample sizes are small), but this was limited to rest-UV emission at  ≳ 3.These results are thus complementary to HST results, revealing consistency between rest-UV and rest-optical size distributions at these redshifts.
The disk model of Fall & Efstathiou (1980) and Mo et al. (1998) also predicts that the size of a galaxy scales with redshift as  ∝ (1 + ) −1 in the case of the host halo having a fixed circular velocity, and as  ∝ (1+ ) −1.5 for a fixed halo mass.However, as discussed by Curtis-Lake et al. ( 2016), rest-UV selected samples do not necessarily also select for DM halos with constant mass or circular velocity.Therefore fitting this relation only reveals whether the rest-optical light of the UV-selected sample in this work best traces one of these scenarios, if certain assumptions hold.For example, the disk mass and angular momentum are assumed to be fixed fractions  d and  d of the total mass  and total angular momentum , which are dominated by the dark matter halo.Mo et al. (1998) assume  d is the same for all disks, and they also largely assume  d =  d .Curtis-Lake et al. ( 2016) discuss results from Danovich et al. (2015), where high-redshift massive galaxies accrete angular momentum via cold gas streams, resulting in a high value of  d and breaking the assumption that the disk is relaxed.This would drive up the value of the spin parameter  for our most massive sources, and this may partially explain the build-up of objects at   > 2 kpc at  = 3 relative to  = 4 − 5. Prior to this build-up, within the THESAN simulation at  ≳ 6 Shen et al. (2023) find that the sizes of massive galaxies agree better with model predictions where  d / d is small, keeping sizes compact at earlier times.However, this does not explain the shallower redshift evolution we find compared to the predictions.Our value of   = 3.51(1 + ) −0.60±0.22 is in tension with the fixed circular velocity scenario at the 2 level.Legrand et al. (2019) find that for the least massive haloes (log 10 ( ℎ / ⊙ ) ≲ 12.5), the stellarmass to halo-mass ratio decreases towards higher redshift.Since we cut at a constant mass over all redshifts, by  = 3 we may be sampling lower-mass haloes compared to  = 5.Thus in our sample for a given low-mass halo, at  = 3 the halo is more likely to be occupied by a more massive galaxy than at  = 5.We therefore may be selecting lower-mass haloes towards lower redshift, which could explain the shallower evolution that we find.
We can gain further insight into disk formation via the evolution of the intrinsic scatter in the size-mass relations.In disk formation models, the intrinsic scatter is related to the distribution of spin parameters (Bullock et al. 2001).The larger scatter we see at lower redshift may thus reflect that galaxies have had more time to undergo processes that drive them away from the standard disk evolutionbased relation, such as mergers.This is also higher in the rest-frame UV than the rest-frame optical, suggesting that by  = 5 the restoptical stellar populations are well established, but rest-UV emission exhibits a greater diversity in size perhaps caused by gas accretion driving star formation.This fits well with the build-up of large (  > 2 kpc) galaxies seen in the tail of the log-normal distributions by  = 3 -mechanisms forming much larger galaxies begin to dominate between  = 3 − 4, driving up the intrinsic scatter.We note that the Universe increases in age by ∼ 40% between  = 3 − 4, a time step of ∼ 600 Myr.
Our findings of the fraction of irregular galaxies are consistent with an emerging picture of rest-optical morphology via visual classification with JWST (see Section 4. 3, Kartaltepe et al. 2023;Ferreira et al. 2023;Jacobs et al. 2023).Whilst our Sérsic fitting fails on these objects, the evolution of the irregular fraction provides insight into processes that drive galaxies away from relaxed disks.A large increase in the irregular fraction towards higher mass (see Fig. 2) could reflect a high fraction of merging systems.Mergers may be one cause of driving galaxies away from the size-mass relation at We also note the rest-frame absolute magnitude and the filter the object is injected to in the bottom row, where we re-measure the size and plot the corresponding aperture.The modal 5 depth of the PRIMER imaging measured in a 0.3" diameter aperture is 27.4 in F115W and 28.5 in F444W. = 3, breaking the relaxed disk assumption in the formalism of Mo et al. (1998) and causing the increase in intrinsic scatter.Essentially, whilst disky galaxies are well-established by  = 5 as shown by the well-fit log-normal distributions, by  = 3 they begin to undergo processes such as mergers which disturb the disks (e.g.Ventou et al. 2017), and also push them to larger sizes as seen by the build-up in the tail of the  = 3 log-normal distribution.

Difficulties in size measurements
Many studies attempt to fit a  ∼ (1 + )  relation to the average or peak size to test different disk formation model scenarios, and agreement has been found with both  ∝ (1 + ) −1 in the case of the galaxy sample following a fixed circular velocity of DM halos, and  ∝ (1 + ) −1.5 for a fixed halo mass.(e.g Hathi et al. 2008;Oesch et al. 2010;Mosleh et al. 2011).The most recent determinations with JWST are by Ormerod et al. (2023), who use 64 arcmin 2 to measure the size evolution of galaxies across  = 0 − 8, and Ward et al. (2023) who use 97 arcmin 2 to measure sizes from  = 0.5 − 5.5.Both studies find a weaker evolution in size than predicted by the disk formation models,  = 0.71 ± 0.19 and  = 0.63 ± 0.07.They agree with our value of  = 0.60 ± 0.22, which would appear to rule out the fixed virial mass scenario.However, as we have shown in Section 4.3 and discussed in the previous section, there are several competing effects governing the strength of this evolution.First, we do see evidence for a larger proportion of galaxies at   > 2 kpc at  = 3 relative to  = 4 − 5. Second, from the mass-size and massluminosity relations, we generally expect the largest galaxies to be the brightest and most massive, which are the rarest (Adams et al. 2023c).Thus due to our limited area of 340 arcmin 2 , coupled with an evolving luminosity function, it is difficult to draw conclusions on our value of the exponent (derived from fitting to the full sample) which is strongly affected by the largest galaxies.Third, the constant stellar mass cut with redshift may result in a non-constant cut in DM halo masses.Fourth, Ward et al. (2023) find ∼ 1000 more galaxies than Ormerod et al. (2023) in a narrower redshift range (with fairly similar areas), suggesting large differences in the way these galaxies are selected in HST and JWST data.Arguably, our sample avoids issues surrounding the selection function with JWST by using ground-selected galaxies.For example, Curtis-Lake et al. ( 2016) discuss size-dependent incompleteness and under-estimation of the sizes of the largest HST-selected galaxies in Shibuya et al. (2015) caused by the use of high signal-to-noise required in small aperture sizes.Due to the ground-based imaging being seeing-dominated, the selection of our sample is largely agnostic to size and morphology.We thus can avoid these biases that may be introduced in a JWST-selected sample.Finally, large scatter in size leads to large uncertainty on our exponent of the size-redshift relation derived using the full sample.However, based on the build-up of large (  > 2 kpc) galaxies at  = 3, we may conclude that whilst the typical size of galaxies does not change, the same processes that increase the intrinsic scatter towards decreasing redshift also result in the establishment of a population with large sizes.Further constraining their evolution with redshift requires a wide-area study such as Euclid.
An additional complication in size measurements arises from the various definitions used in the literature (e.g.Sérsic 1963;Petrosian 1976;Kron 1980) as well as multiple non-parametric measurement methods (e.g.Conselice et al. 2000;Oesch et al. 2010;Roper et al. 2022).Comparing like-for-like with other studies and with simulations is thus rendered more difficult.This motivated our choice of using a parametric and non-parametric size fitting method in this work, allowing us to directly compare to current JWST studies and simulations.The Sérsic profile choice is natural following from disk formation models.However, many fits must be discarded due to irregular morphology.The non-parametric method deals with this in a better manner, but arguably gives us less insight into disk formation at high-redshift.These two choices allow us to compare directly to predictions from R22 and C23, who each use different size definitions to make predictions on wavelength dependence of size scaling relations.

Comparison with simulations
The two key simulation studies we compare to in this work are that of C23 and R22.C23 use the illustrisTNG simulation (Pillepich et al. 2018) to measure the sizes of log 10 ( ★ / ⊙ ) > 9 galaxies with noise added to match the depth of CEERS (Finkelstein et al. 2017).R22 use the FLARES simulation (Lovell et al. 2021) to predict the size-luminosity relation of massive galaxies at  = 5−10 as a function of wavelength.We note that our measurements provide a like-for-like comparison with these two studies, avoiding possible systematics that arise from, for example, comparing half-mass and half-light radii (e.g.Wu et al. 2020).Our size-redshift evolution is slightly weaker than predicted by C23, and reasons for this have been discussed in Sections 5.1 and 5.2.Looking to the size-mass relations, our results are in good agreement with the predictions, including a possibly negative relation at  = 5.One reason for this is that the depths of CEERS are not too dissimilar to PRIMER.In a similar vein, our derived gradients of the size-luminosity match predictions from R22, but our relations are offset to smaller sizes at the 2 level.In this case, they do not simulate mock images at typical JWST depths.When using the non-parametric size-fitting method, the addition of noise dominates over the low-surface brightness outer regions of galaxies, reducing the total light measured for the galaxy.This leads to a smaller number of pixels used to calculate the effective half-light radius.We generate mock galaxies with Sérsic profiles using Synthesizer (Vĳayan et al. 2021), which uses the same FLARES simulation as in R22.In Fig. 8 we show the impact of this effect.We use the same simple stellar population model as R22 -v2.2.1 of the Binary Population and Spectral Synthesis (BPASS) stellar population synthesis models (Stanway & Eldridge 2018) and assume a Chabrier (2003) initial mass function.We measure the non-parametric size of the mock galaxy, inject it into empty regions of our F115W and F444W imaging, and then re-measure the size.We overplot an aperture corresponding to the measured size.The sizes are clearly smaller after injection into our noisy images, and this may explain the bulk of the offset.In Fig. 9 we show how seven objects on the R22 size-luminosity relations behave after they are each injected 20 times into different empty regions of our images.We generate mock galaxies at rest-frame absolute magnitudes of  F115W = −21.8,−22.1, −22.4,−22.8 and  F444W = −21.5,−22, −22.5.The 5 depths of the PRIMER imaging measured in a 0.3 arcsec diameter aperture are 27.4 in F115W and 28.5 in F444W.For brighter objects, the sizes are reduced onto our relations.Fainter objects fall below our relation, but noise in the images causes large scatter in the derived size since the depth of the imaging varies over the COSMOS and UDS fields.Additionally, there is a large difference in depth between F115W and F444W, so the impact noise has on measured size is stronger in F115W.This explains slightly larger offsets seen in F115W compared to F444W between the clean mock galaxy sizes and the sizes after injection into noisy images, and suggests an ideal depth of ∼ 28 for such analyses, where our faintest sources are not washed out completely by the noise.This simple simulation has highlighted that direct comparisons between observational data, influenced by background noise, and clean simulated galaxies are difficult, limiting what can be learned about the physics governing size scaling relations.Comparisons to simulations where JWST-like noise, PSFs and pixel scales have been applied may enable more reliable comparisons to be undertaken.

Compact central star formation?
Considering our tentative agreement with the simulation works of R22 (once instrumental noise is accounted for) and C23 (a possible negative size-mass relation at  = 5), it is natural to ask: what physical mechanisms drive the evolution of these simulated galaxies?To explain the negative size-mass relations at  = 5 and steepening size-luminosity slopes with decreasing wavelength, both studies appeal to compact central star formation in the most massive galaxies.Specifically, Roper et al. (2022Roper et al. ( , 2023) ) find that the most massive galaxies at  ≳ 5 develop dense cores leading to the negative slopes and to higher star formation.This enriches the cores of the galaxies, allows for metal-line cooling and inhibiting stellar/AGN feedback, driving further star formation, akin to the beginnings of inside-out galaxy formation and quenching (e.g.Pichon et al. 2011;Nelson et al. 2016;Baker et al. 2023;Shen et al. 2023).It is difficult to ascertain from our limited sample at  = 5 whether there is a true negative relation.However, we note that the non-parametric size-luminosity relations show more compact sizes in the rest-UV compared to the rest optical.This is not seen in the Sérsic sizes, but as discussed in Section 4.5 the blue Sérsic sizes could be slightly over-estimated due to a larger profile being fit to more clumpy morphologies.Further studies constraining the size-luminosity relations at higher redshift and allow for a wider dynamic range in mass and luminosity will provide stronger tests for these scenarios.We note that other physical mechanisms may affect the sizes seen in R22.For example, the H The open circles represent mock galaxies created as in Fig. 8, and the filled circles show their size after injection into an empty region of our imaging.We offset the injected sizes in magnitude within 0.2 mag for clarity.The colours of the circles match the filters used.emission line crosses into F444W at  = 5 (Davis et al. 2023) and approximately traces star formation.If not accounted for, this will act to decrease   .Additionally, the manner in which dust is modelled in these simulations is important.R22 use a dust-to-metal ratio to calculate line-of-sight attenuation.They do not directly simulate dust production and destruction, instead using proxies from observed UV luminosity functions.Direct simulation of dust production/creation, such as with THESAN (Kannan et al. 2022b), may be required if dust is the primary driver behind these size scaling relations.The agreement of slopes with R22 is encouraging, but stronger differences between size-luminosity gradients with wavelength are predicted at  > 5 which remain to be tested with observations.Despite this, we have provided strong initial tests of predictions of the size-mass and size-luminosity relations by using size measurements that enable direct comparisons, providing tentative evidence for centrally compact star formation at high-redshift.

CONCLUSIONS
We present size measurements of 1668 rest-frame UV selected galaxies with stellar masses log 10 ( ★ / ⊙ ) > 9 using data from PRIMER.These sources were selected from the ground-based study of Adams et al. (2023c).The sample was selected from seeingdominated data, and hence presents an unbiased sampling of the morphology and size distributions of luminous sources.Additionally, the wavelength coverage of JWST NIRCam allows us to measure rest-optical sizes at  > 3, previously not possible with HST.We measure Sérsic and non-parametric sizes to test models of disk formation, measure the size-mass and size-luminosity relations, and compare directly to predictions of these relations from simulations.
• Galaxy sizes in F356W follow a log-normal distribution in both Sérsic and non-parametric sizes, following predictions from Fall & Efstathiou (1980) and Bullock et al. (2001), implying disk formation as early as  = 5.Whilst this has been seen in rest-UV morphologies out to  = 6 with HST, we confirm this result for rest-optical morphologies at  = 4 − 5.
• We find evidence for the build-up of large (  > 2 kpc) galaxies in the tail of the log-normal distribution at  = 3, implying a transition from compact to extended sizes from  ≃ 3−4 for rarer objects.This is supported by a size-redshift evolution of   = 3.51(1 + ) −0.60±0.22kpc derived from fitting to the full sample, consistent with other JWST findings (Ormerod et al. 2023;Ward et al. 2023).
• Simultaneously, following Curtis-Lake et al. ( 2016), if we fix the exponent of the evolution to −0.20, we find a size-redshift relation consistent with the peaks of our log-normal distributions.This suggests that the size of a typical galaxy does not evolve significantly over  = 3 − 5, remaining compact with   ≃ 0.8 − 0.9 kpc.
• We measure Sérsic size-mass relations in F200W and F356W and find agreement with predictions from C23 who use Illustris TNG50 to predict the size-mass relation in CEERS.We find the intrinsic scatter is larger in F356W than in F200W in each redshift bin, and the intrinsic scatter is larger at lower redshift.We speculate that this is because galaxies at  = 3 have had more time to undergo processes such as mergers and gas accretion that drive them away from the relation.
• The gradients of our non-parametric size-luminosity relations agree with predictions from the FLARES simulation of R22, but with an offset that we believe is driven by larger sizes measured for the simulations when instrumental noise is unaccounted for.
• Simulations suggest the scaling relations are driven by compact central star formation at high-redshift, and our results provide tentative evidence for this mechanism.Further observations that improve the dynamic range of this sample are needed to place stronger constraints on interesting cases such as negative size-mass relations and more compact size-luminosity relations in the rest-UV compared to the rest-optical.
Additional results from PRIMER (Allen et al. in prep.) will produce a multi-wavelength view of galaxy sizes over a wide mass range, allowing us to understand size evolution in a diverse population.Furthermore, a similar analysis in a larger volume will allow for stronger constraints on the emergence of large (  > 2kpc) galaxies at  ≃ 3 − 4. Just as degree-scale ground-based imaging revolutionized constraints on high-redshift number statistics such as the bright-end of luminosity functions (e.g.Bowler et al. 2014;Stefanon et al. 2019;Harikane et al. 2022), missions such as Euclid, which will achieve HST resolution over degree-scale fields (Euclid Collaboration et al. 2022), will revolutionize our understanding of resolved properties of the very same galaxies.as a function of stellar mass for galaxies with a good Sérsic fit.Individual galaxies are coloured by their photometric redshift.Bottom: now the ratio is plotted as a function of photometric redshift, with galaxies coloured by their stellar mass.In both panels, the black points show the results of binning the size ratio.

Figure 1 .
Figure 1.Stamps of an example galaxy at  = 4.58.Top: the ground-based imaging from HSC and VISTA used to select this object, as well as Spitzer/IRAC imaging of this galaxy at 3.6m and 4.5m.Bottom: the JWST NIRCam imaging of the same object.The stamps are ordered in increasing wavelength.The resolution is ∼ five times better with NIRCam for the seeing-dominated      bands, and the large FWHM of the Spitzer point spread function (PSF) in the [3.6] and [4.5] bands means no morphological information could previously be interpreted.With JWST, we can qualitatively see a disk-like morphology in the reddest bands, and a decrease in size to shorter wavelengths.Note that the stamps above/below one another do not necessarily correspond to the same wavelength.The stamps are 4 × 4 arcsec.North is up and east is to the left.All stamps are scaled with a lower limit at 2 below the noise level (white).The galaxy (black) saturates at 5 above the noise level in the top row, 4 in the short-wavelength NIRCam bands and 10 in the long-wavelength NIRCam bands.

Figure 3 .
Figure 3. Four examples of Sérsic fitting to different galaxies in the sample.In each plot we show the data, model and the residual (data minus model).The ID, photometric redshift and stellar mass of the object are labelled on the data stamp of each galaxy.The top row and bottom left plots show examples from the  = 3, 4, 5 bins respectively.The bottom right plot shows an example of a clumpy galaxy where the Sérsic fitting does not perform well, leading to large residuals.The stamps are 4 × 4 arcsec.North is up and east is to the left.The scaling is set to a lower limit of −0.1 MJy/sr and saturates at 0.3 MJy/sr, which highlights the residuals in the case of poor fits.

Figure 4 .
Figure 4.The size distribution of galaxies in each of our redshift bins measured in F356W.Top: Sérsic size fitting.Bottom: non-parametric size fitting.We show the distributions as histograms, and then plot the log-normal fit to the distributions.The  = 4 and 5 distributions are scaled by a factor of two and three respectively to emphasise the relative positions of the peaks.This peak is taken as the size of a typical galaxy in each redshift bin.

Figure 5 .
Figure 5.The size-redshift distribution of our galaxies as measured in F356W.Left: Individual galaxy size and photometric redshift measurements are shown by the grey circles.The typical error is shown in the top left.At the mean redshift of each redshift bin ( z = 2.96, 4.04, 4.91) we plot different measures of the typical size and their errors.Circles represent the fit of the log-normal fit to the Sérsic size distribution, squares represent medians and crosses represent means (values are reported in Table1).The black dashed line represents the best-fit redshift evolution to all the data without binning,   = 3.51(1 + ) −0.60±0.22kpc, and the shaded region shows the error.The solid black line shows the best-fit line if we fix the power  in this relation to  = −0.20,following the evolution found by Curtis-Lake et al. (2016) at  = 4 − 8. Right: we compare our best-fit evolution, median and log-normal peak measurements as in the left panel to results fromOrmerod et al. (2023) and predictions from C23.We also plot the median and mean values when the sample is split into two mass bins at log 10 (  ★ / ⊙ ) = 9.6.Note the narrower   axis for clarity.

Figure 6 .
Figure6.The Sérsic size-mass relations of our sample.Each row shows a redshift bin (labelled in the top-left of each panel).The left column shows size measurements made in F200W, and the right column shows F356W.The points and errors show individual measurements, and the black solid line shows the best-fit line.The light-gray shaded region represents the intrinsic scatter, and the dark-gray shaded region represents the statistical error.The dashed black line shows predictions for the size-mass relations in each filter and redshift bin from C23.The horizontal dash-dotted line represents the PSF FWHM of each filter, converted to an equivalent radius at each redshift.The horizontal dotted gray line at the bottom of each panel represents the size inferred for a point source by PyAutoGalaxy (see Section 3.2.1).On the F356W results we also show the rest-optical size-mass relations at  = 3 − 5 for massive quiescent galaxies found byIto et al. (2023).

Figure 7 .
Figure 7.The size-luminosity relations in each redshift bin (labelled in the top-left of each panel) in F115W and F444W.Top: results from our Sérsic fitting.The rest-frame absolute magnitudes calculated for each of F115W and F444W (see Section 2.2) are shown on the x-axis, with the equivalent luminosity on the top axis.We also plot the 1 error contours.Bottom: results from our non-parametric size fitting.The plots are the same as the top panel, but we also show predictions from the FLARES model of R22 at  = 5 with the dashed lines.

MF444WMF115WFigure 8 .
Figure8.Results of non-parametric fitting on synthetic galaxies in FLARES(Vĳayan et al. 2021).The top row shows the synthetic galaxy image, and a circular aperture corresponding to the size measured in this noiseless image.We also note the rest-frame absolute magnitude and the filter the object is injected to in the bottom row, where we re-measure the size and plot the corresponding aperture.The modal 5 depth of the PRIMER imaging measured in a 0.3" diameter aperture is 27.4 in F115W and 28.5 in F444W.

Figure 9 .
Figure9.The size-luminosity relations at  = 5, as shown in Fig.7but with a comparison to simulated mock galaxy images.The open circles represent mock galaxies created as in Fig.8, and the filled circles show their size after injection into an empty region of our imaging.We offset the injected sizes in magnitude within 0.2 mag for clarity.The colours of the circles match the filters used.

Figure
Figure A1.Top: the ratio of Sérsic size to non-parametric size in F356W as a function of stellar mass for galaxies with a good Sérsic fit.Individual galaxies are coloured by their photometric redshift.Bottom: now the ratio is plotted as a function of photometric redshift, with galaxies coloured by their stellar mass.In both panels, the black points show the results of binning the size ratio.
in XMM-LSS is combined with deep optical data from the Canada-France-Hawaii-Telescope Legacy Survey (CFHTLS, Cuillandre et al. 2012) and the Hyper Suprime-Cam Subaru Strategic Program Data Release 2 (HSC-SSP DR2, Aihara et al. 2019) to create multi-wavelength photometric catalogues spanning 0.3−2.4m.The XMM-LSS field contains the Ultra-Deep Survey (UDS, Lawrence

Table 1 .
The mean, median and modal (log-normal peak) Sérsic sizes of galaxies in this work in redshift bins at  = 3, 4, 5.
Table1).The black dashed line represents the best-fit redshift evolution to all the data without binning,   = 3.51(1 + ) −0.60±0.22kpc,and the shaded region shows the error.The solid black line shows the best-fit line if we fix the power  in this relation to  = −0.20,following the evolution found by Curtis-Lake et al. (2016) at  = 4 − 8. Right: we compare our best-fit evolution, median and log-normal peak measurements as in the left panel to results fromOrmerod et al. (

Table 2 .
The best-fit values for the size-mass relation given by log 10   =  × log 10 (  ★ / ⊙ ) +  and the intrinsic scatter  log in each redshift bin and for each filter used.

Table 3 .
The best-fit values for the size-luminosity relation givens by log 10   =  ×  rest frame +  in each redshift bin and for each filter used.The top table shows the results from Sérsic size fitting, and the bottom table shows results from the non-parametric sizes.