JADES: Detecting [OIII]$\lambda 4363$ Emitters and Testing Strong Line Calibrations in the High-$z$ Universe with Ultra-deep JWST/NIRSpec Spectroscopy up to $z \sim 9.5$

We present 10 novel [OIII]$\lambda 4363$ auroral line detections up to $z\sim 9.5$ measured from ultra-deep JWST/NIRSpec MSA spectroscopy from the JWST Advanced Deep Extragalactic Survey (JADES). We leverage the deepest spectroscopic observations yet taken with NIRSpec to determine electron temperatures and oxygen abundances using the direct T$_e$ method. We directly compare against a suite of locally calibrated strong-line diagnostics and recent high-$z$ calibrations. We find the calibrations fail to simultaneously match our JADES sample, thus warranting a self-consistent revision of these calibrations for the high-$z$ Universe. We find weak dependence between R2 and O3O2 with metallicity, thus suggesting these line-ratios are ineffective in the high-$z$ Universe as metallicity diagnostics and degeneracy breakers. We find R3 and R23 still correlate with metallicity, but we find tentative flattening of these diagnostics, thus suggesting future difficulties when applying these strong-line ratios as metallicity indicators in the high-$z$ Universe. We also propose and test an alternative diagnostic based on a different combination of R3 and R2 with a higher dynamic range. We find a reasonably good agreement (median offset of 0.002 dex, median absolute offset of 0.13 dex) with the JWST sample at low metallicity. Our sample demonstrates higher ionization/excitation ratios than local galaxies with rest-frame EWs(H$\beta$) $\approx 200 -300$ Angstroms. However, we find the median rest-frame EWs(H$\beta$) of our sample to be $\sim 2\text{x}$ less than the galaxies used for the local calibrations. This EW discrepancy combined with the high ionization of our galaxies does not present a clear description of [OIII]$\lambda 4363$ production in the high-$z$ Universe, thus warranting a much deeper examination into the factors affecting production.

One of these JWST observed rest-frame optical emission lines was [OIII]λ4363.[OIII]λ4363 is a so-called auroral line, which are collisionally excited emission lines originating from higher energy levels compared to the typical nebular lines observed in galaxy spec-tra.Auroral lines are emitted by different ionic species and at different wavelengths (e.g., OIII]λλ1661, 1666, [OIII]λ4363, [OII]λλ7320, 7330, [SII]λ4069, [NII]λ5755, and [SIII]λ6312) (Castellanos et al. 2002;Maiolino & Mannucci 2019).However, [OIII]λ4363 has become the most sought-after due to its strength compared to other auroral lines and its proximity to restframe optical emission lines.If observed, the ratio of [OIII]λ4363 to the stronger, lower energy level lines of [OIII]λλ4959, 5007 acts as an exceptional electron temperature diagnostic.If the electron temperature can be determined then gas-phase ionic abundances, i.e., metallicities, can be derived directly from the strengths of common emission lines.This method of determining electron temperatures/metallicities is known as the "direct T e method" (T e ) due to the direct comparison of energy levels of a single species.The main disadvantage of employing T e is the intrinsic faintness of [OIII]λ4363, which can be 10-100 times fainter than the neighboring oxygen and Balmer lines (Maiolino & Mannucci 2019).As such, observations of [OIII]λ4363 have been restricted predominately to low-z, low metallicity individual galaxies or to stacked spectra of several hundreds of galaxies (e.g., Izotov et al. 2006;Hirschauer et al. 2016;Curti et al. 2017;Hsyu et al. 2017;Izotov et al. 2021a;Aver et al. 2022;Laseter et al. 2022), with sparse detections at z ≥ 1 (e.g., Christensen et al. 2012;Maseda et al. 2014;Patrício et al. 2018), thus limiting our measurements of galaxy metallicities in the high-z Universe.
Measuring gas-phase metallicities is vital: Metallicity is sensitive to many physical processes driving the baryon cycle in galaxies as it is the result of the complex interplay between gas flows, star formation, and ISM enrichment (Matteucci 2012;Maiolino & Mannucci 2019).Massive effort has been committed to modeling the chemical evolution of galaxies and their surroundings to provide information into the relative importance of such processes.However, such models require tight observational constraints, which can be established by investigating the metallicity over cosmic time.At z = 0, there is a well constrained relationship between stellar masses and metallicity known as the mass-metallicity relation (MZR) (Tremonti et al. 2004;Kewley & Ellison 2008;Mannucci et al. 2010).Evolution in the MZR has been shown to exist up to z ∼ 3 in the sense that galaxies at higher z have lower metallicity at a given stellar mass.However, statistical studies of the MZR based on large samples of galaxies do not typically determine metallicities by the direct T e method due to the difficulties in detecting [OIII]λ4363, especially at higher z and in higher metallicity galaxies.Most studies derive metallicities through strong-line diagnostics.
Strong-line calibrations typically exploit optical nebular lines (e.g., [OIII]λ5007, [NII]λ6584, [SII]λ6717, Hβ, etc.) that are calibrated against metallicities derived through the direct T e method (e.g., Curti et al. 2017Curti et al. , 2020;;Bian et al. 2018;Sanders et al. 2021;Nakajima et al. 2022), with photoionization models (e.g., Pérez-Montero 2014;Dopita et al. 2016), or a hybrid combination of the two (e.g., Pettini & Pagel 2004;Tremonti et al. 2004;Maiolino et al. 2008).However, it has been shown that even for the same galaxy population different calibrations can disagree by up to 0.6 dex (Kewley & Ellison 2008).Curti et al. (2017Curti et al. ( , 2020) ) improved calibrations by stacking Sloan Digital Sky Survey (SDSS) galaxies to provide a full empirical calibration for a suite of optical nebular emission lines.However, the properties of the high z universe differ from the local universe, so it is highly uncertain whether locally calibrated strong line diagnostics are appropriate to use in the early Universe.
The pivotal change in this predicament is the observational ability of JWST combined with the near-infrared spectrograph NIRSpec (Böker et al. 2022;Jakobsen et al. 2022;Ferruit et al. 2022;Böker et al. 2023).NIRSpec has opened the capability of obtaining multiobject spectroscopy in the near-IR from space with unmatched sensitivity compared to any current or past facility.JWST/NIRSpec has already observed a number of [OIII]λ4363 emitters (e.g., Schaerer et al. 2022;Taylor et al. 2022;Curti et al. 2023a;Trump et al. 2023;Rhoads et al. 2023), though all these previous works were based on observations from Early Release Observations (ERO) data obtained by targeting galaxies lensed by the cluster SMACS J0723.3-7327(Repp & Ebeling 2018)  However, all of these observations were obtained with relatively shallow spectroscopy.For example, the CEERS observations across 6 pointings totaled ∼ 5 hours of integration (Finkelstein et al. 2022) and the ERO observations across 2 pointings totaled ∼ 5 hours of integration (Carnall et al. 2023).Here we utilize deep spectroscopic data taken from the JWST Advanced Deep Extragalactic Survey (JADES), the deepest spectroscopic observations yet taken with NIRSpec, to provide a more detailed look at [OIII]λ4363 detections and assess locally derived strong line calibrations up to z ∼ 9.5.These NIRSpec/JADES observations obtained exposure times of up to 28 hours in the PRISM/CLEAR (R∼ 100) and up to 7 hours in each of the 3 medium resolution gratings (R∼ 1000) and the G395H/F290L high resolution grating (R∼ 2700), providing unprecedented new insights into chemical evolution and ISM properties of galaxies within the first Gyr of the Universe's history.
The structure of this paper is as follows: In Section 2 we describe the JADES observations, data reduction and emission line flux measurements; in Section 3 we present our [OIII]λ4363 detections; in Section 4 we compare our direct metallicity measurements to strong line calibrations calibrations; in Section 5 we discuss our findings; and finally in Section 6 we present our conclusions.For this work we adopt the Planck Collaboration et al. ( 2020) cosmology: H 0 = 67.36km/s/Mpc, Ω m = 0.3153, Ω λ = 0.6847.

Observations
The data presented in this paper were obtained via multi-object spectroscopic observations from JWST /NIRSpec using the micro-shutter assembly (MSA).Observations were carried out in three visits between Oct 21-25, 2022 (Program ID: 1210; PI: N. Luetzgendorf) in the Great Observatories Origins Deep Survey South (GOODS-S) legacy field as part of JADES.Each visit consisted of 33,613 s integration in the PRISM/CLEAR low-resolution setting and 8,403 s integration in each of G140M/F070LP, G235M/F170LP, G395M/F290LP, and G395H/F290LP filter/grating settings.Across three visits, this totals 28 hours of integration in the PRISM, which provides continuous spectral coverage from 0.6 -5.3 µm at R ∼ 30 − 300, and ∼ 7 hours in each of the medium resolution gratings, which combine to provide R ∼ 700 − 1300 across the full spectral range of NIRSpec, plus 7 hours in the highresolution grating which provides R ∼ 2700 from ∼2.8 -5.1 µm , though the exact wavelength coverage depends on the target location in the MSA.Observations within each visit were performed as a 3-shutter nod.The central pointing of each visit was dithered (by < 1 arcsec) such that common targets were observed in different shutters and different detector realestate.Thus, each visit had a unique MSA configuration, although target allocation (performed with the eMPT 1 ; Bonaventura et al. ( 2023)) was optimised for maximising target commonality between all three dither positions.
A total of 253 unique targets were observed in the PRISM configuration with the three dithers featuring 145, 155, and 149 targets respectively.All targets are observed with non-overlapping spectra in the PRISM mode.However, in the medium and high resolution gratings, individual spectra are dispersed over a larger number of detector pixels, and thus there is a possibility of spectral overlap.To minimize contamination overlap, we isolate our highest priority targets by closing the shutters of low-priority targets on the same row (i.e.targets that would cause overlapping spectra) during observations.Thus, for our grating spectra we observe 198 unique targets (119, 121, and 111 in each dither).

Data Processing
The JWST/NIRSpec observations have been processed by adopting algorithms developed by the ESA NIRSpec Science Operations Team (SOT) and the NIR-Spec GTO Team, and the details of the data-processing workflow will be presented a the forthcoming NIRSpec 1 https://github.com/esdc-esac-esa-int/eMPTv1 GTO collaboration paper.Once we retrieved the level-1a data from the MAST archive, we estimated the count rate per pixel by using the unsaturated groups in the ramp and removing jumps due to cosmic rays identified by estimating the slope of the individual ramps.During this first stage, we also performed the master bias and dark subtraction, corrected snowball artifacts, and flagged saturated pixels.
We then performed the pixel-by-pixel background subtraction by combining the three nod exposures of each pointing.We note that for some targets we excluded one of the 3-shutter nods in the background subtraction stage as a serendipitous source contaminated the open shutters.We then created 2D dimensional (2D) cutouts of each 3-shutter slit and performed the flat-field, spectrograph optics, and dispersers corrections.Then we run the absolute calibration stage and corrected the 2D spectra for the path-losses depending on the relative position of the source within its shutter.We computed and applied the path-losses correction for a point-like source as the size of our targets are smaller or comparable to the spatial angular resolution of the telescope at the redshifted wavelength of the optical nebular lines at z > 7.
We rectified and interpolated the 2D continuum map onto a regular grid for all medium/high-resolution gratings and an irregular grid for the PRISM/CLEAR to avoid an oversampling of the line spread function at short wavelengths.Finally, the 1D spectra were extracted from the 2D map adopting a box-car aperture as large as the shutter size and centered on the relative position of the target in the shutter.For each target, we combined all 1D spectra and removed bad pixels by adopting a sigma-clipping approach.

PPXF
Emission-line measurements and continuum modelling are made simultaneously using the penalised pixel fitting algorithm, ppxf (Cappellari 2017(Cappellari , 2022)).ppxf models the continuum as a linear superposition of simple stellar-population (SSP) spectra, using non-negative weights and matching the spectral resolution of the observed spectrum.As input, we used the high-resolution (R=10,000) SSP library combining MIST isochrones (Choi et al. 2016) and the C3K theoretical atmospheres (Conroy et al. 2018).The flux blue-ward of the Lyman break was manually set to 0. These templates are complemented by a 5 th -degree multiplicative Legendre polynomial, to take into account systematic differences between the SSPs and the data (e.g., dust, mismatch between the SSP models and high-redshift stellar populations, and residual flux calibration problems).The emission lines are modelled as pixel-integrated Gaussians, again matching the observed spectral resolution.To reduce the number of degrees of freedom, we divide all emission lines in four kinematic groups, constrained to have the same redshift and intrinsic broadening.These are UV lines (blueward of 3000 Å), the Balmer series of Hydrogen, non-Hydrogen optical lines (blueward of 9000 Å), and NIR lines.The stellar component has the same kinematics as the Balmer lines.Furthermore, we tie together doublets that have fixed ratios, and constrain variable-ratio doublets to their physical ranges.In particular, we fit for the following lines of interest: [OII] λ4363 detection in our sample with a S/N = 9.8.However, [OIII]λλ4959, 5007 fell within the detector gap for this object, so we instead use the [OIII]λλ4959, 5007 fluxes from our PRISM observations.We correct for reddening in our measurements from the available Balmer lines adopting a Calzetti et al. (2000) attenuation curve.We assume the theoretical ratios of Hα/Hβ = 2.86 and Hβ/Hγ = 2.13 from Case B recombination at T= 1.5 × 10 4 K.We default to correcting with respect to Hα/Hβ, but we use Hβ/Hγ when Hα is not available.
We can now determine electron temperatures and oxygen abundances through T e .However, we note it is customary to take oxygen abundances as representative of the total gas-phase metallicity, which has implicit assumptions that all other chemical elements scale proportionally and that individual galaxies are a single HII region comprised of a high-ionization zone traced by O ++ and a low-ionization zone traced by O + , which ignores the underlying temperature distribution and ionization structure.A detailed discussion of the nuance of these assumptions is outside the scope of this work (see Stasińska (2002) and Maiolino & Mannucci (2019) for a review), but there is novel work testing the significance of these assumptions (e.g., Cameron et al. 2022) that we are expanding upon.
Nonetheless, we derive the electron temperature for O ++ (t 3 ) by taking flux ratio of the [OIII]λλ4959, 5007 doublet to the [OIII]λ4363 thermal line.We used Pyneb (Luridiana et al. 2015) with O2+ and O+ collision strengths from Aggarwal & Keenan (1999) & Palay et al. (2012) and Pradhan et al. (2006) & Tayal (2007), respectively.A more problematic step is determining the electron temperature for O + (t 2 ).Only t 3 is derived directly here as we do not have spectral coverage of [OII] auroral lines at 7320 Å and 7330 Å.In situations where [OII] auroral lines are not detected, it is common to interconvert between t3 and t2 using modeled relations.One such t 3 -t 2 relation is presented by Curti et al. (2017) (originally presented in Pilyugin et al. (2009)), in which they relate directly derived t 3 and t 2 temperatures to obtain the relation: t 2 = 0.264 + t 3 × (0.835).
However, t 3 -t 2 relations have not been explored in the high-z Universe.Yates et al. (2020) found local t 3 -t 2 relations have difficultly in matching large samples of local galaxies with T e derived metallicities.Fortunately, there is typically little change in the total derived metallicity when adding O+ to O2+ as O2+ dominates the ionization state of oxygen in galaxies with direct [OIII]λ4363 detections (Izotov et al. 2006;Andrews & Martini 2013;Curti et al. 2017Curti et al. , 2020;;Laseter et al. 2022;Curti et al. 2023a).Nonetheless, there is a clear need for future investigation of t 3 -t 2 relations in the high-z Universe.
We determine ionic oxygen abundances using Pyneb with the same collision strengths as before.We assume an electron density of N e = 300cm −3 since this is representative of the ISM electron density of z ∼ 2-3 galaxies (Sanders et al. 2016a,b).The choice of electron density does not significantly affect the temperature results.For example, when assuming N e = 1, 000cm −3 , there is ∼ 0.1% change in the derived t 3 (Izotov et al. 2006).We determine the total oxygen abundance for each galaxy by taking ( O H = O+ H + O2+ H ). We do not detect any HeII λ4686 in our sample, so we do not apply an ionization correction factor to account for O3+ since HeII has an ionization potential of 54.4 eV and O3+ has an ionization potential of 55 eV.Even if O3+ is present, a correction would have nominal change for the total oxygen abundance (Izotov et al. 2006;Berg et al. 2021;Curti et al. 2023a).
To calculate the uncertainties of our measurements we use a Monte Carlo technique.We evaluate the electron temperature and oxygen abundance 10,000 times using values drawn randomly from normal distributions for the measured fluxes of [OIII]λλλ5007,4959,4363,Hβ,Hγ,and [OII]λλ3727, 3729, centered at the measured flux values, and with standard deviations corresponding to the 1σ flux errors from PPXF.Our final reported electron temperatures and metallicities are taken as the median value of the propagated normal distributions with the standard deviation of the distributions being the 1σ error.
In addition to our [OIII]λ4363 emitters, Curti et al. (2023a) measured the chemical abundances of three z ∼ 8 galaxies behind the galaxy cluster SMACS J0723.3-7327 during the initial ERO data release.A number of studies investigated the same objects (e.g., Schaerer et al. 2022;Taylor et al. 2022;Rhoads et al. 2023;Trump et al. 2023).However, Curti et al. (2023a) reprocessed the data through the NIRSPec GTO pipeline.We include these three galaxies (ID: 4590, 6355, and 10612) after reprocessing the initial data from Curti et al. (2023a)  with the updated NIRSpec GTO pipeline (Carniani et al., in preparation) and determining oxygen abundances as described above.We find nominal changes in the total metallicities: 0.24 dex for 4590, −0.1 dex for 6355, and 0.04 dex for 10612.For our combined sample we report the line fluxes in Table 1 and electron temperatures/metallicites in Table 2.
Recently, Bunker et al. (2023) provided the first JWST/NIRSpec spectrum of GN-z11 (Oesch et al. 2016) from the JADES collaboration.Bunker et al. (2023) reports a detection of [OIII]λ4363, but there was insufficient wavelength coverage to observe [OIII]λλ4959, 5007, thus we cannot use the T e method.The proceeding analysis and subsequent discussion in Sections 4 and 5 require a self-consistent metallicity prescription.Therefore, we do not include GN-z11 in our sample, but we highlight the detection of [OIII]λ4363 in the most luminous Lyman break galaxy at z > 10 for context in our discussion in Section 5.

Comparison
Recently, Sanders et al. (2023) identified [OIII]λ4363 in 16 galaxies between z ≈ 2.0 − 9.0, measured from JWST/NIRSpec observations obtained as part of the Cosmic Evolution Early Release Science (CEERS) survey program.They further consolidated 9 objects with [OIII]λ4363 detections between z ≈ 4 − 9 from the literature using JWST/NIRSpec along with 21 galaxies between z ≈ 1.4 − 3.7 with detections from ground-based spectroscopy.Sanders et al. (2023) determined metallicities with T e through PyNeb (Luridiana et al. 2015) for their entire sample to construct empirical T e -based metallicity calibrations for strong-line ratios such as R2, O3O2, R3, and R23 in the high-z Universe, which we investigate in Section 4. As such, we include the 16 discovered galaxies with [OIII] 2023a), with overall conclusions being that analyses and interpretations should avoid absolute flux calibrations and using widely separated line ratios (Trump et al. 2023).Recently, Maseda et al. (2023) provided deeper insight into these discrepancies.However, GTO flux calibrations have improved since these studies, though a full description will be presented in Bunker et al. (in preparation).A full comparison between the current strengths and weaknesses of the pipelines are outside the scope of this work, but for the current comparison between our JADES sample and Sanders et al. (2023), systematics could exacerbate or diminish offsets between metallicity determinations and strong-line ratios.

Metallicity Prescription Choice
In addition to systematics introduced through data reduction and the choice in collisional strengths, the decision to use a given metallicity prescription will introduce systematics, amongst other choices (e.g., the t 3 -t 2 relation).We demonstrate these systematics by re-deriving  2006) prescription.We use the atomic data listed in Stasińska (2005) to determine t 3 in an iterative manner (Izotov et al. (2006) equations 1 and 2).We derive t 2 using equation 14 from Izotov et al. (2006), which was obtained by relating t 3 to temperatures of other ions from photoionization models that best fit HII emission line observations (Izotov et al. 2006).We present in Figure 3 the systematic offsets between Izotov et al. (2006) and PyNeb derived metallicities for our sample.We find a median offset of ∆12 + log(O/H) of −0.11 dex when using PyNeb instead of Izotov et al. (2006).A critical assessment of the advantages and limitations of T e metallicity prescriptions is outside the scope of this work.However, it is clear that choice does matter, thus demonstrating the need for self-consistency in metallicity studies and comparisons as [OIII]λ4363 samples in the high-z Universe continue to grow.We continue with the analysis using metallicities derived with Pyneb.We include in Appendix A the Figures presented in Section 4 for Izotov et al. (2006) derived metallicities.Nonetheless, the main results discussed in Section 5 remain unchanged irregardless of the T e method employed.

Comparison to Locally Derived Strong Line Calibrations
As mentioned in Section 1, there are a number of strong nebular emission-line ratios calibrated against T e derived metallicities to act as metallicity diagnostics (e.g., Pettini & Pagel 2004;Maiolino et al. 2008;Marino et al. 2013;Pilyugin & Grebel 2016;Curti et al. 2017; Schaefer et al. 2020Schaefer et al. , 2022)).Another major uncertainty is the applicability of these strong line calibrations for high-z galaxies.An evolution in the ISM conditions of high-redshift galaxies compared to the local Universe might impact the intrinsic dependence of strong-line ratios on gas-phase metallicity, potentially hampering their use as abundance diagnostics at high redshift, and thus biasing the assessment and interpretation of the chemical evolution history of galaxies.Already, Cameron et al. (2023b), using the same parent data set as the current work, found z ∼ 5.5 − 9.5 galaxy emission line ratios are generally consistent with galaxies with extremely high ionization parameters (log(U) = -1.5)and are traced by the extreme ends of z ∼ 0 ionization-excitation diagrams of R23-O3O2 and R23-Ne3O2.In addition, Cameron et al. (2023b) found more than an order of magnitude of scatter in line ratios such as [OII]λλ3727, 3729/Hβ and [OIII]λ5007/[OII]λλ3727, 3729 while simultaneously not observing any [NII]λ6583, indicating significant diversity in metallicity and ionization within the ISM conditions of the sample.To complicate the landscape, recent JADES/NIRSpec observations of the GN-z11, which is also an [OIII]λ4363 emitter, revealed rarelyseen NIV]λ1486 and NIII]λ1748 lines that may imply an unusually high N/O abundance (Bunker et al. 2023;Cameron et al. 2023a;Senchyna et al. 2023).
Here, we utilize the T e derived abundances and emission line ratios delivered by the 'Deep' spectroscopic tier of JADES to provide a more detailed look at strong line calibrations in the high-z Universe.We include the aforementioned ERO objects from Curti et al. (2023a) 2023) sample plotted against metallicity and an array of locallyderived strong-line calibrations.Specifically, we include Maiolino et al. (2008), Curti et al. (2017Curti et al. ( , 2020)), Bian et al. (2018), andNakajima et al. (2022).In brief, Curti et al. (2017Curti et al. ( , 2020) ) provided calibrations based on T e metallicity measurements derived from SDSS stacked spectra and direct [OIII]λ4363 detections.Maiolino et al. (2008) combined a sample of T e derived low metallicity galaxies from Nagao et al. (2006) with predictions from photoionization models in the highmetallicity regime.Bian et al. (2018) 2 constructed calibrations from a sample of local [OIII]λ4363 emitters selected to match the location of z ∼ 2 star-forming sources in the [NII]-BPT diagram (Kewley et al. 2013).Finally, Nakajima et al. (2022) extended the Curti et al. (2017Curti et al. ( , 2020) SDSS stacks to the extremely metal poor regime by including XMPGs identified from the EM-PRESS survey (Kojima et al. 2020).Nakajima et al. (2022) further subdivided their calibrations characterized by high and low EW(Hβ) (i.e.EW(Hβ) > 200 Å and < 100 Å, respectively).Overall, the metallicity range for these calibrations differ, but we extrapolate each calibration over 6.9 ≤ 12 + log(O/H ≤ 9.0).We indicate calibrated ranges as reported in the original papers as solid lines, whereas extrapolations as dotted lines in Figures 4 -7.We stress that extrapolating calibrations past their defined range can lead to nonphysical behaviours; however, we are extrapolating to examine the limitations of the calibrations.
We determine the significance of deviation (in units of σ) for our JADES+ERO+CEERS sample to the pre- dictions of each of the strong-line calibrations presented in Figures 4 -7.We determine the total deviation of our sample from the calibrations through a Monte Carlo technique.We evaluate the difference between our data points and the calibration values 10,000 times using values drawn randomly from normal distributions for the measured line ratios, metallicities, and calibrations.We include the line uncertainties, metallicity uncertainties, and the intrinsic dispersion of the calibrations (σ cal ) as the standard deviation for the respective distributions3 .We present in Table 3 the total deviation between our sample and the respective calibrations.However, the sensitivity to metallicity varies over metallicity space for each strong-line diagnostic.For example, R23 has a weak dependence on metallicity at the turnaround point between 8.0 12 + log(O/H) 8.5, but a stronger dependence at lower metallicity (12+log(O/H) 7.65).A primary concern for studies investigating the MZR is its slope, which is dependent upon how well the metallicities of galaxies, especially at the lower-mass end (lower metallicity), are determined.We therefore investigate how well each calibration does in predicting the T e de-Deviation of Local Calibrations from our JADES Sample (in units of σ) rived oxygen abundances for each galaxy in our sample.
We determine the offset between derived oxygen abundances by performing the same MC technique as above and then taking 12+log(O/H) Te -12+log(O/H) cal .We present at the bottom of Figures 4 -7 the offset to each respective calibration for our individual galaxies.The vertical lines represent the strong-line calibration failing for that object due to the calibration never reaching the measured line ratio at the given relation.

R2
There is approximately an order of magnitude scatter in the R2 ratio from Figure 4, suggesting there is notable diversity in the ISM conditions of our sample since R2 is highly dependent on the ionization parameter and hardness of ionizing spectrum.In comparison, we find a median R2 value of −0.38 with a standard deviation of 0.41 while Cameron et al. (2023b), using the same parent sample of this work but with selection criteria of 5.5 ≤ z spec ≤ 9.5 and S/N of Hβ ≥ 5, found a median R2 value of −0.28 with a standard deviation of 0.38.We find the high-EW R2 calibration from Nakajima et al. ( 2022) has the smallest significance of deviation to our sample with a 1.09σ deviation, though there are metallicity offsets over ∼ −0.5 dex and 11 of our objects cannot be accounted for.
R2 is rarely used in isolation, but is often employed to break degeneracies of other calibrations.However, for the high-z Universe we clearly see there is significant scatter, thus suggesting the use of R2 as a degeneracy breaker in the high-z Universe is problematic.We perform a Spearman correlation test on our JADES sample and find ρ s = 0.58 with a p-value of 0.001, thus demonstrating a monotonic relationship with a low probability of an uncorrelated system reproducing the distribution.However, we see see similar R2 values across ∼ 1 dex in metallicity.This insensitivity of R2 ratios to metallicity is possibly due to the ionization parameter-metallicity relation at these epochs, i.e., the ionization parametermetallicity relation is not constant or has other dependencies (e.g., Reddy et al. 2023).Overall, our sample demonstrates R2 is a poor metallicity diagnostic in the high-z Universe, but the diversity in R2 values of our sample warrants a deeper investigation that is currently outside the scope of this paper.

O3O2
O3O2 also acts as a degeneracy breaker for other strong-line calibrations (Maiolino & Mannucci 2019) as it primarily traces the ionization parameter with the metallicity dependence being secondary due to the ionization parameter-metallicity relation.We find a median O3O2 value of 1.08 with a standard deviation of 0.36, while Cameron et al. (2023b) found a median O3O2 value of 1.03 with a standard deviation of 0.36.Nearly our entire sample exhibits high O3O2 values with the smallest deviation calibrations (0.96σ) from Bian et al. (2018) and the high-EW O3O2 calibration from Nakajima et al. ( 2023) still failing to account for 22 of our galaxies and producing metallicity offsets ∼ 0.6 dex.
We find a Spearman correlation of ρ s = −0.44 with a p-value of 0.02, thus demonstrating a correlation, albeit weak.However, we find similar O3O2 values across ∼ 1 dex in metallicity similar to R2.Therefore, although our sample is small, this finding suggests that O3O2 is neither a good O/H diagnostic nor an appropriate degeneracy breaker for other strong-line diagnostics in the high-z Universe.A more detailed picture of O3O2 was presented by Cameron et al. (2023b), in which they compared O3O2 against R23 (their Figure 5), which is ultimately comparing tracers of ionization parameter and total excitation, respectively.Cameron et al.  (2023b) found the JADES sample to exhibit much higher O3O2 values at a given R23 value compared to z ∼ 2 MOSDEF galaxies, which already traced the extremes of SDSS z ∼ 0 populations.Cameron et al. (2023b) concluded that galaxies across the sample exhibit very high ionization parameters.This high ionization is reflected in Figures 5 and Table 3 as the majority of calibrations fail to return a 12 + log(O/H) value at O3O2 ratios we measure.An explanation for this high ionization would be simple if our sample had lower O/H values since that would suggest the ionization-metallicity relation is constant.However, ionization is generally higher at fixed metallicity in our sample, thus suggesting a physicallydriven change, though a full characterization will be explored in forthcoming work.In contrast to R2 and O3O2, we see little scatter in our sample for R3.We find a median R3 value of 0.81 with a standard deviation of 0.12.Cameron et al. (2023b) also found a median R3 value of 0.74 with a standard deviation of 0.86.We find the calibration from Bian et al. (2018) has the smallest significance of deviation for our sample with a 0.50σ deviation, though four of our galaxies cannot be predicted by the calibration, metallicity offsets are up to ∼ −0.6 dex, and we are ultimately comparing against the extrapolation.Nonetheless, the R3 calibration from Bian et al. (2018) best traces our sample out of the local calibrations.
We find a Spearman correlation of ρ s = 0.62 with a pvalue of 0.0004, thus demonstrating there is still a strong relationship between R3 and metallicity.However, R3 has a characteristic turnover locally, which requires identifying which of the two branches applies.Interestingly, we see an apparent flattening of our sample across the double-valued R3 sequence.R3 is similar to R2 in that it is highly degenerate with the ionization parameter, the hardness of the ionizing spectrum, and the relation between metallicity and ionization parameter (Kewley & Ellison 2008;Maiolino & Mannucci 2019).As such, the flattening of our objects across the double-valued sequence, in addition to the large scatter in R2 and O3O2, suggests significant ionization across ∼ 1 dex in metallicity in our sample.Without probing higher metallicities it is difficult to conclude whether the characteristic turnover is present in the high-z Universe.If R3 is confirmed to have a minimal turnover then R3 as a metallicity diagnostic is not viable in the high-z Universe.Overall, forthcoming work will investigate whether R3 turns over and the origins of the excess R3 values.

R23
R23 is the most widely used strong-line calibration in determining metallicity because, unlike R2 and R3, R23 is an indication of the total excitation of a galaxy as it combines the different ionization states of oxygen.There is still a high dependence on the ionization parameter, however, along with a double branching that requires employing other strong-line diagnostics, such as R2 or O3O2, to break the degeneracy.R23 has already been employed in the high-z Universe (e.g., Nakajima et al. 2023); however, we find moderate deviation from our sample for the R23 calibrations.Specifically, we find the calibration from Bian et al. (2018) to have the smallest significance of deviation to our sample with a 0.58σ deviation, though as can be seen in Figure 7, the majority of our points do not fall within the calibrated range of Bian et al. (2018), 24 of our galaxies cannot be predicted, and metallicity offsets up ∼ 0.5 dex exist.From Figure 7, however, we see visually the large EW sample from Nakajima et al. (2022) best traces the upper envelope of our objects for a calibrated range, though metallicity offsets range between ∼ −0.5 and 0.5 dex.We find a median R23 value of 0.97 and a standard deviation of 0.13.Cameron et al. (2023b) also found a median R23 value of 0.90 with a standard deviation of 0.10.Overall, the R23 ratios of our JADES sample suggests significant excitation across ∼ 1dex in metallicity than what is typically seen in local galaxies.
It is clear that a self-consistent calibration of R23 is needed for the high-z Universe, but it is difficult to conclude whether R23 is appropriate for the high-z Universe.We find a Spearman correlation of ρ s = 0.68 with a p-value of 4.4 × 10 −5 , which indicates there is a strong correlation of R23 with metallicity.However, similar to our R3 ratios, we cannot determine whether R23 turns over or not.We cannot probe past the low-z turnover point (8.0 12 + log(O/H) 8.5) with our limited sample, but visually and with the Spearmen Rank correlation/p-value, the metallicity dependency of R23 is possibly inadequate for a high-z metallicity indicator, especially if this trend continues past the low-z turnover point.A stacking procedure, similar to Curti et al. (2017Curti et al. ( , 2020)), is necessary to probe past the low-z turnover point.

Comparison to High-z Calibration
In addition to a high-z [OIII]λ4363 sample, Sanders et al. (2023) provided the first high-z strong-line calibrations.Accordingly, we compare their calibrations for R2, O3O2, R3, and R23 to our sample in Figures 4  -7.We determine the significance of deviation as described in Section 4.1 for each calibration from Sanders et al. (2023).We find our sample to be 1.24σ, 1.17σ, 0.77σ 0.81σ away for R2, O3O2, R3, and R23, respectively.The R3 and R23 calibrations from Sanders et al. (2023) do visually trace the upper envelope of our sample where other local calibrations underestimate.However, at 12 + log(O/H) 8.0 the extrapolation of the calibration from Bian et al. (2018) predicts higher R3 ratios at a given metallicity than Sanders et al. (2023), thus leading to the higher deviation reported for the Sanders et al. (2023) calibration.For R2 and O3O2, the deviations reported for the Sanders et al. (2023) calibration are due to the significant scatter in our sample.
We note here and demonstrate in the Appendix A that there would be a systematic offset introduced when comparing a calibration and a sample with different metallicity prescriptions(e.g., Pyneb and Izotov et al. (2006), thus emphasizing the importance of self-consistency before systematics T e choice are better constrained.Nonetheless, the high-z calibration from Sanders et al. (2023) visually traces our sample well in the strong-lines investigated in the current work, but as discussed in Section 5, larger [OIII]λ4363 samples are clearly needed for future high-z Universe strong-line calibrations.

Photoionization Models
A common alternative to determining metallicities through the T e method or strong-line calibrations is the use of photoionization models due to the range of properties that can be explored (e.g., Tremonti et al. 2004;Pérez-Montero 2014;Dopita et al. 2016;Vale Asari et al. 2016).However, this approach is currently limited as it is difficult to capture the complexity of HII regions and a number of assumptions are employed (e.g., planeparallel atmospheres, the ionizing spectrum, and dust depletion) (Maiolino & Mannucci 2019).This area has improved with certain frameworks introducing Bayesian approaches where multiple emission lines are used to identify the best corresponding model returned from a grid (e.g., PyNeb (Luridiana et al. 2015), CLOUDY (Ferland et al. 2013), etc.) while minimizing assumptions.One such code is HII-CHI-Mistry from Pérez-Montero (2014).
Pérez-Montero (2014) used the synthesis spectral code CLOUDY v13.03 (Ferland et al. 2013) using POPSTAR (Mollá et al. 2009) stellar evolutionary models assuming an instantaneous burst with an age of 1 Myr with an initial mass function from Chabrier (2003).They range the ionization parameter between −1.50 ≤ log(U) ≤ −4.00 in steps of 0.25 dex, the oxygen abundance between 7.1 ≤ 12 + log(O/H) ≤ 9.1 in steps of 0.1 dex, and consider variations in the N/O ratio between 0.0 ≤ N/O ≤ −2.0 in steps of 0.125 dex, thus totaling 3927 models.It would be excessive to compare all the models, so we compare against the full metallicity range for N/O values of -2.0 (purple), -1.0 (green), and 0.0 (red) and log(U) values of -1.5 (dashed), -2.5 (solid), and -3.5 (dotted).We present in Figure 8 our JADES sample and the grid models returned from Pérez-Montero (2014).Our JADES sample is best traced by the log(U) = −1.5 models, though our most metal-poor galaxies require a higher ionization parameter while our least metalpoor galaxies fall close to log(U) = −2.5 models.The N/O models are indistinguishable as the values converge for our sample range.As such, it is still unclear whether we are dealing with extremely nitrogen poor systems.Nitrogen enrichment could be moderate yet exist in higher ionization states that we are unable to probe with [NII].As mentioned, Cameron et al. (2023b) found no detections of nitrogen even with 7 hour deep G395M/F290LP spectra, indicating future difficulty in examining N/O abundance ratios in metal-poor galaxies.Yet, GN-z11 revealed rarely-seen NIV]λ1486 and NIII]λ1748 lines (Bunker et al. 2023), with subsequent explanations implying unusually high N/O abundance (Cameron et al. 2023a;Senchyna et al. 2023).N/O trends at high-z are outside the scope of the current work, but our JADES sample demonstrates the importance constraining N/O trends in the high-z Universe and how nitrogen is handled in photoionization models.

A new projection in the R2-R3-O/H space
The set of calibrations presented by Sanders et al. (2023) (in particular those related to the R3 and R23 diagnostics) are starting to provide a more accurate representation of the distribution of galaxies with direct metallicities in the high-z Universe.Nonetheless, the calibration curves are still poorly sampled at both the low-and high-metallicity end, with the majority of galaxies with T e measurements distributed within the 7.6 < 12 + log(O/H) < 8.2 abundance range, close to the plateau of the calibrations.Moreover, given the relatively high-excitation properties of these sources (which boosts R3 and R23 at fixed O/H), the slope of the calibration curves appears to flatten further compared to most of the low-z calibrations, the plateau is hence wider, and the dynamic range in which these line ratios are sensitive to a variation in metallicity is reduced: this means that, for instance, at a value of R3 = 0.8 (above which more than 50 per cent of the currently available calibration sample resides) the 'gap' between the low-and high-metallicity solutions of the calibration is ∼ 0.6 dex.
Here, we attempt to provide a novel calibration based on a similar sample as described in Nakajima et al. (2022), but that however involves a different projection in the space defined by log([OII]λ3727, 29/Hβ), log([OIII]λ5007/Hβ), and metallicity.More specifically, such new diagnostic, which we here label as R, is defined as R = 0.47 R2+0.88 R3.As described more in detail in Appendix B, such linear combination corresponds to a rotation of 61.82 degrees around the O/H-axis in the R2-R3-O/H space, a projection that minimizes the scatter of our calibration sample in R at fixed metallicity over the full O/H range spanned by the galaxy calibration sample.We fit a fourth order polynomial to the R vs O/H relation as shown in Figure 9, with the best-fit coefficients that are provided in Appendix B. Compared to R23, this diagnostic has a wider dynamic range in its low-metallicity branch, spanning an interval of values between −0.2 and 0.8 between 7.0 <12+log(O/H)< 8.0, and shows a narrower turnover and plateau region.
We compare our observed JWST sample with the R diagnostic in Figure 9.We find a reasonably good agreement between R-predicted and observed metallicities for the high-z sample, with no systematic offset above or below the calibration curve: the points scatter around the best-fit relation with a median offset in R of 0.002 dex at fixed O/H, a median absolute deviation of 0.13 dex, a dispersion of 0.19 dex, and a significance of 1.00σ 4 given an intrinsic dispersion of the calibration of 0.058 dex. 4 The dispersion of the R calibration is lower than all local calibrations.A lower intrinsic dispersion can increase the significance of deviation since the calibration varies less compared to a calibration with higher dispersion that is able to "roam" closer to more distant points when performing a Monte Carlo procedure.

Strong-line Diagnostics
From Figures 4 -7 and Table 3, we see clear discrepancies between locally-derived strong-line calibrations and our JADES sample.We find that a single calibration cannot simultaneously account for all galaxies across all diagnostics.
The largest discrepancies between local-calibrations and our JADES sample are for the R2 and O3O2 diagnostics, which is most likely caused by R2 and O3O2 being insensitive to metallicity at these redshifts, i.e., R2 and O3O2 are not appropriate metallicity indicators or degeneracy breakers for the high-z Universe.Recently, Reddy et al. (2023) concluded that electron gas density potentially has a larger responsibility than metallicity in modulating the ionization parameter in these early epochs.We are potentially observing this result in Figures 4 and 5 where we have consistently high ionization ratios over our metallicity space, but further investigation is needed.
For our sample, R3 and R23 still indicate a dependency on metallicity at these high redshifts.Spearman correlations of ρ s = 0.62 & 0.68 with p-values of 0.0004 & 4.4 × 10 −5 , respectively, further corroborate this finding.However, we do observe flattening of our sample compared with local R3 and R23 calibrations, possibly suggesting future difficulty when applying these diagnostics in the high-z Universe, especially at moderate metallicites.This flattening is potentially a result of an evolution in the ionization parameter-metallicity relation that has a higher dependency on electron densities (Reddy et al. 2023), though a much more detailed analysis on a larger sample size of high-z [OIII]λ4363 emitters and stacked spectra of several hundreds of galaxies to probe to higher metallicities (12 + log(O/H) 8.0 − 8.5) is required to examine the physical origins and establish whether there is a turnover for R3 and R23.
Overall, any local calibration for R2, O3O2, R3, and R23 clearly fails to simultaneously match our sample: There is a clear need for a self-consistent revision of the calibrations in the high-z Universe using JWST, and we caution against the use of locally derived calibrations being applied to high-z Universe.We postpone deriving new R2, O3O2, R3, and R23 calibrations for the high-z Universe as our sample is limited and it is best to remain self-consistent until systematics between spectroscopic reduction pipelines are better characterized.As such, it is essential to continue constructing samples of [OIII]λ4363 in the high-z Universe with JWST.
While [OIII]λ4363 sample sizes increase and calibrations improve, the R projection presented in this paper and the high-z calibrations from Sanders et al. (2023) provide the best match to high-z [OIII]λ4363 derived metallicities.
To determine EW 0 (Hβ) for our JADES objects we interpolate the best-fit continuum to our PRISM data from PPXF over a 60 Å bin around the Hβ line center in our R1000 data, divide the measured flux of Hβ from the R1000 fits by the interpolated best-fit continuum, and then divide by (1 + z).We include EW 0 (Hβ) Å for our objects in Table 2.
Although our JADES sample demonstrates excitation ratios higher than any local R3 and R23 calibration (excluding the extrapolation of Bian et al. (2018)), the high EW 0 calibration (EW 0 (Hβ) > 200 Å) from Nakajima et al. (2022) lies closest to the upper envelope of our sample.However, we find the median EW 0 (Hβ) for our JADES sample to be ∼ 170 Å, with the minimum being ∼ 70 Å and the max being ∼ 550 Å.Interestingly, we find the median EW 0 (Hβ) becomes ∼ 120 Å when excluding galaxies in our sample beneath z = 4.0.As such, there is an apparent decrease in rest-frame EWs(Hβ) of high-z [OIII]λ4363 emitters compared to local metalpoor objects with [OIII]λ4363 detections, even though we find higher ionization/excitation ratios for our sample.
An increase in the luminosity of [OIII]λ4363 in the high-z Universe could account for the EW 0 (Hβ) disparity in that galaxies in earlier epochs have intrinsically brighter [OIII]λ4363 at a fixed EW 0 (Hβ).However, it is difficult to characterize whether there is a physically driven increase in the luminosity of [OIII]λ4363 for our sample due to limited z > 1 [OIII]λ4363 samples, lack of flux calibrations for most studies, and undetermined mass completion limits.Nonetheless, a line luminosity increase is expected due to the FMR.At lower metallic-ities and/or masses we expect an increase in the SFR, and thus luminosity.However, it is debated whether the FMR evolves with redshift, though Curti et al. (2023b), using the same parent data set as the current work, demonstrates galaxies sit preferentially below local FMR predictions with increasing redshift (z 6), such that these galaxies are significantly less enriched at a given SFR and stellar mass.
In general, [OIII]λ4363 would be more luminous with an increase in sSFR and/or a decrease in metallicity.However, we would expect an increase in sSFR to be associated with higher rest-frame EWs(Hβ) relative to local counterparts, but for our JADES objects we find rest-frame EWs(Hβ) lower than local galaxies that have reduced ionization/excitation ratios at similar metallicities compared to our sample.Therefore, we expect the [OIII]λ4363 luminosity of our JADES sample to be driven by lower metallicities, thus reflecting a number of possible processes such as pristine gas accretion (Mannucci et al. 2010) and efficient metal removal from stellar winds that are expected to increase with a top-heavy IMF (Palla et al. 2020).However, as mentioned, Cameron et al. (2023b) found our parent sample exhibits excitation ratios resembling extreme star-formation galaxies, such as blueberries (Yang et al. 2017a) and blue compact dwarf galaxies (Sargent & Searle 1970;Cairós et al. 2010) that are known to have high sSFRs (10 −7 yr −1 sSFR 10 −8 yr −1 ).In addition, Curti et al. (2023b) found our parent sample occupies the same region of the MZR as these extreme star-forming galaxies.
Overall, the picture is opaque.It is peculiar that we are simultaneously observing galaxies with lower restframe EWs(Hβ) and higher excitation values relative to local analogs that have high sSFRs.In addition, a number of possible processes, such as an evolving FMR, variations in metal-cooling due to elemental production time scales (e.g., oxygen being enriched rapidly due to the production from core-collapse supernovae, compared to similar cooling curves from nitrogen and carbon that are enriched by massive stars and type Ia supernovae), or more extreme, poorly understood thermal and density structure variations in the emitting nebulae (Cameron et al. 2022;Reddy et al. 2023), could all affect the luminosity of [OIII]λ4363, metallicity determinations, and ionization/excitation values.In addition, Reddy et al. (2023) proposed that electron density plays a larger role in regulating the ionization parameter, which in return would affect the temperature distribution of HII regions where [OIII]λ4363 originates from.Our sample clearly demonstrates the necessity for a deeper investigation into the production of [OIII]λ4363 in the high-z Universe.

SUMMARY AND CONCLUSIONS
We have identified 10 [OIII]λ4363 detections discovered from ultra-deep JWST/NIRSpec MSA spectroscopy from the JADES DEEP survey, which is only a small fraction of the final JADES spectroscopic dataset .We applied the T e -method to determine gas-phase oxygen abundances to examine how well local strong-line calibrations match a robust high-z [OIII]λ4363 sample.Our main findings are summarised as follows: 1.The local strong-line metallicity calibrations investigated do not provide good simultaneous predictions for the metallicities across our sample as seen in Figures 4 -7.Specific calibrations have smaller deviations for various diagnostics while completely failing for lower metallicity galaxies, thus demonstrating the necessity for a systematic re-calibration of R2, O3O2, R3, and R23 strongline diagnostics in the high-z Universe.We caution against employing locally derived calibrations in the high-z Universe.
2. There is weak correlation between R2 and O3O2 with metallicity.If larger samples with higher metallicity galaxies support this finding then R2 and O3O2 would be inadequate diagnostics for deriving metallicities or breaking degeneracies in the high-z Universe.There is also an order of magnitude scatter at fixed metallicity in our sample for R2 and O3O2 diagnostics, further demonstrating ISM diversity that is potentially diminishing the dependency of R2 and O3O2 with metallicity.R3 and R23 correlate with metallicity, but elevated, comparable line-ratios across ∼ 1 dex in metallicity demonstrates a flattening of the strong-lines with metallicity.If this trend continues past the turnover point between 8.0 12 + log(O/H) 8.5 then R3 and R23 would be problematic to use in the high-z Universe as metallicity would be indistinguishable without a substantial degeneracy breaker.
3. The new R projection ( R = 0.47 R2 + 0.88 R3) and high-z calibrations (R3 & R23) from Sanders et al. (2023) provide the best match to our sample overall.However, larger high-z [OIII]λ4363 sample sizes are needed that extend to higher metallicities past the plateaus of the calibrations.
4. The rest-frame Hβ EWs of our JADES sample are moderate with the median being ∼ 170 Å.However, excluding galaxies lower than z = 4 in our JADES sample yields a median of ∼ 120 Å, which contrasts local galaxies with rest-frame EWs(Hβ) ∼ 300 Å used to derive local calibrations that still fall beneath the ionization/excitation ratios of our sample.In addition, our elevated excitation values, along with the findings of Cameron et al. (2023b) and Curti et al. (2023b), demonstrates our sample closely matches extreme star-formation galaxies, such as blueberries (Yang et al. 2017a) and blue compact dwarf galaxies (Sargent & Searle 1970;Cairós et al. 2010) that are known to have some of the highest sSFRs (10 −7 yr −1 sSFR 10 −8 yr −1 ).The combination of these findings does not present a clear description of [OIII] In Section 3.2.2we examined systematic offsets introduced when changing metallicity prescriptions between PyNeb and the empirical relations from Izotov et al. (2006).We demonstrated there is a −0.11 dex offset between PyNeb and Izotov et al. (2006) for our sample.The median error of PyNeb derived abundances for our sample is 0.12 dex, so the systematics introduced when choosing a metallicity prescription are comparable to the associated error with our measurements.We demonstrate these systematics further in Figures 10 and 11 by deriving metallicities for our sample using the Izotov et al. (2006) prescription and comparing against strong-line calibrations as we did in Figures 4 -7.We clearly see our sample more closely matches R3 and R23 local calibrations when using Izotov et al. (2006).However, more recent local calibrations, such as Curti et al. (2017Curti et al. ( , 2020) ) and Nakajima et al. (2023), along with the high-z calibrations from Sanders et al. (2023), employed PyNeb for their T e determinations and therefore their calibrations.As such, if we were to determine the respective PyNeb calibrations using Izotov et al. (2006) instead then the main findings of the paper remain.It is clear that choosing a metallicity prescription matters, and thus future studies combining multiple samples should consistently re-derive metallicities for each respective sample to remain self-consistent.
In addition to metallicity prescription choice, the atomic data used, such as the options provided in PyNeb or the CLOUDY configurations used in Izotov et al. (2006), can introduce systematic offsets.For example, we use the O2+ collision strengths from Aggarwal & Keenan (1999) & Palay et al. (2012) when determining our metallicities, but Sanders et al. (2023) used O2+ collision strengths from Storey et al. (2014) (the default of PyNeb) when deriving their metallicities, hence why we re-derived metallicities from Sanders et al. (2023) for our sample.Similar to Figure 3, we present in Figure 12 the systematic offsets in metallicity for our sample introduced when choosing to use O2+ collision strengths between Storey et al. (2014) and Aggarwal & Keenan (1999) & Palay et al. (2012) internal to PyNeb.We find a median metallicity offset of 0.02 dex, but there are offsets between ∼ −0.1 and 0.1 dex in our sample.It is clear the systematic offsets between metallicity prescriptions are overall larger, but the offsets introduced when choosing collisional strengths can be non-negligible.
Overall, it is clear that choosing a metallicity prescription, and to a lesser extent the collisional strengths, matters.The systematics introduced with choice will affect future studies investigating the MZR and FMR, especially as we begin establishing these principal scaling relations in the high-z Universe (e.g., Curti et al. 2023b).The slope and normalization of these scaling relations are essential in constraining galaxy chemical evolution models and interpreting the driving mechanisms behind their respective existence, shape, and evolution, and thus self-consistency is key before the systematics and their effects are more closely examined.

B. CALIBRATION OF THE NEW R DIAGNOSTIC
In Section 4.8 we provide the calibration to a new metallicity diagnostics based on a combination of log([OIII]λ5007/Hβ) and log([OII]λ3727, 29/Hβ) which differs from the standard R23, and we test it against galaxies with direct metallicities at high-z (z > 2) from ERO, CEERS, and JADES.Here, we provide a more detailed description of the calibration sample and rationale.
In the top-left panel of Figure 13, we plot the distribution of this sample in the log([OIII]λ5007/Hβ) vs log([OII]λ3727, 29/Hβ) diagram; each data point is color-coded by its metallicity derived with the T e method, with squared symbols representing stacked spectra from Curti et al. (2017) and circles marking individual galaxies from the literature.The distribution of points in the diagram reflects the well known sequence in metallicity and ionisation parameter observed in large local surveys like SDSS; however, several among the most extremely metal poor galaxies deviate from the sequence in its upper-left branch, while preferentially occupying a region of significantly lower R3, at fixed R2.This makes it difficult to find a parametrisation in such a 2D space that correctly predicts the metallicity over the entire range spanned by the sample.We therefore search for a re-projection of the axis that facilitate the metallicity prediction over the whole abundance scale.Ideally, such projection should incorporate the different dependence between line ratios, ionisation parameter, and metallicity seen in many metal-poor galaxies of the sample, whose ISM properties more closely resemble those of high redshift objects also observed with JWST/NIRSpec Cameron et al. (2023b).The projection is shown in the top-right panel of Figure 13.More specifically, we search for a linear combination of R2 and R3 in the form R = cos(φ)R2 + sin(φ)R3 (B1) which is equivalent to a rotation of the R2-R3 plane around the O/H axis.We then fit a fourth-order polynomial to the resulting R ratio versus the metallicity, in the form of R = n c n x n where x = 12+log(O/H) − 8.69 , and indentify the angle φ that allows to minimize the scatter in metallicity from the best-fit relation.This procedure leads to a best-fit φ = 61.82deg, which translates into R = 0.47R2 + 0.88R3, i.e., the best possible projection of the R2 vs and a number of extraction and metallicity prescriptions were employed.Recently, Nakajima et al. (2023) reanalyzed 4 sources from ERO and 4 sources from GLASS, along with identifying a new [OIII]λ4363 source from CEERS in the EGS.Sanders et al. (2023) also identified 16 galaxies with [OIII]λ4363 detections from CEERS.In addition, Übler et al. (2023) identified [OIII]λ4363 in a low metallicity AGN at z ∼ 5.55 with the JWST/NIRSpec Integral Field Spectrograph.

Figure 1 .
Figure 1.Redshift distribution of our parent JADES sample of 198 objects with both PRISM and grating spectra and our 10 novel [OIII]λ4363 emitters.No pre-selection was performed on the parent JADES sample as we visually inspected all objects.

Figure 3 .
Figure 3. Deviation between metallicites derived by Izotov et al. (2006) and PyNeb.The solid line represents unity, whereas the dashed line represents the median offset between Izotov et al. (2006) and PyNeb.

Figure 4 .
Figure4.The relationship between Te metallicity and R2 for our JADES sample compared with strong-line calibrations fromMaiolino et al. (2008),Curti et al. (2017Curti et al. ( , 2020)), and the "All", "Large Equivalent Width (EW)", and "Small EW" calibrations fromNakajima et al. (2022).Bian et al. (2018) does not include a calibration for R2, but we include their calibrations for O3O2, R3, and R23 in Figures5 -7.Solid lines indicate calibrated ranges whereas dotted lines indicate the extrapolation of the calibration over the metallicity range 6.9 ≤ 12 + log(O/H) ≤ 9.0.The six subplots demonstrate the change between Te derived metallicities and calibration derived metallicities for our individual galaxies.The vertical lines represent the failure of a strong-line calibration to account for the measured line ratios at the given metallicity.

Figure 5 .
Figure 5. Identical to Figure 4 except the relationship is between Te metallicity and O3O2.

Figure 6 .
Figure 6.Identical to Figure 4 except the relationship is between Te metallicity and R3.

Figure 7 .
Figure 7. Identical to Figure 4 except for the relationship between Te metallicity and R23.

Figure 8 .
Figure 8. R3 vs 12 + log(O/H) and R23 vs 12 + log(O/H) with photoionization models from Pérez-Montero (2014).Various line styles and colors represent different nitrogen/oxygen abundance ratios and ionization parameters, respectively.Symbols are the same as in Figure 4.

Figure 9 .
Figure9.The JWST sample with auroral lines measurements analysed in this work is compared against the R diagnostic presented in Section 4.8.Symbols are the same as in Figure4.The high-z sample with Te metallicities is predicted by the R calibration with a median absolute offset of 0.13 dex and a standard deviation of 0.19 dex.

Table 1 .
Measured fluxes and errors of emission lines of interest from PPXF in units of 10 −20 erg/s/cm 2 .

Table 3 .
Significance of deviation (in units of σ) for the expected line ratios from each strong-line calibration to our JADES sample presented in Figures4 -7.The metallicity dependency varies across each strong-line diagnostic examined (e.g., the turnover points in R3 and R23), thus we include (12 + log(O/H)T e ) − (12 + log(O/H) cal ) in Figures4 -7to demonstrate offsets with the respect to 12 + log(O/H)T e for our individual galaxies.
λ4363 production in the high-z Universe, thus warranting a much deeper examination into the possible processes.at UC Santa Cruz, funded by NSF MRI grant AST 1828315.The research of CCW is supported by NOIR-Lab, which is managed by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation.This research is supported in part by the Australian Research