EMPRESS. V. Metallicity Diagnostics of Galaxies over 12+log(O/H)=~6.9-8.9 Established by a Local Galaxy Census: Preparing for JWST Spectroscopy

We present optical-line gas metallicity diagnostics established by the combination of local SDSS galaxies and the largest compilation of extremely metal-poor galaxies (EMPGs) including new EMPGs identified by the Subaru EMPRESS survey. A total of 103 EMPGs are included that cover a large parameter space of magnitude (Mi=-19 to -7) and H-beta equivalent width (10-600 Ang), i.e., wide ranges of stellar mass and star-formation rate. Using reliable metallicity measurements of the direct method for these galaxies, we derive the relationships between strong optical-line ratios and gas-phase metallicity over the range of 12+log(O/H)=~6.9-8.9 corresponding to 0.02-2 solar metallicity Zsun. We confirm that R23-index, ([OIII]+[OII])/H-beta, is the most accurate metallicity indicator with the metallicity uncertainty of 0.14 dex over the range among various popular metallicity indicators. The other metallicity indicators show large scatters in the metal-poor range (<0.1 Zsun). It is explained by our CLOUDY photoionization modeling that, unlike R23-index, the other metallicity indicators do not use a sum of singly and doubly ionized lines and cannot trace both low and high ionization gas. We find that the accuracy of the metallicity indicators is significantly improved, if one uses H-beta equivalent width measurements that tightly correlate with ionization states. In this work, we also present the relation of physical properties with UV-continuum slope beta and ionization production rate xi_ion derived with GALEX data for the EMPGs, and provide local anchors of galaxy properties together with the optical-line metallicity indicators that are available in the form of ASCII table and useful for forthcoming JWST spectroscopic studies.


INTRODUCTION
Before the next generation of powerful telescopes such as the James Webb Space Telescope (JWST) and the 30 m-class extremely large telescopes will come online, there is an increasing awareness of the importance of low-redshift, young, and low-mass star-forming galaxies as probes of systems in the early universe. Although such local galaxies would have different characteristics and formation processes from high redshift galaxies, they are still useful to discuss which characteristics we can use to address big questions such as the star-formation of galaxies in the early phase of galaxy evolution, their subsequent evolution, and the role of galaxies during the reionization process.
An important knowledge we can learn from the lowredshift galaxies is the emergent emission-line spectrum as a function of galaxies' key properties such as metallicity and ionization parameter in the inter-stellar medium (ISM) as well as the shape of ionizing spectrum. Once the relationships are confirmed in conjunction with theoretical models, we can extend the knowledge toward higher-redshifts to address early galaxies' properties via spectroscopic studies. Gas-phase metallicity is one of the most crucial quantities, as it reflects the star-formation/explosion history (Maiolino & Mannucci 2019 for a review). It has thus long been studied how to estimate gas-phase metallicity in distant galaxies.
The most accurate gas metallicity estimate is provided if the electron temperature (T e ) is directly measured via auroral lines such as [O iii]λ4363. This is called the direct T e method, and most reliably used to identify the primitive galaxies (e.g., Kojima et al. 2020;Izotov et al. 2018b) and examine the scaling relations such as the stellar mass -metallicity relation (e.g., Andrews & Martini 2013). The direct T e method is not always available, however, due to the faint nature of the auroral lines. Instead, stronger metal lines are focused and substituted to estimate metallicities for faint galaxies especially at high redshift. Strong metal lines divided by a hydrogen line, such as ([O iii]λλ5007, 4959+[O ii]λ3727)/Hβ (R23-index) and [N ii]λ6584/Hα (N2-index), as well as other metal line ratios have been historically investigated and proposed as gas-phase metallicity indicators both observationally (empirically; Pagel et al. 1979;Edmunds & Pagel 1984;van Zee et al. 1998;Pettini & Pagel 2004;Pilyugin & Thuan 2005;Stasińska 2006;Nagao et al. 2006;Maiolino et al. 2008;Marino et al. 2013;Pilyugin & Grebel 2016;Curti et al. 2017Curti et al. , 2020 and theoretically with photoionization models (McGaugh 1991;Kewley & Dopita 2002;Blanc et al. 2015;Strom et al. 2018; see Maiolino & Mannucci 2019 for a comprehensive review). Maiolino et al. (2008) make use of individual low-metallicity galaxies as compiled in Nagao et al. (2006) and derive the strong lines' diagnostics in an empirical manner. However, the authors have to rely on the photoionization model fitting for the high metallicity galaxies, because the detection of auroral lines becomes more challenging from galaxies with a higher metallicity and a lower gas temperature. This would cause a systematic change in line ratios as a function of metallicity between the low and high metallicity regimes (see also Nagao et al. 2006). Recently, Curti et al. (2017) and the subsequent study of Curti et al. (2020) obtain the relations at such a highmetallicity regime (12 + log(O/H) ≃ 8.1 − 8.9) based on the direct T e method by stacking spectra of Sloan Digital Sky Survey (SDSS). Nevertheless, the authors do not fully compile rare, individual metal-poor objects for developing the diagnostics (12 + log(O/H) 7.7). Accordingly, a comprehensive study of metallicity diagnostics using a larger sample at low-metallicity and covering the full metallicity range is now required.
Moreover, a possible dependence of the empirical metallicity diagnostics on the ionization state needs to be examined. Pilyugin & Grebel (2016) discuss the effect of ionization state on the metallicity calibrators (see also Pilyugin & Thuan 2005;Kewley & Dopita 2002;Izotov et al. 2019a). It is suggested to use several metal lines probing both high and low ionization gas to correct for the effect especially at the low-metallicity regime. Such a correction, which is ideally feasible even when limited sets of emission lines are available, is particularly important if the diagnostics are applied to high redshift objects, because a typical ionization parameter is suggested to become higher in galaxies at a higher redshift (e.g., Nakajima & Ouchi 2014).
From another viewpoint of what can be learned about early galaxies from local observations, recent work has been particularly focusing on the properties in the rest-frame ultra-violet (UV) wavelength. For example, Izotov et al. (2016a,b) observe z ≃ 0.3 galaxies with an extremely large equivalent width (EW) of [O iii]λ5007 called green pea galaxies in the rest-frame below the Lyman limit, and confirm they present Lyman continuum (LyC) leakage with a moderate escape fraction (f esc = 0.02 − 0.7, median is f esc ∼ 0.1; see also Izotov et al. 2018a,c). Because such intense emission lines are thought to be more common in galaxies at higher redshift (e.g., Smit et al. 2014;Khostovan et al. 2016;Reddy et al. 2018a;Tang et al. 2019;Nakajima et al. 2020), the correlation between f esc and emission lines' visibility needs to be further investigated and clarified (see also Nakajima & Ouchi 2014;Faisst 2016;Plat et al. 2019;Ramambason et al. 2020). An expanded survey at low-redshifts is thus on-going with Hubble Space Telescope (HST; GO 15626, PI: A. Jaskot; see also Flury et al. 2022).
Especially, intense high ionization emission lines such as Heii and Civ are often identified in young, metalpoor galaxies ). This also supports the importance of local young, metal-poor galaxies as analogs of high redshift systems since intense high ionization UV emission lines are more frequently found at high redshift remarkably in the reionization era (Erb et al. 2010;Stark et al. 2014;Berg et al. 2018;Stark et al. 2015aStark et al. ,b, 2017Laporte et al. 2017;Mainali et al. 2018;Hutchison et al. 2019;Jiang et al. 2021;Topping et al. 2021). Detailed modelings are also emerging to accurately interpret the UV emission lines (e.g., Feltre et al. 2016;Gutkin et al. 2016;Jaskot & Ravindranath 2016;Nakajima et al. 2018b;Byler et al. 2018;Hirschmann et al. 2019). The investigation of the UV spectrum in galaxies will be further explored in the local universe with HST/COS (GO 15840, PI: D. Berg; see also Berg et al. 2022).
Despite the importance, many low redshift sources studied in details (i.e., followed-up with the UV instruments) are relatively massive with a stellar mass M ⋆ = 10 7 -10 9 M ⊙ and bright, evolved galaxies whose metallicity are modest, as low as the sub-solar metallicity value (Z = 0.1-0.3 Z ⊙ ). The next focus in the community is thus to explore the properties of further young, low-mass, and metal-poor galaxies, ideally as primordial as first-generation galaxies in the early universe (e.g., Wise et al. 2012). These will also enable us to determine the metallicity diagnostics at the lowmetallicity end. In this paper, we investigate such metalpoor galaxies, especially highlighting those with a metallicity below 10 % of the solar value (Z< 0.1 Z ⊙ , equivalently 12 + log(O/H) < 7.69 based on the solar chemical composition of Asplund et al. 2009) which are called extremely metal-poor galaxies, or dubbed "EMPGs". This paper is a part of a program named EMPRESS  to systematically sample EMPGs and examine their detailed properties. In EMPRESS, we use the deep and wide multi-wavelength imaging data of Subaru/Hyper Suprime-Cam (HSC) Subaru Strategic Program (SSP; Aihara et al. 2018). At z 0.03, low-mass, actively star-forming galaxies present g-and r-band excesses with intense nebular emission lines of [O iii]λλ5007, 4959 and Hα, respectively. In particular, a metal-poorer galaxy tends to present a redder (g − r) color with a weaker contribution of metal line of [O iii] relative to Hα. Although a similar approach using a broadband excess is successfully developed to select intense [O iii] emitting galaxies such as green pea galaxies (Cardamone et al. 2009) at z ∼ 0.3 and blueberry galaxies (Yang et al. 2017) at z ∼ 0.02, they should not be extremely metal-poor by definition. We adopt a machine learning technique to reliably search for EMPGs whose strong hydrogen lines while moderateto-weak metal lines are imprinted on the photometric broadband SEDs. A similar idea is also adopted by  to search for EMPGs based on the photometric data. A series of spectroscopic followup has been conducted for the EMPG candidates of EMPRESS to confirm their metallicities and spectroscopically characterize the properties particularly about the presence of massive stars in metal-poor galaxies (Kojima et al. 2021;Isobe et al. 2021), stellar feedbacks (Xu et al. 2021), and the shape of ionizing spectrum to explain the high ionization emission lines (Umeda et al. 2022). This paper presents another spectroscopic observation recently conducted to enlarge the EMPG sample, and develop the metallicity diagnostics at the lowest metallicity range. Furthermore, we present the fundamental UV properties such as the UV continuum slope β and the ionizing photon production efficiency ξ ion to examine the young stellar population at the extremely metal-poor regime.
The paper is organized as follows. We describe our sample of EMPGs in §2. This includes newly-identified EMPGs by EMPRESS, and compiles previously known metal-poor objects from the literature. In §3, we present the metallicity diagnostics over the metallicity range 12 + log(O/H) ≃ 6.9 − 8.9 and discuss the scatters in the metal-poor regime in detail. Prescriptions are also proposed to improve the accuracy of the metallicity indicators by correcting for the variation of ionization state. In §4, we present the UV properties of the compiled EMPGs using the GALEX photometric data, and then examine the dependencies of the UV properties on metallicity and so on. Finally, we summarize our findings in §5. Throughout the paper we assume a solar chemical composition following Asplund et al. (2009), and adopt a concordance cosmology with Ω Λ = 0.7, Ω M = 0.3 and H 0 = 70 km s −1 Mpc −1 . All magnitudes are given in the AB system (Oke & Gunn 1983).

Individual Metal Poor Galaxies
This section introduces the EMPG sample. We first explain EMPGs selected by EMPRESS, and provide some new galaxies recently identified by our own spectroscopic observation. In addition, we compile metalpoor galaxies from the literature to build the largest, up-to-date EMPG sample as detailed below.

EMPRESS EMPGs: Earlier data
Early spectroscopic observations undertaken for the EMPRESS sample were described in Kojima et al. (2020) and Isobe et al. (2021). As a pilot spectroscopic observation, Kojima et al. (2020) presented data for 10 EMPG candidates, three of which were selected from the HSC-SSP catalog and 7 were from the SDSS. Using [O iii]λ4363 as an electron temperature probe, the EMPRESS team confirmed 2 new EMPGs including HSC J1631+4426 showing the lowest metallicity ever reported. Of the other eight candidates, seven were also confirmed to be actively star-forming galaxies with emission lines as intense as the two EMPGs but slightly metal-enriched (0.1-0.5 Z ⊙ ). Hereafter such moderately metal-poor galaxies are simply called "MPGs" for short. The remaining single candidate was faint and not used in this paper due to a lack of [O iii]λ4363 detection, although its strong emission lines indicated a very low metallicity. In Isobe et al. (2021), another spectroscopic follow-up for 13 EMPG candidates, all of which were HSC-selected, was performed with Keck/LRIS. Ten were confirmed to be intense emission line galaxies, four of which were EMPGs and five were MPGs (0.1-0.2 Z ⊙ ). The other object was suggested to be an EMPG but excluded in this paper's compilation due to its large uncertainty of metallicity (∆ log(O/H) > 0.2). Taken together, the 6 EMPGs as well as the 12 MPGs are added in the compilation from the early EMPRESS work.

EMPRESS EMPGs: New data
Following the success of the machine-learning selection of EMPGs by EMPRESS , we have expanded spectroscopic follow-up observations to increase the numbers of confirmed EMPGs and characterize their properties in a statistical manner.
The third spectroscopic observation was conducted with Magellan/MagE on UT 2021 February 10 in clear conditions with a seeing of 0.5-0.9 ′′ . Ten EMPG candidates were selected for the follow-up from the HSC candidate catalog. We chose 7 from the catalog originally provided by Kojima et al. (2020), and the remaining 3 from a newly provided, enlarged EMPRESS sample. The new sample was based on an upgraded machine learning classifier with further metal-poor EMPG training models (Z = 10 −3 -0.01 Z ⊙ ; based on photoionization models from Nakajima et al. 2018b) and the latest HSC-SSP DR (S20A; Aihara et al. 2018). In addition to the HSC candidates, we observed 4 previously known (E)MPGs (12+log(O/H) = 7.4-8.0; Kniazev et al. 2003;Izotov et al. 2012b) selected from SDSS, to examine the kinematics of low-mass galaxies (Xu et al. 2021). Full details of the observation and data reduction procedures are presented in a companion paper of (Xu et al. 2021). Briefly, we found nine out of the ten HSC candidates in which we identified multiple emission lines over the wavelength range of λ obs = 3500-10000Å that are suggestive of metal-poor galaxies at z 0.05. In this paper, we report the properties of the nine new galaxies of HSC as well as the four SDSS galaxies (hereafter, MagE sources). Table 1 lists the key line intensities normalized by Hβ and their 1σ errors for the 13 MagE sources. We measure the flux of each of the lines by fitting a Gaussian profile plus a constant continuum. The measured fluxes are then corrected for Galactic extinction based on the Schlafly & Finkbeiner (2011)'s map as well as the extinction curve of Cardelli et al. (1989). We use the Balmer lines of Hα, Hβ, Hγ, and Hδ to estimate the dust attenuation by simultaneously fitting the fluxes assuming the Calzetti et al. (2000)' attenuation curve. We do not correct for the potential stellar absorption around the Balmer lines which would be negligible (∼ 1-2Å in metal-poor galaxies; e.g., Izotov et al. 2012a) for our sources with very large EWs of emission lines (EW(Hβ) ∼ 100-550Å).
One of the key properties to characterize our sources is gas-phase metallicity. We identify the temperaturesensitive line of [O iii]λ4363 in 10 of the 13 sources. For the 10 objects, 6 of which are HSC-selected, we determine the oxygen abundance using the direct T e method and use it as a proxy for gas-phase metallic- ity. We first estimate electron number densities and electron temperatures using getTemDen, a package of a Python tool PyNeb (Luridiana et al. 2015 Table 2. Five of the 6 HSC-selected EMPG candidates have a metallicity below the 10 % solar value (12 + log(O/H) < 7.69) and are thus confirmed being EMPGs. The spectra and optical images for the newly identified EMPRESS-EMPGs are illustrated in Figure 1. One source, J1411-0032, is turned out to be slightly metal-enriched system as similarly identified in earlier studies of EMPRESS Isobe et al. 2021), and added to the MPG catalog. For the other three HSC-selected EMPG candidates, the non-detection of [O iii]λ4363 is still consistent with a metal-poor (i.e., high electron temperature) gas due to the rather faint nature. Indeed, their metallicities 5" are indirectly inferred to be as low as those of EMPGs (12 + log(O/H) = 7.1-7.4) based on the empirical metallicity indicators of R23-index and the upper-limit of N2index (optimized for sources with EW(Hβ)=100-200Å; see §3.2). Future deep spectroscopy will reveal the properties in more details for the unconfirmed candidates.
In addition, a further spectroscopic follow-up has been conducted under EMPRESS and will be detailed in a series of forthcoming papers (e.g., Nishigaki et al. 2022 in prep.). In this paper, we additionally exploit the results of 3 new EMPGs from the latest observations. In summary, there are 8 EMPGs and one MPG newly identified in the EMPRESS sample. These 9 EMPRESS galaxies as well as the 4 SDSS galaxies (2 EMPGs and 2 MPGs) whose properties are re-measured with the MagE spectra are added to our sample and used for the following analysis. We do not use the remaining 3 faint HSC galaxies in this paper, although their strong emission lines indicate they are as metal-poor as EMPGs. This spectroscopic observation thus re-confirms the ability of EMPRESS to select EMPGs in the local universe from the photometric catalog.

Other EMPGs
In addition to the (E)MPGs provided by EMPRESS, we compile observations of comparably low-metallicity objects from the literature that have a firm determination of gas-phase metallicity based on [O iii]λ4363. We find 154 objects in total, without any duplication, in the literature (Kniazev et al. 2003;Thuan & Izotov 2005;Pustilnik et al. 2005;Izotov et al. 2006aIzotov et al. , 2009Izotov et al. , 2012aIzotov et al. ,b, 2018bIzotov et al. , 2019bIzotov et al. , 2020Izotov et al. , 2021Izotov & Thuan 2007;Guseva et al. 2007;Pustilnik et al. 2010;Skillman et al. 2013;Hirschauer et al. 2016;Sánchez Almeida et al. 2016;Hsyu et al. 2017) which include isolated galaxies such as emission line galaxies and blue compact dwarfs as well as metal-poor clumps and Hii-regions in a nearby galaxy. Only the sources with a reliable determination of oxygen abundance have been collected, with the direct T e method using a suite of the optical oxygen and hydrogen emission lines as for the EMPRESS sources ( §2. are classified as EMPGs. The remaining, relatively lowmetallicity objects (12 + log(O/H) = 7.7-8.4) are MPGs that are supplementarily provided from the same literature as used for the EMPG compilation. The MPG subsample is therefore not complete in terms of metallicity but just served as a reference sample.
In summary, our sample contains 103 EMPGs and 82 MPGs (N o = 185 in total), increasing the sample size of EMPGs by a factor of three as compared to that of previous work (e.g., Nagao et al. 2006). We note that no obvious active galactic nuclei (AGN) activity is identified in the compiled (E)MPG sample based on the BPT diagram (Kauffmann et al. 2003). Although we may not fully exclude the possibility of metal-poor AGN in the sample with the BPT diagram according to the photoionization predictions (Kewley et al. 2013), we assume no AGN in the sample in the following analysis. Figure  2 (a) presents the distributions of metallicity, EW(Hβ), and i-band absolute magnitude M i for the compiled sources. We derive M i based on the i-or z-band broadband photometric data of HSC or SDSS, in addition to the luminosity distance. We choose the broadband which is free from an intense Hα emission for each of the objects (see §4.4 for more details). The EMPGs cover large parameter spaces of M i from −19 to −7 and EW(Hβ) from 10 to 600Å, i.e., wide ranges of stellar mass and specific star-formation rate, respectively. Assuming a mass to optical luminosity relation typically seen in (E)MPGs Xu et al. 2021), the Kennicutt (1998) relation, anda Chabrier (2003) initial mass function, the ranges correspond to the stellar masses of 10 8 M ⊙ down to below 10 4 M ⊙ , and to the specific star-formation rates of a few Gyr −1 up to ∼ 200 Gyr −1 . Notably, the EMPRESS survey enlarges the sample of the largest EW(Hβ) (E)MPGs (e.g., see EW(Hβ) of the HSC-selected objects in Table 2). Figure 2 (b) shows the redshift distribution of the sample confirming that most of the (E)MPGs (95%) are found at z < 0.05. Their UV properties are also derived and discussed later in this paper ( §4).

Stacked SDSS Galaxies
To compensate the high-metallicity range in determining the metallicity diagnostics ( §3), we exploit the analyses of Curti et al. (2017Curti et al. ( , 2020 which are based on 120, 000 galaxies' spectra from SDSS. Sources that are dominated by an AGN activity are excluded from the sample based on the BPT diagram (Kauffmann et al. 2003). The authors stack SDSS spectra in bins of strong line ratios to detect the [O iii]λ4363 in a statistical manner and determine the metallicity based on the direct T e method. Each bin contains 10− ∼ 6000 galaxies for the stacking analysis. The resulting stacked spectra show metallicities ranging from 12 + log(O/H)= 8.1 to ∼ 8.9.

STRONG LINE DIAGNOSTICS OF METALLICITY
3.1. Revisiting Diagnostics over 12 + log(O/H) ≃ 6.9 − 8.9 Our compiled sample containing quite a few numbers of EMPGs (×3 larger than the previous work; e.g., Nagao et al. 2006) with an accurate metallicity measurement serves as a good reference for checking and improving the empirical metallicity diagnostics at the lowmetallicity regime. Figure 3 illustrates the relationship of R23-and N2-index as a function of metallicity. Our compiled sources (gray circles) shape a tight relationship for each of the indices, remarkably for R23-index, at the low metallicity regime. The short-dashed and long-dashed curve is provided by Maiolino et al. (2008) and Curti et al. (2017Curti et al. ( , 2020, respectively, and the green curve shows our best-fit function (as detailed below). The two previous studies are complementary to each other; Maiolino et al. (2008) make use of the individual low-metallicity galaxies but lack data-points at high metallicity range with the direct T e method. On the other hand, Curti et al. (2017Curti et al. ( , 2020 obtain the relations at such a high-metallicity regime based on the direct T e method by stacking SDSS spectra ( §2.2), but lack enough metal-poor galaxies. . Gray filled squares denote stacked SDSS spectra in the high-metallicity regime (Curti et al. 2017(Curti et al. , 2020. Small gray diamonds are metal-poor galaxies compiled by Nagao et al. (2006) just for a reference (i.e., not used for the following best-fit function). All the measurements of metallicity are done in a consistent manner based on the direct Te method. Green curve presents our best-fit function, whereas the black short-dashed and long-dashed curve illustrates the function of Maiolino et al. (2008) and Curti et al. (2017Curti et al. ( , 2020, respectively. Therefore, as a first step in this section, we present the best-fit functions over the full metallicity range of 12 + log(O/H) ≃ 6.9 up to 8.9 based fully on the direct T e method. We use our compiled sample of EMPGs + MPGs (12+log(O/H) = 6.9-8.2; §2.1) in addition to the Curti et al. (2017Curti et al. ( , 2020)'s stacked result to compensate the high metallicity regime (12 + log(O/H) = 8.1-8.9; §2.2). We confirm in Figure 3 that the two independent samples are smoothly connected at the intersection of 12 + log(O/H) ∼ 8.0. Following the previous studies, we adopt the functional form: to derive the best-fit, where log R is the logarithm of the strong line ratio (e.g., R23 and N2-index), and x is the metallicity relative to solar (i.e., x = log(Z/Z ⊙ ) = 12 + log(O/H) − 8.69). We bin the compiled (E)MPGs to obtain average line ratios for a given metallicity range (red open circles in Figure 3; Table 3 for the "All" sample) and use them to perform the least squares fitting with the stacked data-points of Curti et al. (2017Curti et al. ( , 2020 without any weighting. For the binning we only use the individual (E)MPGs with a firm measurement of line ratio and do not count those with a lower-/upper-limit of line ratio of interest. We employ the polynomial orders of N = 2, 3, and 4 in Equation (1), and choose the best-fit for each of the indices from the three func-tional forms which minimizes the dispersions along with both the 12 + log(O/H) and the line ratio directions. Note that we use the individual (E)MPGs rather than the binned average ones for calculating the dispersions. The coefficients as well as the 1σ uncertainties of the best-fit are given in Table 4 for each indices based on the "All" sample. As expected, our best fit-functions show good agreements with those of Maiolino et al. (2008) in the low-metallicity and Curti et al. (2017Curti et al. ( , 2020 at the high-metallicity regime.

Scatters in the diagnostics
R23-index is confirmed to work as a good metallicity indicator with the metallicity uncertainty of 0.14 dex over the range of 12 + log(O/H) = 6.9 − 8.9 (Table 4). Although N2-index works as accurately as R23-index for galaxies in the high-metallicity range, we recognize large scatters as large as ∆ log(O/H) ∼ 0.4 dex in the relation in the metal-poor regime. A more or less similar scatter of the line ratio is also found in Figure 4 when other famous indicators are chosen as a function of metallicity:  index illustrate their weak dependence on metallicity in the low metallicity regime with the current function forms (see also their large uncertainties of ∆ log(O/H) in Table 4). We note that this paper cannot fully address scatters in the high-metallicity range, as we only have the averaged line ratios based on the stacked SDSS spectra (see also §3.4). In the following, we explore and discuss the scatters in the low-metallicity regime using our compiled (E)MPG sample.
As found in N2-index, a large scatter appears particularly associated with the line ratios using singlyionized, low-ionization lines. We thus speculate this would be caused by a variation of ionization state in the ISM for a given metallicity, such as ionization parameter and ionizing radiation field. Indeed, Pilyugin & Grebel (2016) demonstrate that, for a given metallicity, N2-index tends to become lower with the excitation parameter ( ). The excitation parameter and O32-index are famous probes of the ionization parameter (e.g., Kewley & Dopita 2002;Nakajima & Ouchi 2014), and useful to correct for such an effect of ionization state on the metallicity indicators (see also Pilyugin & Thuan 2005;Kewley & Dopita 2002;Izotov et al. 2019a). Although it is recommended to adopt these direct indicators of ionization parameter as a correction factor, it will not be always possible to obtain such multiple emission lines probing both high and low ionization gas from distant galaxies. Because our main purpose of this work is to provide metallicity diagnostics that will be practical and useful even when limited sets of emission lines are available, we re-examine the scatters in the indices using a different observable quantity, which is easy-to-obtain and a good probe of the ionization state. Among several observables, we consider EW(Hβ) (or equivalently EW(Hα)) is best-suited for this purpose for the following reasons. First, O32index is known to be positively correlated with specific star-formation rate (e.g., Nakajima & Ouchi 2014;Sanders et al. 2016), which should be proportional to EWs of Hβ and Hα by definition. Indeed, our (E)MPGs shape a tight positive correlation between O32-index and EW(Hβ). Second, as will be illustrated in §4.4, ξ ion , a gauge of the hardness of the ionizing radiation field, is primarily controlled by EW(Hβ), such that it becomes larger as EW(Hβ) increases. Finally, as compared to the ionization/excitation parameters, O32-index, and ξ ion , EW(Hβ) is a direct observable quantity and easy to get from high-redshift galaxies with JWST, and reliably measured without being affected by dust extinction and other assumptions. We expect at least either of Hβ and Hα is available when using the metallicity indicators, and EW(Hα) can be translated into EW(Hβ), such that EW(Hα) = 5.47× EW(Hβ) following a typical relationship . Therefore, we expect we can use EW(Hβ) to probe the degree of ionization state in ISM of distant galaxies in a practical manner. Moreover, a presence of diffuse ionized gas (DIG) can influence the line ratios, especially of the low-ionization lines Sanders et al. 2017). This effect can also be quantified by EW(Hβ), as the fraction of emission from DIG is thought to decrease with increasing star-formation intensity (Oey et al. 2007).   Figure 3 but our compiled EMPGs + MPGs are color-coded with EW(Hβ). To clarify the dependence on EW(Hβ), we split the sample into three subsamples, "Small EW" (< 100Å), "Medium EW" (100-200Å), and "Large EW" (> 200Å), and give the binned average results for the Small and Large EW subsamples in the right panels. Note again that this paper explores the EW(Hβ) dependence only in the low metallicity regime (12 + log(O/H) 8.0). We see a clear trend that the scatter in the indices is correlated with EW(Hβ). For a given metallicity, sources with a larger EW(Hβ) tend to present a larger R23 and a smaller N2 value. This can be interpreted as a result of large variation of ionization condition in metal-poor clouds, i.e., the ionization parameter and/or the hardness of the ionizing spectrum among galaxies with a comparable metallicity (Pilyugin & Grebel 2016). The trend in N2-index is straightforward, while that of R23 needs more careful explanations because it uses a sum of singly and doubly ionized lines in the numerator. Figures 6 and 7 summarize the dependence of the other metallicity indicators on EW(Hβ). We confirm the correlation between the scatters in the metallicity indicators and EW(Hβ),  Figure 5). The black grid illustrates the photoionization models based on the binary evolution SEDs (BPASS-300bin; Nakajima et al. 2018b). The dashed-curves present model tracks with an ionization parameter log U from −3.5 (thin) to −0.5 (thick) with a step of 0.5 dex. Only the gas-phase oxygen abundance is counted for the models (i.e., after removing the depleted component onto dust grains) in the abscissa axis to be directly compared with the observations. The gas density is fixed to 100 cm −3 . The models explain the scatter and its correlation with EW(Hβ) by changing the ionization parameter in a consistent manner for each of the indicators. Maiolino et al. 2008), the weak metallicity-dependence we obtain for the "ALL" sample, in contrast to the earlier work, are thought to be due to the presence of objects with a low ionization condition. The correlation between the metallicity and ionization state will be further discussed later ( §3.4). As for R23-index, it is relatively strong against the variation of the ionization state by tracing both low and high ionization gas, resulting in the smallest metallicity dispersion (0.14 dex). In the low metallicity regime, R23-index becomes dominated by R3-index as O32 becomes larger than unity. We thus see a similar trend of EW(Hβ) as seen in R3-index but slightly milder in R23-index.
The binned average relationships of the metallicity indicators for each of the subsamples are summarized in Table 3, and their best-fit functions can be reproduced within the metallicity range and with the coefficients given in Table 4. We fix N = 2 for the best-fit functions for the subsamples. Table 4 also lists the standard deviations of 12 + log(O/H) as typical uncertainties for each of the indicators and subsamples. For example, below 12 + log(O/H) ≤ 7.69, the uncertainty of metallicity using R3-index (All) is 0.21 dex, which becomes as small as 0.10 dex (i.e., by a factor of 2) if the subsample's best-fits are employed. In a similar way, we confirm the accuracy of the metallicity indicators becomes significantly improved if the best-fit function depend-ing on EW(Hβ) is used. Although the indicators using a high ionization line divided by a low ionization line, O32, O3N2, and Ne3O2, show a smaller improvement in the standard deviation than the other indicators, it is mostly due to the larger impact of ionization condition on these line ratios even within the subsamples of EW(Hβ) > 200Å and < 100Å (Fig. 7). This is the current limitation. Another practical note is that we recommend to use the best-fit relationships of the Large and Small EW subsample for sources with EW(Hβ) > 200Å and < 100Å, respectively, and interpolate the two solutions for sources with EW(Hβ) in between 100 and 200Å.

Comparisons with photoionization models
To confirm the variation of ionization condition on the strong line diagnostics, we compare the results with photoionization models. We exploit the CLOUDY photoionization modeling as detailed in Nakajima et al. (2018b). Briefly, we refer to the models of star-forming galaxies using binary evolution SEDs of "BPASS-300bin" (v2; Eldridge et al. 2017). For this comparison, we slightly expand the original BPASS model grid of Nakajima et al. (2018b) to cover the extremely low metallicity down below the 0.01 Z ⊙ adopting the same assumptions such as the abundance ratios and the relationship between stellar and gas-phase metallicities. The ionization parameter is varied to assess the impact of changing the ionization condition on the emergent strong line ratios. The gas density is fixed to a fiducial value of 100 cm −3 . One caveat here is that the metallicity in the original CLOUDY models counts elements depleted onto dust grains in addition to those in gas-phase in the ionized nebula. On the other hand, the metallicities observationally derived based on the direct T e method trace the oxygen abundances only in gas-phase To directly compare the models with the observations, we subtract the depleted component from the total oxygen abundance for each of the models using the adopted depletion factor of 0.6 for oxygen.
In Figures 8 and 9, we present the comparisons of the metallicity diagnostics between the photoionization predictions and the observations. The models reproduce the scatters as well as the trend observed with EW(Hβ) for a given metallicity by changing the ionization parameter within a reasonable range of log U from −3.5 to −0.5. The comparison demonstrates that we can sub-stitute EW(Hβ) as a gauge of the degree of ionization in ISM, and that the metallicity indicators that do not use a sum of low and high ionization lines can be weak without any correction for the ionization state, as they cannot trace both low and high ionization gas, latter of which particularly exist widely in metal-poor clouds.

Implications
The non-negligible correlations seen in the metallicity indicators in terms of EW(Hβ), as a result of the difference in the degree of ionization state in the ISM, have some implications. First, it would be important to adopt the appropriate relationship following the EW of Hβ (or Hα), rather than using the best-fit for the "All" sample blindly, to improve the accuracy of the derived metallicity based on these empirical relationships. Particularly, this would influence discussions of mass-metallicity relation and its star-formation rate dependency (e.g., Mannucci et al. 2010;Lara-López et al. 2010), because EW(Hβ) is a strong function of stellar mass and star-formation rate. Although R23-index is the best indicator which is the least affected by the difference of the ionization state among various popular indicators, there still remains a factor of ∼ 1.5 difference in R23-index between the Large and Small EW subsamples for a given metallicity below 12 + log(O/H) 8.0. This corresponds to a systematic offset of metallicity as large as ∆ log(O/H) ∼ 0.25 dex for a given R23 value between the large (> 200Å) and small (< 100Å) EW(Hβ) objects 2 . A worse case is to blindly use a ratio of high to low ionization emission line such as O32, O3N2, and Ne3O2 indicators which show a weak dependence to estimate the metallicity below 12 + log(O/H) 8.0 especially if EW(Hβ) is not available (Figure 4). Another important practical caveat when using R23index and R3-index is that two solutions would be arithmetically obtained for a given index value. One would need additional single-valued functions such as N2-and S2-index to resolve the degeneracy, and be recommended to correct for the ionization condition using our prescription if the low-metallicity value (12 + log(O/H) 8.0) is the likely solution. O32-index can also provide a discriminatory power to prefer the low-metallicity solution if O32-index is larger than ∼ 3. Moreover, the two solution nature of R23-index and R3-index results in the plateau region around 12 + log(O/H) ∼ 8.0 ± 0.2 which results in a difficulty in pinpointing the metallicity when the index value is close to the maximum value of ∼ 9. Indeed, the uncertainties of metallicity indicator of R23index and R3-index (EW-corrected) are 0.13 − 0.25 dex and 0.18−0.30 dex, respectively, in the metallicity range of 12 + log(O/H) ≤ 8.1 (Table 4). These values get significantly smaller, 0.09 − 0.12 dex and 0.10 − 0.18 dex, in the metallicity range of 12 + log(O/H) ≤ 7.69, i.e., only considering the low-metallicity branch apart from the plateau region. The single-valued functions (e.g., N2and S2-index), with a correction of ionization condition, are suggested to be used together with R23-index and R3-index to help ease the difficulty around the plateau region.
A second implication is about the large variation found in the indicators using low-ionization lines and high-to-low ionization line ratios. As clarified in the comparison plot of photoionization models and O32, O3N2, and Ne3O2-index (Figure 9), the variation suggests a diverse ionization nature of ISM in metalpoor galaxies. This appears in contrast to the SDSS's high-metallicity regime where a relatively tight anticorrelation exists between metallicity and ionization parameter (e.g., Andrews & Martini 2013;Sanders et al. 2020). Because the SDSS stacking in Curti et al. (2017) is performed by dividing the sample based on the location on the R3 vs. R2 plot, it implicitly follows the anti-correlation between the ionization parameter and metallicity typically found in the local universe, resulting in a tight relationship between metallicity and O32, O3N2, and Ne3O2-index (Curti et al. 2017). This is why the Large EW subsample looks more smoothly connected with the SDSS sample at 12 + log(O/H) ∼ 8.0, as the highly-ionized galaxies are preferentially included in the lowest-metallicity bins of SDSS. If a similar anticorrelation typically exists between the metallicity and ionization parameter in the low-metallicity regime, the indicators that present weak dependences on metallicity such as O32, O3N2, and Ne3O2-index would become more helpful as seen in the Large EW subsample and as proposed in the earlier work (Maiolino et al. 2008). At the moment, however, our compilation demonstrates that outliers do exist as found in the Small EW subsample, such as EMPGs with a modestly ionized ISM having a low EW(Hβ). A presence of DIG can also play a role in scattering the low EW(Hβ) objects toward larger values of the low-ionization line indices Sanders et al. 2017). Our prescription will be useful to alleviate these uncertainties and derive metallicities for objects including such outliers. Because the Small EW subsample tends to contain individual stellar clumps and Hii-regions resolved in near-by galaxies, our current compilation may bias the sample following the detection of [O iii]λ4363. We would need a larger and more well-constructed (e.g., mass-limited) EMPG sample to address the typical ionization condition as a function of metallicity. At the high-metallicity regime, we also expect some deviations from the typical ionization parameter -metallicity relationship and hence some improve-ments of diagnostics by using EW(Hβ) (cf. Brown et al. 2016;Cowie et al. 2016;Sanders et al. 2017). However, this is beyond this paper's scope as we would need different analyses including how the SDSS sample is binned and stacked.
Finally, the metallicity prescriptions with the EW(Hβ) dependence would be essential for highredshift galaxy studies. Because higher-z galaxies are thought to present a higher ionization parameter (e.g., Nakajima & Ouchi 2014), the typical relationships (i.e., without the EW(Hβ) correction) constructed based on the local galaxies' ionization parameter -metallicity relation may cause a systematic uncertainty of metallicity at high-redshift. To test the utility of the new metallicity diagnostics for high-redshift galaxies, we use the z = 2 − 3 galaxies whose metallicity is determined with the electron temperature. As compiled in Sanders et al. (2020), we collect 8 galaxies in total at z = 1.7-3.6, five of which present a detection of [O iii]λ4363 (Christensen et al. 2012;Sanders et al. 2020), and the remaining three's electron temperatures are determined with O iii]λλ1661, 1666 (Erb et al. 2010;Steidel et al. 2014). Figure 10 shows the z = 2 − 3 galaxies, color-coded with EW(Hβ), superposed on the metallicity indicators found in the local universe. Albeit with the small sample size, the z = 2 − 3 galaxies appear to support the same dependence of the relationships on EW(Hβ). The single galaxy having a very large EW(Hβ) (340Å) notably follows the functions constructed with the Large EW(Hβ) subsample in the local universe (> 200Å). The other 7 galaxies have modest values of EW(Hβ) (80 − 170Å), and indeed fall in between the Large and Small EW subsamples on the metallicity indicators' plots, with a tendency that the second largest EW(Hβ) galaxy (170Å) prefers the relationship of the Large EW(Hβ) subsample. We admit the current sample size and the individual metallicity measurement uncertainties for the existing data-points do not permit a conclusive discussion. Still, we can argue that we do not see any clear contradiction of the metallicity indicators and the EW(Hβ) dependence at different redshifts up to z ∼ 3.
The result is also consistent with the tendency found in Bian et al. (2018). The authors derive the empirical relationships between metallicity and strong line ratios for the typical local star-forming galaxies of SDSS as well as for analogs of z ∼ 2 galaxies which are selected in the local universe but based on the offset location on the N2 vs. R3 plot (i.e., [N ii] BPT diagram) as typically seen at z ∼ 2 (see also Steidel et al. 2014;Shapley et al. 2015). Using stacked spectra for each of the z = 0 and 2 samples, the authors suggest similar systematic offsets in the metallicity indicators between z = 0 and ∼ 2 as identified in this study. This makes sense as the z = 2 analogous sample is constructed based on the elevated R3 value for a given N2-index, and preferentially contain galaxies with a higher ionization parameter and/or a harder ionizing spectrum for a given metallicity which can be characterized by a large EW(Hβ). The evolution of galaxies on the [N ii] BPT diagram would thus be largely due to the evolution of ionized ISM conditions and caused by the evolution of EW(Hβ) with redshift (see also the discussion of Reddy et al. 2018a

based on EWs of [O iii]).
In brief, we can argue that a correction of ionization condition would be important in determining metallicities based on the strong line ratios, and the new metallicity prescriptions we develop using the local galaxies can be applicable even for high-redshift galaxies. We believe our prescriptions using EW(Hβ) are practically useful as it is easy to get as compared to ionization parameter and ξ ion . This work demonstrates careful applications of the strong line ratios are necessary to discuss the chemical evolution of galaxies as a function of stellar mass and star-formation activity. Nevertheless, we understand the current sample size is too small to confirm the applicability of the indicators at high-redshift. The relationships at high-redshift as well as local universe need to be further tested and improved, if necessary, with the forthcoming large and sensitive spectroscopic surveys such as Subaru/PFS (Takada et al. 2014) and VLT/MOONS (Cirasuolo et al. 2014).  Table 3 continued  Table 3 continued  Table 3 continued  Table 3 continued Note-The polynomial order is either N = 4, 3, or 2 which is determined for each of the indices to minimize the dispersions. The subsamples  Note-( †) Limiting magnitude at the 3σ level. The depths for AIS, MIS, and DIS are given in (Morrissey et al. 2007), and those for GII and NGS are estimated by random aperture photometry.

UV PROPERTIES OF EMPGS
The EMPGs compiled in this study also provide important anchors of properties in the early phase of galaxy evolution that will be useful for high-redshift galaxy studies. In this section, we present the properties of EMPGs in the rest-frame UV wavelength to gain insights into the early star-formation and production of ionizing photons.

GALEX data
In order to characterize the UV properties of (E)MPGs in the local universe, we utilize the FUV (λ eff ∼ 1540Å) and NUV (λ eff ∼ 2320Å) photometric data taken by GALEX (Morrissey et al. 2007). The data are collected from the Mikulski Archive for Space Telescopes (MAST) portal 3 . For each of the objects in our compiled sample ( §2.1) we search for the deepest NUV and FUV imaging data that are available at the spatial position of the object. Eight objects are not covered with any pointings of GALEX. For the remaining 177 objects, we carefully check the downloaded GALEX images to remove sources that are highly contaminated by the neighboring objects due to the low image resolutions (FWHM ∼ 4 ′′ in FUV and ∼ 5 ′′ in NUV). This procedure is particularly important for the nearby stellar clumps/Hii regions where multiple clumps are found in a galaxy, as well as EMPGs that are associated with a bright extended tail (see e.g., Isobe et al. 2020). By comparing with the higher-resolution optical images, we label 62 objects, all of which are nearby stellar clumps, as blended in the GALEX images. The 70 (= 8 + 62) objects are not used in the analyses of UV properties (but used in the analyses of emission line ratios; §3).
The GALEX imaging observations from five types of surveys are used for the 115 objects. These surveys are the All-sky Imaging Survey (AIS), the Medium Imaging Survey (MIS), the Deep Imaging Survey (DIS), the Guest Investigators Survey (GII), and the Nearby Galaxy Survey (NGS). The numbers of sources and the typical depths for each surveys and for each bands are given in Table 5. We use the SExtractor software (Bertin & Arnouts 1996) to perform source detection and photometry. We identify sources with five adjoining pixels and brightness above > 2σ of the background, and then cross-match with each of our sources to find a counterpart in the FUV/NUV images within 5 ′′ (≃ FWHM) from the source position. We adopt MAG_AUTO for the total magnitude if the cross-matched object is brighter than the 3σ limiting magnitude. We find a high detection rate (> 3σ) in the GALEX images (103/115 in FUV and 111/115 in NUV) for the compiled (E)MPGs. The magnitudes are corrected for Galactic extinction in the same way as detailed in §2.1.2. Figure 11 presents the distributions of GALEX FUV and NUV photometry for our sources (N o = 115) as a function of metallicity.
In the GALEX FUV and NUV photometric bands, the UV emission lines such as Civλ1549, Heiiλ1640, and Ciii]λ1909 stay and can contribute to the observed GALEX photometry. Nevertheless, the UV emission lines would not have a significant impact on the photometry and the resulting UV properties below. Even the strongest UV emission lines present the maximum EWs as large as ∼ 20Å for Ciii] and ∼ 10Å for Civ for starformation dominated systems (Nakajima et al. 2018b). Even with such extreme EWs, the photometry would be boosted by only a negligibly small amount ( 0.03 mag for an average brightness galaxy (∼ 19.3 mag) in the sample at z = 0.03). We therefore do not correct for any possible contribution of the UV emission lines to the observed GALEX photometry.

UV absolute magnitude M UV
We start with deriving a key fundamental property of UV absolute magnitude, M UV for the compiled metalpoor objects. Here the absolute UV magnitudes are derived from the GALEX FUV band photometry, which probes the rest-frame ∼ 1500Å emission, in addition to the luminosity distance. We do not take into account any k-correction for the M UV estimations as the sample is almost built at the similar redshift of z < 0.05 (Figure 2b). Moreover, the errors in the luminosity distance (or redshift) are not available for the compiled sample, Figure 11. The distributions of GALEX FUV-(left) and NUV-band (right) photometry for the compiled EMPGs and MPGs as a function of metallicity. The 3σ upper-limits are adopted on the photometry for the sources without a significant detection. Figure 12. UV absolute magnitude MUV as a function of metallicity (left) and EW(Hβ) (right), and its distribution (at the most right end; the undetected sources are not included in the histogram) for the compiled (E)MPGs in this paper. and not included in the M UV calculation. A correction for dust reddening is applied according to the degree of attenuation for nebular emission. We utilize the Balmer decrements and the Calzetti et al. (2000)'s attenuation curve to obtain E(B−V) for the nebular emission, divide it by 0.44 for stellar emission (Calzetti et al. 2000), and correct for the reddening of the FUV stellar light. The corrections are generally small for the metal-poor objects studied in this paper, and our results are not significantly affected by the choice of the attenuation curve and the relationship of E(B−V) between stellar and nebular emission. Figure 12 shows the distribution of M UV for the compiled objects as functions of metallicity and EW(Hβ). The compiled sample extraordinarily reaches the faintest UV magnitude of M UV ∼ −9.

UV slope β
Our next interest is to determine the rest-frame UV continuum slope β (f λ ∝ λ β ) for EMPGs and examine the distribution of β toward the lowest metallicity and faintest UV luminosity. We estimate the UV continuum slope β using two wavelength photometric points of GALEX FUV and NUV band. We note again that the k-correction would be negligible in our β measurements ( §4.2). The wavelength range of ∼ 1500-2300Å is consistent with those used for β measurements in higher- Figure 13. Relationships of UV slope β as a function of MUV (top), EW(Hβ) (middle), and metallicity (bottom). In the left panels, our compiled EMPGs (red small circles) and MPGs (blue small circles) are presented on an individual basis, while the binned average relationships for EMPGs (red open circles) and MPGs (blue open circles) are presented with larger symbols in the right panels using the individual data points shown in the left-hand panels. The errorbar represents the standard deviation of each of the binned distribution. The points enclosed with a green circle are z ≃ 0.3 green pea galaxies whose fesc is directly measured (Izotov et al. 2016a,b). The other large open symbols denote average relationships at higher-redshifts of z ∼ 2.5 (triangles), 4 (squares), 5 (pentagons), and 6 (hexagons) (Bouwens et al. 2009(Bouwens et al. , 2014). The open star shows a stacked result of faint LAEs at z = 4 − 5 (Maseda et al. 2020). Like the Maseda et al. (2020)'s data-point, the orange-markers represent galaxies whose Lyα is observed to be strong (LAEs with EW(Lyα) > 20Å; see also Izotov et al. 2020). For high-redshift galaxies whose EW(Hβ) is unknown, we substitute EW(Hα) using the empirical conversion of EW(Hα)/EW(Hβ) = 5.47 ). redshift galaxies (e.g., Bouwens et al. 2009Bouwens et al. , 2012. For each object we fit the FUV and NUV magnitudes with a power-law function at the measured redshift and determine β. We also repeat the fitting to randomly fluctuated photometry following the errors as Gaussian distributions and obtain the 1σ uncertainty of β which encompasses 68 percent of β values drawn from the procedure. Figure 13 presents how the UV slopes of (E)MPGs are distributed as a function of absolute UV magnitude, EW(Hβ), and metallicity. The left panels show the individual objects, while the right panels present an average value of β for a bin of the abscissa quantity, with red symbols for EMPGs and blue for MPGs. The relationship between β and M UV (top panels) reveals that EMPGs overall present a small value of β from ∼ −1.9 to ∼ −2.3 in the range of M UV = −18 to −10 on average, mostly independent of the UV luminosity. The reference sample of MPGs in the local universe covers a slightly bright but still faint range of M UV = −20 to −12, showing a similarly flat relationship of β as a function of M UV and showing a similarly small value of β from ∼ −1.8 to ∼ −2.0. The difference between the EMPG and MPG samples is therefore not significant on the β vs. M UV plane. At higher-redshifts (z 3), using continuum-selected galaxies such as Lyman break galaxies (LBGs), earlier studies have indicated a decreasing trend of β toward a fainter M UV on average in the bright end of M UV = −22 to −17 (e.g., Bouwens et al. 2009Bouwens et al. , 2014. On the other hand, some studies find a rather flat trend of β ∼ −2 irrespective of M UV (Finkelstein et al. 2012;Dunlop et al. 2013) as similarly seen in the local (E)MPG sample. The relationship between β and M UV remains controversial at high-redshift, and will be revisited below.
The middle panels of Figure 13 show the distribution of β as a function of EW(Hβ). We are interested in EW(Hβ) to use it as a proxy for stellar age as it corresponds to the current star-formation activity divided by the stellar mass (i.e., specific star-formation rate; sSFR). We find a very weak trend of decreasing β toward a larger EW(Hβ) in the EMPG sample. Even the largest EW(Hβ) objects (> 200Å) present a large scatter of β. At first sight, it is weird to find such a weak correlation because a dust content is thought to be tightly linked with the past star-formation history and hence the stellar age. We speculate this would be due to the fact that EW(Hβ) is a measure of the stellar age based on the current star-formation activity. Past star-formation episodes would play a role in dust production even for the largest EW(Hβ) galaxies that are experiencing another phase of active star-formation. Ac-cordingly, EW(Hβ) would not be the primary governing the UV continuum slope especially in the local universe.
At the bottom, the relationship between β and metallicity is presented. We clearly see a trend such that the UV continuum slope gets smaller at a lower metallicity, particularly at 12 + log(O/H) 7.3. At the lowest metallicity, the β value reaches β ∼ −2.6, which is as small as the intrinsic slope β 0 theoretically predicted (Reddy et al. 2018b). This supports that EMPGs, especially with 12 + log(O/H) ≃ 7.0 or less metallicity, are almost dust-free systems as the productions and abundances of dust and metals are closely related to each other, both following the past star-formation history. A relatively large scatter of metallicity for a given M UV and EW(Hβ) (Figures 2a(top) and 12(right)) would weaken the relationship and result in a flat trend of β as seen in the top and middle panels. On the other hand, the high-redshift samples are usually continuumselected and hence magnitude-limited. Following a relatively tight luminosity-metallicity relationship and its redshift evolution (e.g., Zahid et al. 2011), fainter and higher-redshift galaxies tend to be metal-poorer. The negative relationship between β and M UV observed in high-redshift UV-selected LBGs and its possible redshift evolution toward smaller β as seen in the top panels would thus be caused by such a luminosity-metallicity relation.

Ionizing photon production efficiency ξ ion
Another key UV property is the ionizing photon production efficiency, ξ ion . This quantity represents the number of hydrogen-ionizing photons per UV luminosity; where Q H 0 is the ionizing photon production rate below 912Å, and L UV is the intrinsic UV continuum luminosity typically at around 1500Å. The number of ionizing photons, Q H 0 , can be determined via hydrogen recombination lines such as Hα and Hβ (e.g Leitherer & Heckman 1995), and the UV luminosity, L UV , is now derived from the GALEX FUV band photometry for the nearby galaxies (as M UV estimated; see §4.2). Note that the conversion of Q H 0 assumes no escaping ionizing photons, i.e., all are converted into recombination radiation unless otherwise specified. We will revisit the assumption later. A crucial step is to match the apertures for the measurements of hydrogen recombination lines and FUV band photometry. This is complicated by the spectroscopic measurements which were taken heterogeneously. We then use the total Hα flux estimated from the ex-  Maseda et al. 2020, at z = 4.9 (hexagon;Harikane et al. 2018), and at z = 6 − 7 (triangle; Matthee et al. 2017). We note that the ξion is corrected for the escaping ionizing photons for the sources with an fesc-measurement (green-enclosed; (Izotov et al. 2016a,b;Schaerer et al. 2016;Izotov et al. 2018a;Schaerer et al. 2018;Nakajima et al. 2020)). Two individual galaxies at z = 7 − 8 are also presented with magenta circles whose Lyα and UV emission lines are detected (Stark et al. 2015b. In the middle panels, an average EW(Hβ)-ξion relationship seen in extreme emission line galaxies at z = 1.3−2.4 is illustrated with a gray dashed line . Our best-fit relationship of Equation (3) is presented with a green solid line, confirming the previously known relationship but over a wider range of EW(Hβ) of ∼ 10 − 600Å. cess in the broadband photometry, which are measured in a consistent way as the total magnitude of GALEX FUV. As illustrated in Figure 12, our EMPG sample present an intense hydrogen recombination lines of EW(Hβ) (∼ 10-600Å at the rest-frame, ∼ 130Å median). Given the typical EW ratio of EW(Hα)/EW(Hβ) = 5.47 , Hα EWs are supposed to be very large on average for the EMPGs (∼ 700Å) and boost the photometry of the broadband where the Hα is redshifted and falls in. We can estimate the strength of Hα by evaluating the excess in the photometric broadband SED. We retrieve the broadband photometric data from the archives of the SDSS Data Release 16 and the HSC-SSP S20A Data Release. We use the HSC riz band photometry for those in the HSC-SSP footprint. Otherwise we use the SDSS riz band photometry. Fifteen sources, most of which are in the MPG sample from Guseva et al. (2007), are not cross-matched with anything on the two catalogs, and thus not used in the following ξ ion analyses. We adopt the photometry of cmodel (HSC) and ModelMag (SDSS) which delivers the total magnitude for each of the broadbands. At z 0.055 with HSC, the r-band captures the Hα line, and the i-and z-bands are used for the continuum estimation. At a higher-redshift (up to the highest redshift of z = 0.13 in our sample), we use the i-band to probe the Hα (+continuum) and the z-band for the continuum. We use a BPASS SED with a stellar metallicity of Z ∼ 0.07 Z ⊙ , constant star-formation, and with an age of 50 Myr to assume the shape of the continuum around iz-bands. A different demarcation redshift of 0.040 is adopted for the SDSS photometry. For each of the objects we use the systemic redshift determined from the optical spectroscopy as well as the actual transmission curves of the broadbands to translate the broadband excess into the flux of Hα. We ignore the small contributions ( 10 %) of the weaker emission lines such as [N ii]λ6584 and [S ii]λλ6717, 6731 for the EMPG sample. The flux of Hα is then reddening-corrected with the E(B−V) value estimated from the Balmer decrements of the spectroscopy data with the Calzetti et al. (2000)'s attenuation law. Finally, as a sanity check, we compare the dust-corrected fluxes of Hα derived from the above method (i.e., the broadband excess) and spectroscopy. If the spectroscopic measure of Hα is not available, we substitute the dust-corrected Hβ multiplied by the intrinsic Hα/Hβ ratio (2.86 in the Case B recombination) to be compared with the Hα strength from the broadband excess. The difference corresponds to the aperture correction for the slit-or fiber-loss in the spectroscopic observation. The difference typically varies from 1.0 to 2.2 (1.4 median), which are reasonable as aperture corrections for EMPGs (e.g., Izotov et al. 2011). Figure 14 provides the distribution of ξ ion as a function of various key properties of M UV , EW(Hβ), and metallicity. We have now explored the ξ ion for the EMPGs over the wide ranges of properties, down to M UV ∼ −10, EW(Hβ) ∼ 10-600Å, and 12 + log(O/H) down to ∼ 6.9. At the faint end of the UV luminosity M UV −15, a large variation of ξ ion is interestingly recognized especially for the EMPG sample (top panels of Figure 14). This is in contrast to the previous studies at high-redshift where an almost flat, or a weakly increasing trend of ξ ion is suggested with UV luminosity decreasing in the range of M UV from −22 to ∼ −19 (e.g., Bouwens et al. 2016;Shivaei et al. 2018;Nakajima et al. 2018a). A stacking of faint, strong Lyα emitters at z = 4 − 5 infers a very large value of log ξ ion of 26.3 albeit with a large uncertainty, and supports an elevated ionizing photon production efficiency in faint galaxies (Maseda et al. 2020). LAEs are generally thought to be more efficient producers of ionizing photons at a given UV luminosity compared to continuum-selected LBGs (Trainor et al. 2016;Nakajima et al. 2016;Matthee et al. 2017;Harikane et al. 2018;Nakajima et al. 2018aNakajima et al. , 2020. The EMPGs examined in this study do not apparently follow the trend. Although the typical ξ ion matches with those reported in high-redshift galaxies at M UV ∼ −19, it rather decreases with M UV with a tendency that MPGs have a more or less larger ξ ion value than EMPGs for a given UV luminosity. The bottom panels of Figure 14 clarify the metallicity dependence of ξ ion . A large variation remains at the lowest metallicity of 12 + log(O/H) ∼ 7.0 whose ξ ion varies from log ξ ion ∼ 24.0 to 26.0. Faint UV luminosity and/or low metallicity in gasphase would not be the primary properties that govern the efficient production of ionizing photons. We will turn back to these puzzling trends later.
On the other hand, the middle panels of Figure 14 illustrate that EW(Hβ) is well-correlated with ξ ion in a positive manner with a relatively small uncertainty. Both the EMPG and the MPG samples fall on the same relationship, with the following equation form: log ξ ion = 23.53 + 0.92 × log EW(Hβ). ( Such a tight correlation has been suggested in earlier studies in the local universe (Chevallard et al. 2018) 4 , at z ∼ 1 − 2 , at z ∼ 3 (Nakajima et al. 2020), and even at z ∼ 4 − 5 (Harikane et al. 2018;Lam et al. 2019). This work confirms the same relationship for metal-poor galaxies, and also suggests it holds over the wider range of EW(Hβ) than previously explored, down to ∼ 10Å and up to ∼ 600Å. The stacked faint LAEs at z ∼ 4 − 5 interestingly appears to fall above the relationship (Maseda et al. 2020), although the large uncertainty remains both in ξ ion and EW(Hα).
Following the apparently universal trend as a function of EWs of Hydrogen Balmer lines, the production efficiency of ionizing photons would be mainly governed by the stellar age of the current star-formation. This is reasonable as it probes the relative abundance of the youngest, most massive stars (age of 10 Myr) to that of less massive, long-lived stars that can contribute to the nonionizing UV radiation (age of 100 Myr). Theoretically a metal-poorer stellar population would present a harder ionizing spectrum due to several effects such as metal blanketing, rotational hardening, and binary evolution (e.g. Levesque et al. 2012;Kewley et al. 2013;Eldridge et al. 2017). In the current sample, however, we do not see any significant secondary dependence of metallicity on the relationship, albeit with a caveat that the metallicity we refer to is the gas-phase oxygen abundance while the stellar metallicity, especially the iron abundance, controls the hardness of ionizing spectrum (e.g., Steidel et al. 2016;Cullen et al. 2019). We need such metal absorption studies for EMPGs to discuss the stellar metallicity dependences. The lack of metallicity dependence on ξ ion in our sample can be partly because we explore only the lowest metallicity range below the sub-solar value. Indeed, a weak-but-decreasing trend of ξ ion with metallicity is seen in the z = 2 − 3 sample on average in the high metallicity range of 12 + log(O/H) > 8.2 (Shivaei et al. 2018), although this may be a result of the correlation between EW(Hβ) and metallicity in the MOSDEF sample (Reddy et al. 2018a). Moreover, the assumption of zero escape fraction may not be realistic for extremely faint, metal-poor systems (see below).
We now revisit the puzzling trends found in the M UV vs. ξ ion plot following the strong dependence of ξ ion on EW(Hβ). At the bright end of Figure  14, the relatively large ξ ion values reported on average in high-redshift galaxies are primarily due to a large EW in galaxies typically found at high-redshift (Khostovan et al. 2016;Faisst et al. 2016;Reddy et al. 2018a). Among these high-redshift galaxies, less massive galaxies with a fainter M UV tend to be younger and present a larger EW, and hence have a harder ξ ion to set up the weakly increasing trend. In particular, LAEs with a stronger EW(Lyα) show a larger EW(Hβ), and are associated with a harder ionizing spectrum as marked with orange symbols in Figure 14. (Nakajima et al. 2016;Matthee et al. 2017;Harikane et al. 2018;Nakajima et al. 2018aNakajima et al. , 2020Maseda et al. 2020). At the fainter end of M UV ( 15) in the current plot, half of the EMPGs in the faint M UV range have a low value of EW(Hβ) below ∼ 80Å ( Figure  12) and thus lower the typical value of ξ ion in the current compilation. The MPGs, which are compiled in a less complete manner ( §2.1.3) show a biased distribution of EW(Hβ) toward larger values ( Figure 2a) and thus tend to have a larger ξ ion than the EMPGs for a given M UV . Finally, we note again that the current ξ ion measurements for the compiled galaxies assume no escaping ionizing photons. In Figure 14, we plot the intrinsic ξ ion values only for some green pea galaxies at z ≃ 0.3 and LAEs at z ≃ 3 whose escape fraction of ionizing photons, f esc , is directly measured (f esc ∼ 0.05 to ∼ 0.5; Izotov et al. 2016a,b;Schaerer et al. 2016;Izotov et al. 2018a;Schaerer et al. 2018;Fletcher et al. 2019;Nakajima et al. 2020). The intrinsic values are greater than the observed ones by a factor of 1/(1−f esc ), which ranges from ∼ 1.05 to as large as 2 in the above cases. In this sense, the ξ ion values currently estimated for the remaining (E)MPGs serve as lower-limits, to be precise. This could be part of another reason why the ξ ion of the (E)MPGs appear not so large at the extremely faint and low-metallicity regimes. Indeed, galaxies with a bluer UV continuum slope, as blue as the intrinsic slope β 0 , tend to be more associated with a LyC leakage (e.g., Zackrisson et al. 2013Zackrisson et al. , 2017. Following the trend between β and metallicity (bottom panels in Figure 13), EMPGs are thought to have a higher chance to present a non-zero f esc . On the other hand, the f esc uncertainty would not break the relationship between ξ ion and EW(Hβ) because EW(Hβ) is observed to become small by the same factor of 1/(1 − f esc ) as ξ ion (see also the slope of Equation (3) is almost unity). Although direct LyC observations for the local (E)MPGs are challenging because they are too near-by (z 0.05) to be observed with HST/COS in the wavelength below the Lyman limit, indirect methods such as using the Lyα's spectral profile and the flux at the systemic velocity (e.g., Verhamme et al. 2017;Vanzella et al. 2019;Naidu et al. 2021) could help understand the local (E)MPGs' intrinsic nature of ionizing photon production/escape and its dependence on the galaxy's properties.

SUMMARY
We investigate the optical-line metallicity indicators together with the fundamental ultra-violet (UV) properties of low-mass, extremely metal-poor galaxies (EMPGs) to provide useful anchors for forthcoming spectroscopic studies in the early universe. We make use of the EMPRESS sample ) and perform a follow-up spectroscopic observation to enlarge the EMPG sample. Compiling previously known metalpoor galaxies from the literature, we build a large sample of EMPGs (N o = 103) covering a large parameter space of magnitude (M i from −19 to −7 and M UV from −20 to −9) and Hβ equivalent width (EW(Hβ) from 10 to 600Å), i.e., wide ranges of stellar mass and starformation rate. Our main results regarding the metallicity indicators are summarized as follows.
1. Utilizing the largest EMPG sample as well as the stacked spectra of 120, 000 SDSS galaxies (Curti et al. 2017(Curti et al. , 2020, we derive the relationships between strong optical line ratios and gasphase metallicity over the range of 12 + log(O/H) = 6.9 to 8.9 corresponding to 0.02 to 2 solar metallicity Z ⊙ fully based on the reliable metallicity measurements of the direct T e method.

We confirm that R23-index, ([O iii]+[O ii])/Hβ,
shows the smallest scatter in the relation with the metallicity measurements (∆ log (O/H) = 0.14) over the full metallicity range, suggesting that R23-index is most reliable among various metallicity indicators over the wide range of metallicity. Unlike R23-index, the other metallicity indicators do not use a sum of singly and doubly ionized lines and cannot trace both low and high ionization gas. A caveat is an R23-based metallicity becomes less accurate around 12 + log(O/H) ∼ 8.0 ± 0.2 as compared to the lower and higher metallicity ranges due to the two-branch nature.
3. We find that the accuracy of the metallicity indicators, including the famous ones such as R3-index and N2-index, becomes significantly improved (by a factor of as large as 2), if one uses Hβ equivalent width measurements that tightly correlate with ISM ionization states. This application is supported by our CLOUDY photoionization modeling, and suggested to work irrespective of redshift. We argue that it is crucial to correct for the ISM ionization condition in estimating the metallicity if the strong line methods, especially when using only low-or high-ionization lines are used. Such a correction would be of particular importance when discussing the mass-metallicity relation, its dependence on star-formation activity, and its cosmic evolution.
Moreover, the analysis of GALEX FUV and NUV band photometry for the EMPGs allows us to characterize the UV properties of UV absolute magnitude M UV , UV continuum slope β, and ionizing photon production efficiency ξ ion for the extreme population of EMPGs. The main findings are as follows.
4. We identify the UV slope β is best-correlated with metallicity below 12+log(O/H) 7.4. The most metal-deficient galaxies with 12 + log(O/H) ∼ 7 on average show the lowest β value almost close to the intrinsic UV continuum slope β 0 = −2.6. The negative correlation between β and M UV known in high-redshift UV-selected galaxies would be caused by a luminosity-metallicity relation. 5. We confirm the ionizing photon production efficiency ξ ion is best-correlated with EWs of Hydrogen Balmer lines of Hα and Hβ over a wide range from EW(Hβ) = 10Å to 600Å. A large variation of ξ ion is recognized even for galaxies with the faintest UV luminosity (M UV −15) and the lowest metallicity (12 + log(O/H) 7.69). The variations of ξ ion as well as EW(Hβ) for a given metallicity worsen the accuracy of the metallicity diagnostics as discussed above.
The metallicity-sensitive emission line ratios and the UV properties for the compiled 103 EMPGs are publicly available in the form of ASCII table as partly listed in Table 6. Note- Table 6 is published in its entirety in the machine-readable format. A portion is shown here for guidance regarding its form and content.