Titans metal-poor reference stars II. Red giants and CEMP stars

Representative samples of F-, G-, K-type stars located out of the Solar Neighbourhood has started to be available in spectroscopic surveys. The fraction of metal-poor ([Fe/H]~$\lesssim -0.8$~dex) giants becomes increasingly relevant to far distances. In metal-poor stars, effective temperatures ($T_{\mathrm{eff}}$) based on LTE spectroscopy and on former colour-$T_{\mathrm{eff}}$ relations of still wide use have been reported to be inaccurate. It is necessary to re-calibrate chemical abundances based on these $T_{\mathrm{eff}}$ scales in the multiple available surveys to bring them to the same standard scale for their simultaneous use. For that, a complete sample of standards is required, which so far, is restricted to a few stars with quasi-direct $T_{\mathrm{eff}}$ measurements. We aim at providing a legacy sample of metal-poor standards with proven accurate atmospheric parameters. We add 47 giants to the sample of metal-poor dwarfs of Giribaldi et al. 2021, thereby constituting the Titans metal-poor reference stars. $T_{\mathrm{eff}}$ was derived by 3D non-LTE H$\alpha$ modelling, whose accuracy was tested against interferometry and InfraRed Flux Method (IRFM). Surface gravity (log $g$) was derived by fitting Mg~I~b triplet lines, whose accuracy was tested against asteroseismology. Metallicity was derived using Fe II lines, which was verified to be identical to the [Fe/H] derived from non-LTE spectral synthesis. $T_{\mathrm{eff}}$ from 3D non-LTE H$\alpha$ is equivalent to interferometric and IRFM temperatures within a $\pm$46~K uncertainty. We achieved precision of $\sim$50~K for 34 stars with spectra with the highest S/N. For log $g$, we achieved a total uncertainty of $\pm$0.15~dex. For [Fe/H], we obtained a total uncertainty of $\pm$0.09~dex. We find that the ionization equilibrium of Fe lines under LTE is not valid in metal-poor giants.


Introduction
Spectroscopic surveys are nowadays the main data sources to study the Milky Way formation and evolution via kinematic, dynamic, and chemical abundance analyses when combined with Gaia astrometric data (Gaia Collaboration et al. 2016, 2022).Gaia-ESO (Gilmore et al. 2012;Randich et al. 2013;Gilmore et al. 2022;Randich et al. 2022), Galactic Archaeology with HERMES (GALAH, De Silva et al. 2015;Buder et al. 2021), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST, Cui et al. 2012), Apache Point Observatory Galactic Evolution Experiment (APOGEE, Majewski et al. 2017;Jönsson et al. 2020), and The Radial Velocity Experiment (RAVE, Steinmetz et al. 2020) are projects currently providing stellar parameters and updating them in subsequent catalogue releases.Supported by Gaia distances, they have prospected the Galaxy plane from nearly the bulge center to about 20 kpc away, within about 10 kpc height off the plane.Future projects such as 4-metre Multi-Object Spectroscopic Telescope (4MOST, de Jong et al. 2019), and WEAVE (Dalton 2016;Dalton et al. 2016) are ex-⋆ Based on observations collected at the MERCATOR telescope installed at Roque de los Muchachos Observatory and ESO archival data.
pected to register spectra of many distant stars in such a way that studies will be performed with representative samples out of the Solar Neighbourhood.This will naturally cut off distant dwarf stars, which are relatively faint, whereas red giants will be the main, if not the only, objects from which reliable abundances associated to certain point in the time-line of the Galaxy evolution can be obtained, provided that their ages are possible to estimate (e.g.Valentini et al. 2019;Montalbán et al. 2021).Maximising the profits of the efforts made to implement spectroscopic surveys depend on our ability to retrieve accurate stellar parameters, abundances, and ages (e.g.Jofré et al. 2019).
Generally, it is assumed that main sequence dwarfs and red giants share the same abundance scale.However, it has been shown that, under local thermodynamic equilibrium (LTE), the giants' metallicity ([Fe/H] 1 ) scale is consistent with that of dwarfs only with meticulous adapted line lists, and when their effective temperature (T eff ) and surface gravity (log g) are not constrained by assuming excitation and ionization equilibrium, but via independent methods (Dutra-Ferreira et al. 2016).Fur-ther, it has been observed that at low metallicity ranges, diffusion and mixing phenomena may induce substantial variations in Fe and other elements across the way between the turnoff and the red giant branch (RGB) (Korn et al. 2007;Lind et al. 2008;Nordlander et al. 2012;Gruyters et al. 2014).Moreover, tests have shown that classical spectroscopic methods may produce highly discrepant abundance outcomes for red giants (e.g.Lebzelter et al. 2012;Jofré et al. 2017;Casali et al. 2020).
It is therefore reasonable that scale incompatibilities, that is to say, inaccuracies, also arise for other elements further than Fe when based on inaccurate T eff and log g.For example, Giribaldi & Smiljanic (2023b) found an offset of −0.07 dex in [Mg/Fe] ratios of metal-poor dwarfs induced by temperatures underestimated by ∼150 K.Here we show that in metal-poor red giants, the [Mg/Fe] offset can be exacerbated to about −0.1 dex due to only 100 K underestimation, which is the typical T eff offset of the literature values we compiled.
In addition to these difficulties, we must also consider that observational line profiles are more accurately reproduced by three-dimensional (3D) hydrodynamic model atmospheres that take non-LTE effects into account (e.g.Bergemann et al. 2017a;Amarsi et al. 2018;Gallagher et al. 2020;Wang et al. 2021;Amarsi et al. 2022), which inherently lead to more accurate abundance determinations.However, 3D non-LTE abundance determination is not yet feasible for the large amount of survey spectra, thus 1D LTE abundances converted to the 3D non-LTE scale using correction grids will be the most efficient option during some years in the near future (e.g.Amarsi et al. 2019Amarsi et al. , 2022;;Wang et al. 2021).Therefore, for proper application of 3D non-LTE corrections, 1D LTE abundances free from potential T eff , log g, and [Fe/H] systematics are paramount.These corrections have already shown how strong their impact is in the chemodynamic analysis employed to identify stellar populations of the primitive Milky Way and to analyse their evolution (e.g.Bergemann et al. 2017b;Amarsi et al. 2019;Giribaldi et al. 2019b).
Standard stars have been historically tracked and studied to calibrate acquisition instruments and astrophysical models.For instance, most photometric systems adopted Vega as spectrophotometric standard.Its brightness and good observability for northern telescopes supported its choice.However, its rapid rotation and the presence of a debris disc around it devise a spectrum that requires different sets of atmospheric parameters to be reproduced in the infrared and the visible (e.g.Casagrande et al. 2006;Gray 2007).This led to small but significant corrections in the zero-points of several photometric systems (Bohlin 2007;Maíz Apellániz 2007), so that widely used color-dependent relations to infer T eff of F-, G-, and K-type stars were shown to be substantially biased to cooler determinations (Casagrande et al. 2010).
Solar twins 2 have also been largely tracked as standard stars (e.g.Cayrel de Strobel 1996;Cayrel de Strobel & Bentolila 1989;Porto de Mello & da Silva 1997;Porto de Mello et al. 2014;Meléndez & Ramírez 2007;Giribaldi et al. 2019a;Yana Galarza et al. 2021).Among several scientific purposes, solar twins are needed to infer the solar colours and magnitudes in different photometric systems (e.g.Neckel 1986;Pasquini et al. 2008;Casagrande et al. 2010Casagrande et al. , 2021)), given that the Sun's proximity prevents their direct measurement.Those magnitudes, as well as twins' spectra are required to update our knowledge of 2 Stars with atmospheric parameters matching the solar ones within uncertainties (e.g.Porto de Mello & da Silva 1997;Porto de Mello et al. 2014) or within small arbitrary differences, typically 100 K, 0.1 dex, and 0.1 dex in T eff , [Fe/H], and log g (e.g.Ramírez et al. 2009).
Standard F-, G-, and K-type stars have been compiled within the so-called Gaia Benchmarks (Jofré et al. 2014;Heiter et al. 2015;Hawkins et al. 2016) mainly for the practical purpose of calibrating Gaia's parameter database, which currently is in its third data release (DR3, Gaia Collaboration et al. 2022).The atmospheric parameters T eff and log g of most of these benchmarks are highly reliable in terms of accuracy because they have been inferred quasi-directly via interferometric measurements of their angular diameter and a modified version of the Stefan-Boltzmann relation.Therefore, they are ideal objects for diagnosing the accuracy of model atmospheres and the completeness of physical models of line formation (e.g.Amarsi et al. 2016Amarsi et al. , 2018Amarsi et al. , 2022)).However, despite the collective effort made by all authors that provided parameters for the Gaia Benchmarks (see references in the papers) and the use of the most sophisticated computational tools, biases in calibrated survey stellar parameters imply that more standard stars are required.For example, offsets present in the GALAH DR2 parameter calibrations where the Gaia Benchmarks are standards (Buder et al. 2018, Fig. 14), and possible metallicity systematics reported in the GALAH internal DR2 data set (Wheeler et al. 2020, Fig. 11).This is a relevant problem, although not evident, for users of catalogued stellar parameters, as they are often interested in element abundances, which are unavoidably affected by the parameter biases.In addition, parameter biases could be transferred to new spectroscopic and photometric surveys, as they start to use spectroscopic surveys previously released as references for validation (e.g.Wheeler et al. 2020;Steinmetz et al. 2020;Andrae et al. 2022).
The Gaia Benchmarks presented a paucity in the metal-poor range, approximately for metallicity lower than −1 dex, which was later mitigated by Hawkins et al. (2016), who provided parameters for eleven moderately metal-poor stars (−1.5 < [Fe/H] < −1 dex), where only two were giants.Karovicova et al. (2020) improved parameters for two already studied Benchmarks and provided new ones for other six stars, all of them being red giants.These studies provide eighteen metal-poor stars, from which only twelve have reliable parameters according to the authors themselves.
Carbon-enhanced metal-poor (CEMP) stars are characterized by a carbon overabundance with [C/Fe] usually larger than 0.7 (e.g.Beers & Christlieb 2005).Their fraction increases with decreasing metallicity: they represent 10-30% of stars with [Fe/H]< −2 but up to 80% of stars with [Fe/H]< −4 (Lucatello et al. 2006;Placco et al. 2014;Yoon et al. 2018).Those which are also enriched in heavy elements are further separated in CEMPr, CEMP-s or CEMP-rs stars, and bear the signature of the rapid (r-), slow (s-) or potentially intermediate (i) processes of nucleosynthesis.Their atmospheric parameters are frequently derived by traditional methods such as the excitation and ionization equilibrium of Fe lines, and sometimes by further constraining T eff with color-T eff relations (e.g.Hansen et al. 2018;Karinkuzhi et al. 2021).One of the objectives of this work is to ascertain the correct T eff -log g-[Fe/H] scale of CEMP stars, and in particular to use spectroscopic indicators of log g free from evolutionary modeling, such as Mg I b triplet lines.Since the abundances of heavy elements are quite sensitive to stellar parameter uncertain-ties, it is important to constitute a sample of benchmark CEMP stars as well.Giribaldi et al. (2021), referred to as Paper I henceforth, provided atmospheric parameters for 41 metal-poor dwarfs (48 dwarfs in total including seven Gaia Benchmarks) named the Titans3 I metal-poor reference stars.They substantially fill the paucity of the Gaia Benchmarks between −3 < [Fe/H] < −1 dex.Their parameters are not based on quasi-direct T eff , as those of most of the Gaia Benchmarks, but on Hα Balmer profiles synthesised by 3D non-LTE models (Amarsi et al. 2018).In Paper I, the outcomes of both methods were proven compatible for F-, G-, K-type stars with a wide range of metallicity, although metalpoor red giants remained to be tested.Here we provide accurate atmospheric parameters for 47 metal-poor red giants with metallicity values between −3.2 and −0.5 dex and log g between 1 and 3.5 dex; we refer them to as Titans II henceforth.We employed the same 3D non-LTE Hα models as in Paper I, and we included tests with asteroseismic log g and non-LTE [Fe/H] to scrutinise rigorously the accuracy of our atmospheric parameters.
This paper is organised as follows.Section 2 describes the acquisition and selection of our observational data.Section 3 describes the data reduction.Section 4 describes the determination of the atmospheric parameters.In Sect. 5 we show the scrutiny of our atmospheric parameters, and provide accuracy diagnostics of various widely used methods for deriving T eff , log g, and [Fe/H].Section 6 describes our determination of Mg, C, N, and O abundances.Finally, in Sect.7 we list our conclusions in such a way that a reader interested on a certain accuracy test can be inmediately directed to the related section and figure.

Sample selection and observational data
CEMP giants and their spectra were selected from the list of stars analysed on Karinkuzhi et al. (2021).The spectra were acquired with the HERMES spectrograph (Raskin et al. 2011) mounted in the 1.2 m Mercator telescope at Roque de los Muchachos Observatory located at La Palma, Canary Islands, which covers the wavelength range 3800-9000 Å at a nominal resolution R ∼ 86 000.We selected stars whose Hα line profiles show wavelength bins free of metal or molecular line contamination, as those bins are required to derive T eff by fitting observational with synthetic profiles; details of this procedure are provided in Sect. 4.
To select non-CEMP giants, we initially searched UVES (Dekker et al. 2000) and HARPS (Mayor et al. 2003) spectra covering the Hα line (6562.797Å) with signal-to-noise ratio (S/N) higher than 200 and resolution higher than R = 40 000 in the European Southern Observatory (ESO) archive using the science portal of processed data4 .We searched in the SIMBAD database to recover Gaia parallax (ϖ), and the B and V magnitudes.We selected stars that seem to remain on the RGB in the M V vs. B − V space as Fig. A.1 shows (reddening effects were ignored).Once these candidates were pre-selected, we excluded the metal-rich stars ([Fe/H] > −0.8 dex) after cross-matching the resulting list with the PASTEL catalogue (Soubiran et al. 2016).In the same sub-sample, we included one star with an optimal quality spectrum in the HERMES archive, HD 115444, with parameters and abundances in Westin et al. (2000); Carrera et al. (2013) and Hansen et al. (2015b).Lastly, we removed stars with evident emission close to the Hα core, since this line is used as a temperature indicator in the present work.
For validation purposes, three additional star samples were considered.First, we included a sub-sample of stars with asteroseismologic measurements from the Transiting Exoplanet Survey Satellite data (TESS, Ricker et al. 2014;Stassun et al. 2018) available in Hon et al. (2021) (so that asteroseismic log g can be derived, see Sect.4.4).We cross-matched the TESS Input Catalogue (TIC, Stassun et al. 2018) with the PASTEL database (Soubiran et al. 2016), requiring [Fe/H] < −0.8 dex.
Third, we searched in the ESO archive for stars with effective temperatures directly derived by the InfraRed Flux Method (IRFM, Blackwell & Shallis 1977;Blackwell et al. 1979Blackwell et al. , 1980)), in the catalogues of Casagrande et al. (2010), Hawkins et al. (2016) and Casagrande et al. (2021), which share the same absolute photometric calibration scale.
Fig. 1 presents the map in equatorial and Galactic coordinates of the Titans benchmarks including the Titans I sample of metal-poor dwarfs presented in Giribaldi et al. (2021).The distribution on the sky is quite homogeneous with a small deficiency at higher declinations.All dwarfs have B − V < 0.6 while all giants have B − V > 0.4.The slight overlap is due to the presence of a few subgiants in the Titans I sample.The complete sample covers the G magnitude range [6,14], and the median G magnitude is 9.6.In the Galactic map, the Titans II giants stars are all systematically further away and above the the Galactic plane compared to the Titans I dwarfs.
In total we obtained a sample of 47 stars with their literature parameters listed in Table 1 where the red giants are classified according to the T eff or log g determination method: interferometry and IRFM for T eff , and asteroseismology for log g.CEMP stars are listed separately at the end of the table.Among all stars, four have interferometric T eff determinations, six have IRFM T eff determinations (one in common with interferometry), and four have asteroseismic log g.These stars are the standards required to assess the accuracy of the parameter determinations by the methods presented in this work and further applied to the stars labelled as "Other giants" or "CEMP giants" in Table 1.Seven stars are CEMP, whereas 28 stars are not carbon-enriched.The Kiel and HR diagrams of the stellar sample are presented in Fig. 2, where the parameters are those derived in this work (Table 2).2, while those of Titans I dwarfs come from Giribaldi et al. (2021).CEMP stars are marked with a cross symbol.The Kiel diagram displays Yonsey-Yale isochrones (Kim et al. 2002;Yi et al. 2003) as reference.HR diagrams display STAREVOL evolutionary tracks (Siess & Arnould 2008) for [Fe/H] = −2 dex and different masses, as labelled (mid panel), and for a mass M = 0.9M ⊙ and various metallicity values, as labeled (right panel).The track representing CEMP stars (blue line) has the abundance ratio C/O = 2 (details are given in Karinkuzhi et al. 2021).Tracks were constrained to maximum ages lower than the age of the universe, 13.8 Gigayears.Candidates to have started the Horizontal Branch are tagged by their identifiers in Table 2.
Table 1.Preliminary parameters from the literature.The "Ref" column indicates the stellar parameter reference.The "Inst." column specifies the origin of the spectra used in the present paper: HERMES (HE), HARPS (HA), UVES (UV) or FEROS (FE).The E(B − V) column provides reddening values extracted from Capitanio et al. (2017) when available, or alternatively from Schlafly & Finkbeiner (2011).CEMP-s and -rs stand for carbon-enriched metal poor stars enriched in either s, or in a mixture of s and r elements, following the classification in Karinkuzhi et al. (2021).Notes.Second column lists T Hα eff with its internal errors, i.e. the method precision.For the total uncertainties, the estimated model accuracy error of ±48 K (as discussed in Sect.5.1) must be added in quadrature.Third column lists alternative temperatures consistent with T Hα eff scale.The method by which they were derived and their corresponding sources are indicated with the following codes: IntK20: direct interferometry in Karovicova et al. (2020).Ph: IRFM-based photometric calibrations in Casagrande et al. (2021) using Gaia B p − R p colours, its errors are those of the calibration in the corresponding source.Hk: direct IRFM in Hawkins et al. (2016).Ca10: stands for direct IRFM in Casagrande et al. (2010).Ca21: direct IRFM in Casagrande et al. (2021).Fourth column lists surface gravities derived from Mg triplet lines (values in Table A.1 averaged), determined accurate in Sect.5.4.Fifth column lists iron abundances from neutral lines under LTE, they have been proven to be biased to low values in Sect.5.5, thus they must be used only for spectral modeling under LTE.Sixth column lists iron abundances from ionised lines under LTE, they were determined to be equivalent to abundances derived under non-LTE in Sect 5.6, thus they should be preferred for calibration purposes, as they are closer to real.Seventh column lists luminosities.Eighth column lists non-LTE corrected Mg abundances, to which −0.07 dex must be added to reproduce LTE spectra; see Sect.6.1.The star HD 175305, which has both interferometric and IRFM temperature measurements is repeated for compatibility with Table 1.The symbol § indicates that lines (log (EW/λ) > −5.05) were included to derive [Fe i/H] and [Fe ii/H].[Fe i/H] values indicated with an asterisk (*) were not derived here but were extracted from Buder et al. (2021).log g values in brackets are log g iso offset corrected values, see Sect.5.4.log g values accompanied by the symbol (⋆) were derived from asteroseismologic measurements in Sect.4.4.
Article number, page 6 of 27 R. E. Giribaldi et al.: Titans metal-poor reference stars II.

Data reduction
High-resolution spectra were acquired with HERMES, FEROS, HARPS or UVES instruments, as indicated in seventh column of Table 1.For HERMES, the Doppler correction was performed by cross-correlating the stellar spectra with a mask covering the wavelength range 4800 -6500 Å and mimicking the spectrum of Arcturus (K1.5 III).The restricted wavelength span is to avoid both telluric lines at the red end and the crowded and poorly exposed blue end of the spectra.The HERMES spectra were reduced with an automated pipeline, merging the different orders and correcting for the blaze function of the echelle grating as well as for the Earth motion around the solar-system barycenter.
UVES, FEROS, and HARPS 1D spectra were Doppler corrected to the rest-frame wavelength scale using iSpec (Blanco-Cuaresma et al. 2014); see details in Paper I. It was not possible to correct the most metal-poor spectra with this tool because of the very few available metal lines, which are used by the algorithm to cross-correlate with the template.They were thus corrected using the IRAF5 tasks fxcor and dopcor taking as template a spectrum of BPS CS 31082−001 corrected with iSpec.All spectra were globally normalised by spline polynomials with iSpec.More precise local re-normalizations around the lines of interest were applied later during the element abundance determination process.

Atmospheric parameters
The atmospheric parameters T eff , [Fe/H], and log g were derived by the method described in Paper I. It consists on iterative loops with the following steps: derivation of T eff through Hα profile fitting (hereafter T Hα eff ); [Fe/H] determination through spectral synthesis; and determination of log g through isochrone fitting (hereafter log g iso ).Each parameter is derived while keeping the other ones fixed, and the procedure is iterated until the T Hα eff variation does not exceed its fitting error.
Hα is virtually insensitive to typical log g and [Fe/H] errors (see Sect. 4.1), therefore we used log g and [Fe/H] values from the literature as priors in the first loop to get the first T Hα eff guess.This first T Hα eff can be very different than temperature values from the literature (see Sect.5.3) because custom methods are prone to biases due to their high parameter interdependence.Therefore, log g and [Fe/H] in the first loop are likely to highly vary with respect to literature values.The temperatures constrained in subsequent loops vary little, no more than few tens of Kelvins; thus loops are basically run to tune the parameters.In Sect.5.4 we verify that log g from Mg I b triplet lines is more reliable than the isochrone fitting outcomes; for this reason, we run a final loop using the former method instead of the later.Table 2 lists our final parameters, the determination of which is fully described in this section, whereas their scrutiny is performed in Sect. 5.

Effective temperature
We described in detail the method to derive T Hα eff in Paper I, where we also determined the accuracy of 3D non-LTE theoretical Hα profiles (Amarsi et al. 2018) in dwarf metal-poor stars.We refer the reader to Giribaldi et al. (2019b) for technical aspects on the normalization-fitting procedure of Hα line profiles.Here we employ the same method on our sample of giants.
CEMP stars challenge the applicability of the Hα fitting because they have narrow line profiles blended by CN and C 2 molecular features.To perform a proper application of this method, windows free from molecular contamination and tellurics were manually identified in the neighborhood of the Hα line, for each spectrum.We synthesize Hα profiles at two temperatures separated by 200 K to assess the sensitivity of the windows selected, as displayed in Figs.A.2, A.3, A.4, A.5, and A.6.For CEMP stars, the windows widths were optimised using C, N, O, and Fe abundances derived from the atmospheric parameters in every loop.Figs. 3 and 4 show the performance of this procedure for the CEMP stars HD 76396 and HD 26, respectively.The former presents a small number of narrow fitting windows in the most sensitive wavelength ranges of its Hα profile, note the location of wavelength regions marked in green within the pink shades in second panel of Fig. 3.The latter presents only three narrow windows, from which only one lies in a very sensitive region, see second panel of Fig. 4. When the windows are too narrow, a bootstrap method is performed to robustly assess the effective temperature.Starting from the literature atmospheric parameters, the effective temperature is iteratively derived until the temperature difference of the two last ierations does not exceed the estimated uncertainty.When several spectra are available for a given star, the fitting windows are selected independently for each spectrum to avoid frequent tellurics contamination and artifacts.
As done in Paper I, the uncertainty of T Hα eff is given by the following expression: where δ T eff −model is the uncertainty of the synthetic profile, δ T eff −fit is the uncertainty of the fitting, δ T eff −inst is the uncertainty induced by an instrument residual pattern, δ T eff −log g is the uncertainty related to the interdependence between T Hα eff and log g, and δ T eff −[Fe/H] is the uncertainty related to the interdependence between T Hα eff and [Fe/H].According with the analysis in Sect.5.1, we consider that the temperature uncertainty due to errors on the model synthetic spectrum can be estimated from the dispersion of the difference of T Hα eff with T eff derived from IRFM and from interferometry: it amounts to δ T eff −model = 48 K. δ T eff −fit is given by the internal uncertainty of T Hα eff .The compatibility between observational and synthetic profiles is evaluated by a temperature histogram, whose frequencies are associated to the temperatures of the synthetic profiles that best match every wavelength pixel within the fitting windows; see Figs.A.2, A.3, A.4, A.5, and A.6 where all fits of the stars in this work are compiled.When only one spectrum is available, δ T eff −fit is given by the 1σ dispersion of a Gaussian fitted to the histogram of temperatures (see details in Sect.3.1 of Paper I), whereas when more than one spectrum is available, δ T eff −fit equals to the 1σ dispersion of the temperatures associated to every spectra.For δ T eff −inst , we adopted the value 33 K determined with UVES spectra in Paper I.
We evaluated δ T eff −log g as follows.We simulated observational profiles by adding noise equivalent to S /N = 300 to our grid of synthetic 3D non-LTE Hα profiles.We derived their temperatures the same way as done with authentic observational profiles.The exercise was done fitting profiles with [Fe/H] equal to −1, −2, and −3 dex, separately.Since similar results were obtained in each analysis, we present here that of [Fe/H] = −2 dex, to be used as proxy for the entire metallicity range covered by = 5064 K Fig. 3. Observed spectra of the CEMP star HD 76396 compared with its modeled spectra.The upper and lower panels are used to ensure that a proper normalization of the observed spectrum is performed in the Hα region illustrated in the middle panel.Pink shades represent variations up to ±200 K to provide a view of the flux sensitivity to T eff along the wavelength range.Gray shades indicate transitions not appearing in the observational spectrum, which were eliminated from the synthesis line lists, according to the procedure described in Giribaldi & Smiljanic (2023a).The fitting windows used are highlighted in green in the residual plot in the second panel.Some atomic and molecular characteristics are labeled in the plots.
the stars in the present study.The fittings were done within wavelength regions with high sensitivity to T eff .For line profiles associated to T eff < 5400 K we used the intervals [6557.8:6561.5]Å and [6564.5: 6567.8]Å for the blue and red wings, respectively.For those associated to T eff ≥ 5400 K we restricted the regions to avoid the influence of intense line cores, thus the intervals used are [6557.8:6560.0]Å and [6565.5:6567.8]Å for the blue and red wings, respectively.Here we did not use the actual log g values of the profiles as inputs for the fittings, but modified by +0.1 dex.The fittings produced biased temperatures, which are mapped as function of T eff and log g in Fig. 5.We conclude that temperature offsets (∆T eff ) due to 0.1 dex errors in log g are typically within 30 K; as illustrated by the representative case log g = 2.5 dex and T eff = 5000 K, identified with the black lines in Fig. 5.The highest offsets are present for the hottest stars.This outcome occurs because degeneracy appears approximately for T eff > 5500 K with log g < 2.5 dex, the effects of which become stronger as the T eff -log g pair gets further away from the RGB evolutionary path in the Kiel diagram; this is, to the horizontal branch.The top plot in the figure shows that for the T eff range in this work (hotter than 4500 K), stars with low log g tend to be less sensitive to potential input log g biases.Namely, ∆T eff ∼15 K corresponds to log g < 2.5 dex, and ∆T eff ∼25 K corresponds to log g ≥ 2.5 dex.We use this estimation as a practical rule for δ T eff −log g .In Sect.5.4 we estimate that the typical uncertainty of our surface gravities is 0.15 dex, thus the values above must be multiplied by 1.5 to obtain the corresponding errors: δ T eff −log g = 23 K for stars with log g < 2.5 dex and δ T eff −log g = 38 K for stars with log g > 2.5 dex.
To evaluate δ T eff −[Fe/H] we ran the same procedure as above, but replacing log g by [Fe/H].Figure 6 shows ∆T eff induced by adding +0.07 dex to the actual metallicity of the simulated observational profiles.This quantity, assumed as the typical metallicity uncertainty, is obtained from the top plot in Fig. 14, where the dispersion (±0.14 dex) is attributed evenly to both [Fe i/H] and [Fe ii/H] measurements.∆T eff in Fig. 6 is slightly shifted to negative values, typically not lower than −10 K. See for example, the cases of [Fe/H] = −2.0dex and T eff = 5000 K represented by the black lines.The T eff dispersion in the plots is, however, dominated by the spectral noise, as δ T eff −fit associated to the fits in this test is typically ±40 K. Therefore, δ T eff −[Fe/H] is hereafter neglected.

Metallicity and microturbulence
To derive metallicity, the spectral fitting was done using the iSpec package (Blanco-Cuaresma et al. 2014) running the radiative transfer code Turbospectrum (Plez 2012) with spherical MARCS model atmospheres (Gustafsson et al. 2008) considering the atomic parameters excitation potential and oscillator strength in Heiter et al. (2021).As proceeded in Paper I, we first derived the broadening parameters (microturbulence v mic and macroturbulence v mac ).For that, the projected rotational velocity vsin i was fixed to 1.6 km s −1 while v mic , v mac , and    were compiled to derive these parameters: Fe i and Fe ii for giants from Jofré et al. (2014), the "ASPL" and "MASH" lists of Fe i from Dutra-Ferreira et al. ( 2016), and the Fe ii line list from Meléndez & Barbuy (2009).Since our method does not assume ionization equilibrium, the abundance biases associated to some of these line lists and discussed in the papers mentioned above do not affect our results.Actually, The capacity of our method to recover accurate T eff and log g was evaluated by comparing its outcomes against those inferred by interferometric measurements in Paper I.
As demonstrated in Sect.5, our set of T eff and log g are statistically accurate, therefore the accuracy of our set of iron abundances is mainly subject to potential systematic errors resulting from our Fe ii lines modeling 6 , carried out with 1D, spherically symmetric model atmospheres, assuming LTE.Iron abundances were determined from Fe ii lines, although abundances from Fe i lines were also derived to quantify the offsets induced by the LTE modeling; hereafter we refer to [Fe ii/H] when mentioning metallicity.Only weak lines were considered in our procedure to minimise 1D modeling the defects.Namely, we restricted lines with reduced equivalent width (REW = log(EW/λ) 7 ) lower than −5 to make sure we work in the linear part of the curve of growth.For CEMP stars, we extended this upper limit to −4.80 because very few weak and unblended lines were available.
We verified the absence of correlation between [Fe/H] and REW for the determined v mic values using Fe i and Fe ii lines altogether.In Sect.5.5 we find that Fe i abundances are generally lower than Fe ii abundances under LTE.This implies that v mic may vary depending of which group of lines are used for its determination: either neutral, ionised, or neutral and ionised lines.On the other hand, it reinforces the justification of using only weak lines to derive Fe abundances, as they are little sensitive to v mic , as already mentioned above.We exemplify this concept with the star HD 122563 in Fig. 7. Top panel shows no correlation between [Fe/H] and REW when both Fe i and Fe ii are considered (solid red line).Only a small correlation appears when [Fe/H] is computed from Fe i lines (black dashed line).The bottom panel in the figure show that this slope may be eliminated by increasing v mic to ∼ 3.30 kms −1 or more.With v mic = 3.30 kms −1 , the average [Fe i/H] abundance would decrease from −2.81 to −2.84 dex, whereas [Fe ii/H] would remain equal with −2.71 dex.The total uncertainty of [Fe/H] is estimated in Sect.5.5.

Surface gravity from Mg triplet lines
Conventional isochrones have a standard (solar) chemical composition, scaled to the considered stellar metallicity.They are not adapted to determine log g of stars with peculiar surface compositions, because the isochrone position is sensitive to the photospheric CNO abundances, as illustrated in Figs.24 and 26 of Karinkuzhi et al. (2021).Indeed, a modified composition can affect the opacities, and thus the bolometric magnitude and the effective temperature.
Anticipating this problem, we derived surface gravities from the 5172 and 5183 Å magnesium triplet lines, which, unlike the line 5167 Å, are reasonably free from blends for most metalpoor stars.To determine the surface gravity from Mg lines, we synthesised spectra adopting the following parameters for each star: (i) T Hα eff and (ii) [Fe ii/H] as listed in Table 2, and (iii) log g in the range [0.5-3.5] with a step of 0.5 dex.These grids were interpolated in log g with a step of 0.01 dex.We considered two fitting windows located far from the line cores, in order to avoid 6 Neglecting systematic errors from spectral acquisition and reduction. 7EW is the equivalent width in Å and λ is the wavelength in Å.  chromospheric effects.Wavelength regions with line blends were avoided.
For stars with [Fe/H] ≲ −2, Mg lines are relatively narrow (coverage less than 1 Å) and display asymmetries that 1D LTE models cannot reproduce.For them, we fixed the fitting windows to [5172.0-5172.5] Å and [5172.8-5173.4]Å for the line at 5172 Å, and to [5182.8-5183.45] Å and [5183.8-5184.1]Å for the line at 5183 Å.For CEMP stars, we set the limits of fitting windows to the regions without blends, always avoiding the line core (at a wavelength distance of at least ±0.3 Å).An example of these windows for the CEMP star HD 76396 is illustrated on Fig. 8.
The fitting procedure is similar to that of Hα.Each wavelength pixel inside the fitting windows is associated with the log g of the most compatible interpolated synthetic profile.This generates a dispersion of log g values, the histogram of which represents a probability distribution; see for example right panels in Fig. 8. Since in many cases only few wavelength bins are possible to fit, we do not directly fit Gaussians to the histograms, as done with Hα lines (see right panels in Figs.A.2, A.3, A.4, A.5, and A.6).We estimate the most probable log g and its error by bootstrap.This is, the log g dispersion associated to the wavelength bins is randomly re-sampled (bootstrapped dataset), and its median and standard deviation are computed.This procedure is repeated 1000 times, and the mean of all computed medians is considered the most probable log g, whereas the mean of all computed standard deviations is considered its error.The total uncertainty of log g is estimated in Sect.5.4.

Asteroseismic surface gravity
We cross-matched the ESO archive with the TESS catalogue (Stassun et al. 2018) searching for stars with asteroseismic measurements of maximum frequency (ν max ) in Hon et al. (2021).
We identified four stars with archived spectra of good S /N: HD 3179, HD13359, HD 17072, and HD 221580.Their T Hα eff , log g iso , log g Mg , and [Fe/H] were derived as described in previous sections.We also included the stars TIC 168924748 and TIC 404605506 in this analysis, although we did not find archival spectra of them.They have IRFM T eff (Casagrande et al. 2021) listed in the GALAH DR3 catalogue (Buder et al. 2021), which is determined here to be consistent with T Hα eff (Sect.5.1), thus, along with the four stars above, they are appropriate to determine the accuracy of the log g iso and log g Mg scales.These stars appear as dwarfs in our initial cross-match according to their parameters in Stassun et al. (2018), where their log g values are 4.49 and 4.48 dex, respectively.However, the GALAH catalogue indicates they are giants, the log g values of which given Table 1 were taken as preliminary.We verified in the StarHorse catalogue (Anders et al. 2022) that these stars are most likely red giants: it provides log g values 2.46 and 2.80 dex for each TIC 168924748 and TIC 404605506, respectively.
We determine asteroseismic surface gravity (log g seis ) by the expression: where we adopted the values ν max ⊙ = 3090 µHz, ∆ν ⊙ = 135.1 µHz, logg ⊙ = 4.44 dex, and T eff ⊙ = 5777 K (Huber et al. 2011).log g seis is compiled in Table 3 along with ν max derived by Hon et al. (2021).We used ∆ν ⊙ along with the T Hα eff errors in Table 2 and ν max errors in Table 3 to obtain log g seis errors.We included in Table 3 log g Mg and log g iso for comparison.

Luminosity and radius
We computed luminosities (L) by the relation L * = L −0.4(X+BC X ) 0 , where L * is the luminosity of the star, L 0 is the zero point luminosity 3.0128 × 10 28 W (Mamajek et al. 2015), and X and BC X are the absolute magnitude in a determined photometric band and its corresponding bolometric correction.Absolute magnitudes were computed from apparent magnitudes (extinction corrected as described in Paper I) and distance estimates of Bailer-Jones et al. (2018), where the Gaia parallax zero-point correction (+0.021 mas, Lindegren et al. 2021) was considered.Bolometric corrections were computed for each Gaia magnitude G, B P , and R P , extinction corrected, with the routine bcutil.py8(Casagrande et al. 2014;Casagrande & VandenBerg 2018).Luminosities in Table 2 are average values obtained with every Gaia magnitude band, values relative to the Sun in logaritmic scale (log L/L ⊙ ) are listed.Its uncertainties were computed by adding in quadrature the errors induced by the T eff , log g, and [Fe/H] errors given in the table.Stellar radius (R) was computed from log (L/L ⊙ ) and T eff in the table by means of the Stefan-Boltzmann relation; values relative to the Sun (R/R ⊙ ) are listed.Its errors were expanded from log (L/L ⊙ ) and T eff errors into the formula.

Accuracy of Hα effective temperature
Assessing model-based T eff determinations can be done with the help of stars with angular diameters (θ) inferred via interferometry.Their T eff are considered to carry marginal model influence, therefore to lie on a scale close to accurate.The set of Gaia benchmarks compiled in Heiter et al. (2015) and Jofré et al. (2014) includes 34 nearby stars with interferometric T eff .Among them only four have [Fe/H] < −1 dex, from which one is a red giant, HD 122563.This paucity was later mitigated by the incorporation of a sub-sample of ten stars with −2 ≲ [Fe/H] ≲ −1 dex in Hawkins et al. (2016), where two are red giants: HD 175305 and HD 218857.Their temperatures were however determined  Blackwell et al. 1979Blackwell et al. , 1980)), rather insensitive to model inaccuracies as well, as it makes use of photometry in the Rayleigh-Jeans spectral region.We demonstrated in Paper I that T eff determinations from Hα fitting and from interferometrc measurements are compatible for a wide range of atmospheric parameters of F-, G-, and K-type stars.Furthermore, we demonstrated an excellent agreement with IRFM in the metallicity range −3 ≲ [Fe/H] ≲ −1 dex for dwarf and turnoff stars.The validation of T eff determinations using Hα was however not tested for metal-poor giants, given the paucity of benchmark stars of this category.Karovicova et al. (2020) recently provided interferometric T eff determinations for ten stars with [Fe/H] < −0.7 dex.Two of them are Gaia Benchmarks, and the remaining are new standards, six giants and two dwarfs.
Concerning IRFM, Casagrande et al. (2021) expanded the applicability range of the implementation in Casagrande et al. (2010) to the RGB.Some of these stars have been studied in the context of the "First Stars" large programme 9 and are available in the ESO archive.
Figure 9 shows the comparison between T Hα eff and what we call "standard scale", which represents either interferometric temperatures in red symbols, temperatures based on θ determined by calibrations in blue symbols (Cohen et al. 1999;Kervella et al. 2004), or IRFM temperatures in gray symbols.The stars tested in the present work are indicated by their labels in the top and bottom panels; the additional points are the benchmark stars of Paper I. For the sake of completeness of the comparisons in the following discussion we include the subgiant metal-poor star HD 140283 with T Hα eff = 5810 ± 32 K derived in Paper I. Its interferometric T eff is 5792 ± 55 K (Karovicova et al. 2020) and IRFM T eff is 5777 ± 55 K (Casagrande et al. 2010).
The agreement between Hα and interferometric temperatures is excellent for HD 140283 and HD 122563, with negligible differences of ∼20 K.For HD 2665 the difference is larger (124 K), but the interferometric and Hα temperatures are still in agreement within 1σ.For HD 175305 and HD 221170 the differences are 249 and 319 K, respectively, between 1 and 2σ errors.However, their reported interferometric errors are relatively large (2-3%, see Table 1).HD 175305 also has an IRFM T eff (Hawkins et al. 2016), which is in much better agreement (40 K difference) with T Hα eff (grey symbol linked to the red one by the dashed line in Fig. 9).Therefore, the interferometric temperature of HD 175305 seems to be underestimated.We now discuss whether it could also be the case for HD 221170, which 9 Large Program 165.N-0276, P.I.: R. Cayrel.
unfortunately does not have an accurate IRFM determination to evaluate this possibility.Some clue is however provided in Casagrande et al. (2014), where systematics in interferometric temperatures are observed towards low θ values.In the analysis related to their Fig. 4, these authors explain that such systematics may arise for θ ≲ 0.9 mas due to insufficient power of beam combiners for sampling the visibility curve of the star disc at its border, which requires high spatial frequencies.The bottom panel of our Fig. 9 shows similar systematics to those in Fig. 4 of Casagrande et al. (2014).Examples of how the sampling at high frequency can dramatically improve angular diameter measurements were provided by White et al. (2013) Boyajian et al. (2012), where the latter presents significant systematics.Although the angular diameter measurements of HD 2265, HD 175305, and HD 221170 were acquired with PAVO (Karovicova et al. 2020), which offers sampling at the highest frequency, it is possible that their visibility curves are still biased to high counts at high frequencies.If so, they have affected θ determination and could be at the origin of the discrepancies between T Hα eff and interferometric temperatures of the three stars above.
We find that T Hα eff and IRFM T eff agree for all stars in our red giant sample within 1σ individual errors.It includes HD 140283 and HD 175305 with interferometric T eff compared above; we obtain for them T Hα eff -IRFM T eff values of +33 and +40 K, respectively.As Fig. 9 shows, no difference exceeds 76 K, which corresponds to a star with a spectrum of moderately low quality in our sample, BPS CS 22892-052 with S/N ∼139.The median of the differences between T Hα eff and the IRFM T eff is +28 K (grey horizontal line in the upper panel of Fig. 9), close to the IRFM T eff zero-point uncertainty of 20 K estimated in Casagrande et al. (2010).For stars with interferometric data, we obtain a median difference of −11 K when the outliers HD 2265, HD 175305, and HD 22117 are dismissed.
We conclude that Hα and IRFM temperatures are compatible for giants and dwarfs and show no significant systematics with any atmospheric parameter nor with angular diameter, therefore the outcomes of both methods are equivalent and can be averaged to improve imprecise T Hα eff determinations due to low S /N spectra, such as the one of BPS CS 22892−052; see   1.The plots include the Gaia benchmarks (see Fig. 7 of Paper I).The stars for which T Hα eff were derived in this work can be identified by their name labels in the top and bottom panels.The stars are colorclassified according to the method used to infer their standard T eff in the literature: direct application of interferometry (red), indirect interferometry based in color calibrations (blue), and IRFM (gray).The symbol sizes are inversely proportional to the log g values.The two temperature measurements of HD 175305 (see Table 1) are connected by a vertical dashed line.The red (dashed grey) line at −11 K (+28 K) indicates the median of the differences between Hα temperatures and those derived from interferometry (from IRFM), computed omitting HD 221170, HD 175305, and HD 2665.Their dispersion of 42 K (46 K) is indicated by the error bar with the same color.
Interferometric T eff should be used with caution for stars with angular diameters lower than 1 mas, since in some cases it can be underestimated by a few hundred Kelvins.For such stars, effective temperatures are best derived using Hα fitting or IRFM.
We associate the largest dispersion in Fig. 9, ±46 K, as the official uncertainty of the Hα 3D non-LTE model.However, we emphasise that the IRFM and interferometric T eff errors of the stars in the comparison are similar or even larger than this quantity, thus they dominate the dispersion of the comparison.For this reason, the internal uncertainties in Table 2 are good estimate to be expanded for deriving dependent parameters and abundances.

Validation of IRFM-based Gaia photometric calibrations
Casagrande et al. ( 2021) provide colour-T eff calibrations based on the IRFM for about 590 000 stars in the GALAH DR3 catalogue with Gaia and 2MASS photometry.The zero-points were finely tuned with solar twins, thus accurate T eff are expected for stars with parameters close to solar.The calibrations with the Gaia colours BP − RP, G − BP, and G − RP are of special interest for automatic T eff determinations of the large amount of giants in the Gaia catalogue.They are expected to provide precise T eff for metal-poor stars with preliminary [Fe/H] and log g (as those in Andrae et al. 2022), as they show small response to typical offsets in these parameters.Typically, ±40-50 K is the combined effect of ∆log g = 0.2, ∆[Fe/H] = 0.1, and ∆E(B − V) = 0.1.On the other hand, the performance of this calibration at the metalpoor range has not been rigorously quantified because of the lack of an extended sample of metal-poor reference stars; see Fig. 6 of Casagrande et al. (2021).
Here we test the performance of this calibration in the parameter range covered by our sample of metal-poor red giants.We derived colour-calibrated temperatures10 from Gaia colours running the script colte11 , using [Fe ii/H] and log g in Table 2, and E(B − V) in Table 1; the last was extracted from either Stilism (Capitanio et al. 2017) when available or from Schlegel et al. (1998) otherwise.For obtaining dereddened colours, the script colte converts E(B − V) to the Gaia system using extinction coefficients of either Schlafly & Finkbeiner (2011) or Cardelli et al. (1989) andO'Donnell (1994).We chose to use the former in the analysis below; we verified that using the latter we obtain temperatures that differ by only few Kelvins.Figure 10 compares the colour-calibrated temperatures with T Hα eff as a function of the atmospheric parameters.For completeness, dwarf stars from Paper I are included in the analysis and represented in blue.We observe that the calibrations using BP − RP provide the best agreement with T Hα eff .As shown by the plots on the left column of Fig. 10, there is perfect agreement across the entire T eff -[Fe/H]log g parameter space.The calibrations with the G − BP color agree with T Hα eff for giants, whereas they produce temperatures ∼60 K hotter for dwarfs.The calibrations with the G − RP color produce temperatures ∼70 K cooler than T Hα eff for giants, whereas for dwarfs they can be ∼100 K cooler.We remark that although small offsets for dwarfs between T Hα eff and the colour-calibrated temperature appear for G − BP and G − RP colours, we demonstrated in Paper I that IRFM temperatures (the base of these color calibrations) are fully compatible with T Hα eff .As far as CEMP stars are concerned, Fig. 10 illustrates the large discrepancies between their colour calibrated temperatures and T Hα eff , with differences larger than 2σ.Actually, the C 2 and CN absorption bands produce strong flux absorption which are not taken into account by the photometric calibrations consider-Teff ing solar-scaled chemical abundances.Standard photometric calibrations can thus produce strong biases when applied to CEMP stars.

Comparison of literature temperatures with accurate T eff
Disregarding the first three sections of Table 1, which contain standard stars for accuracy tests, several stars have temperatures derived with the color-T eff relations of Alonso et al. (1996) in their respective source papers: Cayrel et al. (2004) and Carney et al. (2008).Preliminary temperatures of a few stars were derived by spectral fitting (Arentsen et al. 2019;Koleva & Vazdekis 2012;Beers et al. 2017), and few by others assuming excitation ionization equilibrium of Fe lines (Johnson 2002;Hansen et al. 2018;Karinkuzhi et al. 2021).
Figure 11 shows the offsets of the literature temperatures with respect to T Hα eff , determined accurate in Sect.5.1.No general correlations with the atmospheric parameters are observed.Average differences, indicated by the horizontal lines, show that the three scales underestimate T eff by ∼100 K. Casagrande et al. (2010, Fig. 3) diagnosed the same bias for the scale of Alonso et al. (1996) with dwarf stars, here we verify it remains the same for red giants.Casagrande et al. (2010, Fig. 11) also found temperature underestimations similar to ours for the spectral fitting method when revising the catalog of Valenti & Fischer (2005).Preliminary temperatures derived by the excitation and ionization equilibrium of Fe lines display the largest dispersion.Among them, temperatures of four stars (HD 26, HD 5223, HD 201626, and HD 224959) differ with T Hα eff by less than 120 K.These stars, however, have literature log g and [Fe/H] that differ from ours by up to 0.7 and 0.4 dex, respectively, in the worse cases (HD 5223 and HD 224959).Thus, it is plausible that, due to the strong parameter interdependence that this method involves, biases permute among T eff , log g, and [Fe/H] in a complicated manner that we cannot rule with our small and selectionbiased sample.In Sects.5.5 and 5.6 we demonstrate that the ionization equilibrium of Fe is not satisfied under LTE for giants.Thus, we indicate that this assumption is likely the main source of the biases displayed here.

Accuracy of surface gravity
Custom methods to constrain log g, such as forcing the ionization balance of Fe lines and isochrone fitting, may lead to unreliable outcomes for giant stars.In Sect.5.5 and 5.6 we demonstrate that the problem of the former is that Fe i and Fe ii in red giants are generally unbalanced under LTE.When isochrone fitting is applied, it is generally assumed that log g (or luminosity) varies with T eff , age, metallicity, and mass.Other parameters such as the mixing length, α element enhancement, atomic diffusion, and the helium abundance, for instance, are also relevant in the RGB (e.g.Song et al. 2018;Cassisi 2017;Cassisi & Salaris 2020), and may vary case-to-case in complicated and diverse manners not accounted in ready isochrone grids as those used in this work.As a consequence, the accuracy of log g from isochrone fitting for field stars, is uncertain and likely imprecise.eff and preliminary temperatures from the literature (T lit eff ) compiled in Table 1, as function of the atmospheric parameters.The size of the symbols is inversely proportional to log g in Table 2.The symbols distinguish the method by which T lit eff was derived: Gray circles for color-T eff relations of Alonso et al. (1996), black contour circles for spectral fitting, and blue circles for excitation and ionization equilibrium of Fe lines.The gray dashed, black dotted, and blue lines represent the mean offsets of each scale: −93, −122, and −105 K, respectively.Error bars are 1σ dispersions and correspond to symbol colors: 72, 86, and 169 K, respectively.Bottom panel includes labels with the names of the stars with discrepancies higher than 150 k.Mg triplet Isochrones Fig. 12. Surface gravity difference for each star in Table 3. log g iso and log g Mg relative to log g seis are displayed in red and gray, respectively.
We evaluate the accuracy of log g iso and log g Mg using log g seis in Table 3 as the reference scale.Figure 12 shows that log g Mg and log g seis are compatible within 1σ errors for three stars out of four with measurements available, whereas log g iso is significantly deviating to higher values for every star.Accordingly, we accept log g Mg as the accurate surface gravity determination of our sample.Table 2

lists averaged values from both
Mg triplet lines employed (5172 and 5183 Å) among our recommended parameters, its errors are estimated below.We note that the stars in this test are constrained within a narrow log g range between 2.40 and 2.50 dex.Thus, the accuracy of our log g in Table 2 is guaranteed within this range and near surroundings, say ±0.5 dex approximately.The accuracy of our recommended log g values lower than ∼2 dex requires comparison with outcomes from fundamental methods on eclipsing binaries (e.g.Hełminiak et al. 2019;Ratajczak et al. 2021;Miller et al. 2022;Maxted 2023) or well calibrated asteroseismology.
The determination of log g iso is straightforward and can still be useful when the log g Mg determination is unfeasible, as far as its offsets are well characterised.This is the case of the stars BPS CS 22186−025, BPS CS 22189−009, BPS CS 22891−209, BPS CS 22949−048, and BPS CS 22956−050, the Mg lines of which are too narrow to be used as log g indicator.In Fig. 13, we compare the log g Mg values obtained from each Mg line with log g iso ; Table A.1 lists all values in the comparison.We remind here that the latter may provide reasonable outcomes only for non-enriched stars, therefore CEMP stars are excluded from Fig. 13.The plots show no obvious correlation with any atmospheric parameter, but display systematic offsets.Even though the internal precision of the individual log g Mg measurements worsens as metallicity decreases, as shown in the middle plot, the dispersions of the distributions remain roughly constant across the considered parameter range.As shown on the top panel of the figure, both Mg lines provide similar log g Mg , their offsets with respect to log g iso being nearly identical.Its average, −0.34 dex, is here determined as the correction factor for log g iso .To determine log g Mg error, we assume that the dispersion of the difference between log g iso and averaged log g Mg values, 0.17 dex, is composed by the addition in quadrature of the corresponding errors.Since the typical log g iso error is 0.08 dex, the typical log g Mg error results as 0.15 dex.

Fe ionization balance under LTE
The so-called "excitation and ionization balance of Fe lines" has been largely used as a standard method to determine atmospheric parameters; see a few recent examples in Hema et al. (2018); Hill et al. (2019), andKarinkuzhi et al. (2021).It assumes the identity of the abundances derived from neutral and ionised Fe lines.Such an assumption leads to reliable log g only under LTE validity.Tsantaki et al. (2019) selected Fe ii lines so that the iron ionization balance is satisfied in their sample of F, G and K solar-metallicity dwarfs; whether this configuration can be extrapolated to other spectral types and metallicities remains to be verified.
Ionised Fe lines have been observed to be virtually insensitive to departures from LTE (e.g.Fabrizio et al. 2012;Lind et al. 2012;Mashonkina et al. 2011;Sitnova et al. 2015) in F-, G-and K-type stars.Moreover, Amarsi et al. (2016) have shown that Fe ii abundances remain nearly the same when derived from 1D LTE or 3D non-LTE models.In the same work, it has been observed that 3D non-LTE Fe i abundances closely approach to those of Fe ii.Therefore, Fe ii lines can be safely used when fast 1D LTE calculations are performed.
Given the relatively large sample of metal-poor stars with accurate T eff and log g compiled in Paper I and here, we examine the compatibility of Fe i and Fe ii abundances under LTE along the T eff -log g-[Fe/H] parameter space.The following analysis is relevant because, in many cases, only few Fe ii lines may be available in observational spectra due to severe blending.The size of the symbols is inversely proportional to log g iso .The points are coded in red or grey depending on the Mg i line (5172Å or 5183 Å) used to compute log g Mg − log g iso .In the top panel, the offsets and ±1σ dispersions are represented by the dashed lines and error bars, the values of which are −0.35 ± 0.21 for the line at 5172 Å (in red) and −0.33 ± 0.19 for the line at 5183 Å (in gray).
Individual log g values are listed in Table A.1.Error bars in the middle and bottom plots are the quadratic sum of the errors of log g iso and those of log g (Mg) .
ionization unbalance appears only for red giants, this is, stars with log g ≲ 3.5 dex.The distributions of the abundance differences with T eff and log g are quite similar (see top and bottom plots of Fig. 14).Therefore, it can be inferred that Fe i lines allow reliable abundance determinations only for main sequence stars up to the very beginning of the subgiant branch.Then on, for giants, Fe i lines certainly lead to abundance underestimations between −0.10 and −0.20 dex (as shown by the red lines in the T eff and log g ranges covered by the gray symbols in Fig. 14).We remark that these offsets are only valid for Fe abundances derived from accurate T eff and log g.The ionization unbalance displays no correlation with respect to [Fe/H] in the analysed parameter space (see middle panel in Fig. 14).Assuming LTE ionization balance for red giants will thus possibly lead to biased log g determinations, for this reason this method should be avoided.
The results above confirm those of Karovicova et al. (2018).Sitnova et al. (2015) found that ionization equilibrium departures appear in dwarfs for [Fe/H] ≲ −1 dex (see Fig. 7 in the paper), which contradicts our results.In their analysis, Fe i abundances match Fe ii when non-LTE models are used.We cannot assert what the source of their relatively low LTE Fe i values is.However, we note that their temperature scale is a possible 4500 5000 5500 6000 6500 7000 source.They derived temperatures with the color-calibrations of Ramírez & Meléndez (2005), which lead to temperatures ∼100 K cooler than the IRFM scale of Casagrande et al. (2010) (see Fig. 5 in the paper), which is compatible with ours; see Sect.5.1.We determine the total precision of our [Fe/H] determinations by accepting that the dispersion in Fig. 14 is the even contribution of the errors of [Fe i/H] and [Fe ii/H] added in quadrature.Since the dispersion is ∼0.13 dex, our [Fe/H] precision results as ±0.09 dex.This is a little larger than the typical internal precision ±0.05 dex, as it accounts for T eff and log g errors.

Fe ionization balance under non-LTE
Spectral synthesis with iron in non-LTE was done using Turbospectrum 202012 (Gerber et al. 2023) for three MARCS model atmospheres close to the parameters of the stars HD 122563, BD+11 2998, and BPS CS 29502−042, which have atmospheric parameters representative of the sample.We have first extracted the iron departure coefficients based on the iron model atom developed in Bergemann et al. (2012) and Semenova et al. (2020) corresponding to the three atmospheric models.Then, using the curated line list from the Gaia-ESO Survey (Heiter et al. 2021), we computed synthetic spectra in the range 4900-6500 Å with a step of 0.005 Å, both in LTE and non-LTE, varying the iron abundance by steps of 0.1 dex.
With these spectra we compare non-LTE with LTE abundances performing a differential line-by-line analysis.This is, we used the same input parameters, the same line list for non-LTE and for LTE models, and we ran the same fitting algorithm.Our analysis only considers lines with log(EW/λ) ≤ −5.As expected, for Fe ii, we got the same outcomes with LTE and non-LTE models.Individual abundances of Fe i lines obtained in LTE and non-LTE are compared in Fig. 15.We note that taking into account non-LTE effects shifts the Fe i line abundances up by ∼0.12 dex and brings them in agreement with the Fe ii line abundances within 1σ.Accordingly, non-LTE Fe i and LTE Fe ii can be averaged or, one can be used in absence of the other for stars in the parameter range here analysed.

Mg, C, N, and O abundances
Element abundances were derived assuming the following atmospheric parameters: T Hα eff , [Fe ii/H], and log g in Table 2. Top panel displays abundance differences as function of metallicity.Mid panel displays differences as function of the Mg abundance of the 5711 Å line.Bottom panel, displays differences as function of the difference between log(EW 5528Å /5528) and log(EW 5711Å /5711).Symbols are colour-coded according to the reduced equivalent width of the line 5528 Å in logarithmic scale.Symbols sizes are inversely proportional to log g.Exponentials are fitted to the distributions in all panels.Stars with the largest discrepancies are tagged by their identifiers.1σ errors (±0.07) around the average (+0.07) are represented by the error bars.

Magnesium
Magnesium was derived primarily from the line at 5711 Å.This line has been shown to be little affected by 3D non-LTE effects, both theoretically (Mashonkina 2013;Bergemann et al. 2017a) and observationally (Giribaldi & Smiljanic 2023b).In case this line was not available, either because it was too weak or because it was not covered by the spectrum, we used the line at 5528 Å instead.In Fig. 16 we show abundance discrepancies between these lines as function of [Fe/H], A(Mg) 13 , and the difference of the reduced equivalent widths 14 of the Mg lines ∆(REW) = log(EW/5528) − log(EW/5711).The bottom panel colour-codes the symbols according to a logarithm scale of the reduced equivalent width of the line at 5528 Å.It shows that, in general, discrepancies remain moderate, around 0.1 dex.This happens for most stars with [Fe/H] ≲ −1 dex, as top panel in the figure shows.At low ∆(REW) the abundances from the line at 5528 Å tend to be lower than those from the line at 5711 Å (darkest gray symbols).This is the case of stars with the highest metal-13 A(Mg) = log(N Mg /N H ) + 12. 14 Equivalent widths were computed by Gaussian fitting.
licities and Mg abundances: HD 105740 and BD+11 2998, as top and mid panel show.Similar outcomes were also observed in dwarfs (Giribaldi & Smiljanic 2023b).We remark that in the present case, abundance discrepancies remain small even for high ∆(REW) values (≳ 0.6) because their Mg lines are unsaturated.As red tone symbols in bottom panel of the figure show, the REW of the strongest line 5528 Å remains lower than −4.8, therefore within in the linear section of the curve of growth.We use the exponential fits in the mid panel to correct the offsets that the Mg determinations from the 5528 Å line carry, thus Mg of all Titans, dwarfs and giants, lie in the same scale.Finally, we adopt the non-LTE corrections of Mashonkina (2013) for the line 5711 Å to all Mg abundances.This is +0.07 dex, which is an average of the Drawinian rates scaled values ("D0.1") for [Fe/H] = −1 and [Fe/H] = −2 dex, given in Table 5 in the paper.Corrections were also provided by Merle et al. (2011), but not for all the atom parameters.We remark that the abundance discrepancies shown in Fig. 16 are only valid when line synthesis is applied.Abundance determination via EW measurement likely produces different outcomes.
The uncertainties of the abundances are computed as described in Giribaldi & Smiljanic (2023b).Namely, grids of A(Mg) offsets induced by typical offsets of T eff , log g, [Fe/H], and log g were produced.For that, we first synthesised spectra around the lines 5528 and 5711 Å.The spectra were produced with four values of T eff between 4500-6000 K, three values of log g between 1-3 dex, four values of [Fe/H] between −3 to −0.5 dex, seven values of A(Mg) between 4.8-7.2dex, and two values of v mic between 1-2 kms −1 .Subsequently, we computed abundances varying T eff , log g, [Fe/H], and v mic , one at a time, by their typical errors.These are ±50 K in T eff , ±0.15 dex in log g, ±0.13 dex in [Fe/H], and ±0.3 kms −1 in v mic .Therefore, four grids of offsets with 672 spectra for each Mg line are available.We interpolated these grids with the atmospheric parameters of each star in Table 2 to compute their A(Mg) errors.Since the grids provide offsets induced by ±50 K, the A(Mg) errors corresponding to the T eff errors in Table 2 are obtained by computing the required quantity.For example, for a T eff error of ±100 K, the error provided by the grid must be multiplied by two.The typical A(Mg) error induced by T Hα eff errors in Table 2 is ±0.05 dex, whereas those induced by log g, [Fe/H], and v mic are of the order of 10 −3 , thus they were neglected; Fig. A.7 shows the distribution of the offsets as function of the atmospheric parameters.This error, ±0.05 dex, is hence considered the typical A(Mg) error associated to the errors of all atmospheric parameters combined.This quantity is slightly smaller than the standard deviation of the abundance difference in Fig. 16, this is ±0.07 dex.Assuming this quantity is the result of the addition in quadrature of A(Mg) errors induced by the atmospheric parameters and the spectral noise and normalization, we deduce the latter is also ±0.05 dex.Therefore, we add the latter in quadrature to the A(Mg) error induced by T Hα eff to compute the total A(Mg) error given in Table 2.
6.2.Carbon, nitrogen, and oxygen C, N, and O abundances were derived following the strategy in Karinkuzhi et al. (2021).LTE spectral synthesis were performed using Turbospectrum with spherical MARCS model atmospheres (Gustafsson et al. 2008) considering the linelist of Heiter et al. (2021).Oxygen was determined by fitting the triplet at 7771-7775 Å. C and N abundances were not determined by individual line fitting as for Fe, Mg and O, but by fitting molecular band regions.For this purpose we adapted a minimum χ 2 routine, where grids of synthetic spectra varying the abundance of a given element were interpolated.We derived C 2 from the 5626-5634.5Å region (avoiding the band-head at 5635 Å), as it appears less saturated than the 5155-5164 Å region, although similar abundances were obtained.The 12 C/ 13 C ratio was derived by fitting the 4737.5-4744Å region, and the 13 C line at 8016.429 Å which appears reasonably free from blends.Nitrogen abundance was derived by fitting either the 6305-6330 Å or the 6345-6378 Å region once C was determined.Figure 17 shows an example of the spectral fitting for the determination of C, N, O, and 12 C/ 13 C ratio for HD 76396.Table 4 lists CNO abundances derived for CEMP stars and their errors induced by the T eff , log g, and [Fe/H] errors in Table 2, separately.
As shown on Fig. 18, all our CEMP stars are indeed enriched in carbon, most of them with [C/Fe]> 1.They are consistent with the higher-metallicity, upper carbon plateau (at A(C) ∼ 8.25) as identified in Spite et al. (2013), as expected since this region seems to contain mainly CEMP-s (and -sr) objects (Hansen et al. 2015a;Bonifacio et al. 2018).On the contrary, for lower metallicities ([Fe/H] < −3.4), the carbon abundance drops below A(C) = 7.6 (Bonifacio et al. 2018) and this region seems to contain mainly CEMP-no objects.

Conclusions
We provide accurate and precise atmospheric parameters (T eff log g, and [Fe/H]) and Mg, C, N, and O abundances for 47 metal-poor red giant stars (Table 2).We refer to this sample of benchmark metal-poor stars, along with the metal-poor dwarfs of Giribaldi et al. (2021), as the Titans reference stars.In this sample, 34 stars have the most precise parameters.Namely, they have total uncertainties of 40-80 K in T eff , 0.15 dex in log g, 0.09 dex in [Fe/H], and 0.07 dex in A(Mg).We tested the accuracy of the derived atmospheric parameters with most of the (few) standard metal-poor stars available up to date, from which we could either obtain new HERMES spectra or recover spectra from the ESO archives.We summarize below the arguments in support of the T eff , log g, and [Fe/H] determination accuracy.
We derived effective temperatures (T Hα eff ) by fitting observational Hα line profiles with 3D non-LTE synthetic models (Amarsi et al. 2018).The evaluation of the accuracy of T Hα eff is graphically summarized in Fig. 9, where the reference temperatures are either from interferometry (red symbols) or IRFM (dark symbols).The three scales are consistent and can be considered equivalently accurate.This is consistent with results obtained for dwarfs in Giribaldi et al. (2021).We adopt the dispersion of the comparison in the figure, ±46 K, as the uncertainty of the 3D non-LTE Hα model for giants.The outlier stars HD 2665, HD 221170, and HD 175305, have relatively low interferometric T eff values, which are likely effects of small angular diameter offsets; see Sect.5.1 for details.
Based on the outcome above, we evaluate the accuracy of the color-dependent T eff relations of Casagrande et al. (2021) currently available for Gaia photometry.The evaluation is summarized in Fig. 10, where we observe excellent agreement for BP − RP and G − BP colours, disregarding CEMP stars.In Table 2 we list a second set of temperatures derived by other methods found to correspond with T Hα eff .They can be averaged with T Hα eff to obtain more precise T eff of the Titans.We selected, when available, temperatures derived by the most direct to the less di-1, except for HD 26, the most metal rich star in the sample and spectroscopic binary as well.Heavy elements of CEMP stars will be presented in Giribaldi et al. in prep.A few Titans seem to have started their path to the horizontal branch (HB), these are BD+17 3248, BD+09 2860, CD−41 15048, CD−62 1346, BD+11 2998, and HD 196944, the latter being a CEMP star; they are tagged in the HR diagram of Fig 2 .BD+17 3248 seems a genuine HB star, it have been already labeled as such, although with significantly higher T eff and lower log g values (For & Sneden 2010).The other five stars shown signs of binarity: CD−41 15048, BD+09 286, BD+11 2998 show proper motion accelerations from Hipparcos and Gaia's astrometric data (Kervella et al. 2019;Brandt 2021); CD−62 1346 shows significant variation of its radial velocities (Escorza et al. 2019); and HD 196944 is certainly a spectroscopic binary (Karinkuzhi et al. 2021).Our observational T eff and luminosities show that CEMP stars are rather located to the right part of the normal red giant branch stars, which is supported by the STAREVOL evolutionary tracks specifically computed for CEMP stars (Siess & Arnould 2008).
We recommend the use of [Fe i/H] in Table 2  Finally, we recommend to avoid the use of the excitation and ionization equilibrium of Fe and spectral fitting under LTE to simultaneously derive T eff , log g, and [Fe/H] in metal-poor giants.They will likely provide biased determinations, as demonstrated in this work.Temperature outcomes of these two spectroscopic methods are consistent with those from the color-T eff relations of Alonso et al. (1996), however all are biased to cooler values; see also Casagrande et al. (2010) where the same offset as here was obtained.Users may confidently upgrade their methods for fast T eff determination using the Gaia colour-T eff relations with BP − RP and G − BP (Casagrande et al. 2021) for giants, and preferably with BP − RP for dwarfs.Alternatively, the differential spectroscopic approach (e.g.Meléndez et al. 2012;Reggiani et al. 2016) can be used with with one or more Titans as standards.

Fig. 1 .
Fig. 1.Mollweide projections of the 95 Titans benchmark stars: 48 dwarfs (Titans I, stars) and 47 giants (Titans II, present work, circles).Left: in equatorial coordinates, color-coded by the index color B − V, the size of the star depends on the apparent G magnitude: the larger the brighter.Right: in galactic coordinates, color-coded by the metallicity determined in this work (see Table2), the size of the star scales with the Gaia parallax.

Fig. 2 .
Fig. 2. Titans in the Kiel diagram (left panel) and HR diagram (mid and right panels).Surface gravities of Titans II giants (log g < 3.5 dex) are listed in Table2,

Fig. 4 .
Fig. 4. Observed spectra of the CEMP star HD 26 compared with its modeled spectra.The elements of the plots are the same as those in Fig. 3.

Fig. 5 .
Fig. 5. Temperature offset obtained by assuming a log g bias of +0.1 dex.Differences are plotted as function of T eff (top panel), and log g (bottom panel).The cases for T eff = 5000 K and log g = 2.5 dex are highlighted by black lines.

Fig. 6 .
Fig. 6.Temperature offset obtained by assuming a [Fe/H] bias of +0.07 dex.Differences are plotted as function of T eff (top panel), and [Fe/H] (bottom panel).The cases for T eff = 5000 K and [Fe/H]] = −2.0dex are highlighted by black lines.

Fig. 7 .
Fig. 7. Top panel: For the star HD 122563, line-to-line Fe abundances computed with v mic = 2.5 kms −1 as function of the reduced equivalent width.Gray and red dots represent Fe i and Fe ii lines, respectively.The solid dark red line is the trend of all Fe i and Fe ii measurements.The black dashed and red dotted lines are the trends of Fe i and Fe ii measurements, respectively.The shade represents the dispersion.Bottom panel: For HD 122563, metallicity as function of v mic color-coded by the slope of the trends in the v mic -[Fe/H] plane.
in their comparison of the outcomes from the Michigan Infrared Combiner (MIRC, Monnier et al. 2004), the Classic combiner, and the Precision Astronomical Visible Observations (PAVO, Ireland et al. 2008) combiner in the Center for High Angular Resolution Astronomy (CHARA) array (ten Brummelaar et al. 2005).Casagrande et al. (2014) also comment on the outcomes for the solar twin 18 Sco (Porto de Mello & da Silva 1997) from PAVO by Bazot et al. (2011) and from Classic by

Fig. 9 .
Fig.9.Temperature difference as function of the atmospheric parameters in Table 2 and the angular diameter in Table1.The plots include the Gaia benchmarks (see Fig.7of Paper I).The stars for which T Hα eff were derived in this work can be identified by their name labels in the top and bottom panels.The stars are colorclassified according to the method used to infer their standard T eff in the literature: direct application of interferometry (red), indirect interferometry based in color calibrations (blue), and IRFM (gray).The symbol sizes are inversely proportional to the log g values.The two temperature measurements of HD 175305 (see Table1) are connected by a vertical dashed line.The red (dashed grey) line at −11 K (+28 K) indicates the median of the differences between Hα temperatures and those derived from interferometry (from IRFM), computed omitting HD 221170, HD 175305, and HD 2665.Their dispersion of 42 K (46 K) is indicated by the error bar with the same color.

Fig. 10 .
Fig. 10.  .Comparison between effective temperatures from the color-T eff relations ofCasagrande et al. (2021) and those derived by Hα in this work.Each column corresponds to one Gaia colour, and each row shows the distribution of the differences with respect to T eff , [Fe/H], and log g, respectively.Titans I dwarfs are plotted in blue, and CEMP stars are highlighted with dark contours.The symbol sizes are inversely proportional to log g.Red lines connect medians computed in equally spaced bins, and vertical bars correspond to 1σ dispersions; their values are indicated in red characters.Medians and corresponding 1σ dispersions for only giants (gray symbols) dismissing CEMP stars are given in red characters.Giants with temperature differences larger than 2σ dispersions are labeled by their catalogue numbers.

Fig. 13 .
Fig. 13.Surface gravity differences as function of the atmospheric parameters.

Fig. 14 .
Fig. 14.Deviation from ionization balance as function of the atmospheric parameters.Dwarf stars from Paper I are plotted in blue, and CEMP stars are highlighted with dark contours.The symbol sizes are inversely proportional to log g.Red lines in top and bottom panels connect medians computed in equally spaced bins of T eff and log g, respectively; medians and bin-sizes are indicated in the plots.The vertical bars correspond to 1σ standard deviations; corresponding values are indicated in the plots as well.

Fig. 15 .
Fig. 15.Inferred iron abundance against the ratio between equivalent width and wavelength for selected Fe i lines.Horizontal lines represent the averages of the symbols with same colors.Dashed lines represent [Fe ii/H] abundances.Shades represent 1σ dispersions around average values for non-LTE Fe i (red color tone), and Fe ii (yellow color tone).
Fig. A.1.Examples of the initial selection of red giants with UVES spectra in the ESO archive.Left panel displays stars with spectra of 200 ≤ S /N ≤ 300, whereas right panel displays stars with S /N > 300.The limits set are to separate main sequence stars (gray symbols) from red giant candidates (red symbols) are arbitrary.
Fig. A.4.Similar to Fig. A.2 for stars with asteroseismic measurements.

Table 2 .
Determined atmospheric parameters of the giant Titans. [Fe/H] Fig. 8. Magnesium profile fits of the CEMP star HD 76396.The observational profile is represented by the black line.The thick red line represents the fitted profile and shaded areas indicate the fitting windows.Right panels show histograms of the log g values associated to all pixels within the shaded windows.The most probable log g and its error are obtained by bootstrapping, see main text.

Table 3 .
Blackwell & Shallis 1977; measurements.Values in third column were extracted fromHon et al. (2021).In fifth column, log g Mg is the weighted average of the values obtained from the lines 5172 and 5183 Å, its errors are estimated in Sect.5.4.by means of the so-called InfraRed Flux Method (IRFM,Blackwell & Shallis 1977;

Table 2 .
only for reproducing observational Fe i lines via synthesis under LTE, whereas [Fe ii/H] are accurate quantities required for evolutionary analysis of stellar populations.Similarly, we recommend to use A(Mg) in Table 2 corrected by −0.07 to reproduce observational lines under LTE, as the quantities listed are corrected from non-LTE effects.

Table A .
1. Magnesium abundances and surface gravities.Notes.Second and third columns lists surface gravities derived from the Mg triplet lines 5172 and 5183 Å, respectively.Fourth and fifth columns lists Mg abundances derived from the lines 5528 and 5711 Å, respectively.Brackets indicate uncertain values because of line blending.Last column lists log g iso .