ARES VI: Viability of one-dimensional retrieval models for transmission spectroscopy characterization of exo-atmospheres in the era of JWST and Ariel

Observed exoplanet transit spectra are usually retrieved using 1D models to determine atmospheric composition while planetary atmospheres are 3D. With the JWST and future space telescopes such as Ariel, we will be able to obtain increasingly accurate transit spectra. The 3D effects on the spectra will be visible, and we can expect biases in the 1D extractions. In order to elucidate these biases, we have built theoretical observations of transit spectra, from 3D atmospheric modeling through transit modeling to instrument modeling. 3D effects are observed to be strongly nonlinear from the coldest to the hottest planets. These effects also depend on the planet's metallicity and gravity. Considering equilibrium chemistry, 3D effects are observed through very strong variations in certain features of the molecule or very small variations over the whole spectrum. We conclude that we cannot rely on the uncertainty of retrievals at all pressures, and that we must be cautious about the results of retrievals at the top of the atmosphere. However the results are still fairly close to the truth at mid-altitudes (those probed). We also need to be careful with the chemical models used for planetary atmosphere. If the chemistry of one molecule is not correctly described, this will bias all the others, and the retrieved temperature as well. Finally, although fitting a wider wavelength range and higher resolution has been shown to increase retrieval accuracy, we show that this could depend on the wavelength range chosen, due to the accuracy on modeling the different features. In any case, 1D retrievals are still correct for the detection of molecules, even in the event of an erroneous abundance retrieval.


Introduction
Over the past three decades, there has been an exponential rise in the discovery of exoplanets that have dramatically expanded our understanding of exoplanets.Their observed diversity (e.g., (Gaudi et al. 2021)) has fueled the excitement and imagination of the scientific community and the general public and challenged long-held assumptions about planet formation and evolution.The focus in the exoplanet field has shifted from studying the bulk and orbital parameters to understanding the true nature of exoplanets through their compositions, atmospheres, and climates (Gaudi et al. 2021;Guillot et al. 2022).The atmospheres are shaped by the stellar environment and offer a glimpse into the planetary interior, which holds evidence of the formation process of the planet.Transiting exoplanets are very well suited for atmospheric analysis, as the stellar light filtering through their atmosphere provides a wealth of information on its composition and thermodynamic state (Sing 2018;Tsiaras et al. 2018).Transmission spectroscopy has therefore emerged as the most promising technique for atmospheric characterization (Seager & Sasselov 2000;Tinetti et al. 2007;Kreidberg et al. 2014;Line et al. 2016;Welbanks et al. 2019;Madhusudhan 2019;Edwards et al. 2020;Skaf et al. 2020;Pluriel et al. 2020a;Guilluy et al. 2020;Mugnai et al. 2021).Until very recently, the main challenges in atmospheric studies using this technique were limited sensitivity and spectral range that prevented proper sampling of molecular features.In this regard, the launch of the James Webb Space Telescope (JWST) marks a new milestone in the exploration of exoplanets and provides us with a unique opportunity to uncover their true nature and formation-evolution histories.The upcoming space mission Atmospheric Remote-Sensing Infrared Exoplanet Large-survey (Ariel) of the European Space Agency (ESA) will provide yet another important contribution, by systematically characterizing the atmospheres of entire populations of exoplanets (e.g.Charnay & Drossart (2023)).
The JWST (McElwain et al. 2023a) is a large, highly sensitive infrared-optimized space telescope developed in a collaboration between NASA, the European Space Agency (ESA), and the Canadian Space Agency (CSA).The JWST science goals were organized into four themes, including the Planetary Systems and the Origins of Life theme, which aims to determine the physical and chemical properties of planetary systems and investigate the potential for the origins of life in those systems (Gardner et al. 2006;Beichman & Greene 2018).The overall design of the observatory includes three main components: the telescope and scientific instruments, the 5-layer sunshield, and the spacecraft bus.The telescope has a unique 6.5 m-class primary mirror comprising 18 hexagonal segments, phased together to act as a single mirror.There are four science instruments, each with several observing modes: a Near-Infrared Camera (NIRCam, 0.6-5.0µm) (e.g., (Beichman et al. 2012)), a Near-Infrared Spectrograph (NIRSpec, 0.6-5.3µm) (Jakobsen et al. 2022), a Near-Infrared Imager and Slitless Spectrograph (NIRISS, 0.6-5.0µm) (Doyon et al. 2012), and a Mid-Infrared Instrument (MIRI, 5-28.5 µm) (Rieke et al. 2015).The telescope and science instruments are passively cooled to below 50 K by the sunshield and thermally isolated from the spacecraft bus and solar array.The combined spectral coverage of all science instruments is from 0.6 to 28.5 µm, opening uncharted territory in atmospheric characterization, as evidenced by recent inferences of CO 2 and SO 2 in the atmosphere of the hot Jupiter WASP-39b (Bean et al. 2018;Rustamkulov et al. 2023;Feinstein et al. 2023;Ahrer et al. 2023;Tsai et al. 2023;Alderson et al. 2023a).
The Atmospheric Remote-Sensing Infrared Exoplanet Large-survey (Ariel) (Tinetti et al. 2018(Tinetti et al. , 2021) ) is the M4 ESA space mission of the Cosmic Vision program and will operate from the L2 point starting in 2029.Ariel is a dedicated mission for the spectroscopic observation of transiting exoplanets and will conduct the first unbiased survey of a large and diverse sample of approximately 1000 planets in the optical and nearinfrared (e.g., Mugnai et al. 2021).The Ariel payload will mount an off-axis Cassegrain telescope with a 1-m class primary mirror feeding a collimated beam, split by dichroics, into two separate instruments with a coincident field of view: the Fine Guidance System (FGS) and the Ariel InfraRed Spectrometer (AIRS).FGS includes three photometric channels (VISPhot, 0.5-0.6 µm; FGS1, 0.6-0.80 µm; FGS2, 0.80-1.1 µm) and a low-resolution Near-InfraRed Spectrometer (NIRSpec, 1.1-1.95µm and R ≥ 15); AIRS has two low-to-medium resolution IR channels (CH0, 1.95-3.9µm and R ≥ 100; CH1, 3.9-7.8µm and R ≥ 30).The payload module comprising the telescope and the science instruments will be passively cooled to ∼55 K with the thermal shield assembly.With this instrumentation, Ariel will provide simultaneous observations of the whole 0.5 to 7.8-µm spectral band, encompassing the emission peak of warm and hot exoplanets and well-sufficient to detect all molecular species (Encrenaz et al. 2015).Key science questions to be addressed by the Ariel mission include: "What are the physical processes shaping planetary atmospheres?","What are exoplanets made of?" and "How do planets and planetary systems form and evolve?" (Turrini et al. 2018;Changeat et al. 2020a).JWST and Ariel are complementary of each other.JWST, with its exquisite sensitivity, is already providing transformative science for the field of exoplanets and, thanks to its precision launch, will continue to do so for the next 20+ years (McElwain et al. 2023b), with about 25% time allocated to exoplanet observations.Ariel, on the other hand, already during the four-year nominal mission, will observe a conspicuous fraction of the known planetary population, providing the statistics to interpret the observed gamut of planetary bodies and place the JWST observations into a wider context.
The exquisite sensitivity and wavelength range offered by JWST and, to a lesser extent, Ariel, enable the study of subtler effects than previously possible, opening new pathways to understanding the true nature of exoplanets.A prime example is the characterization of the 3D nature of exoplanets.In this regard, the geometry of a transit allows us to probe only a limited volume around the terminator line -the region of the atmosphere at the border between the hot day-side and the cooler night-side (Brown 2001;Kreidberg 2018).Therefore, transit spectroscopy measurements average the atmospheric features from the morning and evening sides of the planet (Fortney et al. 2010;Caldas et al. 2019;Wardenier et al. 2022).Often, a uniform terminator is assumed in interpreting transit spectroscopy data, disregarding 3D atmospheric effects.This assumption can be sufficient to interpret the data in the case of cold planets that have more homogeneous atmospheres (MacDonald et al. 2020) but not for hot Jupiters and ultra-hot Jupiters.Hot and ultra hot Jupiters present strong day-night contrast in temperature due to the shorter radiative timescale compared to the dynamical timescale (Sudarsky et al. 2000;Guillot 2010;Bell & Cowan 2018;Arcangeli et al. 2018), and also chemical heterogeneities (Changeat et al. 2019;Baeyens et al. 2021).These differences diminish as we move towards warm Jupiter (equilibrium temperature below 1400 K), where zonal circulation tends to homogenize atmospheres (Venot et al. 2020;Baeyens et al. 2022).Global Circulation Models (GCMs) are numerical models of the full 3D structures and dynamics of atmospheres that can help us explore and gain a better understanding of atmospheric processes and characteristics that cannot be captured by a 1D plane parallel approach (Showman et al. 2008;Leconte, J. et al. 2013;Venot et al. 2014;Parmentier et al. 2018).
This work addresses questions on the interpretability of transmission spectra given the 3D nature of exoplanetary atmospheres.The first one is: "How does the 3D atmospheric structure affect the transmission spectra of exoplanets, from a cold planet to an ultra-hot Jupiter?".Recent works (Pluriel et al. 2020b;Lacy & Burrows 2020;Wardenier et al. 2022;Pluriel et al. 2022) concluded that, for ultra-hot Jupiters, the 3D structure plays a major role in shaping the transmission spectra.If the temperature of the atmosphere is not high enough to dissociate a molecule on both the day-side and the night-side, the amplitude of its spectral features will be larger than predicted by a 1D plane parallel approach.In addition, Falco et al. (2022) showed how the changes in planet orientation during the transit allow us to probe the horizontal variations in the atmosphere.Pluriel et al. (2022) investigated 3D effects in the transmission spectra of hot Jupiters, dividing them into three main groups: vertical effects, horizontal effects along the limb, and horizontal effects through the limb.
The second question we ask is: "Can 1D retrievals find consistent parameters (T-P profile, abundances, C/O ratio, metallicity, and clouds)?".This question is closely related to the first one: if the 3D atmospheric structure strongly affects transmission spectra -can 1D retrieval models find correct atmospheric parameters?In this regard, MacDonald et al. ( 2020) investigated the following conundrum: "Why are most inferred temperatures from transmission spectra far colder than expected from the equilibrium temperature?".They concluded that a 1D model can fit the transmission spectra of planets with asymmetric terminators, but retrieved atmospheric parameters may not represent true terminator averages.Also, the retrieved temperatures of planetary terminators may be biased by hundred degrees below their real value, and this bias is most extreme in the case of ultra-hot Jupiters (but see Welbanks & Madhusudhan (2022)).This study also mentions the biases in chemical abundances derived from 1D retrievals.Pluriel et al. (2020b) concluded the same -if the temperature and the chemical composition vary across the limb, which is the case for 3D structures, 1D retrievals cannot find the correct molecular abundances.This also affects the inferred C/O ratio, which is an indirect estimate based on the abundances of all C-and O-bearing molecules.Recently Zingales et al. (2022) and Welbanks & Madhusudhan (2022) demonstrated that the choice of the retrieval model is critical for correctly retrieving the thermal structure of the atmosphere.Lastly, Pluriel et al. (2022) provided a "cheat sheet" of the minimum model assumptions needed to avoid biases in interpreting atmospheric properties.When optical absorbers are present, 1D models are adequate to describe transmission spectra for atmospheric equilibrium temperatures lower than 1400 K; in their absence, the 1D assumption can extend up to 2000 K. Above these temperatures, retrievals with 1D models return biased estimates of the parameters in the forward model.
The problem of terminator inhomogeneities addressed here can be solved by more sophisticated retrieval using 2D atmospheric models (MacDonald et al. 2020).As these inhomogeneities would be observable by JWST (Espinoza & Jones 2021), this is part of the current needs.Zingales et al. (2022) and Welbanks & Madhusudhan (2022) have studied hot and ultrahot Jupiter to show that models beyond the 1D approach such as 2D models can retrieve consistent temperature profiles.Similarly, MacDonald & Lewis (2022) and Nixon & Madhusudhan (2022) have implemented a 3D parametric pressure-temperature profile to model day-night temperature variations with improved retrieval accuracy.This study is part of this new challenge in exoplanet research, and will specifically address the issue of errors introduced by 1D retrieval models when applied to contrasting 3D atmospheres.
To address the above-mentioned issues, we need two distinct branches of action: one that can produce simulated transmission spectra for different planets, accounting for 3D structures, and one that can perform atmospheric retrievals and compare the inferred parameters with the forward models.Our goal is to assess the extent to which our retrievals can reconstruct the true atmospheric composition.This end-to-end system allows us to investigate the interpretation of our atmospheric retrievals consistently.The first branch starts from precomputed GCMs for three planets (see Section 2.1): -GJ 1214 b, a Neptune-like planet orbiting an M-type star; -HD 189733 b, a hot Jupiter around a K-type star; -WASP-121 b, an ultra-hot Jupiter orbiting an F-type star.
Then, it uses the Pytmosph3R code to produce 3D transmission spectra (see Section 2.2).In this step, we use two different configurations for each planet: an equilibrium chemistry model and a model with chemical profiles constant with altitude.The final products of the first branch are spectra with attached errorbars, to reproduce spectra "as observed" by JWST and Ariel.The expected errorbars are estimated using PandExo and ArielRad, two simulators of the noise performance of JWST and Ariel, respectively (see Section 2.3).The second branch starts from these simulated observations and performs a Bayesian retrieval to estimate the best-fitting parameters of the model.As a retrieval tool, we use the retrieval framework TauREx 3 (Tau Retrieval for Exoplanets), briefly described in Section 3.1.Section 3.3 details the retrieval procedure, the chemical configurations, and the other atmospheric parameters.The retrieval procedure is the same for all spectra -the same set of atmospheric models is applied, to remove our a priori knowledge and compare the obtained results correctly.We discuss the results from this comparison and their implications for interpreting transmission spectra with the 1D assumption in Section 4. Our conclusions are summarized in Section 5.

Transmission spectra simulations
The following three planets (GJ 1214 b, HD 189733 b, WASP-121 b) have been chosen from warm Neptune to ultra-hot Jupiter (see Table 1) in order to study the transmission spectra and retrieval biases depending on the temperature of the planets.Following Al-Refaie et al. (2022a) study, we go a step further by focusing on biases arising from 1D vertical thermal variation as well as full 3D thermal structure.We simulate JWST (NIRSpec + MIRI) and Ariel observations.We do not consider clouds to focus on general planetary properties biases.

Global Climate Models
GJ 1214 b has been simulated using the generic Planetary Climate Model.This model has been specifically developed for exoplanets and paleoclimate studies (Charnay et al. 2015;Leconte, J. et al. 2013).The dynamical core solves the primitive hydrostatic equations of meteorology on an Arakawa C grid, using a finite difference scheme.Radiative transfer is solved using the correlated-k model.Radiative effects of H 2 , He, H 2 O, CH 4 , NH 3 , CO, and CO 2 are taken into account, assuming a 100x solar metallicity.The horizontal resolution is 64 × 48 and we use 45 vertical layers between 80 bar and 3 Pa, equally spaced in log pressure.The star is taken as a blackbody at 3026 K, and we assume an internal temperature of 60 K.The dynamical time-step is 60s and the physical/radiative time-step is 300s.The model was integrated for 1600 days.For a more complete description of the model and the simulation, we refer the reader to Charnay et al. (2015), as the model used here is from this study.
For HD 189733 b, we make use of the Met Office Unified Model (Drummond et al. 2018).The model solves the deep atmosphere, non-hydrostatic Navier-Stokes equations on an Arakawa C grid.Radiative transfer is handled through the SOCRATES 1 code adapted for hot Jupiters (Amundsen et al. 2016).A chemical relaxation scheme is used and the radiative transfer is computed in 32 wavelength bins.The chemistry includes CH 4 , CO and H 2 O.The simulation is integrated for 1000 Earth days.For a more complete description of the model, see Drummond et al. (2018) as the model used here is the same as in this study.
WASP-121 b has been simulated using the SPARC/MIT global circulation model (Showman et al. 2009).The model solves the same primitive equations of meteorology as the generic Planetary Climate Model on a cubic-sphere grid.It has been widely used for various hot Jupiters (Showman et al. 2009;Kataria et al. 2015;Parmentier et al. 2016Parmentier et al. , 2018Parmentier et al. , 2021) ) and has also been applied to ultra-hot Jupiters (Kreidberg 2018;Arcangeli et al. 2019).For this study, we use the model published in Parmentier et al. (2018).The horizontal resolution is C32, equivalent to 128 cells in longitude and 64 in latitude and 53 vertical levels with pressure ranging from 200 bar to 2 µbar.Radiative transfer is handled using the two-stream approximation with 11 wavelength bins, as done in Kataria et al. (2013).The model assumes chemical equilibrium, taking into account the thermal dissociation of water and hydrogen.The chemical species taken into account are H 2 O, H − , CO, TiO, Na, Vo, K and H 2 .However, H 2 recombination is neglected despite its non-negligible impact on the thermal and dynamical structure (Tan & Komacek 2019).For a more complete description of the model and the simulation, we refer the reader to Parmentier et al. (2018).
In the three models considered here, disequilibrium chemistry is not taken into account and the models are cloud-free and haze-free.
For each planet, we build a pseudo-1D version of the 3D model.The temperature profiles of the whole grid are replaced by the same 1D profile.This profile is the temperature profile at the equator of the western terminator (coldest one) for each model.Thus, the pseudo-1D models are representative of each temperature condition from warm Neptune to ultra-hot Jupiter.The purpose of these pseudo-1D models is to control the correct behavior of the retrieval code.If we consider the 1D retrieval code, it should correctly retrieve the pseudo-1D models.Thus, we can confidently untangle the 3D effects.

Pytmosph3R
Based on the planetary atmospheres described in Section 2.1, we used the latest version of Pytmosph3R (Falco et al. 2022) to generate the transmission spectra.It takes into account the 3D structure of the atmosphere and uses directly the line-by-line cross 1 https://code.metoffice.gov.uk/trac/socratessections calculated by ExoMol (Yurchenko et al. 2011;Tennyson & Yurchenko 2012;Barton et al. 2013;Yurchenko et al. 2014;Barton et al. 2014), more specifically cross sections are taken from Chubb et al. (2021).Species abundances were established in two different ways: (i) a constant chemistry model, i.e.where abundances are independent of temperature and pressure, thus constant everywhere, (ii) an equilibrium chemical model.The abundances of the constant chemistry model have been chosen close to the equilibrium ratio for a given temperature representative of each planet, and set up over all longitudes, latitudes and altitudes.We have taken into account only the main species: H 2 O, CO, CH 4 , CO 2 , HCN, C 2 H 2 , NH 3 , C 2 H 4 , and in addition TiO, VO, K, Na, SiO, FeH for HD 189733 b and WASP-121 b.The values are given in Table B.1, B.2, and B.3, input column in red.This theoretical construction, not representative of the real atmosphere, removes one degree of freedom (chemistry) to check how 1D retrieval models handle 3D atmospheric thermal structures and thus what can be the biases.
Equilibrium chemistry leads to variable chemical profiles.In the modeled atmospheres, equilibrium chemistry can be expected in the hottest and densest regions.However, we know that non-equilibrium chemistry must be accounted for, especially in the upper atmosphere (Cooper & Showman 2006;Moses et al. 2011Moses et al. , 2012;;Venot et al. 2012Venot et al. , 2020;;Molaverdikhani et al. 2019;Tsai et al. 2021Tsai et al. , 2022)).This has been very recently implemented in the retrieval code (FRECKLL code developed by Al-Refaie et al. (2022b)) but this is computationally expensive.We used ACE chemistry (see Section 3.2 for details) to model the input equilibrium chemistry of GJ 1214 b and the chemical abundances used in Parmentier et al. (2016) to model the input equilibrium chemistry of HD 189733 b and WASP-121 b.This theoretical construct, relative to the constant case, focuses on the research biases that can arise from the chemical retrieval models.

PandExo -JWST
To simulate JWST observables (spectra Figure 1), we utilize the PandExo 1.5 package (Batalha et al. 2017).This program is a noise simulator designed for JWST transiting observations of exoplanets.We make use of the model to simulate one transit of each planet using NIRSpec-PRISM and one transit using MIRI-LRS.For each planet, we consider a saturation limit of 80% of the full well and a fraction of time out-of-transit to in-transit of 1.For each instrument and planet, we use the optimize option for the number of groups per integration, which automatically defines the best settings to carry the observations.The planet and star-specific parameters used for producing the observables are summarized in Table 1.PandExo neither includes the varying stellar noise nor takes into account the transit ingress and egress.However, we estimate that the produced observables are a good enough approximation to what one will observe with the JWST facility and should not change the conclusions of this work.It is worth noting that the three systems studied here have a Jband stellar magnitude below 11.4, which is the estimated saturation limit of NIRSpec-PRISM.Thus, these systems are not observable with this instrument configuration.Yet, we still performed the computations and studied these systems, as the goal of this analysis is not to prepare JWST observations per se, but to highlight possible biases introduced by retrievals on JWST-like data sets.We did not add random noise, as explain in the following section.

ArielRad -Ariel
The Ariel Radiometric model ArielRad (Mugnai et al. 2020) is the radiometric simulator of the Ariel payload, developed and maintained by the Ariel Consortium.We will briefly describe ArielRad here; for a technical description, including the detailed noise model, the reader is encouraged to read the original article.
Given the description of the payload and a list of candidate exoplanets, ArielRad outputs the expected experimental uncertainty on their measured atmospheric transmission or emission spectra.The simulation propagates the stellar light through the payload, accounting for each transmission or dispersion by interposing optical components until reaching the focal planes.Then, ArielRad evaluates the noise contributions (with margins) from stationary processes, i.e., stellar photon noise, detector noise, dark current, zodiacal background, and instrument emission.Jitter noise is computed externally by ExoSim (Sarkar et al. 2021), the end-to-end time-domain simulator of Ariel observations, and included in the final noise budget.
Then, ArielRad returns the uncertainty estimates on a single transit or eclipse observation.Ariel defines an observation to last 2.5 times the time between the first and the last contact between the planetary and the stellar disks to obtain a sufficiently long baseline integration for the light curve fit and the transit depth estimation (Mugnai et al. 2020).Because the astronomical measurement is the contrast ratio with the signal from the stellar host, ArielRad uses the contrast ratio to compute the observed spectrum's expected signal-to-noise ratio (S/N).
The Ariel mission adopts a four-tier observation strategy in which, after each observation, the resulting spectrum from each spectrometer is binned in data analysis according to specific requirements to optimize the S/N and the mission scientific return (Edwards et al. 2019).ArielRad, knowing the binning and spectral resolution implemented in the different tiers, computes the S/N in the spectral bin according to the tier of interest.Then, it estimates the number of observations required for each planet to reach the tier's required S/N.
From the number of observations, ArielRad obtains the final noise estimate for a planet by rescaling the noise on a single observation.These uncertainties can be attached to simulated forward models of transmission or emission spectra, binned down to the tier spectral resolution to obtain the simulated observed spectra.
Computing the errorbars We utilize the general procedure described above to calculate the Ariel observation uncertainties for the three planetary targets of interest.We use an updated version of the code, which is a wrapper of ExoRad, the instrumentindependent version of the radiometric simulator publicly available on GitHub 2 , and ArielRad-Payloads, the repository of configuration files for the payload maintained by the Ariel Consortium.For reproducibility, we report the code versions in Table 2.
For each planet, we assume a mean molecular weight of 2.3 a.m.u. to simulate H 2 -He dominated atmospheres.We use this parameter to calculate the atmospheric scale height and, consequently, the contrast ratio of the transit.We acknowledge that assuming an H 2 -He dominated atmosphere conflicts with the GJ 1214 b model detailed above, which uses a much higher 2 https://github.com/ExObsSim/ExoRad2-publicTable 2. Versions of the codes used to generate the Ariel spectra.

Code Version
ArielRad 2.4.25 ExoRad 2.1.111ArielRad-Payloads 0.0.16metallicity.Assuming a lower mean molecular weight leads us to an over-optimistic estimate of the number of transits for the modeled spectrum.Therefore, the S/N (and retrievals) will reflect a worst-case scenario.
We utilize the Ariel strategy for collecting data during an observation; therefore, the ratio of observing time in and out of transit is 1/1.5.Then, for each planet, we estimate the noise per spectral bin and from there, the S/N for a transit observation in Tier 3. In this Tier, the raw spectral data from each spectrometer are binned at R = 20, 100, 30 in FGS-NIRSpec (1.1-1.95µm), AIRS-Ch0 (1.95-3.9µm), and AIRS-Ch1 (3.9-7.8 µm), respectively.We find that 1, 3, and 4 observations are needed to achieve the Ariel Tier 3 required S/N for HD 189733 b, GJ 1214 b, and WASP-121 b, respectively.Then, we rescale the noise by the square root of the corresponding number of observations, assuming each observation has a Gaussian noise distribution.Finally, we attach the rescaled noise estimates to the respective transmission spectra, binned down at the Tier 3 wavelength grid.It should be noted that we do not scatter the spectra according to random noise corresponding to the estimated error bars.This was done throughout the paper to avoid introducing a susceptibility to individual random noise realizations, which would defeat the purpose of characterizing retrieval biases and finding intrinsic correlations between the atmospheric parameters.While using unscattered spectra may result in unrealistically precise constraints, if the spectra contain sufficiently redundant information, discrepancies vs. using scattered spectra should not be too large (Feng et al. 2018;Changeat et al. 2020a).
Alternatively, given the noise for a single transit observation and the atmospheric spectrum, we could calculate a more realistic S/N that does not rely on the assumed atmospheric scale height (Mugnai et al. 2021).However, the S/N would depend on the assumed atmospheric spectrum, changing the number of observations on a single target.However, (i) an observability study is outside the scope of this paper, and (ii) in the following we show that even for GJ 1214 b (lower number of transits than realistic given the mean molecular weight assumed) we can investigate the main effects of interest.
All the spectra calculated with the methodology described in this section are shown in Figure 1.This includes 6 input configurations for each planet (listed Table 3).

TauREx
As a retrieval tool, we used TauREx 3 (Tau Retrieval for Exoplanets) 3 a fully Bayesian inverse atmospheric retrieval framework (Al-Refaie et al. 2021).TauREx 3 consists of two main frameworks: the Forward Model framework and the Retrieval framework.The goal of the Retrieval framework is to fit a Forward Model to an observation.A Forward Model framework is necessary to provide information about the planet, the host star, 3 https://github.com/ucl-exoplanets/TauREx3_public temperature-pressure profiles as well as chemistry and contributions (e.g.collision-induced absorption (CIA), limited to H 2 -He and H 2 -H 2 in the current study, Rayleigh, gray clouds).Tau-REx 3 adopts the layer-by-layer approach for the temperature profile, which can be parameterized in different ways, such as isothermal, a radiative two-stream approximation, a custom profile loaded from a file, or a multi-point temperature profile.The vertical pressure profile is equally spaced in log-pressure, between a P max and a P min value specified by the user along with a number of layers N l (N l =200 at current study).The cloud model provided by TauREx 3 is discretized along P(z), allowing the user to define an opacity value in square meters for layers between the Article number, page 6 of 38 Yassin Jaziri et al.: ARES VI: from 1D to 3D models pressure at the top and the pressure at the bottom of the cloud deck.
TauREx 3 supports equilibrium chemistry using the ACE chemical code (Agúndez et al. 2012(Agúndez et al. , 2020)), FastChem (Stock et al. 2018), GGchem (Woitke et al. 2018), and the Free chemistry model (Al-Refaie et al. 2022a).In our study, we explore different combinations of chemistry and contribution parameters.We performed retrievals with and without clouds, using ACE, FastChem, GGchem, or Free chemistry.To perform retrievals, TauREx 3 can use several sampling techniques PyMultiNest and MultiNest, PolyChord, or dyPolyChord.For our study, we used the nested sampling retrieval algorithm Multinest with its Python version PyMultiNest (Feroz et al. 2009).The Multinest algorithm samples the parameter space by subdividing it into a set of ellipsoids according to the likelihood.This set of ellipsoids can overlap, but in cases where there are several local maxima in the parameter space, the result will be a set of multiple solutions (as can be seen in Figure 3).For more details, see Feroz & Hobson (2008) and Feroz et al. (2019).TauREx 3 is a full Bayesian Retrieval framework, which returns the best-fit transmission model spectrum along with all parameter posterior distributions and the Bayesian Evidence.For model comparisons, we use the Bayesian Evidence as defined by (Trotta 2008;Waldmann et al. 2015) to compute the logarithmic Bayes factor, where E model A and E model B are the evidences of two competing models.According to (Benneke & Seager 2013), by translating these Bayes factors into a statistical significance (Kass & Raftery 1995), ∆logE ≤ 2 can be considered as a "weak" case of models distinguishability, while 2 <∆logE < 5 corresponds to "moderate" and ∆logE ≥ 5 to "strong" cases of distinguishability.Since the Bayesian Evidences of the different retrieval models depend on the number of free parameters, and that the free chemistry model has significantly more free parameters compared the the equilibrium chemistry models, a Bayesian factor that favors the free chemistry model means a more significant improvement in goodness of fit to observations.Finally, a molecule is detected if it has a signature greater than 3σ, which is retrieved.

Chemical model
We consider in this study two types of chemistry that we are going to detail here : Free chemistry and equilibrium chemistry.
Free chemistry takes into account each chemical species with constant abundances throughout the layers of atmosphere.The Free chemistry models are considering the following species: H 2 O, CO, CH 4 , CO 2 , HCN, NH 3 , FeH, SiO, Na, K, TiO and VO.This configuration gives the model a certain degree of freedom, as it imposes no physical or chemical constraints on what will be retrieved.In this way, it is possible to retrieve abundances corresponding to non-equilibrium chemistry, or various other species distributions since the species are not correlated with each other.However, it could also retrieve non-realistic chemical abundances.It should be remembered that this is a 1D model and we are therefore limited when faced with strong vertical variation (which could be compensated for by the two-layer method of Changeat et al. 2019).
Chemical equilibrium is based on temperature and pressure conditions, this assumption is a classic hypothesis when considering exoplanetary atmospheres (Seager & Sasselov 2000;Burrows et al. 2007Burrows et al. , 2008;;Fortney et al. 2008;Madhusudhan et al. 2011;Kataria et al. 2014;Al-Refaie et al. 2021).A system is at a thermodynamic equilibrium state when there is thermal, mechanical, and chemical equilibrium at the same time.This equilibrium is characterized by the minimum of a thermodynamic potential, such as the Gibbs free energy.It happens in exoplanets' atmospheres when the dynamical timescales can be considered longer than the chemical reaction timescales and when we suppose negligible the irradiation by a dissociating or ionizing source (photochemistry or cosmic rays induced processes).Thus, chemical abundances vary with altitude according to the retrieved TP profile, as we assume they are at chemical equilibrium.As the chemical profiles are not forced to be vertically constant, this approach should be more accurate for real atmospheres than the Free chemistry approach.For very hot planets this approximation is close to reality, on the other hand, for cooler planets, vertical mixing and photodissociation have an effect on the chemical composition and the atmospheres are no longer at a thermodynamic equilibrium state.This disequilibrium chemical composition must then be taken into account with a more complex kinetic model (Cooper & Showman 2006;Moses et al. 2011Moses et al. , 2012;;Venot et al. 2012Venot et al. , 2020;;Molaverdikhani et al. 2019;Tsai et al. 2021Tsai et al. , 2022;;Al-Refaie et al. 2022b).If the observed planet's atmosphere exhibits these non-equilibrium mechanisms, or longitudinal/latitudinal heterogeneities, the retrieved parameters will be erroneous.In such cases, it may not be possible to find a consistent fit, or the retrieval may find an adequate fit that corresponds to erroneous parameters.
Although several different algorithms have been made to calculate the chemical composition at equilibrium, we focused on three algorithms for this study: ACE (Agúndez et al. 2012(Agúndez et al. , 2020)), GGchem (Woitke et al. 2018) and FastChem (Stock et al. 2018).The basic principle of these three models is the same, they start from an initial composition made up of initial abundances for each molecule taken into account then they iterate until a convergent state.However, each model has a slightly different procedure, whether in the molecules taken into account as input, the iteration method, or the network of chemical reactions.We will study in more detail these differences in this part.
ACE minimizes the total Gibbs free energy by applying the algorithm first introduced by White et al. (1958).ACE is based on an algorithm implemented in the NASA/CEA program and presented in detail in Gordon & McBride (1994).For a closed system of N chemical compounds at a certain temperature and pressure, in the absence of disturbance (transport, UV radiation, etc.), the equilibrium chemical composition can be calculated theoretically, thanks to standard-state chemical potential expressed as a function of the standard-state enthalpy and entropy of the species.These thermodynamic quantities can be calculated using NASA polynomial coefficients (see e.g.McBride et al. (2002)) in databases such as NASA/CEA (McBride et al. 2002) or the Third Millennium Thermochemical Database (Goos et al. 2016).The chemical species used include 105 neutral species composed of C, H, O, and N, more specifically species up to 2 carbon atoms and the main nitrogen species (NH 3 , HCN, N 2 , NO x ).It has been validated for temperatures as low as 300K.The reader is encouraged to consult Venot et al. (2012) for more details on the ACE code thermodynamic coefficients and calculation of thermochemical equilibrium.
Both FastChem and GGchem use a second type of method for determining the chemical composition at the equilibrium state.These two programs use the law of mass action and equilibrium constants, with some subtleties (for FastChem equi-librium constants are based on Gibbs free energy while for GGchem they are based on partition functions).This amounts to solving a system of N algebraic equations with N unknowns, which correspond to the conservation equations for N elements.The partial pressure of each molecule is defined as the partial pressure of the constituent atoms by the atomization equilibrium constant.This system of equations can then be solved by any root-finding algorithms like the Newton-Raphson method for example (Russell 1934;Brinkley 1947;Tsuji 1973).
The thermodynamical data used in FastChem are mainly from the NIST-JANAF database detailed in Chase (1998).The list of species used has been modified to take into account molecules that may be of interest in astrophysics with data from Tsuji (1973), Barin & PLatzki (1995), Burcat et al. (2005) and Goos et al. (2016).The total of species used amounts to 396 neutral and 114 charge species and the code has been validated for parameters down to 100 K and up to 1000 bar.The reader is directed to Stock et al. (2018) for more details on the list of species used.We note that the FastChem code has recently been updated with FastChem 2 (Stock et al. 2022) which is more efficient but not yet available in combination with TauREx.
Compared to the other two codes we are using, GGchem takes condensation into account.In fact, the formation of liquids and solids in the atmosphere will have an effect on the composition at thermodynamic equilibrium.Condensed species can consume certain elements leaving a significant difference in composition between before condensation and after condensation (Woitke & Helling 2004;Juncher et al. 2017).Condensation will have an effect, especially at temperatures below 2000 K so this mechanism will mainly affect GJ 1214 b and HD 189733 b in our work.The data included in this code includes 552 molecules and 257 condensates, including 38 liquids, and GGchem has been proven robust down to 100 K.All elements from hydrogen to zirconium are included, as well as the option to add tungsten and charges.We refer the reader to Woitke et al. (2018) for more details on the list of species.Note that the list of active molecules that can appear as a feature in the spectra are H 2 O, CO, CH 4 , CO 2 , HCN, NH 3 , FeH, SiO, Na, K, TiO and VO.It depends on the opacity files loaded in the TauREx program and is independent of the type of chemical model chosen.

Retrieval procedure
All spectra configurations are retrieved with the same set of retrieval models.The set of retrieval models covers all the simulated configurations.We expect that each simulated spectra will be best retrieved by the corresponding retrieval model.
Each retrieval model assumes a four points temperaturepressure (TP) profile (T top , T sur f ace , T 1 , and T 2 ) with the corresponding pressure level P 1 and P 2 free to converge between P sur f ace and P top .It has already been shown (Rocchetto et al. 2016;Pluriel et al. 2022) that retrieving an isothermal temperature profile generates biases for hot planets.We also retrieve the radius of the planet.We duplicate our retrievals by adding a gray cloud level as a parameter.Finally, we took into account two chemical configurations in the retrievals, which we call Free and equilibrium chemistry (see Section 3.2 for more details).
For the Free models, we retrieve one value of abundance for each considered species : H 2 O, CO, CH 4 , CO 2 , HCN, NH 3 , FeH, SiO, Na, K, TiO and VO.Note that for GJ 1214 b we did not retrieve Na, K, TiO, and VO because these species cannot be in gaseous form under these temperature and pressure conditions.Equilibrium chemistry is calculated using three different models included in TauREx: ACE, FastChem, and GGchem (details in Section 3.2).Metallicity (Z) and C/O ratio, from which abundance profiles are derived, are retrieved.ACE, FastChem and GGchem have already been shown in Al-Refaie et al. (2022a) to be equivalent using the same molecules and without condensation for GGchem.Thus, we consider all the molecules of each model as well as condensation for GGchem.
Therefore, we end up with 4 retrieval models (Free, ACE, FastChem, GGchem) for each of the 3 planets (GJ 1214 b, HD 189733 b, WASP-121 b) considering each of the 6 input configurations (listed Table 3).This makes 4×3×6 = 72 retrievals (×2 with clouds).For better readability of the large number of results, we do not show GGchem results for GJ 1214 b, where the temperature is not high enough to be affected by the additional condensation considered by GGchem, and we do not show ACE results for WASP-121 b, where the missing species in ACE mean that the model is not representative of this type of planet (see Section 3.2 for more details on these chemical models).Retrievals are compared to each other considering their relative Bayes Factor as described in Sec.3.1, equation 1, following the same idea developed in Tsiaras et al. (2018) with a baseline.However, we define here the Bayes factor (∆logE) as the difference with the logarithmic evidence of the worse model (lower one).This allows us to compare the different models between each other as it is done in Panek et al. (2023).
Table 4 shows free parameters and priors for the retrievals.We used a uniform sampling in log space for chemical abundances in the Free chemistry model, the metallicity, the pressure P 1 and P 2 , and the pressure of the clouds, and a uniform sampling in linear space for the temperatures, the radius, and the C/O ratio.

From 1D to 3D
In this section, we present how transmission spectra are affected when considering 1D or 3D GCM models, with or without equilibrium chemistry.We have developed this study from warm to ultra-hot planets, but with some specific differences on metallicity and planet gravity at the theoretical surface.This highlights the fact that it's not just the effective temperature trend that brings biases, but that other parameters are also involved.

Constant chemistry
We will first focus on the simulated spectra assuming constant chemistry simulations which correspond to the left panels of Figure 2. We consider in this case that the abundances are constant everywhere in the atmosphere no matter the temperature and pressure conditions.Thanks to this assumption, the effects on the spectra are only due to the thermal structure of the atmosphere.For our coldest case GJ 1214 b, we see strong differences comparing 1D and 3D transmission spectra.Indeed, the 3D GCM model of GJ 1214 b (Charnay et al. 2015) shows large day-night asymmetries with an extended hot day side combined with a 30 • eastward shifted hot spot.The metallicity of 100 is likely to result in a higher day-night temperature contrast compared to a solar metallicity, as the chemical composition of the atmosphere can have a significant impact on the efficiency of day-night energy redistribution (Kataria et al. 2014).Due to this shift, the light coming from the star thus probes through the hot day side with a larger scale height compared to the limb (which is used to compute the 1D temperature profile of the pseudo-1D model).This implies hundreds of ppm differences in the transit depth in particular in the major absorbers, such as carbon dioxide and methane.However, on the water-dominated bands in the far infrared and in the visible, we see very small differences between 1D and 3D transmission spectra, in the order of 50 ppm.The altitude where the atmosphere is opaque on these bands is indeed deep in the troposphere (around 100 mbar).Thus, even when we take into account the inflated day side in 3D, the effective radius observed is very similar to that in 1D, because the region where the atmosphere is opaque remains at the same place at the limb.Therefore, the impact of the 3D structure of the atmosphere will depend on the wavelength and the composition.
Similar results are shown concerning the hottest study case WASP-121 b. 3D GCM models show that for highly irradiated atmospheres in tidal locking, the radiative timescale becomes substantially smaller than the dynamic timescale implying that almost no heat is transported from the day to the night side.It results in an extremely large day-side scale height compared to the night side because of the large day-night temperature contrast.Indeed, Keating et al. (2019) showed that regardless of the dayside temperature, the night-side temperature of short-period gas giants is relatively uniform, around ∼ 1100 K.This very inflated day side is thus mainly probed during the transit which explains why 3D transmission spectra are by thousands of ppm larger than in 1D.Unlike the previous case GJ 1214 b, all absorption bands are shifted.The atmosphere is so inflated for this ultrahot Jupiter that the altitude at which the atmosphere is opaque is much higher due to the much greater scale height.
Interestingly, the results are very different in our intermediate case HD 189733 b.Here, despite a strong day-night temperature contrast, we observe less than a 50 ppm difference between 1D and 3D transmission spectra.This means that the West limb represents well the observable and that such atmospheres are more homogeneous than colder or hotter atmospheres.This is due to the high surface gravity of the planet, which mitigates the scale height differences between the different sides of the planet.Thus, using only the limb is a fair representation of the observable.These 3D effects are independent of the differences between the east-west limbs, even though they are both related to temperature variations.Figure A.1 shows the transmission spectra of the two limbs independently compared to the whole spectra.This shows that we have similar features but with a different scale height, since the western limb is cooler than the eastern limb (see Figure C.2).The differences are of the order of 100 to 600 ppm.This is also observed by Espinoza & Jones (2021), who shows that these differences can be observed by the JWST.

Equilibrium chemistry
We now look at the equilibrium chemistry simulations which are shown in the right panels of Figure 2. Now, the simulated spectra are impacted by both the effects of the chemical and the thermal heterogeneities which is more realistic.
For GJ 1214 b, the global picture has completely changed compared to constant chemistry.The largest difference (1000 to 2000 ppm) concerns CO 2 bands at 2.5 and 4.5 µm.The temperature profile of the 1D model is not hot enough to obtain abundant CO 2 at the equilibrium state, whereas the day side of the planet in the 3D model reaches a temperature where CO 2 is abundant enough to show strong signatures.Furthermore, as explained above, we probe a non-negligible part of the planet's day-side, hence the presence of broad bands of CO 2 .We also see differences in the water bands (between 1 and 2 µm) which weren't present in the constant chemistry model.The reason for these differences is the longitudinal variation in water abundance, which is lower on the day side of the planet.Light from the star then probes deeper regions corresponding to lower transit depth.Between 5 and 9 microns, as well as around 3.5 microns, in the region of the methane bands, the 1D and 3D spectra show fewer discrepancies than the rest of the spectra.Indeed, looking at the methane abundances in Figure D.7, we see that its abundance is drastically reduced in the day side above 100 mbar which is deeper than where we probe.It results that in these bands, we are not affected by the hot day side and we are mainly probing at the limb which is equivalent to the 1D spectrum.
For WASP-121 b, the 1D and 3D spectra show few differences in the whole wavelength range using equilibrium chemistry compared to constant chemistry.As shown in Figure D.9, the abundances of almost every species drastically diminished on the day side, mainly due to thermal dissociation (Parmentier et al. 2018), with the exception of carbon monoxide which is only divided by two due to its dilution in an H-dominated day side instead of a H 2 -dominated atmosphere.This implies that, on the water bands, the spectrum is not affected by the day side of the atmosphere because water is almost not present there.That's why a 1D model manages to fit the spectrum as shown in Pluriel et al. (2020a).However, we can see in the residuals of Figure 2 that in some bands, the fit is clearly less good than in the other bands.This is particularly true for the CO bands around 2.5 and 4.5 microns, as well as for the TiO and VO bands in the visible.As we explained, CO is present on the day side of the atmosphere, where extreme temperatures induce a scale height far greater than the scale height of the limbs.Consequently, the 1D model does not represent this behavior well, resulting in a large difference (around 300-400 ppm) at these regions of the spectrum.The differences between 1D and 3D for HD 189733 b with equilibrium chemistry are very low, as with constant chemistry.Indeed, we see in Figure D.8 that the main species such as H 2 O, CH 4 , CO 2 , and NH 3 do not display strong longitudinal variations.In addition, we have shown before that due to similar scale height, there is no significant impact comparing 1D and 3D spectra with constant chemistry.As a consequence, a 1D model at the limb is equivalent to the 3D model meaning that for such warm atmospheres, we are probing near the limb.
These 3D effects are independent of the differences between the east-west limbs, as explain for the constant chemistry.

Cloud effect
Even if we have good reasons to think that cloud decks are present in many exoplanets (Parmentier et al. 2016;Tsiaras et al. 2018), each atmospheric model used in this study, 1D or 3D, for the three planets, is cloudless.It would be interesting to add clouds in the simulations, in particular, because they affect the short wavelengths observed by JWST and Ariel.However, the aim of this study is to see the impact of chemical and thermal 3D effects on the transmission spectra and how to deal with these 3D effects in the context of atmospheric characterization using 1D retrieval models.For this reason, we chose to not over-complicate our models.Clouds would require a dedicated paper.Clouds are nevertheless part of the retrieval parameters in the TauREx model.This can break up possible degeneracies (Pluriel et al. 2022;Changeat et al. 2020b), and we verified that the model works correctly by not retrieving a cloud layer when we knew that none was implemented.For each retrieval performed, the Bayes factor has always privileged a cloudless model, and in the retrievals assuming clouds, the cloud deck was always pushed near to the surface pressure thus without impact on the spectra.We thus decided not to present retrievals including clouds in this paper as they bring no more information compared to cloudless retrievals.

What if atmospheres are 1D?
We used the theoretical 1D atmosphere (see Section 2.1 for construction) to check the correct behavior of the 1D retrieval code.We remind that the 3 planets have been chosen to study the retrieval biases depending on the effective temperature of the planets, from warm Neptune to ultra-hot Jupiter.Also, the chemical construction with a constant profile or equilibrium chemistry has been considered to unravel the retrieval biases from temperature (constant chemistry) and chemistry (equilibrium chemistry).To do so, Free retrievals (constant chemistry) and equilibrium retrievals are both performed for each configuration (more details Section 3.3).
Figure 3 and Table B.1, B.2, B.3 shows that for all configurations the best retrieval is consistent with the input model configuration.This means that Free retrievals fit better input constant chemistry and equilibrium retrievals fit better input equilibrium chemistry.However, the best retrieval is not always more significant than the others and the equilibrium retrieval models are not all adapted for the different configurations.Figures C.1,D.1,D.2 and D.3 show that the temperature and species profiles are mostly not well retrieved below the probed altitudes (deeper than ∼ 10 2 Pa).This part of the atmosphere does not contribute to the features of the spectra, which explains why they are not well retrieved.However, the retrieved values could be well constrained while the input values is often outside the uncertainty.Thus, we cannot trust the retrieved profiles of the lower atmosphere.

Temperate-Warm planet: GJ 1214 b
The best retrieval is consistent with the input model configuration and significantly better than the other models with ∆logE ≥ 19 for constant input chemistry and ∆logE ≥ 7 for equilibrium input chemistry (Figure 3 and Table B.1).
-Constant input chemistry: For lower pressure than ∼ 10 2 Pa (high altitudes), Free retrieval shows a significantly better retrieved temperature profile within 60 K compared to the input profile (Figure C.1) It shows that even at these "low" temperatures we already have difficulties to perfectly retrieve the input model.
To summarize for GJ 1214 b: the retrievals work well on the temperature, but not without some biases on the chemistry.Constant chemistry and 1D temperature profiles are retrieved by Free retrieval within -0.15 dex.However, even at such "low" temperatures, equilibrium input chemistry is best retrieved by equilibrium models because of species with strong vertical variations (here NH 3 ).In addition, TauREx not only has difficulty in accurately retrieving the C/O ratio and the metallicity (respectively 96% and 19% deviation for the best model ACE), but also the values closest to the input are not retrieved by the more significant model ACE, despite the fact that the best retrieval ACE is consistent with the input ACE equilibrium chemistry.

Warm-hot planet: HD 189733 b
The best retrieval is consistent with the input model configuration and significantly better than the other models with ∆logE ≥ 2536 for constant input chemistry and ∆logE ≥ 200 for equilibrium input chemistry (Figure 3 and Table B.2).
-Constant input chemistry: For lower pressure than ∼ 10 2 Pa (high altitudes), Free retrieval shows a significantly better fit -Equilibrium input chemistry: For lower pressure than ∼ 10 2 Pa, Free and FastChem retrievals overestimate the temperature profile by more than 250 K while ACE and GGchem stay mostly within 250 K deviation from the input values.However, these two chemical models underestimate the tem-perature in the upper atmosphere by more than 500 K (Figure C.1).In this peculiar configuration, with warm-hot temperature, molecules such as TiO, VO, and K are close to their condensation temperature.Compare to the isotherm configuration of Al-Refaie et al. (2022a), condensation occurs at mid and high altitudes where the temperature is lower than in the deep atmosphere.Thus, a strong bias occurs with FastChem where it considers these molecules but not their condensates.Therefore, FastChem cannot retrieve properly this configuration where TiO, VO, and K have strong features in the visible while they should not be because of their condensation, shows wrong species profiles.Thus, GGchem, which considers condensation of these species, should in theory solve this issue but it surprisingly does not.It shows a better fit for the temperature profile and species abundances with regards to FastChem.But still, it does not manage to give a consistent abundance of K, overestimated by a factor of 2, and shows a strong feature in the visible which bias all the spectra, since all chemical abundances are correlated.This is explained by the discrepancy between the K abundance calculated by GGchem and the input abundance calculated by Parmentier et al. (2016).We are around 1000 K, at the K condensation limit.This implies a strong variation in K abundance.To summarize for HD 189733 b: the retrievals work well on the temperature for constant input chemistry (within 150 K) but for equilibrium input chemistry they begin to show difficulties to retrieve the top of the atmosphere (underestimated by 500 K), as well as the bottom of the atmosphere.Only the pressures corresponding to the highest atmospheric contribution (between ∼ 10 2 and ∼ 10 0 Pa) are well retrieved (within 250 K).In addition, we encounter even more chemical abundance bias compared to GJ 1214 b.Constant chemistry profiles are best retrieved by the Free retrieval model (which is consistent) with an overestimation of +0.23 to +0.36 dex.Due to the strong vertical variation, the equilibrium configuration is better retrieved by an equilibrium model (ACE), which is not the expected one (GGchem), but which fits very well the main absorber H 2 O under 0.01 dex.However, the study performed on this planet shows all the limitations of the 3 chemical equilibrium models for retrievals.At this temperature, the role of condensates (such as TiO, VO, and K here) is essential with their feature in the visible.The discrepancies between the K chemical modeling of GGchem and the K chemical modeling of Parmentier et al. (2016), bias GGchem retrieval.This is a good example of what can be encountered when fitting observations with a model that does not perfectly reproduce the true chemistry.The C/O ratio and the metallicity are well retrieved by ACE (best model) but not as well as Free.Our results show that we can retrieve values closest to input with a model statistically not favored.

Ultra hot planet: WASP-121 b
The best retrieval is consistent with the input model configuration and significantly better than the other models with ∆logE ≥ 434 for constant input chemistry and ∆logE ≥ 71 for equilibrium input chemistry (see Figure 3 and Table B.3).We observe a higher overall uncertainty in the retrieved temperature and chemical species profiles compared to the other two planets.The very hot temperature coupled with a sharp day-to-night gradient brings a complexity that is more difficult to retrieve with a simpler model.
-Constant input chemistry: in the upper atmosphere, the retrieved thermal profiles are either overestimated or underestimated by ∼ 1000 K (Figure C.1).This is far from the input temperature profile (∼ 50% deviation).However, close to a pressure corresponding to the highest atmospheric contribution, between ∼ 10 2 and ∼ 10 0 Pa, retrieved profiles remain within 500 K (less than 25% deviation), while the best model (Free) is not significantly better.The abundance of the main absorbers H 2 O, CH 4 and TiO, FeH in the visible, are overestimated at +0.22 to +0.29 dex for the Free retrieval (best model).Despite these high values, this is still a better fit than equilibrium chemistry.The strong vertical variation in equilibrium chemistry under these temperature and pressure conditions cannot match the constant input chemistry (see Figure D.3).-Equilibrium input chemistry: we observe here the same behavior as constant input chemistry for the retrieved temperature profiles (see Figure C.1).There is less than a 10% deviation between ∼ 10 2 and ∼ 10 0 Pa for the best retrieval GGchem, but the retrieved profiles are mostly erroneous outside this range (reaching more than 50% deviation).Considering the strong temperature gradient between ∼ 10 2 and ∼ 10 0 Pa, we observe the same behavior as HD 189733 b regarding the condensation of species such as TiO, VO, and K.This makes GGchem a better model than FastChem in this case, as shown in Figures 3, D.6.Although GGchem gives the best fit, none of the retrieval models fit all the main absorbers H 2 O, TiO, CO 2 , CO, and VO better than the others.The Free retrieval is better for H 2 O and CO 2 , while GGchem is better for the other absorbers (despite a strong deviation for both models, reaching +0.50 dex).In addition, the best model, GGchem, retrieves an erroneous C/O ratio with a deviation of 80% and an erroneous metallicity with a deviation of 530%.In contrast, the Free model ends up giving values close to the input within 13% (see Table B .3 and Figure E.3).Thus, all the models retrieved different parts of the input, but none of them obtained the entire structure.
To summarize for WASP-121 b: the retrievals perform poorly outside the pressures corresponding to the highest atmospheric contribution (between ∼ 10 2 and ∼ 10 0 Pa).Due to strong largescale vertical variation in temperature and species, 1D temperature profiles for constant and equilibrium chemistry are best retrieved at only 25% between ∼ 10 2 and ∼ 10 0 Pa.Constant input chemistry is best retrieved by Free retrieval, overestimated by +0.22 to +0.29 dex, and equilibrium input chemistry is best retrieved by GGchem retrieval, but with a deviation reaching ∼+0.50 dex.The retrieval models are consistent with the input configurations, but not without bias on absolute values.GGchem retrieves the wrong C/O ratio and metallicity, while the Free model is within 13%.The strong vertical gradient on a large scale brings a complexity that is difficult to retrieve correctly with a simpler model, which also translates into greater uncertainty in the retrievals.

Summary
The theoretical 1D analysis validates the consistency of the 1D retrievals, but not without a few biases.This shows that below the probed altitudes, in addition to the retrieved parameters, we cannot trust the uncertainties given by the models, which are largely underestimated.This is also the case at the top of the atmosphere, as we move towards warmer planets.
Furthermore, equilibrium chemical models are not equivalent and give significantly different results: -ACE cannot retrieve ultra-hot planets whereas it might be a better approximation for cooler planets.-FastChem is biased towards warm, even hot planets, where species are close to or below condensation temperature.This is never the best retrieval model.-GGchem, which should be a complete model, will still be in competition with a simplified model for cold and warm planets.Apparently the best option for hot planets, but Free retrieval can give better retrieved values.
We need to be careful with equilibrium models.Our study shows that if one part of the chemistry modeling is wrong, all the chemical abundances will be biased since everything is correlated.This agrees with the conclusion of Al-Refaie et al. (2022a) using an isotherm configuration.

Constant chemistry
Given the 3D thermal structure, we used theoretical atmospheric models with constant chemistry (see section 2.2) to disentangle the 3D effects of temperature without being biased by chemistry.Section 4.3 confirms the overall correct behavior of the retrieval code or highlights any biases we may encounter.So, we can be confident in this approach to focus on thermal 3D effects.To this end, Free retrievals (constant-with-depth chemistry) and equilibrium retrievals are performed for each configuration (more details in section 3.3).The aim is also to compare the biases of the JWST and Ariel instruments.
Figure 3 shows that the Free retrieval finds the best solution compared to equilibrium chemistry (except for GJ 1214 b in the Ariel configuration).This is consistent with the input constant chemistry.For GJ 1214 b in the Ariel configuration, ACE retrieval has a better Bayes factor but within a range of 1.5 variation compared to the others.Therefore, all models are statistically equivalent and none is preferred.We observed a stronger deviation of the Bayes factor for the JWST configuration compared to the Ariel one.This could simply be due to the higher resolution on a larger wavelength range for the JWST configuration or this may be linked to some particular wavelength band such as the lack of data points in the visible for the Ariel configuration where strong features for hot planets (such as TiO, VO, and K) are located.
Temperate-Warm planet: GJ 1214 b All retrieval models for the Ariel spectrum are within a Bayes factor deviation of 1.5 which makes all models equivalent (see Table B.1).However, the JWST spectrum is better retrieved by the Free retrieval with ∆logE = 33 compared to the second-best model.The higher resolution of the JWST spectrum, compared to the Ariel spectrum, gives more constraints on H 2 O and CH 4 features at low wavelength (between 1 and 2 µm), but probably also at higher wavelength (above 4 µm).Yet, this does not translate to closer-totruth retrieved profiles, although the retrieved uncertainties are smaller.Figure C.2 and C.3 show consistent inputs and retrieved temperature profiles for GJ 1214 b.At low altitudes (deeper than ∼10 2 Pa) there is high uncertainty because these altitudes are not probed.Higher in the atmosphere (probed altitudes), the temperature day-night variation is within 300K and all models retrieve the limb profiles.H 2 O and CH 4 are responsible for the main features of the spectra.Both retrieved values of the Free retrieval are around -0.10 dex lower than the input.For the JWST spectrum, the uncertainty is lower but the input value is not within the uncertainty.While, for the Ariel spectrum, the CH 4 input value is within the uncertainty.Thus, even with more data point, the retrieved chemical abundances is not closer to the input and the uncertainties cannot be trusted.
Warm-hot planet: HD 189733 b The best retrieval (Free) is consistent with the input model configuration and significantly better than the other models with ∆logE ≥ 3048 for the JWST spectrum and ∆logE ≥ 34 for the Ariel spectrum (see Figure 3 and Table B.2). Discrepancies between retrieval models are the same as those explained in Section 4.3.Figure C.2 and C.3 show that the biases on the temperature profiles are the same as in Section 4.3.Temperatures retrieved below ∼ 10 2 Pa correspond mainly to those of the limb.For the Ariel spectrum, the temperature is slightly warmer, but the solution for the JWST spectrum remains within the Ariel uncertainties, which are significantly higher.Equilibrium chemistry cannot reproduce the constant input chemistry, while Free retrieval gives consistent results but not without significant deviation (between +0.18 to +0.35 dex).Even with a variation of less than 500 K between day and night side, only temperature variation can bias the retrieval of species abundances.While VO is largely retrieved in the JWST spectrum, it is not in the Ariel spectrum.Retrieved uncertainties on the Ariel spectrum are larger than on the JWST spectrum.However (and to the exception of VO) the same molecules are present with both observatories.The lack of abundant VO with Ariel is due to the coarser spectral resolution in visible light which is still sufficient to detect TiO.

Ultra hot planet: WASP-121 b
The best retrieval (Free) is consistent with the input model configuration and significantly better than the other models with ∆logE ≥ 1310 for the JWST spectrum and ∆logE ≥ 60 for the Ariel spectrum (see Figure 3 and Table B.3). Discrepancies between retrieval models are the same as those explained in Section 4.3.Figure C.2 and C.3 show that the Free retrieval (best model) finds a temperature at the top of the atmosphere higher than the input model, and a temperature at the bottom of the atmosphere lower than the input model.The temperature transition occurs in the atmosphere where species absorption contributes the most, around ∼ 10 2 Pa.The temperature gradient is steep, crossing all possible temperatures from day to night side.The chemical abundances of the main absorbers H 2 O, TiO, CO are retrieved between -0.25 and -0.67 dex for the JWST spectrum and between -0.28 and -0.77 dex for the Ariel spectrum, with the exception of TiO with a deviation of +0.03 dex from the input value.However, TiO feature does not match the input spectrum for both JWST and Ariel configurations.The retrieved spectra are outside the uncertainties, by several sigma.The much higher JWST resolution in the visible compared with Ariel surprisingly does not provide better constraints on TiO.While we already encounter difficulties in retrieving input values with a 1D atmosphere, the huge temperature gradient between day and night brings even more biases.The end result is a temperature from both the day and night sides that does not allow the retrieval models to find input spectra and species profiles.There is not even thermal inversion retrieved by the best model.

Summary
The conclusion in Section 4.3 remains the same in 3D.Secondly, using for retrievals a 4-point temperature profile gives good results for the cooler planets but not for hotter ones, which need at least 2D retrievals, as has already been pointed out in more detail by Pluriel et al. (2022).The higher resolution of the JWST spectrum, particularly in the visible, reduces uncertainties but does not provide a better fit.In addition, the input values will be out of uncertainty, making them unreliable.This is probably due not only to the high resolution, but also to the good input signal-to-noise ratio, which can be improved by binning down the spectra.Thus, lower resolution would still result in extremely small error bars and over-confident retrieval results.The lower atmosphere is still poorly retrieved, especially as we move towards hotter planets.Nevertheless, it is still possible to find the presence of the input species.Finally, the low resolution in the visible wavelength range of the Ariel spectrum has missed the presence of the visible absorber VO but never TiO with a retrieval similar to that of the JWST spectrum.

Equilibrium chemistry
Here the 3D thermal structure, as well as equilibrium chemistry, are considered as input.Considering the conclusion of Sections 4.3 and 4.4.1, this will highlight biases due to the variability of chemical abundances in the atmospheres of warm to ultrahot planets.Section 4.4.1 has already shown the temperature biases from the 3D structure.As previously done, Free retrievals (constant-with-depth chemistry) and equilibrium retrievals are both performed for each configuration (more details in Section 3.3).
Figure 3 shows that the equilibrium retrievals always find the best solution compared to Free chemistry (except for the WASP-121 b JWST configuration).This is consistent with the input equilibrium chemistry.The retrieval models of the Ariel spectrum have less deviation from each other than the JWST spectrum.This is due to the lower spectral resolution across all wavelength bands, but particularly in the visible bands.Between visible and infrared the chemical species contributing to the spectral signatures are different, TiO, VO, and K against H 2 O, CH 4 , CO 2 , CO, and NH 3 .If we try to retrieve both parts at the same time, both will be biased, as the signatures may come from different parts of the atmosphere, at different temperatures.While a larger spectral range could benefit to interpret more accurately the observations (Benneke & Seager 2013;Welbanks & Madhusudhan 2019), we show that depending on the chosen range and the model used this could bias the retrieval compared to input data.

Temperate-Warm planet: GJ 1214 b
The ACE model provides the best fit to both JWST and Ariel spectra with ∆logE ≥ 10 and ∆logE ≥ 5 respectively (see Table B.1), which is consistent with the ACE input chemistry modeling.Figure 3 also shows that the JWST spectrum is secondly best retrieved by the Free model, while the Ariel spectrum is secondly best retrieved by the FastChem model.FastChem's retrievals poorly fit the CO and the visible bands, because of the chemical modeling differences with the input ACE chemistry modeling.Retrievals on the Ariel spectrum circumvent this issue thanks to its low spectral resolution in visible light.The main difference from the previous input configuration is that the equilibrium chemistry results  B.2), again due to the lack of constraint in the visible wavelength bands where discrepancies between chemical models appear.ACE is as significant as GGchem, but Table B.2 and Figure E.5 show that GGchem better retrieves the C/O ratio and metallicity.Figure C.3 shows that the temperature at the top of the atmosphere is unconstrained by the huge uncertainty.This part of the atmosphere, therefore, makes no significant contribution to the features of the spectra.In contrast, Figure C.2 shows that increasing resolution adds an erroneous constraint on the temperature of the top of the atmosphere.Only the temperature around pressures corresponding to the highest atmospheric contribution (around ∼ 10 2 Pa), is consistent between equilibrium models and the input temperature profiles.The temperature retrieved at these pressures is that of the limb.Figure D.8 shows a good agreement between the retrieved species profiles.

Ultra hot planet: WASP-121 b
The GGchem model retrieves the Ariel spectrum better than the other models with ∆logE ≥ 7, as already explained in Section 4.3.However, the JWST spectrum is better retrieved by the Free model with ∆logE ≥ 52 (see Figure 3 and Table B.3). Table B .3 and Figure E.6 show that GGchem retrieves for Ariel configuration the C/O ratio very well at 5% but not the metallicity (75% deviation), while for JWST configuration it is higher than 44% considering all retrieval models.The models are not suited to the high spectral resolution of JWST, which imposes severe constraints on thermal contrast and hence on chemical distribution.This shows that such a contrasted atmosphere cannot be retrieved by a 1D model with correlated chemistry.However, the higher degree of freedom of the Free retrieval allows a better match.The temperature profiles retrieved between the Free model and the GGchem model are similar (within 500 K below 10 4 Pa, see Summary In addition to the previous biases from Section 4.3 and 4.4.1, the biases coming from the chemistry show that, even on a warm planet, it would make sense to fit the different molecular features separately to disentangle the temperature variation that brings chemical variability.Otherwise, using a 1D retrieval model will bias all different spectral contributions.Furthermore, only the pressure where the contribution is highest should be considered as a significantly good retrieval of the observation.The rest should be treated with caution.All models remain good at detecting input molecules.

Conclusions
We present in Table 5 an overview of the main results obtained in this study.A limitation to the approach described in this paper * For species detection, C/O ratio, metallicity (Z), and chemical profiles, it focuses only on main absorbers.The temperature and chemical profiles are split depending on the region on the atmosphere, where around the highest atmospheric contribution (between ∼ 10 2 Pa and ∼ 10 0 Pa) the atmosphere in globally well retrieved, contrary to the bottom of the atmosphere (between ∼ 10 6 Pa and ∼ 10 2 Pa).See Section 4.3 and 4.4 for more details on the specific biases.
has been the use of simplified equilibrium chemistry models: recent re-analysis work on transit retrievals from HST has shown the necessity to go towards non-equilibrium chemical models at least for temperate planets (Panek et al. 2023).This additional complexity is still beyond current 3D modeling for retrievals, but will certainly in the future be an important aspect to develop.The three-dimensional effects that are presented above will be an improvement in future retrievals of the JWST observations, like transit spectroscopy at JWST/NIRSpec resolution on WASP-39 b which presently use 1D models (Alderson et al. 2023b).Phase curve observations as observed by HST, Spitzer, and JWST on WASP-43 b (Stevenson et al. 2014(Stevenson et al. , 2017;;Murphy et al. 2023;Bell et al. 2023) give access to preliminary constraints on the 3-dimensional composition, cloud coverage, and temperature structure of the planet's atmosphere thanks to JWST/MIRI/LRS sensitivity.A limitation of the parameter retrieval from the observations is today reached by the complexity of the models: combining GCM, radiative transfer codes, thermochemistry codes, and non-equilibrium chemistry is a formidable task that involves a multidisciplinary effort from various communities of molecular spectroscopists, chemists, meteorologists, and astronomers.Even with the limitations described above, this paper provides warning of the approaches allowing future investigators to address properly these questions.1.13 1.13 1.13 1.13 1.16 1.12 1.13 1.12 1.12 1.13 1.13 1.13 1.12 1.12 1.13 1.12 1.12 1.13 1.12 1.13 1.12 1.12 1.12 1.12 C/O 0.

Fig. 1 .
Fig.1.Simulation of observations for HD 189733 b, WASP-121 b, and GJ 1214 b (grey) with the corresponding best retrieval configuration in solid yellow, red dash-dot, and dashed blue lines respectively.Left panels: constant input chemistry.Right: equilibrium input chemistry.Top and middle: JWST simulated observation for atmospheric with 1D and 3D assumption respectively.Bottom: Ariel simulated observation for a 3D atmospheric assumption.

Fig. 2 .
Fig. 2. Transmission spectra simulated with Pytmosph3R (Falco et al. 2022) for HD 189733 b (top), WASP-121 b (middle) and GJ 1214 b(bottom).Each panel compares two transmission spectra based on a 1D and a 3D atmosphere, respectively in dashed and solid lines.Left panels: constant input chemistry.Right: equilibrium input chemistry.The differences between the 3D and the 1D spectra are plotted below each panel in grey.

freeFig. 3 .
Fig.3.Bayes factors for each retrieval and all their solutions, see Section 3.1 (Free, ACE, FastChem and GGchem).By definition, we put at 0 the solution with the lowest Bayesian evidence as it is the reference of the comparison.Left panels: constant input chemistry.Right: equilibrium input chemistry.Top and middle: JWST simulated observation for atmospheric 1D and 3D assumption respectively.Bottom: Ariel simulated observation for a 3D atmospheric assumption.The star represents the highest Bayes factor.The expected best retrieval is highlighted in bold.
Figure C.1 shows a wrong temperature profile, Ta-ble B.2 shows wrong retrieved parameters and Figure D.5 in a strong dichotomy of CO 2 between the day and night side (see Figure D.7).As a result, the retrieved temperature profiles correspond to the day side (see Figure C.2 and C.3).Table B.1 and Figure E.4 shows that the temperature bias on the day side still keeps a good agreement on the C/O ratio and the metallicity retrieved, within 20%.Warm-hot planet: HD 189733 b The ACE model retrieves the JWST spectrum much better than the other models with ∆logE ≥ 208, as already explained in Sections 4.3.However, the Ariel spectrum is equivalently retrieved by ACE and GGchem (Figure 3 and Table Figure C.2 and C.3).The conclusions on temperature biases are the same as for HD 189733 b.But Figure D.9 shows that species abundances are more difficult to retrieve.
Fig. A.1.Transmission spectra simulated with Pytmosph3R (Falco et al. 2022) for HD 189733 b (top), WASP-121 b (middle) and GJ 1214 b(bottom).Each panel compares three transmission spectra based on a 3D atmosphere for the East limb, the West limb and the all planet, respectively in dashed grey, dot grey and solid blue lines.Left panels: constant input chemistry.Right: equilibrium input chemistry.The differences between the East limb and the West limb spectra are plotted below each panel in grey.

Table 1 .
Planet-Stellar parameters used to simulate the transmission spectra and as input into PandExo.

Table 4 .
Free parameters and priors for the retrievals.
. The main absorber CH 4 is retrieved by the Free retrieval at -0.11 dex while H 2 O, CO 2 and NH 3 giving secondary features are retrieved within -0.15 dex (Figure D.1).This allows the best model to be significantly in agreement with the input model even if not all input profiles are included in the uncertainty.-Equilibrium input chemistry: For lower pressure than ∼ 10 2 Pa, all retrievals give the same temperature profile within 90 K of the input model (Figure C.1).All retrievals also give the same main absorber profiles (CH 4 and H 2 O) below ∼ 10 2 Pa, where the input equilibrium chemistry is constant at these temperature and pressure conditions (Figure D.4).However, ACE is better at retrieving NH 3 profile, which varies by 4 orders of magnitude and contributes at 3 µm, which makes ACE the best model in agreement with input ACE chemical composition.We can highlight that Free and ACE retrievals prefer to retrieve at least +0.31 dex CH 4 and down to -0.17 dex H 2 O than input values to better fit the spectrum, with input values out of the uncertainty.FastChem displays some discrepancies with ACE, as pointed out in Al-Refaie et al.
Al-Refaie et al. (2022a)armentier et al. ( , 2018) )16Parmentier et al. ( , 2018) )use different approximations and assumptions for condensation, which leads to uncertainty, more important at the condensation limit.Al-Refaie et al. (2022a)found a good agreement between ACE, FastChem, and GGchem because the same molecules are considered in the three models.However, we show here chemical discrepancies between the two models, illustrating that using imperfect chemical models with regards to what is actually observed leads to biased interpretations.ACE, which considers only C, H, O, and N atoms, ends up giving the best-fit model by getting rid of TiO, VO, and K condensation issues.The retrieved thermal profile is similar to the one retrieved by GGchem but the main absorber H 2 O is now perfectly constrained under 0.01 dex (seeFigure D.5).All these biases can be seen in Table B.2 and Figure E.2 on the C/O ratio and the metallicity.FastChem model is far from input values while ACE has a 24% deviation.Furthermore, even if the Free is not the more significant model, it is closest to input values.

Table B .
3. Retrieval results of WASP-121 b.Best retrieval of each configuration is highlighted in bold.