ACCESS: Confirmation of a Clear Atmosphere for WASP-96b and a Comparison of Light Curve Detrending Techniques

One of the strongest ${\rm Na~I}$ features was observed in WASP-96b. To confirm this novel detection, we provide a new 475-825nm transmission spectrum obtained with Magellan/IMACS, which indeed confirms the presence of a broad sodium absorption feature. We find the same result when reanalyzing the 400-825nm VLT/FORS2 data. We also utilize synthetic data to test the effectiveness of two common detrending techniques: (1) a Gaussian processes (GP) routine, and (2) common-mode correction followed by polynomial correction (CMC+Poly). We find that both methods poorly reproduce the absolute transit depths but maintain their true spectral shape. This emphasizes the importance of fitting for offsets when combining spectra from different sources or epochs. Additionally, we find that for our datasets both methods give consistent results, but CMC+Poly is more accurate and precise. We combine the Magellan/IMACS and VLT/FORS2 spectra with literature 800-1644nm HST/WFC3 spectra, yielding a global spectrum from 400-1644nm. We used the PLATON and Exoretrievals retrieval codes to interpret this spectrum, and find that both yield relatively deeper pressures where the atmosphere is optically thick at log-pressures between $1.3^{+1.0}_{-1.1}$ and 0.29$^{+1.86}_{-2.02}$ bars, respectively. Exoretrievals finds a solar to super-solar ${\rm Na~I}$ and ${\rm H_2O}$ log-mixing ratios of $-5.4^{+2.0}_{-1.9}$ and $-4.5^{+2.0}_{-2.0}$, respectively, while PLATON finds an overall metallicity of $log_{10}(Z/Z_{\odot}) = -0.49^{+1.0}_{-0.37}$dex. Therefore, our findings are in agreement with literature and support the inference that the terminator of WASP-96b has few aerosols obscuring prominent features in the optical to near-infrared (near-IR) spectrum.


INTRODUCTION
In-depth studies of exoplanetary atmospheres (exoatmospheres) is a key pathway to obtaining more detailed insights about the formation and evolution of planetary systems. Many of these planets are in extreme environments not found in the solar system and understanding how their atmospheres are sculpted by such unique environments gives us detailed insights on the complex chemistry and physics at play. Examples arXiv:2207.03479v2 [astro-ph.EP] 14 Jul 2022 include the cause of observed temperature inversions in hot Jupiters (e.g. Baxter et al. 2020;Gandhi & Madhusudhan 2019), the cause and composition of high altitude aerosols (clouds and/or hazes) (e.g. Moses et al. 2011;Wakeford et al. 2019b;Gao et al. 2020), and observed super sonic wind speeds (e.g. Fromang et al. 2016). Observing exo-atmospheres also can improve our understanding of the formation and evolutionary processes that exoplanets undergo, e.g., host disk dissipation rate (e.g., Powell 2021) and hot Jupiter migration timescales (e.g., Moses et al. 2013;Powell 2021).
To improve our understanding of exoplanets, observations have had to push the performance of instruments, which were not designed with exo-atmosphere studies in mind, to their limits. This has also been the case for data analysis techniques aimed at removing both instrumental and astrophysical systematics both from spacebased observatories (e.g. Pont et al. 2013;Sing et al. 2019) and ground-based ones (e.g. Gibson et al. 2012;Yan et al. 2020).
As expected in any developing field, there have been a number of cases with disagreeing results (e.g., Southworth et al. 2017 andDiamond-Lowe et al. 2018;Sedaghati et al. 2017 andEspinoza et al. 2019b;and Sing et al. 2015and Gibson et al. 2017, Gibson et al. 2019, and McGruder et al. 2020. These discrepancies are attributed to imperfect understanding of systematics, whether it be instrumental, observational, or astrophysical in nature. This highlights the importance of confirming features of interest via independent analyses in order to isolate spurious signals and instill more confidence in agreeing detections. This is particularly important in attempts to find correlations between atmospheric features and other system parameters, such as the cause of high-altitude aerosol formation in gas giants. There have been several studies attempting to find such a correlation (i.e. Sing et al. 2016;Heng 2016;Stevenson 2016;Fu et al. 2017;Tsiaras et al. 2018;Dymont et al. 2021), with no clear answer yet (e.g. Alam et al. 2020).
WASP-96b (M=0.48 M J , R = 1.2 R J , P = 3.425 days, G8 host star, Hellier et al. 2014), is one of few exoplanets observed to-date to have little evidence supporting the presence of high-altitude aerosols in both optical (Nikolov et al. 2018) and near-infrared (IR, Yip et al. 2021) transmission spectra. An exoplanet with a transmission spectrum that can be modeled excluding high-altitude aerosols is also called a "clear" atmosphere, which does not mean the planet has no aerosols but little to no aerosols obscure the summed terminator's spectrum. Other planetary atmosphere that can be explained without including high-altitude aerosols are WASP-17b , WASP-39b (Fischer et al. 2016;Nikolov et al. 2016;Wakeford et al. 2018;Kirk et al. 2019), WASP-62b (Alam et al. 2021), and WASP-94Ab (Ahrer et al. 2022). Finding planets like these is essential for understanding the evolutionary, chemical, and physical processes underway in this class of planets because aerosols mute the features needed to probe exo-atmospheres. This is particularly challenging given that it is estimated only ∼7% of hot Jupiters have clear atmospheres (Wakeford et al. 2019b). As such, it is vital to find and thoroughly study clear planets like WASP-96b. The optical transmission spectrum of WASP-96b was first observed by Nikolov et al. (2018) with VLT/FORS2 and recently Yip et al. (2021) combined the VLT/FORS2 spectrum with a near-IR spectrum derived using HST/WFC3 observations from GO program 15469 (PI: Nikolay Nikolov) to obtain a better picture of the planet's clear nature. These studies report strong Na I and H 2 O features with abundances of −3.88 +1.05 −0.82 and −3.65 +0.90 −0.94 , respectively. In this paper, we present a new optical transmission spectrum of WASP-96b derived from new observations obtained as part of ACCESS 1 .
These observations are described in Section 2. We then combined the ACCESS observations with an independent re-analysis of the original VLT/FORS observations and the available near-IR observations to derive a new 400-1644 nm optical to near-infrared transmission spectrum for this planet and re-inspect its atmospheric properties via retrieval models.
To understand individual and relative performances of commonly used detrending methods, we also conduct a detailed comparison of two often-used techniques in ground-based transmission spectroscopy: (1) a Gaussian processes routine (Gibson et al. 2012;Weaver et al. 2020;Yan et al. 2020;McGruder et al. 2020;Weaver et al. 2021), and (2) common-mode correction followed by polynomial correction Nikolov et al. 2016;Gibson et al. 2017;Nikolov et al. 2018;Carter et al. 2020). We discuss these detrending techniques and their performance comparison in Section 3.
The remainder of this paper is structured as follows. Section 4 describes our methods for constructing the transmission spectra, and Section 5 outlines how we consider stellar activity affecting the transmission spectra. The spectra are then interpreted using retrievals in Sec-1 The Atmospheric Characterization Collaboration for Exoplanet Spectroscopic Studies (ACCESS) survey on the Magellan Telescopes (Jordán et al. 2013;Rackham et al. 2017;Bixel et al. 2019;Espinoza et al. 2019b;Weaver et al. 2020;McGruder et al. 2020;Weaver et al. 2021;Kirk et al. 2021) tion 6. In Section 7, we present and discuss the retrieval analysis results, and contextualize WASP-96b within the exoplanet population in Section 8. We conclude in Section 9.
2. OBSERVATIONS 2.1. VLT/FORS2 Transits We used two transit observations of WASP-96b on the nights of UT170729 and UT170822 (UTYYMMDD) with the FOcal Reducer and Spectrograph (FORS2) 2 mounted on the 8.2 m Very Large Telescope (VLT) in the European Southern Observatory on Cerro Paranal, Chile. These observations were originally collected, reduced, and published by Nikolov et al. (2018). For our re-analysis of this data, described in Section 3, we used the same extracted spectra produced in Nikolov et al. (2018), where additional detail of the data collection and extraction process can be found. Both observations were taken with the multi-object spectroscopy (MOS) mode and the same two-slit mask. Each slit in the mask was 22 by 120 and centered on the target and comparison star. Given the wide slits, the spectral resolution of the observations were determined by the seeing, with an average resolving power of R ∼ 600. The first transit was observed using the bluer dispersive element, GRIS600B (600B), which had approximate spectral coverage of 360-620 nm. Contrary to Nikolov et al. (2018), we excluded the spectrum from 360-400 nm to avoid systematics caused by the low counts, where that region had over 85% fewer counts than the rest of the spectrum. The second night used the redder GRIS600RI (600RI) grism and the GG435 filter, producing an approximate spectral coverage of 520-835 nm.

Magellan/IMACS Transits
We observed two transits of WASP-96b as part of the ACCESS Survey 3 on the nights of UT170804 and UT171108 with the Inamori-Magellan Areal Camera & Spectrograph (IMACS; Dressler et al. 2011) on the 6.5m Baade Magellan Telescope at Las Campanas Observatory in Chile. Both transits were observed using the 8K × 8K CCD mosaic camera at the f/2 focus, which provides a 27.4 field of view (FoV). We used a 300 line/mm grating at blaze angle of 17.5 • , yielding a spectral coverage of 0.44-0.97 µm (without a filter). To reduce readout time and improve the duty cycle of the observations, we used 2×2 binning. Both observa-tions used the FAST readout mode, which had a 29 s readout time and a 3 s overhead. The first observation had no filter, but we applied the GG495 filter (coverage 0.49-0.97 µm) for the second night to prevent secondorder light contamination. The effect of second-order contamination on the first observation is discussed in Appendix A.
For each observation we used the MOS mode and designed a custom science mask with 10 by 90 slits for the target and a number of comparison stars in the field. Identical masks, but with slit widths of 0.5 instead of 10 , were also designed for wavelength calibrations. The average seeing-limited resolving power of the observations were R ∼ 900 The comparison stars were selected using the same prescription described in Rackham et al. (2017). We consider a nearby star as a suitable comparison if it has a color difference with WASP-96 of D < 1, where The uppercase letters in the equation correspond to the Johnson-Cousin apparent magnitudes of the stars, and the subscripts t and c indicate the target and comparison, respectively. The comparisons used in both the AC-CESS and VLT/FORS2 observations are summarized in Table 1.
We limited observations of each transit to airmass below 2 during the full transit window to minimize differential atmospheric refraction effects, so we ended up with two full transits with additional out-of-transit baselines of 1.29 and 1.35 hours for the UT170804 and UT171108 transits, respectively. To maintain count levels of 25,000-35,000 ADU 4 , given the average seeing conditions at each night, we held exposure times constant to 60 seconds for transit UT170804 (average seeing of 1.55 ) and 45 seconds for transit UT171108 (average seeing of 0.71 ). A summary of each transit's observing conditions are given in Table 2.
Finally, we collected bias frames, quartz lamp flats, HeNeAr calibrations (using the 0.5 by 90 masks), each with the same binning as the science observations.

ACCESS Reduction Pipeline
We reduced the data using the ACCESS custom pipeline introduced in Jordán et al. (2013) and described in detail in Espinoza (2017). We first subtracted the bias level from each frame using the median of the overscan region. The pipeline also has the option of flat-fielding Table 1. Target and comparison star magnitudes and coordinates from the UCAC4 catalog (Zacharias et al. 2013). We used comparisons 14 and 15 for the analysis of both IMACS transits and comparison 1 for both FORS2 transits. Comparison 3 was in the IMACS slit but was over saturated and not used. each frame using a master flat, but as with previous AC-CESS datasets (e.g. Rackham et al. 2017;Bixel et al. 2019;McGruder et al. 2020;Weaver et al. 2021;Kirk et al. 2021), we found that flat-fielding worsens the photometric precision of the transit light curves, so we opted to not flat-field the data. The ineffectiveness of flat-fielding is likely due to the lack of sufficient flats needed for the high precision analysis. We collected 10 to 25 flats per night with 1 to 5 second exposures. We then performed a bad-pixel correction using a badpixel map obtained from the flats. Any pixel with 10 times higher or lower photon noise relative to neighboring pixels was added to the map. The correction was done to the science images by replacing the value of each flagged pixel with a value interpolated from the neighboring pixels in the dispersion direction of the detector.
We traced each spectrum in the images by first using a second-order polynomial to identify each resolution element's centroid (relative to both spatial and dispersion directions). Then a fourth-order polynomial was fit on the centroids in order to ensure the smoothness of the trace.
The sky background was subtracted using the median of the spectral counts outside of the central apertures for a given wavelength element. The appropriate aperture sizes were empirically determined to be 3.6 (9 pixels) in radius for UT170804 and 3.2 (8 pixels) for UT171108. The extracted spectrum was produced by summing the spectral profile within the central aperture in the spatial direction and subtracting the sky background.
A Lorentzian profile was used to fit each spectral line in the HeNeAr calibration spectra. The wavelengths of each line as a function of pixel position was then fit using sixth-order polynomials. The pipeline iteratively excluded deviant datapoints until the root mean square error value of the fit was less than 2 km s −1 (∼ 0.05 Å). This wavelength map was applied to each extracted spectrum.
We identified and removed cosmic rays by first creating a global median spectrum using the median of all the normalized spectra for each exposure. This normalized, median spectrum was used to compare against each individual normalized spectrum, and any counts in a spectrum more than 10σ greater than the global median spectrum (on a corresponding wavelength) were replaced by interpolating the counts at the appropriate wavelength from the preceding and following spectra in the time series.
It should also be noted that the pipeline is capable of doing optimal extraction (Marsh 1989). This could potentially be used in place of bad-pixel and cosmic-ray removal, as it effectively highlights and removes deviant counts (Allen et. al., in prep.). However, we tested this for both WASP-96 observations, and found very little difference between resulting spectra. Figure 1 shows the median extracted spectrum for both nights. The final wavelength range used was 475-825 nm for UT170804 and 525-825 nm for UT171108 (see Fig. 1). Given the diminishing throughput on the edge of the spectra, the bluest wavelengths had too few counts to be useful. We also excluded data from 759.4-767.2 nm, a region of strong tellurics. Though the sky subtraction and dividing by the comparison(s) should in principle negate the telluric features, that region still had a drastic decrease in counts (and likely residual telluric-induced systematics). This made it difficult to properly fit a light-curve at that wavelength range. Unfortunately, this coincides with the K I doublet (centered at 768.15 nm). We also excluded wavelengths longer than 825 nm due to the diminishing throughput and increase in tellurics (see Fig. 1). We could have attempted to obtain useful information from that range, but given that the available HST/WFC3/IR/G102 data (Yip et al. 2021) is not afflicted by any of these effects, we instead omit that wavelength range from the AC-CESS analysis.
When using the f/2 mode, each stellar spectrum is dispersed on two CCD chips, causing a gap in the spectra (see Figure 1). Fortunately, all utilized spectra of the target and COMP14 fell on one chip. For COMP15, the gap was approximately between 5580-5675Å; however, that region still had COMP14 for systematic corrections.

Photometric Monitoring
We compiled and analyzed available time-series photometry of the host star WASP-96 from the Transiting Exoplanet Survey Satellite (TESS, Ricker et al. 2014) and the All-Sky Automated Survey for Supernovae (ASAS-SN, Kochanek et al. 2017) to constrain the presence of star spots that could contaminate the transmission spectrum of WASP-96b.
TESS observed WASP-96 in 2-minute-cadence mode over 27 days during Sector 2 (between 23 August and 20 September 2018) and over 24 days in Sector 29 (between 26 August and 20 September 2020). To model its photometric modulation, we used the Presearch Data Conditioning Simple Aperture Photometry (Jenkins et al. 2016, PDCSAP) light curve obtained from the Barbara A. Mikulski Archive for Space Telescopes (MAST) portal. We masked the transit data using a period of 3.42526 days, initial midtransit time (t0) of 2458354.319946 BJD, and a transit window of 3.04 hours (25% buffer in duration) 5 . The resulting out-of-transit light curves have a median abso-lute deviation (MAD) of 2.76 parts-per-thousand (ppt) and a peak-to-peak difference of 25.5 ppt for Sector 2 (17676 data points), and a MAD of 2.9 ppt and peakto-peak difference of 37.3 ppt for Sector 29 (14252 data points). For our analysis of the TESS data, we binned every 100 observations together.
ASAS-SN observed WASP-96 in two filters. V filter observations were collected between April 2014 and September 2018. g filter observations were collected between September 2017 and February 2020. To remove deviating observations, we sigma-clipped points that deviated by more than 3σ from the mean. This led to excluding 11 of 921 and 27 of 1626 observations for the V and g filters, respectively. Because of the lower cadence of the ASAS-SN observations (∼3 per day) compared to the TESS observations, their larger variance, and since we do not expect significant stellar modulation in a day, we weighted averaged observations taken on the same night. We then removed observations with uncertainties larger than 3 times the mean uncertainty, which led to six binned observations removed in V band, and seven in g band. The final light curves included 351 & 520 datapoints with MADs of 10.4 & 9.5 ppt and peak-to-peak differences of 83.5 & 96.5 ppt, respectively.

LIGHT-CURVE ANALYSIS AND COMPARISON OF DETRENDING TECHNIQUES
All the wavelengths used in each extracted spectrum obtained in Section 2.2.1 were summed for every exposure to construct a photometric white-light curve. Furthermore, binned-light curves were constructed by partitioning the spectra into distinct spectro-photometric bins that were used to determine the change in transit depth over wavelength (i.e., the transmission spectrum). The binning scheme was designed by considering spectro-photometric precision, the overlap of spectral bands from different observations, high telluric absorption regions, and the desire to properly probe for atmospheric features, such as a scattering slope, sodium, and potassium lines. The bins used to construct the transmission spectrum are shaded grey in Figure 1 and their wavelength coverages are listed in the first column of Table 8. An overview of the white-light and binned light curve detrending technique implemented for our final light curves is provided in sections 3.1 & 3.2. However a more detailed description of our detrending routines are given in Appendix B. This section ends (3.3) with an explanation of how we determined the best detrending methods for our data.

White Light Curve Fitting
Our first step in detrending the VLT/FORS2 and Magellan/IMACS white-light curves was a sigma clip- Figure 1. Median extracted spectra of WASP-96 and comparisons, excluding the oversaturated COMP 3. The top row shows IMACS data (UT170804 and UT171108) and the bottom shows FORS2 data from Nikolov et al. 2018 (UT170729 and UT170822). Each spectroscopic bin used is shaded in light gray and demarked by dotted lines. Telluric regions with less than 0.1 transmission are shaded in light red. The only gap in the binning scheme is the strong telluric region from 7594.0-7672.0Å.
ping of the raw light curves (target divided by mean of comparisons). This was done by individually evaluating each data point in the light curve with a moving average of 11 points centered around the point of interest. If the value of the specific point deviated by more than 3σ from the mean it was removed. We opted for this method because it does not depend on an initial transit model fit, which is often used as a sigma-clipping criteria. It resulted in zero, two, three, and two points being removed from each transit in chronological order.
We then used the PCA+GP routine (see Appendix B.1), to fit the sigma-clipped white-light curves. The transit parameters used in the fits were quadratic limb darkening (LD) coefficients, q 1 , q 2 , planet orbital period, P , semi-major axis relative to the stellar radius, a/R s , the planet-to-star radius ratio, R p /R s , impact parameter, b, mid-transit time, t 0 , eccentricity, e, and the argument of periastron, ω. q 1 and q 2 are parameterizations of the more commonly used u 1 and u 2 quadratic LD parameters, where q 1 = (u 1 + u 2 ) 2 and q 2 = u 1 /2(u 1 + u 2 ) (Kipping 2013). For our analysis we had to express b in terms of inclination, i, as We adopted a circular orbit for WASP-96b, based on Hellier et al. (2014), and used a quadratic LD law as in Nikolov et al. (2018). We also held P (3.4252602 days), a/R s (8.84), and i (1.486 radians) fixed to the values used by Yip et al. (2021) in order to later combine the optical and HST spectra. The quadratic LD parameters were set to be uniform from 0 to 1, and R p /R s had a normal prior with mean value of 0.115 and a standard deviation of 0.02. Here priors were based off of the R p /R s values obtained by Nikolov et al. (2018). For t 0 we used normal priors with uncertainties of 0.005 days (7.2 minutes) and mean values of t 0 − 2, 450, 000 =7963.33672 [MJD], 7970.69261[BJD], 7987.31195 [MJD], and 8066.59828[BJD] days for the transits in chronological order 6 . The best-fit white light curves and their corresponding orbital parameters are shown in Figure 2 and Table 3. We note that the difference in depth for the last transit (UT171108) is because the orbital parameters were held fixed. When allowed to be free, more consistent depths are obtained, and the other system parameter still agree within their uncertainty ranges. However, the absolute depth is not as relevant, as will be discussed in Section 3.3.

Binned light curve fitting
The next step on the path to produce the transmission spectrum is fitting the binned light curves to obtain values of R p /R s as a function of wavelength. With the uncorrected binned-light curves in hand, we first used the same 3σ clipping process discussed in Section 3.1 for each bin. Then the bins were detrended with a combination of common mode correction and polynomial fitting (CMC+Poly), where the CMC term (shown in Figure  2) was obtained from a PCA+GP fit of the white-light curve (detail in Appendix B.2). We also fixed all parameters, aside from q 1 , q 2 , and R p /R s , to values obtained by the PCA+GP white-light fit obtained in Section 3.1 7 . The bins used and their corresponding radii, along with plots of the raw (target/comparisons) and detrended light curves can all be found in Appendix D. We determined the CMC+Poly routine was best to detrend the binned light curves for this dataset, by testing its effectiveness against synthetic data, which is discussed in the following section (Sec. 3.3).

Comparing the performance of GP and
CMC+Poly with Synthetic data A number of techniques have been used to detrend the light curves of exoplanet transits (i.e. PCA e.g. Jordán et al. 2013, GP e.g. Gibson et al. 2012, CMC e.g. Gibson et al. 2013), but there are few instances in the literature that compare the effectiveness of one method over another Panwar et al. 2022, but they still do not compare against synthetic data). Furthermore, using two separate analysis techniques could produce widely different results (e.g. Sing et al. 2015 andGibson et al. 2017). Thus, it is important to ensure the method utilized for a particular dataset yields as accurate results as possible and provides consistent uncertainty measurements. To instill confidence in our analysis procedures of the binned light curves, we synthesize transmission spectra to compare the precisions and accuracies of obtained transit depths to synthesized values. We test the synthetic data against two common transit reduction techniques: 1) a Gaussian Processes (GP) routine, and 2) Common-Mode-Correction followed by polynomial correction (CMC+Poly).
To test both methods, we created 500 synthetic light curves similar to the VLT/FORS2 observations. A detailed description of how the synthetic data were created can be found in Appendix E, but in general we simulated 50 flat (constant R p /R s ) transmission spectra out of 10 bins each. All of the 10 bins had approximately the same shot noise levels (∼ 400 ppm), which was assigned to produce a white-light curve noise level of about 125 ppm, consistent to the VLT/FORS2 white-light photon noise limits (∼63 and 162 for transit UT170729 and UT170822, respectively). All bins in a given spectrum had the same overarching systematic generated by a random draw in a GP distribution, where the GP was constructed to be correlated to a few of the UT170729 observation's auxiliary parameters. Then each individual bin had additional systematics generated from up to a second-order polynomial fit using other auxiliary parameters with random polynomial coefficients. Each bin also had their own quadratic limb darkening coefficients. Because the VLT/FORS2 observations only had one comparison star, we also only created one comparison star in our synthetic data. As outlined in Appendix B.1, when there is only one comparison star, PCA cannot be done and the PCA+GP routine becomes a GP routine with the comparison star used as a linear regressor. This is why in the synthetic analysis we refer to this method as a GP routine, but when using the same routine with the Magellan/IMACS data, which has two comparison stars, we refer to it as PCA+GP. Images of each step in the synthetic data production process are shown in Appendix E.
After the synthetic spectra were produced, we fit all 50 white light curves using the GP method in the same way described in Appendix B.1. We used the results of each white light curve as priors for the analysis of the binned light curves (see Appendix E). With those whitelight parameters we produced the transmission spectra following first the GP process to detrend the bins, and then with the CMC+Poly process, which are both discussed in Appendix B. Both methods used to detrend the binned-light curves utilized the parameters determined from the GP detrended white-light data, and the CMC+Poly method used that white-light curve model to produce the CMC term. This allowed us to compare the effectiveness and accuracy of both methods given that each true depth is known. Additionally, because all 50 spectra were flat with the same inputted depth, we could collectively compare the results from every reduced bin.
To first understand the accuracies of both methods, we plot a histogram of the difference of each bin's obtained depth relative to the true depth (R p /R s = 0.1157), Table 3. Fitted white light curve values. The period (P=3.4252602), semi-major axis (relative to stellar radius, a/Rs=8.84), and inclination (i=85.14) were all held fixed to parameters used by Yip et al. (2021). The mid-transit times are in terms of MJD for the FORS2 observations (UT170729 and UT170822) and BJD for the IMACS observations (UT170804 and UT171108). Note: the difference in depth for the last transit is because the orbital parameters were held fixed, when allowed to be free more consistent depths are obtained.   Table 3, and the residuals (red points) produce by subtracting the detrended data from the model. The value of σ given on each residual panel is the standard deviation of the residuals in ppm. Bottom: The common mode correction term, which was produced by dividing the top LC by the best fit transit model (the black LC in the middle panel, see Appendix B.2 for detailed discussions). This term was used to correct for common systematics in the binned light curves.
shown in the first column of Figure 3. From this we see that though the true depth is on average obtained using the GP method (first row), neither method consistently reproduced the true depths, where only 46.32±2.82% 8 8 uncertainties obtained through bootstrapping.
of the bins detrended using the GP method obtained a depth 1σ from its average uncertainties levels. The mean uncertainty of obtained R p /R s with the GP method was 0.00774. This is significantly worse using the CMC+Poly method, which only has 12.54±2.00% of the bins within its 1σ level, where the mean uncertainty with the CMC+Poly method was 0.00127. This suggests that if one were to fit any given transmission spectrum with the GP routine there is only a 46.32±2.82% likelihood of those obtained depths being consistent with the physical depths. This is only worse using the CMC+Poly method. One likely strong contribution for this is that the bins are already biased by the white-light fit, given that the bins reduction is dependent on the white-light's. This is especially the case for the CMC method, because the common mode term, which drives the binned light curves correction model, can only be determined from the white-light fit. To support this we compared the obtained binned depths relative to their corresponding white-light fits' depths and find much more consistency of the depths with 41.79±2.82% and 57.28±2.88% within the 1σ average uncertainty for the CMC+Poly and GP methods, respectively. Still, neither method can consistently re-obtain the true white-light depths. However, when comparing each bin to a corresponding mean depth determined by averaging all 10 bins for a given transmission spectrum (column two of Figure 3), we find 78.21±2.42% and 70.72±2.67% are within 1σ for the CMC+Poly and GP methods, respectively. This implies that though the absolute depth is not consistently obtained, a relative depth is for both detrending methods. If that is the case, it provides justification for the required offsets often needed when combining transmission spectra from different nights and/or different instruments (e.g. Mc-Gruder et al. 2020;Weaver et al. 2021;Yip et al. 2021). These results would also explain why the inconsistency in white-light depth found for transit UT171108 (Table  3 and Fig. 2) does not imply that the transmission spectrum of that night is incorrect. The likely scenario for that dataset is the GP routine misestimated the whitelight depth (likely because of the fixed parameters), but this is common for the majority of the synthetic data as well. However, the relative binned depths can still be preserved using this white-light depth for the CMC correction, because the difference in depths from one bin to another is still maintained.
To further highlight that the structure of the spectrum is preserved we plot the standard deviation of all bins in a given spectrum. Since, each synthesized spectrum is flat, each bin (of a given spectrum) should not significantly vary from one another. This is exactly what we find, where every spectrum has a bin standard deviation significantly lower than that spectrum's average R p /R s uncertainty width (see column three of Figure 3). Furthermore, the third column shows that the standard deviation is higher for the GP method, implying that the CMC+Poly method is more consistent. Additionally, because the CMC+Poly method inherently pro-duces lower uncertainties (average R p /R s uncertainty of 0.00127 for CMC+Poly compared to 0.00774 for GP), for this set of data, the CMC+Poly method is consistently more accurate and precise than the GP method. As such we elect to use the CMC+Poly method to detrend all binned light curves.
It is important to understand that these finding are only for this specific dataset, which is constructed to mimic the particular VLT/FORS2 observations of WASP-96b; i.e. the synthetic data has a relatively low shot noise level (∼400 for bins), the bulk of the systematics are dominated by the white-light systematics (see Appendix E), and there was only one comparison star used. Therefore, this should not be extrapolated to every dataset. For example, we reduced the VLT/FORS2 data with just a CMC correction and found a similar fit compared to using CMC+Poly, outlining how little chromatic systematics persists in the given VLT/FORS2 data. If chromatic systematics are more dominant in a dataset or there are multiple comparison stars, then it could be possible that the data would be better reduced with a method like PCA+GP, which is less heavily dependent on the initial white-light fit. For this reason, we still ran the ACCESS data through both the CMC+Poly and PCA+GP routines as described below to confirm CMC+Poly still performed better for those data. In summary, one should explore the best detrending method for specific data before assigning one.

OPTICAL TRANSMISSION SPECTRUM FROM THE VLT/FOR2 AND ACCESS DATA
We produced the transmission spectrum by plotting the R p /R s found from each detrended binned-light curve against that wavelength interval (bin). Three optical transmission spectra were created. One from combining the two FORS2 transits where we weighted averaged all overlapping bins, another from combining the two IMACS transits, and the third from combining all four transits (global optical spectrum). These are plotted in Figure 4. As justified in Section 3.3, we fit an offset when combining each of the spectra. For a given combined spectrum a white-light depth was determined for each individual spectrum using a weighted mean of only overlapping bins. The weighted mean of these whitelight depths was then used as the central depth for which each individual spectra was offset to. The average precision of each bin for the combined IMACS, FORS2, and global optical spectra are 0.00129, 0.00094, and 0.00076 R p /R s , respectively.
The IMACS data has more scatter and lower precision than the FORS2 data (see Fig. 4), even though both use two transits. One explanations to this is that the size Figure 3. Histograms used to outline the biases and precisions of both the GP (top row) and the CMC+GP (bottom row) routines. Left column: We fit each of the 500 total synthetic binned light curves using the GP and CMC+Poly routines; then take the difference of each fitted binned depth from the true depth (Rp/Rs=0.1157) used to produce all synthetic data. That difference was divided by the uncertainty of the fitted depth, in order to obtain the bias from the input depth, relative to the uncertainties. The distribution of the relative differences are plotted. Middle column: This is similar to the left column, but instead is the difference of each fitted binned depth from the mean depth of all 10 bins in its corresponding transmission spectra. This provides a relative bias on obtained transit depths. In this case 'relative' means relative to the transmission spectrum's mean depth. For both the left and middle columns, the black dashed line represents the average 16 and 84 percentiles of the histograms, the middle solid black line is the 50 percentile, and the red dotted line is 0. This corresponds to the true value of the depth, thus when the dashed black line and dotted red line do not overlap the routine has an inherent bias on that obtained depth. Right column: This is the standard deviation of each of the 50 transmission spectra. It shows the measured scatter of the transmission spectra, where the true scatter is 0, because the spectra were made to be flat. Here the black dashed line is the mean width of all bins uncertainties. Note that though both routines produce scatter well under their respective mean uncertainty widths, because the mean uncertainty width is over 6 times larger for the GP routine, the CMC+Poly routine is much more precise.
of Magellan Baade (6.5 m) is smaller than VLT (8.2 m), meaning less collecting area and more shot noise. Additionally, the difference in comparisons used could cause the IMACS data to have more systematics. The comparison Nikolov et al. (2018) used (D=0.1) was more similar to the target than either of ours (D=0.24 and 0.14; see Table 1). The importance of the comparisons is emphasized by Ahrer et al. (2022), where they were able to detect a strong sodium signal and a super-Rayleigh scattering slope in the atmosphere of WASP 94Ab with just one comparison and one transit, partially attributed to the comparison being nearly identical in spectral type and location in the sky. Furthermore, for both IMACS transits the baselines, which is needed to properly correct for systematics, were relatively short (seen in Fig 2), where ideally the baseline should be the same length as the transit duration.
Likely the largest cause of deviation between the two datasets is that the IMACS data has more chromatic systematics. This can be seen when looking at the binned light curves in Appendix D. Why the chromatic systematics are stronger for the IMACS observations is unclear. It might be due to chromatic differences in comparisons, or some other issue. Regardless, the CMC method relies on the assumption that the bulk of the systematics found in the white-light curve can be applied to each of the binned light curves. If each bin's systematics are more unique the initial CMC is not as effective, and might even introduce more spurious signals that the polynomial fit has to correct for. Concern over this led us to analyze the IMACS data with both CMC+Poly and PCA+GP routines. In doing so we found that the two transmission spectra were consistent to one another, with each bin deviating from one another by 0.562σ on average. However, the uncertainties of the PCA+GP method were nearly 5 times larger than the CMC+Poly's. Given both methods produced similar features, the accuracy of the CMC+Poly method is supported by Section 3.3, and the desire to use the same routine for all datasets we choose to continue using the CMC+Poly routine for the IMACS data.

Rotational Period
We infer the stellar rotational period by combining the radius of the star with the projected stellar rotational velocity, v sin i, as well as fitting a periodic signal to the photometric monitoring data. The v sin i for WASP-96 was determined by Hellier et al. (2014) using the Euler/CORALIE spectrograph. It was not well constrained but was found to be 1.5±1.3 km s -1 . This provides a 3-σ upper limit on v sin i of 5.4 km s -1 . Combining this with the stellar radius of 1.05±0.05 R (Hellier et al. 2014) we estimate a 3-σ lower limit on rotation period of about 9.8 days. Thus, we scanned the photometric data for periodic peaks within a range of 9.8 to 300 days using Lomb-Scargle periodogram analysis (Lomb 1976). With the binned combined TESS data the highest periodic peaks were 35.9, 37.7, and 31.2, and with the combined ASAS-SN data they were 28.3, 28.8, and 11.2. However, for all peaks the False Alarm Probability (FAP) were greater than 10 -1 , thus no significant peaks were found with the periodogram analysis. This is likely because the photometric modulation is in general very small (see Sec. 2.4 or Sec. 5.3).
We then jointly fit the ASAS-SN and binned TESS data using the Juliet package (Espinoza et al. 2019a) with a semi-periodic kernel. In this joint fit, only the period and timescale terms (see equation 9 of Espinoza et al. (2019a)) are set common for all photometric campaigns. All other parameters were specific to the combined TESS (sector 2 and 29), the V band ASAS-SN, or the g band ASAS-SN data. When doing this we assume the periodicicity is due to stellar inhomogeneities 9 coming in and out of view as the star rotates, which is often done (e.g., Hirano et al. 2012;Sing et al. 2015;Newton et al. 2018). We use wide uniform priors on the period from 9.8 to 50 days, which is consistent with what the periodograms and v sin i weakly suggest. The resulting period was found to be 31.3 +0.3 −3.4 days. Given the observed correlation of rotational period to activity levels (e.g. Pallavicini et al. 1981;Reiners et al. 2014), this also implies that WASP-96 is a relatively quiet star. Figure 5 shows the photometric monitoring campaigns along with the Juliet best fits.

Ca ii H & K lines
The Ca ii H & K lines were measured using two R = 48000 spectra collected with the MPG 2.2-m/FEROS spectrograph on 19 December 2016. Each spectrum has an average SNR of 27. The reduced data was acquired from ESO's online archive. Figure 6 shows the Ca ii H & K lines and we see no emission in the core of either of the lines, implying that WASP-96 is a relatively inactive star.

Photometric Modulation
The analysis in the above two subsections suggest that WASP-96 is a quiet G-type star. However, we use the amplitude of the photometric modulation of the star to quantify its level of activity in terms of spot covering fraction and spot temperature, which are the parameters needed for the retrievals (see Sec. 6). This is done by using equation 2 and Table 2 of Rackham et al. (2019) and twice the TESS MAD of 0.0058 (see Sec. 2.4), assuming that the MAD is approximately the amplitude of sinusoidal variation. We use the TESS data because the vari- . In all three plots the black points correspond to the final spectrum produced by taking the weighted average of all overlapping bins. When combining each spectrum an offset is applied, so the means of each individual spectra are consistent, before the bins are averaged together. The center of the Na I and K I absorption are plotted as dotted gray and yellow lines, respectively. ability in the ASAS-SN data seems to be driven by more than just the star (likely low precision), which is apparent when comparing the MAD of the TESS and ASAS-SN monitoring campaigns of the same target. Following that procedure, we estimate a spot covering fraction of WASP-96 1.35±0.97% assuming only spots are present and 1.40±1.09% assuming spots and faculae are both present.
To capture these limitations in the retrieval analysis, we constrained the heterogeneity covering fraction, f , to have normal priors with a mean of 0.014 and standard deviation of 0.009. For a G8 star like WASP-96, we should expect the heterogeneity temperature contrast, ∆ T, to be roughly 1600 K (see Berdyugina 2005; Rackham et al. 2019). We used wider uniform priors from -3000 to 3000 K on ∆ T, see Appendix F.

RETRIEVAL ANALYSIS
We combined each of the three optical spectra discussed in Section 4 (IMACS only, FORS2 only, and global spectra, see Fig. 4) with the HST/WFC3 data. We ran each of these three optical to near-IR spectra against two retrievals: PLATON (v3, Zhang et al. 2019) and Exoretrievals (introduced in Espinoza et al. 2019b). We used both PLATON and Exoretrievals, because their differing approaches of modeling exo-atmospheres provides different insights about the observed transmission spectra. The key differences between PLATON and Exoretrievals are: (1) PLATON includes collision induced absorption, where Exoretrievals does not, (2) Exoretrievals models the transmission spectrum using a semi-analytical formalism with an isothermal, isobaric, atmosphere and non-equilibrium chemistry, but PLATON uses an isothermal atmosphere and imposes equilibrium chemistry 10 ,  (Yurchenko et al. 2013;Tennyson et al. 2016). Additionally, testing a transmission spectrum against multiple retrievals provides a robustness against assumptions that are unique to each retrieval (e.g. McGruder et al. 2020;Kirk et al. 2021).
Both retrievals used nested sampling to explore their parameter space (dynesty, Speagle 2020, for PLATON and PyMultiNest, Buchner et al. 2014, for Exoretrievals), as such we used the differences in log Bayesian evidences, ∆ ln Z, to test which specific model was favored over another. Following the same prescription as McGruder et al. 2020(from Trotta 2008Benneke & Seager 2013), we interpreted the ∆ ln Z values in a frequentist significance as: |∆ ln Z| of 0 to 2.5 is provides useful insights that could not be obtained without this assumption.
inconclusive with < 2.7σ support for the higher evidence model, |∆ ln Z| of 2.5 to 5 corresponds to a moderately significant detection of 2.7σ to 3.6σ, and |∆ ln Z| ≥ 5 corresponds to strong support for one model over the other.
With Exoretrievals, we tested the spectra against having either water, potassium, sodium, water and sodium, and all three species in the transmission spectra. Along with these molecular and atomic species, we tested if the spectra warranted high altitude scattering agents, stellar activity, and a combination of scatters and activity. Lastly, a model with no features (flat spectrum) and only activity features was tested. In total, we tested 22 different combinations of models. For all models, aside from the flat spectrum, a reference radius (parameterized with f ) and reference pressure (P 0 ) were fit. These are the pressure and corresponding radius where the atmosphere is optically thick in all wavelengths.
With PLATON we tested a model with scattering agents, stellar activity, models with both scattering agents and stellar activity, and a model without either (clear). All models fit for a reference radius (R 0 ) corresponding to the radius of the planet at an arbitrary reference pressure, which was set to 1 bar. The models also fit for a pressure (P cloud ) where the atmosphere is optically thick.
The priors did not change for each spectrum we ran against the retrievals and we set the priors between PLATON and Exoretrievals as consistent to one another as possible. For all three spectra and both retrieval models we fit an offset between the optical and near-IR data 11 , which is justified given that the detrending method used for the optical data does not preserve absolute depth (see Sec. 3.3).

IMACS & FORS2 Transmission Spectra
Tables of ∆ ln Z for both datasets against both retrievals are shown in Tables 4 & 5. In the tables the ∆ ln Z values for each model was determined by comparing the clear model, for PLATON, or the clear model and flat spectrum, for Exoretrievals, to the other models. One can also compare any given model from another by examining the difference of ∆ ln Z's between one another.
Using Exoretrievals, we find from both the IMACS and FORS2 spectra, independently, that the model with the highest ∆ ln Z is one with water and sodium (no potassium). We find that models with additional stellar 11 PLATON v3 could not fit an offset between different dataset, as such, we modified the code to do so. activity or scatters do not make a significant difference in the ∆ ln Z, aside for the IMACS dataset that has a significant decrease in ∆ ln Z when including scatters (∆ ln Z decreases by 4, compared to a model with only water and sodium). The feature blue-ward of 500 nm is likely what the retrievals with scatters and activity are attempting to fit. However, because the feature is not extreme, relative to the uncertainties (especially for the IMACS dataset), there is not sufficient support for the extra complexity of these models. As such we can only claim a tentative detection of a slight blue-ward slope either attributed from the star or a possible Rayleigh scattering slope. For the IMACS data the best fit model that included stellar activity found spot parameters of ∆T =-150 +1300 −1700 K and f =0.0114 +0.0085 −0.0072 . For FORS2 those activity parameters were ∆T =-1060 +1100 −1200 K and f =0.0124 +0.0083 −0.0080 . The best model which included a haze scattering slope for the IMACS data obtained a haze power law of γ=-9.3 +7.8 −3.4 , and γ=-9.7 +5.1 −3.0 was obtained for the FORS2 best scatters model. With Exoretrievals γ of -4 corresponds to a Rayleigh slope (see Appendix D of Espinoza et al. (2019b)), implying that even though the retrieved slopes are unconstrained, they are consistent with Rayleigh scattering. The detections of H 2 O and Na were highly significant (∆ ln Z > 11) for both datasets. The highestevidence retrieval model parameters for this data subset and the others can be seen in Table 6 Interestingly, with PLATON there is less consistency amongst the two datasets. All PLATON models with the FORS2 dataset were indistinguishable from one another, which is in agreement with what Exoretrievals found for the FORS2 dataset. That is the slope blue-ward of 500 nm could be explained by either activity or a scattering slope, but neither is required for the data. For the FORS2 dataset the best fit model including activity found ∆T =-1040 +1200 −1000 and f =0.0143 +0.0083 −0.0089 ; the best fit model including scatterers found a scattering slope, α, of 5.7 +5.3 −6.0 (α=4 is Rayleigh); and the highest evidence model (clear model) obtained a metallicity, log 10 (Z/Z ), of 0.26 +0.75 −0.78 dex, C/O of 1.11 +0.55 −0.39 , log 10 (P 0 ) of 0.3 +1.7 −1.3 bars, and T p of 987 +92 −52 . Contrarily to PLATON, the IMACS data obtained a significantly higher evidence (∆ ln Z > 4) for the models that included scatterers. The retrieved values with IMACS differed from the fit with FORS2 likely because the Na feature was not as prominent in the IMACS data. Thus, the retrieval increases the metallicity (log 10 (Z/Z )=0.51 +0.53 −0.75 dex) and decreases the temperature (T p of 752 +62 −94 K) to mute the Na feature. In turn, the best retrieved spectrum is overall different from the FORS2 spectrum. Still the retrieved scattering slope of both datasets was consistent with a Rayleigh scattering slope, though not well constrained.

Combined Transmission Spectrum
The ∆ ln Z of all models run using the global data against both retrievals is shown in Table 7. When running the retrievals against the combined data, we find a similar trend as that for the individual datasets (aside for IMACS with PLATON). That is, a major detection of water and sodium, and tentative signs of a blue-ward slope attributed to stellar activity or a scattering slope.
For Exoretrievals the evidence, ∆ ln Z = 19.45, is even stronger than the individual datasets, showing that combining the data does improve the overall detection of the species. The corner plot of the highestevidence model retrieved by PLATON (one with scatters) and Exoretrievals (one without activity and scatters but including H 2 O and Na) is shown in Figures 7 and 8. Figures 9 and 10 show the global data with over-plotted models that either include stellar activity, a scattering slope, or none (flat and clear) for both retrievals. We elect to use the global transmission spectra which included all optical data to interpret the retrievals and atmosphere of WASP-96b, because the maximum relative retrieved evidence and transmission spectrum precision are higher when including both optical data. −0.4 ). The Na mixing ratios obtained here are consistent with solar abundance (log 10 (N a) = −5.78 ± 0.03; Asplund et al. 2021) and WASP-96's stellar abundance (Nikolov et al. 2018). The water abundance is also consistent with water abundances of Jupiter constrained by Juno (log 10 (H 2 O) = −2.6 +0.27 −0.44 ; Li et al. 2020). This implies that the formation process for WASP-96b might have been similar to our own Jovians, but that conclusion is limited by the uncertainties in the measured mixing ratios.
Theoretical predictions for clear atmosphere planets predict absorption signatures from the optical alkali features of Na and K (Seager & Sasselov 2000). However, our observations of WASP-96b show no observable evidence of K absorption, even though strong Na I and H 2 O features imply WASP-96b has a clear transmission spectrum. There are multiple possibilities as to why potassium was not significantly detected. The most obvious hindrance in detecting K was the gap in the transmission spectrum (759.4-767.2 nm) that was nearly aligned with the center of the most prominent K doublet feature (768.15 nm). Essentially, this only allowed K to be constrained by its wings, which might not be enough even if K is present. The possibility of K being present in the atmosphere is hinted in Figure 9, where PLATON's retrievals by default include K because of imposed equilibrium chemistry and relative abundances determined by metallicity. In those models we see that the transmission spectrum is somewhat consistent with the K doublets' features, where the blue-ward bin in the K wing is within the model's 1-σ interval and the red-ward bin is not. The fact that the evidences of the Exoretrievals models including K are lower than those excluding it is likely only because the data could be explained with or without K, given the gap in the central doublet band. Another possibility is that the potassium is locked away in KCl, which has been suggested to have a highly efficient formation rate in this temperature regime (Gao et al. 2020). Further space-based observations, or ground based high resolution observations will be needed to determine if abundant K absorption is truly present in the atmosphere of WASP-96b.

Activity and Optical Slope
For the models that included activity, retrieved ∆T temperatures from both retrievals and all datasets were consistent (< 1-σ) with no active regions (∆T = 0 K), implying a non-detection of stellar activity. Therefore, stellar photometry (see sec. 5.1 & 5.3), stellar spec- Figure 7. Corner plot of the PLATON best fit retrieval model with, which is run against the combined Magellan/IMACS, VLT/FORS2, and HST/WFC3 data. Its corresponding transmission spectrum is shown in Figure 9 (red model). Vertical dashed lines mark the 16% and 84% quantiles.
troscopy (see sec. 5.2), and the transmission spectrum retrieval analysis all support the idea that WASP-96 has very little activity.
H 2 Rayleigh scattering is expected blue-wards of ∼450-550 nm in a hydrogen dominated atmosphere without high altitude clouds (i.e. Seager & Sasselov 2000;Lecavelier Des Etangs et al. 2008). The fact that we were unable to significantly identify one is likely due to limitations of the data, given that we only have two and a half broader bins in this wavelength range. The slope could be better constrained with HST/STIS 12 or HST/WFC3/UVIS 13 observations. This would likely be sufficient to distinguish from stellar activity, because the star is relatively quiet. Therefore, when imposing these constraints, as was done in Section 5.3, the only viable activity features produced are very shallow slopes.
13 WFC3/UVIS G280 covers 0.2-0.8 µm and has better throughput in the blue than STIS  This is outlined in Figures 9 and 10 where the models with activity fit are shown in gray for both figures. Furthermore, the activity of quiet stars, like this one, is dominated by faculae (Reinhold et al. 2019;Rackham et al. 2022), which produces the opposite signal of what is expected of a Rayleigh slope, when the activity regions are unocculted. Meaning that large cold unocculted spots, needed to mimic a Rayleigh slope, would be in direct contradiction to all other observations of Figure 9. The final transmission spectra of WASP-96b, which include the combined FORS2 and IMACS data (purple) and the G102 (green) and G141 (pink) data both from Yip et al. (2021). Best fit PLATON retrieval models that include stellar activity (grey), scatters (gold), and a clear atmosphere (red) are also plotted, with there 1-σ confidence interval shaded in the same colors. Because for each of the three models, the optical offsets were slightly different, what is shown here is the mean depth where the true offsets were 0.00476 +.00040 −.00039 , 0.00414 +.00036 −.00035 , and 0.00467 +.00039 −.00034 for the activity, scatters, and clear models, respectively. As can be seen in Table 7 the model with the highest evidence is one with scatters; however, no model has an evidence strong enough to favor it over another.

WASP-96. Thus, it is clear that if a blue-ward scatter-
ing slope is found with additional observations, the most viable explanation would be attributing the feature to the planet.
Though the data is not sufficient enough to significantly distinguish between a model with and without a scattering slope, all retrieved scattering slopes were found to be consistent (< 2-σ) with a Rayleigh scattering slope. If this can be confirmed the Rayleigh like slope would best be attributed to H 2 scattering.

Aerosols
The higher reference pressures obtained by PLATON and Exoretrievals (a few to tens of bars) both agree with one another and supports the idea that if WASP-96b hosts thick aerosol layers, they are confined to beneath the top of the atmosphere. The strong water features and broad Na I absorption wings observed also support this conclusion. Additionally, when fitting for aerosols with Exoretrievals the cloud cross section, σ cloud (see Espinoza et al. (2019b) Appendix D) was initially set to be very wide, with log-uniform priors from -80 to 80 for σ cloud . In doing so, we found that across all 3 datasets (i.e. optical data from IMACS, FORS2, and combined) and all iterations of models that include scattering (i.e. no mater which molecules were included or if including activity) the means of σ cloud were nearly the same at ∼ -55. This value is so low it effectively means that the retrievals find no evidence of clouds affecting the spectra, which is consistent with the strong water and sodium features, and the larger reference pressures.
Aerosols are likely present in the atmosphere of all planets, but when aerosols are not warranted by the retrievals that implies that the aerosols are not thick enough at the high altitudes in which transmission spectroscopy probes to significantly mute the spectral features in the observed wavelengths, and/or they are not significantly present in the terminator. Also, though the retrievals, suggests that data can be explained without thick aerosols, color dependent scatters may still affect the transmission spectra. In particular, the possible Rayleigh scattering slope in the optical cannot be confidently refuted in the combined optical data.

Temperature and C/O ratio
The terminator temperatures found with PLATON (T p =877±40 K) and Exoretrievals (T p =830 +160  Goyal et al. 2018) to perform retrieval analysis on their data. Our retrieved temperatures are also far from the 0 albedo equilibrium temperature of 1285±40 K (Hellier et al. 2014). A lower retrieved terminator temperature is often found in the literature and is expected when applying a 1D model Figure 10. Same as Figure 9, but with Exoretrievals. In this figure the optical (combined FORS2 and IMACS), G102, and G141 observations are cyan, white, and red, respectively. Again, the evidence of each plotted model (activity -gray, scattersgreen, and clear -purple) are indistinguishable from each other, but the clear atmosphere has the highest evidence. The offset with each fit were 0.00554±.00031, 0.00558±.00034, and 0.00547 +.00031 −.00032 for the activity, scatters, and clear models, respectively.
to probe a 3D atmosphere. 1D models are useful for fitting features found in transmission spectra substantially more quickly than 3D models. Unfortunately, they can artificially shift the retrieved temperature up to hundreds of Kelvin cooler than the true average temperature (MacDonald et al. 2020;Pluriel et al. 2020).
Other physical assumptions prescribed in the retrieval frameworks could also contribute to a discrepancy in retrieved temperatures (Welbanks & Madhusudhan 2021). The only feature in the observed spectrum that could directly constrain C or O was H 2 O, thus, the quoted C/O ratios should be taken lightly, given that it is only using the water features and chemical equilibrium constraints to obtain the ratio. Including the two Spitzer/IRAC points centered at 3.6 and 4.5 µm (program ID 14255), as was done by Yip et al. (2021), would include data near carbon bearing features (i.e. CO, CO 2 , and CH4). However, we elected to exclude these points because the two points have larger uncertainties (in wavelength and depth), which would not significantly constrain any of these carbon species, as can be seen in Yip et al. (2021). Given the lack of a constrained C/O ratio, it is hard to ascertain the formation region of WASP-96b, but JWST observations would be able to refine the C/O ratio with higher precision and higher spectral resolution observations in the near-to mid-IR wavelengths. The observed lack of aerosols obscuring features in the optical to near-IR spectrum of WASP-96b makes it an ideal target for such observations, which has the potential to provide key insight of where hot Jupiters formed and when/if they migrated.

Mass-Metallicity Trend
Our metallicity with Platon is consistent with the constraints from Nikolov et al. 2018 (log 10 (Z/Z ) = 0.4 +0.7 −0.5 ). Furthermore, it is within 3 σ agrement to the solar system mass-metallicity trend (explored for exoplanets by e.g., Wakeford et al. 2018;Alam et al. 2020;Wakeford & Dalba 2020), shown in Figure 11. In this figure we plot the metallicites and masses of the solar system giants and a linear fit of that data in log-log space. We also show planets with metallicities derived from molecular and atomic abundance of one to a few different species.
Looking at all exoplanets in the figure, the trend found for the solar system does not persist among the exoplanet sample. However, given that high-altitude aerosols make it more difficult to accurately determine molecular abundances, we also highlighted the five planets which are thought to have little high-altitude aerosols obscuring their spectra. When only including these planets, four of the five planets (WASP-17b, WASP-62b, WASP-94b, and WASP-96b) are less than 3 σ from the predicted values. Though most of the clear atmospheres prove to be consistent with the solar system mass-metallicity trend, it is still hard to claim that we can interpret this as most exoplanets undergo the same formation mechanisms as our solar system for multiple reasons. For one, all three of the consistent metallicities are in similar mass regimes, therefore, giving perspective only on a small fraction of exoplanets. Addi-   tionally, most of these metallicities were obtained assuming the abundances of one to a few species can be directly translated to the bulk metallicity of the atmosphere. How far this assumption is skewed from reality is unknown, since interactions among transport, chemistry, and phase changes may mean the elements are not well-mixed in the atmosphere (Zhang 2020). Furthermore, most of the planets in Figure 11 are hot jupiters/neptunes, which are outliers that are not representative to exoplanets in general. For these reasons, and others, there is much headway required in order to obtain a more complete grasp of exoplanet formation mechanisms and trends. Observing more planets like WASP-96b, with relatively clear atmospheres, in many wavelengths and improving our analysis processes around these observations is the best path forward. Interestingly, though the work from this paper and others (e.g. Alam et al. 2020;Wakeford & Dalba 2020) find no clear trend amongst atmospheric composition and mass, there has been a found mass-metallicity trend in terms of planet bulk composition (Thorngren et al. 2016, , inferred from structure models). Still, much work is needed in order to determine if there truly is or is not an atmospheric metallicity-mass trend before the root of this potential discrepancy can be explored further.

Correlations to aerosol formation rates
Understanding why WASP-96b is one of the few planets that has an observed transmission spectrum unobscured by aerosols is vital for advances in exoplanetology. This would allow astronomers to a priori determine what key characteristics make an exoplanet transmission spectrum clear and allow for more targeted exoatmosphere surveys. Figure 12 puts WASP-96b in a broader context, it was initially used by Stevenson (2016), who proposed a temperature-gravity trend in the formation of highaltitude clouds. They used the H 2 O-J index as a proxy for the cloud levels, which inherently assumes a similar absolute water abundance of all hot Jupiters. HAT-P-11b is a prime example of the limitations of using this index as a proxy for cloud formation, where its water feature is extremely strong (2.499±0.505), but its optical spectra revealed it to have a cloudy atmosphere (Chachan et al. 2019). Nonetheless, the H 2 O-J index provides an easy index to parameterize cloud levels. Multiple other planets have been added to the original figure (e.g., Alam et al. 2020;Weaver et al. 2021, etc.), and make it clear that the trend does not strictly exist. As seen in our Figure 12, WASP-96b (marked as a star) highlights the lack of an obvious trend even further, where it is near the fitted temperature-log(g) boundary line but is in fact one of the most clear atmospheres observed to-date. Thus, there is still extended work needed to isolate what causes high-altitude aerosol formation.

SUMMARY & CONCLUSION
We observed two transmission spectra of WASP-96b with Magellan/IMACS as part of the ACCESS survey. In the process of analyzing the data, we tested the precisions and accuracies of two commonly used spectroscopic light curve detrending techniques: A) Common mode correction followed by a polynomial fitting (CMC+Poly) and B) a Gaussian processes (GP) routine. Both routines were tested against simulated data, where we find that for data without substantial chromatic systemics the CMC+Poly procedure produces more accurate depths with higher precision. Additionally, we find that neither method (worse for CMC+Poly) was able to consistently reproduce absolute depths. This provided justification of fitting for offsets amongst transmission spectra from different nights and instruments.
The transmission spectrum from IMACS was then added with reanalyzed transmission spectra from FORS2, both of which were reduced using the CMC+Poly routine. The optical data was combined with a previously published HST/WFC3 (G102 and G141) transmission spectrum (Yip et al. 2021), collectively producing a nearly continuous coverage transmission spectrum from 400-1237 nm. This spectrum was run against two retrievals: PLATON and Exoretrievals. Both retrievals found that the terminator of WASP-96b was shrouded by little or no aerosols, as such it is still one of the most clear exo-atmospheres observed. More specifically PLATON found a metallicity, C/O ratio, reference pressure, and terminator temperature of log 10 (Z/Z ) = −0.49 1.0 −0.37 dex, C/O = 0.97 +0.65 −0.50 , log 10 (P 0 ) = 1.3 +1.0 −1.1 bars, and T p = 877±40 K, respectively. Exoretrievals found a terminator temperature, reference pressure, and water and sodium mixing ratios of T p = 830 +160 −140 K, log 10 (P 0 ) = 0.29 +1.86 −2.02 bars, log 10 (H 2 O) = -4.5±2.0, and log 10 (N a) = -5.4 +2.0 −1.9 , respectively. With the constraints on stellar activity imposed from photometric monitoring, neither retrieval had strong support for stellar activity explaining the data. Though there is a hint of a slight slope blue-ward of 550 nm, there was not substantial evidence supporting the need for a scattering slope. However, bluer, space-borne observations of the planet could discern if the hint of a blue-ward slope is indeed a true feature.
Finally, in an attempt to put WASP-96b in a broader context, we found that it along with three other clear planets (WASP-62b, WASP-94b, and WASP-17b) are a https://stellarplanet.org/science/mass-metallicity/ b https://stellarplanet.org/science/mass-metallicity/ consistent with the mass-metallicity trend observed in the solar system. However this sample is not complete enough to make overarching claims about the extrasolar giants' formation mechanisms compared to giant planet formation pathways in the solar system. We also find no clear correlation to what causes the aerosol formation rate in exo-atmospheres, and recommend more in-depth analysis of a higher sample of planets in order to find such a correlation.
The results reported herein benefited from support, collaborations and information exchange within NASA's Nexus for Exoplanet System Science (NExSS), a research coordination network sponsored by NASA's Science Mission Directorate. This material is partly based upon work supported by the National Aeronautics and Space Administration under Agreement No. 80NSSC21K0593 for the program "Alien Earths". This paper includes data gathered with the 6.5 meter Magellan Telescopes located at Las Campanas Observatory, Chile. We thank the staff at the Magellan Telescopes and Las Campanas Observatory for their ongoing input and support to make the ACCESS observations presented in this work possible. This work uses ob-  (2016) relative to the planetary surface gravities and equilibrium temperatures. Other planets with surface gravities and effective temperatures obtained from TEPCat are plotted as green circles for context. We also plot the dividing line in Tequ-log10(g) phase space that Stevenson (2016) proposed delineates between cloudy and clear planets. Where the authors say H2O-J indices greater than 2 is likely completely clear, and indices less than 1 are greatly obscured by clouds. The only planets with indices greater than 2 are HAT-P-11b (2.5±0.5), WASP-96 (2.4±0.7), and WASP-17b (2.3±0.9). WASP-96 (star) is shown to be one of the clearest observed planets, both through this index and in the optical, yet it sits very close to the dividing line. The data in this table was produced in Alam et al. As mentioned in Section 2.2, we did not use a blocking filter during the UT170804 transit, but we added the GG495 blocking filter in the UT171108 observations after we learned from other ACCESS observations that second order light introduces contamination in some cases.
To check for possible contamination in the UT170804 observation, we modeled its effect by convolving the unfiltered stellar spectra, which inherently is a convolution of the instrument's throughput and WASP-96's stellar spectra, with the G495 filter's throughput. This gave us the spectral profile of the UT170804 observation, if the G495 filter had been added, but with any second order light still present. Then we modeled the normalized continuum of this light curve against the normalized continuum of the UT171108 observations light curve. Because we assumed that the only difference between both continuum structures should be from the 2nd order light, we used the ratio of the two continua as a correction term for the second order light. This process is illustrated in Figure 13. We applied this correction method to each comparison star and WASP-96 spectrum observed on UT170804 (the exact correction term is unique per spectrum) to determine the effect of the second order light. We then used both the corrected and uncorrected spectra to produce a white-light curve and binned light curves, which are constructed by integrating either all wavelengths of light (white-light) or specific band passes (binned) in a given spectrum, and plotting each integrated spectral counts relative to time. Next, we compared the final white-light and binned transits with and without the correction, and found an insignificant difference between the two. This is likely, because the relative effect of the second order light is negated when dividing by the comparison stars (or using a PCA correction with the comparison stars). Given that the correction had minimal effects, in our final analysis we used the uncorrected spectra. Figure 13. The process used to correct for second order light. The left panel shows the normalized IMACS observations taken on UT170804, which had no filter (red); UT170804 convolved with the GG495 transmission profile (green); and UT171108, which had the GG495 filter (blue). The middle panel shows the normalized second order light correction, determined by dividing the filter convolved first night (green) by the second night (blue). The right panel shows normalized UT170804 (red), UT170804 convolved with the GG495 profile and corrected for second order light (green), and UT171108 (blue). Dotted lines of each spectra's modeled continua is overlayed in the same respective colors as their spectra. These steps are also applied to each comparison so every spectrum is corrected from second order light. The PCA+GP routine we used has been implemented often in recent years (Weaver et al. 2020;Yan et al. 2020;McGruder et al. 2020;Weaver et al. 2021), but we outline it here. We first model out commmon systematics found in all the comparison stars by performing singular value decomposition on a matrix composed of comparison light curves (L k (t)) in the form: where s i (t) is a set of signals representing the systematics affecting a given light curve and A k,i is the weight for each of those signals. This allows us to identify the principal components (i.e. PCA) of the light curve, which is driven by systematics. This is used rather than just dividing the target by the sum or mean light curve of the comparison stars, because when doing that you combine systematics that are unique to a specific star, instead of isolating which systematics persist amongst all stars. Therefore, this method is less likely to incorrectly divide out systematics in the comparison star that were not present in the target star.
To model out systemtics unique to the target's light curve we used a Gaussian process (GP) regression with a joint probability distribution of the form N [0, Σ], where the covariance function (Σ) is defined as K SE (x i , x j ) + σ 2 w δ i,j . Here, σ 2 w and δ i,j are a jitter term and the Kroenecker delta function, respectively. K SE (x i , x j ) is a multidimensional squared-exponential kernel of the form: where σ 2 GP is the amplitude of the GP and α d are the inverse (squared) length-scales of each components of the GP. The priors on the jitter term and σ 2 GP were both log-uniform, where the jitter term ranged from 0.01 to 100 ppm and σ 2 GP from 0.01 to 100 mmag. The prior on each α d was an exponential function, similar to what was done by Gibson et al. (2017) andEvans et al. (2018).
In the above equations index i denotes each time-stamp and d corresponds to a set of different time-dependent external (auxiliary) parameters used. For the Magellan/IMACS transits these auxiliary parameters were variation of air mass, full-width at half-maximum (FWHM) of the spectra, mean sky flux, position of the central pixel trace (perpendicular to the dispersion axis), and drift of the wavelength solution. For the VLT/FORS2 transits they were dispersion and cross-dispersion drift, variation of the FWHM of the spectra, air mass, mean sky flux, and change in rotator angle.
The PCA and GP components were combined using the following equation: where M k (t) is the (mean-subtracted) magnitude of the target star in the k th model, c k is a magnitude offset, N k is the number of PCA signals s i (t), A i,k is the weight for each signal in each models, T (t|φ) is the transit model with parameters φ 14 , and our GP component 15 .
We In the limiting case where there is only one comparison star, the PCA portion of the routine cannot be performed. In that case equation B3 becomes where M(t), T (t|φ), c 0 , and are the same as M k (t), T (t|φ), c k , and from equation B3, but without summing over different N k PCA signals. A is now a weight for the mean-subtracted comparison star magnitude, m c . This equation is what was used by Yan et al. (2020), because they only had one comparison star.
When this detrending routine can be used to its fullest (i.e. with multiple comparison stars) it is referred as "PCA+GP." In the case when there is only one comparison star, like in the FORS2 data, it is referred to as a "GP" routine. As discussed in Section 3.3 and Appendix E, the synthetic data only had one comparison star, so we in fact are comparing the GP and CMC+Poly routines in that analysis.
This general procedure can be used to detrend both the white-light curves and the binned-light curves. When producing the binned-light curves solely with PCA+GP we first used this method on the white-light data, then fixed all parameters based on the PCA+GP white-light fit (aside from q 1 , q 2 , and R p /R s ), then ran the PCA+GP routine against the binned-light curves. When producing the binned-light curves with the CMC+Poly routine we again used the PCA+GP method on the white-light data, used that white-light fit to obtain the CMC term, fixed all binned-light curve parameters based on the PCA+GP white-light fit (aside from q 1 , q 2 , and R p /R s ), and ran the CMC+Poly routine (see sec. B.2).

B.2. Common Mode Correction (CMC)
The general process in CMC is to first produce a partially detrended white-light curve, where the larger systematics common to the target and the comparison(s) are removed. Two common ways to do this is either by dividing the light curve of the target by the light curve of the star(s) or with PCA. Next, a best fit transit model for the light curve is found using a routine that fits for both a transit and additional systematics (e.g. GPs). This transit model is divided by the uncorrected white-light curve to produce a constant systematic correction term (common mode). The correction term is then used for each individual bin, because the systematics are expected to be relatively consistent throughout each spectral band pass. After the common systematics are removed, each bin has an additional detrending routine (e.g. GPs or polynomial correction) to correct for any residual chromatic systematics. Doing this technique has proven to produce relatively high precision (though accuracy was not tested) for retrieved transit depths of each band pass Nikolov et al. 2016;Gibson et al. 2017;Nikolov et al. 2018;Carter et al. 2020).
Our best fit white-light curve transit model (used for the common mode term) is obtained with the same steps of Appendix B.1. After the common mode correction is applied we fit the remaining systematics using 1st and 2nd order polynomial fits of all external parameters. We individually explore the posterior space of all possible combinations of systematic corrections. This means for each bin, including a fit without external parameters (just a single constant coefficient), 729 and 243 models were fit for the IMACS and FORS2 data, respectively. The posterior space was explored in three steps: first, we fit for just the polynomial systematic coefficients on the out-of-transit data using scipy.optimize.minimize (Jones et al. 2001) with the 'Powell' method, coefficient bounds of -5 to 5, and initial points of 0. Then we use the found coefficients as the initial start points of another scipy.optimize.minimize run which includes the in transit data and a transit model fit. Lastly, we use PyMultiNest as our final posterior exploration where the transit parameters found with scipy were used as priors on the mean value, while maintaining the prior bounds (see Sec. 3.1), and the found polynomial coefficients were used as the mean values for a normal prior distributions with a standard deviations of 0.05. Again, we used BMA to best combine posteriors from all explored posterior spaces. For the fits we held t 0 fixed to what the white-light curve fit found, thus, the only transit parameters fit were the LD parameters and R p /R s . There are differences in how this technique is implemented compared to Nikolov et al. (2018)'s method; however, it produces a nearly identical transmission spectrum, as shown in Figure 14.  Nikolov et al. (2018) (black), and a re-reduction of the same data (VLT/FORS2) with the same binning scheme using our CMC+Poly analysis discussed in section B.2. The two analyses produce similar mean Rp/Rs precisions (0.00108 for the original and 0.0079 our reanalysis) and have an average difference of depth of only 0.56-σ, suggesting strong agreement of the two data. An offset is applied so the means of both datasets are the same.

D. LIGHT CURVES
For each bin we estimated the theoretical photon noise precision σ w . This was calculated by taking the mean counts (per star) over each exposure, summing all the counts (N) in a given bin, and using √ N to estimate the shot noise for each star. The error of each star was propagated where the final (uncorrected) transit light curve was produced by dividing WASP-96's flux by the sum of the comparison stars' flux.
We also estimate the noise of the corrected binned light curves. This was done with two approaches. First we calculate the root mean squared (r.m.s.) of the residuals. To also include an estimate that is not dependent on the best fit transit model, we take the standard deviation of the out-of-transit data. Both methods produced similar precisions and we use the higher of the two for our estimates of the the corrected light curve precision.
We quantify the red noise by taking the ratio of the detrended light curve's noise level and the theoretical photon noise. We call this ratio β, as it is similar to the β term used by Winn et al. (2008) (though derived differently). The average β is 1.122, 1.671, 1.089, and 1.528 for each night in chronological order. Their corresponding R p /R s uncertainties are 0.001262, 0.002284, 0.000935, and 0.001485, correlated to the total light curve precision (red and white noise) but overall determined by how well the posterior space can be constrained, given the data. Table 8. Combined Magellan/IMACS (UT170804 and UT171108) optical transmission spectrum (Rp/Rs), combined VLT/FORS2 (UT170729 and UT170822) optical transmission spectrum, and global combined optical spectrum (all four transits). The data were produced implementing the reduction and detrending processes discussed in Section 4. These depths are offset so the mean depth is 0.1158.   Figure 2. This would also be the same detrending routine used to detrend the binned data, if the GP (PCA+GP for the IMACS data) routine was used rather than the CMC routine. The alphas are the GP inverse squared length-scales on the external parameter regressors. Alphas 0 to 5 correspond to the target dispersion drift, cross-dispersion drift, variation of the FWHM, air mass, mean sky flux, rotator angle.    Figure 19, but for VLT/FORS2 transit UT170822. Though there are a few cases (8th, 9th, and 12th bins) where β is slightly less than one. We do not interpret this as the light curve was over corrected, but given that they are nearly one, we see this as we achieved photon noise precision for those bins. Table 9. Near-IR transmission spectrum obtained from Yip et al. (2021). The G102 grism is from 8000-11250 Å and the G141 grism is from 11372-12180 Å.
In our initial analysis, we reduced the VLT/FORS2 transmission spectrum using the PCA+GP routine (just 'GP', given that there is only one comparison), discussed in appendix B.1, to reduce the white light curves and the binned light curves. When comparing the VLT/FORS2 transmission spectrum produced with this method to the original spectrum (Nikolov et al. 2018), shown in Figure 22. We found that the uncertainties from the GP method were over 3 times larger than when Nikolov et al. (2018) used their CMC+Poly method on the same data. This was independent of if the squared exponential kernel (see Appendix B.1) or the george Matern32Kernel was used. Naturally, this is concerning, because though both methods produce similar overall structures, Figure 22 implies that either CMC+Poly underestimates the uncertainties or GP overestimates it. As such, we test the accuracy and precision of both methods using synthetic data.
The first step to produce our synthetic data was to generate a transit light curve with parameters similar to what produces the transit light curve of WASP-96b. For generating all light curves we used batman (Kreidberg 2015) with the same time stamps as FORS2 transit UT170729 (i.e. 90 points covering ∼5 hours). The exact light curve parameters used are shown in Table 10.
Next, we generated a realization of a Gaussian Process, GP(t, instrumental_systematics), for the common systematics that affect all spectrophotometric bands. Here the 'instrumental_systematics' were the FORS2 transit UT170729 auxiliary parameters of airmass, fullwidth at half-maximum (FWHM), and rotational angle. Using george (Foreman-Mackey et al. 2014), we combined three squared exponential kernels each with an inverse natural-log length scale of -2 and a constant term of -5, -12, and -12 for the GP inputs of airmass, FWHM, and rotational angle, respectively. The values of the inverse length scale and constant terms were empirically deduced in order to produce systematics with amplitudes and structure similar to what was seen in the FORS2 transit UT170729 light curves. An example of the white light curve and its GP systematics is shown in Figure 23.
Our third step, was to produce the binned light curves. This was done by generating 10 light curves each containing the same GP systematic realization as above, but with quadratic limb darkening (LD) coefficients determined using PyLDTk (Parviainen & Aigrain 2015) with a stellar effective temperature of 5500 K, stellar log 10 (g) of 4.42, metallicity (log 10 (Z)/log 10 (Z )) of 0.25 dex, and assuming each bin was 150Å wide starting from 3800 to 5300Å. Each binned light curve had a unique realization of shot noise with a 400 ppm standard deviation. In addition, each binned light curve had an unique systematic added by generating a first order polynomial on dispersion drift, a second order polynomial on cross-dispersion drift, and a first order polynomial on rotator angle. The coefficients of each polynomial on each bin were randomly drawn from a uniform distribution from -0.1 to 0.1. Each polynomial function was standardized (a common practice done by e.g. Evans et al. 2017;Espinoza et al. 2019b;Kirk et al. 2021, etc.) and their product was used as the chromatic systematics. An example of binned light curves and their polynomial systematics can be seen in Figure 24.
The final step in producing the synthetic data was to decompose the light curves into two components, the target light curves and the comparison star light curves. When making the comparison star light curve we take a constant light curve with normalized value of 1 and add a separate draw from the above GP systematics, i.e. using the same auxiliary parameters and kernels, but a different random draw from the GP distribution. Because it is assumed that the systematics affecting the comparison star's light curve should also affect the targets, we multiple this GP draw by the original target light curve. Thus, when the target is divided by the comparison, the effect of the comparison's systematics are completely divided out. This might not happen in real datasets, because the comparision could add additional systematics, but for our synthetic data, we assume any such systematics is included in the initial construction of the light curve (i.e. the first GP draw). The first column of Figure 24 shows examples of synthesized binned target light curves and comparison light curves. The white light curve can then be constructed by combining all 10 bins, and assuming each bin had the same number of counts. When producing all synthetic data we followed these steps 50 times, where each white light curve was produced with a different draw from the GP distribution, unique realizations of shot noise, and unique coefficients for the polynomial systematics on each bin. Figure 22. The transmission spectra of VLT/FORS2 data. The black diamonds with uncertainties is the analysis done by Nikolov et al. (2018), and the cyan dots with uncertainties is our initial analysis of the VLT/FORS2 data with the same wavelength bins and using the GP detrending method. An offset is applied so their mean depths are equal to one another. Each spectrum is a weighted average of observations taken with the B filter (350-617.3 nm) on the night of UT170729 and the R filter (529.3-801.3 nm) on the night of UT170822. The average deviation of each point relative to their uncertainties is 0.28σ, suggesting that the spectral structures are the same. Table 10. The initial transit parameters to construct the synthetic data (column 2), the white-light priors used when fitting the data (column 3), and the binned priors (column 4). The period (P) of 3.42526 days and mid transit time (t0) of 2457963.336499 days were held fixed for all fits. Here 'b' is the impact parameter, which can be transformed to inclination (i) via equation 1 using a/Rs, eccentricity (fixed at 0), and ω (fixed at 90 • ). The limb-darkening parameters were determined using PyLDTk and were different for each bin, what is shown in the table below is the average.

parameter value white-light binned
Rp/Rs 0.1157 uniform: 0.1-0.14 normal: m=(WL fit), σn= 0.05 < q1 > 0.8208 uniform: 0-1 uniform: 0-1 < q2 > -0.0201 uniform: 0-1 uniform: 0-1 b 0.7456 uniform: 0.5-1 Fix to WL fit a/Rs 9.0 uniform: 8-10 Fix to WL fit Table 11. The priors for Exoretrievals and PLATON. These priors were set to allow for a wide parameter space to be surveyed, but contained within physical regimes. Not all parameters were included in each model fit (see Tab. 7). We used 5000 live points for all runs. For further description of the parameters of Exoretrievals, please refer to the Appendix D of Espinoza et al. (2019b). The log cloud absorbing (σ cloud ) parameter was fixed because across all datasets and all models which included aerosols, this retrieved parameter was near -55. As such, we decided to reduce the dimensionality of the explored posterior space by fixing it to this value, effectively turning off clouds. log-uniform -1 to 10 scattering factor log-uniform -10 to 10 haze power law (γ) * uniform -14 to 4 scattering slope (α) ‡ uniform -4 to 14 log cloud absorbing Fixed -55 metallicity (Z/Z ) log-uniform -1 to 3 cross-section (σ cloud ) trace molecules' log-uniform -30 to 0 C/O uniform 0.05 to 2 mixing ratios reference radius factor(f ) uniform 0.8 to 1.2 1 bar, reference radius (R0) uniform 1 to 1.4Rj * This is the exponent of the scattering slope power law, where −4 is a Rayleigh scattering slope. ‡ This is the wavelength dependence of scattering, with 4 being Rayleigh.
This is a factor multiplied by the inputted planetary radius to produce the reference radius, i.e. R0 = f Rp Figure 23. An example synthetic white light curve (black dots) constructed by adding all 10 binned light curves in Figure 24. Given that all of the individual chromatic systematics are smaller relative to the white-light systematics and the transit, we do not see their effect in this light curve. This is exactly what we would see in a true light curve (e.g. see Fig. 2. The red line is the systematics produced by a random draw from a GP distribution created using airmass, full-width at half-maximum (FWHM), and rotational angle as inputs. σw is the residuals of the out-of-transit data from the GP systematic model, which would be 0 if there was no white noise added. Figure 24. The specific binned light curve used to construct the WLC in Figure 23. The left column shows the binned target light curves (blue) and the binned comparison light curves (purple). The target light curve is composed of the transit, systematics common to all bins and the comparions, systematics common to all bins and just the target, and systematics unique to each bin. The LD coefficients used for each bin are printed below the light curves, and were determined with PyLDTk. The right column shows the 'raw' (target/comparison) light curve in black dots. The larger amplitude GP systematics (constant for each bin) is shown as a yellow dashed line, the smaller amplitude polynomial systematics (unique per bin) is shown in red and the combined GP and polynomial systematics is shown as green dashes. The residuals of the out-of-transit data from the green dashed line is printed in the bottom left corner (σw).