Storms or Systematics? The changing secondary eclipse depth of WASP-12b

WASP-12b is one of the most well-studied transiting exoplanets, as its highly-inflated radius and its 1.1 day orbit around a G0-type star make it an excellent target for atmospheric categorisation through observation during its secondary eclipse. We present two new secondary eclipse observations of WASP-12b, acquired a year apart with the Wide Field Camera on the Isaac Newton Telescope (INT) and the IO:O instrument on the Liverpool Telescope (LT). These observations were conducted in the $i^\prime$-band, a window expected to be dominated by TiO features if present in appreciable quantities in the upper atmosphere. We measured eclipse depths that disagree with each other by $\sim$3$\sigma$ (0.97 $\pm$ 0.14 mmag on the INT and 0.44 $\pm$ 0.21 mmag on the LT), a result that is mirrored in previous $z^\prime$-band secondary eclipse measurements for WASP-12b. We explore explanations for these disagreements, including systematic errors and variable thermal emission in the dayside atmosphere of WASP-12b caused by temperature changes of a few hundred Kelvin: a possibility we cannot rule out from our analysis. Full-phase curves observed with TESS and CHEOPS have the potential to detect similar atmospheric variability for WASP-12b and other optimal targets, and a strategic, multi-telescope approach to future ground-based secondary eclipse observations is required to discriminate between explanations involving storms and systematics.


I N T RO D U C T I O N
Secondary eclipse observations are an important tool to characterize the atmospheres of transiting hot Jupiters. The vast majority of secondary eclipse studies have been conducted in the near-infrared (NIR), at wavelengths of 1.1 μm and longer (e.g. Charbonneau et al. 2005;de Mooij & Snellen 2009;Swain et al. 2013). At optical wavelengths and shorter, the flux from hot Jupiters due to thermal emission drops off sharply in contrast to their host stars. With current instrumentation, individual secondary eclipse signals are not detectable in the optical for all but the most heavily irradiated E-mail: mhooton01@qub.ac.uk exoplanets, though the precision afforded by the NIR instruments mounted on the James Webb Space Telescope is expected to be able to detect considerably smaller signals at these wavelengths than is currently possible (e.g. Bean et al. 2018).
For the handful of exoplanets with sufficiently large signals, secondary eclipse observations at red-optical wavelengths are a rich source of information about their compositions and temperature structures. The i -and z bands (with coverage of ∼0.7-1.1 μm) contain prominent titanium oxide (TiO) and vanadium oxide (VO) features, which should dominate the spectra of hot Jupiters if present in appreciable quantities in their upper atmospheres. TiO and VO have long been predicted to be significant opacity sources that could give rise to temperature inversions in the most highly irradiated exoplanets (e.g. Hubeny, Burrows & Sudarsky 2003;Fortney et al. 2008). Despite numerous searches (e.g. Sing et al. 2013;Haynes et al. 2015), high significance TiO detections have only been reported more recently. Nugroho et al. (2017) and Sedaghati et al. (2017) reported detections of TiO at high resolution in the WASP-33b dayside and at low resolution at the WASP-19b terminator, respectively. In the case of WASP-19b, this contradicts non-detections by Huitson et al. (2013) and Espinoza et al. (2019) in the same wavelength range. Tentative reports of TiO and VO in the transmission spectrum of WASP-121b at NIR wavelengths (Evans et al. 2016) were complicated by a recent study that found evidence for VO, but not TiO, in its optical transmission spectrum (Evans et al. 2018).
As observations of secondary eclipses in the i -and z bands probe the spectral energy distributions of exoplanets at wavelengths that are shortward of their peaks, the thermal emission is very sensitive to the precise temperature of the emitting layer. Madhusudhan (2012) predicted that for many hot Jupiters, thermal emission measurements at these wavelengths are good discriminators between carbon-rich and carbon-poor exoplanets. While secondary eclipse observations conducted at z -band wavelengths have been published for nine exoplanets 1 (Sing & Lopez-Morales 2009;Lopez-Morales et al. 2010;Smith et al. 2011;Burton et al. 2012;Gillon et al. 2012;Föhring et al. 2013;Lendl et al. 2013;Zhou et al. 2013;Chen et al. 2014;Delrez et al. 2016;Gaudi et al. 2017;Delrez et al. 2018), only two studies have reported secondary eclipse observations at i -band wavelengths (Mancini et al. 2013;Chen et al. 2014), with the most confident detection reported at 3.7σ .
WASP-12b (Hebb et al. 2009) is the subject of many disagreements regarding the classification of its atmosphere. Its 1.09 d orbit around a G0-type star makes WASP-12b one of the most heavily irradiated known exoplanets, resulting in a dayside temperature of ∼3000 K. This, as well as its highly inflated size, results in some of the largest eclipse depths for any known exoplanet, with secondary eclipse observations of WASP-12b having been carried out across the NIR. Its large scale height also makes it an excellent target to study its terminator using transmission spectroscopy. Madhusudhan et al. (2010) used CFHT and Spitzer secondary eclipse observations (Campo et al. 2011;Croll et al. 2011) to infer that WASP-12b has a high C/O ratio. Later observations cast doubt on this assessment, as Stevenson et al. (2014b) and Kreidberg et al. (2015) found that the emission and transmission spectra of WASP-12b could be well fit by models without a high C/O ratio. Bell et al. (2017) measured a geometric albedo (A g ) < 0.064 for WASP-12b at optical and nearultraviolet (NUV) wavelengths. This suggests that WASP-12b is amongst the darkest known exoplanets, which is contrary to transmission spectroscopy studies that predict significant scattering from high-altitude aerosols (e.g. Sing et al. 2013;Barstow et al. 2017).
In this paper, we present two i -band secondary eclipse observations of WASP-12b that were conducted a year apart. The measured depths significantly disagree, mirroring the results of similar observations conducted at z -band wavelengths (Lopez-Morales et al. 2010;Föhring et al. 2013). We also present an analysis of all reported WASP-12b secondary eclipse depths, in addition to a discussion of whether this could arise due to systematic errors or detectable variability in the WASP-12b dayside. In Section 2, we describe our observations and our data reduction; in Section 3, we describe how we decorrelated the light curves before presenting our results, which show a significant disagreement in the eclipse depths. In Section 4, we discuss the possible reasons for the disagreement and in Section 5, we present our conclusions and outline possible future observational strategies to discriminate between these scenarios.

Isaac Newton Telescope (INT) observations
We observed one secondary eclipse of WASP-12b on 2017 January 15 using the Wide Field Camera (WFC; Ives 1998) with the Sloan Gunn i filter 2 on the Isaac Newton Telescope (INT) on La Palma, using a strategy broadly similar to the one adopted in Hooton et al. (2018). The WFC consists of a mosaic of four CCDs, each with a plate scale of 0. 33 per pixel. We only used the central CCD (CCD4), resulting in a ∼22. 7 by 11. 4 field of view. During the observation, the Moon had an average angular separation of 64 • from WASP-12 and an average illumination of 85 per cent. The seeing was poor and varied between 1-3 arcsec. The observation began at 21:07 UT and lasted ∼7.9 h. During this time, we obtained 264 frames with an exposure time of 60 s and readout time of ∼29 s. WASP-12b was fully or partially occulted by its host star for 103 of these frames. At the very end of the observation, 22 frames were discarded as the low altitude of WASP-12 resulted in vignetting from the lower shutter of the INT.
The INT was defocused during our observation. This strategy minimizes flat-fielding errors, reduces overheads, and makes the point spread functions (PSFs) more robust against the variable seeing that was observed. The resulting donut-shaped PSFs had diameters of 27 pixels (9 arcsec). As the auto-guider for the INT requires the telescope to be in focus, we used a custom code to guide the telescope. This accounts for any drift in the telescope using the science frames. The total drift was less than 9 pixels (3 arcsec) throughout the night, with a root mean square (RMS) of 1.9 pixels. We ensured that WASP-12 and the other comparison stars were positioned on well-behaved parts of the detector.
The frames were reduced using a custom IDL pipeline written by the lead author. Each of the images was bias-subtracted row by row using the mean of the overscan regions at each edge of the CCD, followed by a full-frame subtraction to account for further artefacts present in the bias frames. The row-by-row subtraction addresses an issue where the bias level in the WFC CCDs periodically jumps to different values -a phenomenon that was observed in 43 of the frames. Each of the frames was flat-fielded using twilight flats.
A fringing pattern with an amplitude of ∼5 per cent of the background flux was present in the science frames throughout the observation. Although the position of the fringes on the WFC detector is stable with time, small movements in the target and comparison stars across the detector can introduce systematic errors into the light curves if they cross any fringes. We created a map of the fringing pattern by acquiring dithered frames on a blank field with an exposure time of 60 s, and combined the frames using a clipped median of each pixel.
As the intensity of the fringing pattern in the data varied with time, we needed to scale the fringing map accordingly before subtracting it from each frame. We explored a couple of methods to do this in a robust and consistent way. First, we linearly fitted the whole fringing map to the whole of each science frame with the stars masked. This produced well-corrected science frames for the majority of the time series, but failed to robustly remove the fringes for frames acquired between phases 0.37 and 0.42, where contamination from the Moon caused a gradient across the frames. Instead, we selected a small region of the science frames with a good range in fringe amplitudes, and used this region to fit for the proper offset and scaling for the fringe frame using a linear fit between the fringe map and the science frame, taking care to mask any stars in the region. This approach was significantly less sensitive to the gradient across the Mooncontaminated science frames and robustly removed the fringes for the entire time series.
Finally, we fit and subtracted a second-order two-dimensional polynomial from each of the science frames with the stars masked. This step was performed to correct the small trends in background flux that were a function of their position on the CCD (e.g. de Mooij et al. 2011), including contamination from the Moon that affected ∼40 out-of-eclipse frames. A small gradient was still present in the Moon-contaminated frames that was not removed by higher order polynomial fits. For six frames that centred on WASP-12 crossing meridian, a more complex trend was visible across the CCD, which our polynomial fit was unable to successfully model. For this reason, we excluded these frames from our analysis.
Five bright comparison stars within 5 arcmin of WASP-12b (with V mag = 11. 52, 11.62, 11.74, 12.22, and 13.08) were selected to normalize the target light curve, as stars further away typically introduced large systematic errors into the time series due to the positiondependent effects of the Moon contamination on the detector.
Our method of defocusing the telescope resulted in two nearby background stars (2MASS06303343+2940255 and 2MASS06303255+2940301; referred to hereafter as 2M063033 and 2M063032, respectively) blending with the WASP-12 PSF. The target and comparison stars were centred by picking a bright star in a clear area of the detector, computing the mean x and y positions of the pixels with flux 5σ above the background level in every frame and adding offsets to calculate the positions of the other stars.
We performed aperture photometry using the APER procedure from the IDL Astronomy User's Library 3 on each of the five comparison stars with a radius of 44 pixels. This was sufficiently large to capture 90 per cent of the flux from both background stars so that their contributions to the flux could be quantified and accounted for (see Table A1), and larger aperture sizes resulted in an increased RMS due to the addition of more background flux. The residual sky background for each star was computed in a sky annulus outside the target aperture, which had inner radius of 50 pixels and an outer radius 69 pixels. A range of inner and outer radii were tested, but the changes had negligible effects on the measured background flux.

Liverpool Telescope (LT) observations
We observed one secondary eclipse of WASP-12b on 2018 January 20 using the IO:O instrument with the SDSS-i filter on the Liverpool Telescope (LT; Steele et al. 2004) on La Palma. IO:O in the standard 2 × 2 binning mode is a 2048 × 4056 pixel e2v detector with a plate scale of 0. 30 per pixel, giving us a field of view of ∼10. 24 by 10. 28. During the observation, the Moon had an average angular 3 idlastro.gsfc.nasa.gov separation of 111 • from WASP-12 and an average illumination of 13 per cent. The seeing was also poor and varied between 1 and 3.5 arcsec. The observation began at 23:53 UT and lasted ∼4.7 h. During this time, we obtained 393 frames with an exposure time of 40 s and readout time of 18.5 s. WASP-12b was fully or partially occulted by its host star for 185 of these frames.
The LT was also defocused during the observation. The resulting donut-shaped PSFs had a diameter of 10 pixels (3 arcsec). The autoguider for the LT maintains its own focus independently of IO:O, and was hence suitable for use for these observations despite our defocused PSFs. A drift of 9 pixels occurred in the 25 frames taken immediately after meridian; the drift throughout the remainder of the observations was <4 pixels (1.2 arcsec), with an RMS of 0.8 pixels.
As standard, the science frames are bias-subtracted, overscantrimmed, and flat-fielded using the IO:O pipeline at the observatory. The images were bias-subtracted by subtracting a single value from the entire image based on the overscan regions on either side of the image. These overscan regions were then trimmed, before the frames were each flat-fielded using a master flat constructed from twilight flats. Unlike for the WFC, negligible fringing was observed in the IO:O frames and no correction was performed. When stacking the science frames, a small trend was still visible across the detector after the IO:O reduction, despite the flat-fielding. For this reason, a second-order polynomial was fit to and subtracted from the entirety of each frame individually with the stars masked.
We performed aperture photometry on the same five comparison stars used in the INT reduction using a broadly similar method. As the LT resolves the world coordinate system, this was used to compute the positions of the stars in each frame. We selected target apertures with radii of 54 pixels, and inner and outer radii of 60 and 79 pixels, respectively.

ANALYSIS
The raw target light curves for WASP-12 are shown in the top two panels of Fig. 1. A dip in the raw flux in the INT light curve between phases 0.40 and 0.51, as well as the simultaneous spike in the flux in the sky annulus (see the third panel of Fig. 1), corresponds to a time when thin cirrus clouds were likely passing overhead. The flux of a few star-free sections on the detector was measured, and all show same spike in sky brightness that is visible in the sky annuli of the target and reference stars. As we observed correlations between the flux and the sky background for both the INT and LT data, sky background was included as a component in the baseline model for both observations.
The normalized light curves for each observation were created by dividing the raw target light curves by the total of the five comparison stars, and dividing through to get a median out-ofeclipse baseline of 1. The comparison stars selected had similar magnitudes and colours to WASP-12, to ensure the SNR was maximized and that no significant colour terms were introduced into the normalized light curve. This step removed the visible correlation with airmass (see the bottom panel of Fig. 1) for the INT light curve, but did not fully remove it for the LT light curve. Despite the drift at the start of the LT observation, we did not observe a correlation with the x and y positions of the stars on the detector in either data set. Crossfield et al. (2012) and Bergfors et al. (2013) detected an elongated companion candidate 1. 06 from WASP-12, which further observations by Sing et al. (2013) and Bechter et al. (2014) confirmed to be a close M-type binary that is gravitationally bound to WASP-12. These were unresolved in the PSF of WASP-12 for both of our observations. Prior to fitting for the eclipse depths for each data set, we corrected the light curves for the dilution caused by the two stellar companions of WASP-12, as well as the two background stars in our target apertures. A detailed description of how this correction was performed is given in Appendix A. The corrected light curves are shown in the top panels of Figs 2 and 3.
To extract the eclipse depths, we fit each light curve with an eclipse model using a Markov chain Monte Carlo (MCMC) method with the Metropolis-Hastings algorithm. We used a Mandel and Agol transit model (Mandel & Agol 2002) to model the eclipse, with both limb-darkening coefficients set to zero. Table 1 displays the stellar, planetary, and orbital parameters of the WASP-12A system from Hebb et al. (2009) 4 that were fixed in the MCMC (including the mid-transit time). To quantify the noise associated with the residual flux, we used a wavelet likelihood function (Carter & Winn 2009). This parameterizes the noise in the light curve as a sum of whitenoise (σ w ), and red-noise (σ r ) with power spectral density 1/f γ .
The model used to fit the normalized light curves included components associated with the eclipse depth, a linear function of 4 While updated, more precise values for these parameters are given in Collins, Kielkopf & Stassun (2017), we adopted the values from Hebb et al. (2009)   sky background, σ w and σ r (spectral index γ was fixed to 1), which were allowed to vary with each step. For the LT model, a linear function of airmass was also included. We ran an MCMC chain of 10 6 steps to get the best-fitting values for each parameter, and used the Gelman-Rubin criterion (Gelman & Rubin 1992) to verify convergence. After the first run, all points over 3σ away from the best-fitting model were discarded (these points are shown as grey asterisks in Figs 2 and 3) and another MCMC chain of 10 6 steps was run to determine the final best-fitting values for each parameter.
The best-fitting models for each data set, with depths of 0.97 ± 0.14 mmag for the INT light curve and 0.44 ± 0.21 mmag for the LT light curve (which disagree by ∼3σ ), are shown in red in Figs 2 and 3 with the fit parameters shown in Table 1.

Collective analysis of reported WASP-12b eclipse depths
To interpret our results, we analyse them jointly with all other published secondary eclipse detections of WASP-12b. This is compli-   cated by the fact that WASP-12 (WASP-12A, more strictly) has two M-type companions, and that the previous eclipse measurements were published at different stages in the detection and categorization of these companions (Crossfield et al. 2012;Bergfors et al. 2013;Sing et al. 2013;Bechter et al. 2014). Therefore, we perform our own homogeneous correction on all of the uncorrected eclipse depths reported in each paper. This is described in Appendix A, with the uncorrected and corrected eclipse depths and dilution factors listed in Table A2. To keep the methods of reduction and analysis as consistent as possible, we use the depths reported in Stevenson et al. (2014b) for all HST/WFC3 and Spitzer/IRAC data, but differentiate between them in our discussion by referring to the original studies by Campo et al. (2011), Cowan et al. (2012, and Swain et al. (2013).
To study how well measurements agree in each photometric band, we consider the measured eclipse depths associated with individual observations separately, rather than combining them all into one single depth. These are shown in Figs 4 and 5, along with three blackbody spectra with brightness temperatures (T B ) of 2600 K, 3000 K, and 3400 K. We use these three temperatures as benchmarks with which to compare the implied dayside temperatures of each eclipse measurement. For WASP-12, we used a Kurucz stellar model (Castelli & Kurucz 2004) Table 1). While the temperature-pressure profile of the WASP-12b dayside is not necessarily isothermal at the altitudes probed by the full range of wavelengths that we are considering, it is convenient for us to consider each eclipse in terms of its brightness temperature (which are also listed in Table A2 for each observation) to compare measurements at different wavelengths.
The three z -band eclipse measurements exhibit disagreement on a similar scale to our i -band results. The two depths from Lopez- in line with hotter dayside temperatures, we also note that K-and K S -band observations of other exoplanets have routinely yielded deeper than expected secondary eclipses (e.g. Gibson et al. 2010;de Mooij et al. 2011).
Although the three secondary eclipses detected using the 3.6 μm channel of Spitzer/IRAC all have brightness temperatures within 1.5σ of 3000 K, the high precision achieved in each of the measurements leads to a 2.4σ disagreement between the depths from the observations by Campo et al. (2011) and Cowan et al. (2012). These observations occurred over the course of 2 yr and were separated by at least six months. The four secondary eclipses observed with the 4.5 μm IRAC channel tell a similar story, with a 2.9σ disagreement between the depths from the observations by Cowan et al. (2012) and Stevenson et al. (2014b). These exhibit some of the lowest brightness temperatures of any band, with the observation by Stevenson et al. (2014b) in agreement with 2600 K and the two eclipse depths from the Cowan et al. (2012) phase curve observation within 1.5σ of 3000 K. However, various models have used CO absorption to explain the shallow depths reported in this channel (e.g. Madhusudhan et al. 2010;Stevenson et al. 2014b). The depths extracted from the observations in the 5.8 and 8.0 μm IRAC channels by Campo et al. (2011) result in some of the highest brightness temperatures for any WASP-12b eclipse observation, and are both consistent with 3400 K. It should be noted that the eclipse depths originally reported in Campo et al. (2011) for the 3.6 and 5.8 μm channels come from one eclipse observation, and the eclipse depths for the 4.5 and 8.0 μm channels come from another eclipse observation. The simultaneous observations in the shorter wavelength channels both yield brightness temperatures that are in disagreement with the longer wavelength observations by ∼2σ .
We can conclude that disagreements in secondary eclipse depths for WASP-12b occur at >2σ in every photometric band where multiple observations have been conducted (apart from in the K S band, where two of the three depths had large errors associated with them). However, the disagreements in the 3.6 and 4.5 μm channels could be explained by a systematic under-reporting of uncertainties, as the associated depths were reported at high significance and cover a much narrower range of brightness temperatures than results in the i -and z bands. No published studies to date have been able to formulate models that fit reported depths in all bands across the IR. Along with the disagreements at shorter wavelengths, this presents challenges in interpreting the physical significance of these data. From our analysis, while it is certainly likely that sizeable systematic errors bias some of the results discussed above, we conclude that we cannot rule out the possibility that the observed disagreements are caused by the thermal emission varying by an average of a few hundred Kelvin in the dayside of WASP-12b.

Potential sources of systematic error
It is highly plausible that the observed disagreements in measured eclipse depths could arise due to systematic errors being introduced into the time series at various stages of data acquisition, reduction, and analysis. In this section, we will qualitatively discuss some of the most likely sources of these.
In terms of the data acquisition, secondary eclipses of WASP-12b have been observed using a wide selection of instruments on different telescopes, each of which have their own associated characteristic properties. There are variations in sensitivity between pixels on a detector that flat-fielding does not completely remove. Imperfect telescope guiding leads to the target drifting across the detector, which results in differences in the sensitivities of the pixels that the PSFs are occupying. This could introduce sizeable time-correlated systematics into the resulting light curve. Systematics associated with weather and atmospheric conditions on Earth can be introduced due to secondary eclipse observations occurring at different times and at different locations. This could introduce large systematics into the resulting light curves, especially if the weather is variable on a short time-scale within a given observation.
The i -and z bands contain numerous telluric emission and absorption lines, both of which can be the source of systematic errors. Telluric emission lines due to OH-radicals cause the sizeable fringing patterns on the detector described in Section 2, which, if not robustly corrected for, can introduce their own systematic errors. This is especially true if a target or comparison star crosses a fringe during time-series observations. There is also significant telluric H2O absorption at these wavelengths, although most observations are relatively robust against non-uniform water content in the atmosphere due to the small angular separations between the target and comparison stars. Contamination from satellites, the Moon, and cosmic rays can cause spikes in the observed flux for small sections of the time series if the frames are not corrected or removed from the analysis.
Once the data have been acquired, differences in methods of reduction -including bias subtraction, flat-fielding, and other more specialized steps -can introduce systematic errors. There are many methods used to fit eclipse light curves, with the size of the eclipse and associated errors often highly dependent on which baseline model is used. This issue can arise when two or more different baseline models that result in significantly different eclipse depths are judged to model the residual flux equally well. Observations of partial eclipses, or observations where relatively little out-of-eclipse baseline is observed, are particularly susceptible to this problem. Despite the high significance at which many secondary eclipse depths are detected, many of the systematic errors discussed will not manifest themselves in the reported errors. The fact that the biggest disagreement in eclipse depths for WASP-12b occur in the shortest wavelengths could be indicative of the challenges associated with trying to accurately measure the small signal sizes associated with the thermal emission at these wavelengths.
While a detailed discussion of individual disagreements is beyond the scope of this paper, a common source of such disagreements is differences in methods of reduction and analysis, as reanalysis of data sets by different groups often rules out previously stated conclusions (e.g. Gibson Espinoza et al. 2019). It is considerably harder to quantify and correct for any systematics associated with instruments and weather conditions.

Variability in dayside thermal emission
Another potential explanation of the discrepancies in reported secondary eclipse depths is that they arise due to atmospheric variability in the thermal emission properties of the WASP-12b dayside itself. The most compelling observational evidence for variability in the atmosphere of a hot Jupiter comes from HAT-P-7b. Armstrong et al. (2016) fit for the planetary brightness amplitude A p , secondary eclipse depth F ecl , and peak brightness offset in the phase curves of four years of Kepler photometry of HAT-P-7b. While no significant variation was detected for A p and F ecl , varied between −0.086 and 0.143 in phase (occurring both before and after the secondary eclipse) in a highly time-correlated manner, on a timescale of months. This can be interpreted as the brightest spot on the HAT-P-7b dayside shifting about the substellar point. Armstrong et al. (2016) put forward the explanation of changing wind speeds, which transport clouds that form on the nightside of HAT-P-7b different distances on to the dayside before they evaporate, causing temporal variability in the thermal and reflective properties of the dayside. No significant variability in was observed in the Kepler light curves of Kepler-13Ab; the only other planet within the Kepler field suitable for a similar study to be conducted (Herman et al. 2018). Föhring et al. (2013) suggested that local variations in surface brightness could explain the differences in reported eclipse depths for WASP-12b. In this case, fluctuations of ∼450-750 K relative to a background of 3000 K for 10-20 per cent of the surface would be sufficient to account for the disagreement. Stevenson et al. (2014b) cast doubt on this idea on the basis of the short (∼2 h) out-ofeclipse baseline from which the deep eclipse reported by Föhring et al. (2013) was measured. While the small depth extracted from our LT observation was measured from an even shorter out-ofeclipse baseline (1.7 h), we note that the T B associated with our INT observation (which had ∼3.9 h of out-of-eclipse baseline) is in strong agreement with that of Föhring et al. (2013). This suggests that these disagreements are not solely due to poorly constrained baseline models biasing the reported eclipse depths.
WASP-19b is the only other exoplanet for which multiple zband eclipse depths have been reported. While the depths reported by Burton et al. (2012) and Zhou et al. (2013) are in good agreement, Lendl et al. (2013) reported a shallower depth in ∼2σ disagreement, on the basis of 10 observations over the course of 2 yr. The i -and z bands probe planets bluewards of the peak of their thermal emission, so secondary eclipse measurements in these bands are very sensitive to temperature variations. Further strategic observations of ultrahot Jupiters are required to discriminate between whether these disagreements in red optical secondary eclipse depths are due to systematics or variations in the dayside thermal emission.

C O N C L U S I O N S
We have presented the reduction and analyses of two i -band secondary eclipse observations of WASP-12b. Much like similar observations at z -band wavelengths, the measured depths disagree by ∼3σ . This presents challenges in interpreting the physical cause of these results, with the depths analysed in isolation indicating markedly different things about dayside temperature and presence of TiO. This disagreement could arise either due to systematic errors that manifest in the light curves from which these depths were extracted, or genuine variability in the dayside atmosphere of WASP-12b itself.
Our results demonstrate that 2m-class telescopes are sufficiently large to yield results with the requisite precision to detect significant disagreement between individual eclipse depths at red-optical wavelengths for WASP-12b. Homogeneous reanalysis of published data sets and repeat measurements with different instrumentation (e.g. Stevenson et al. 2014b;Gibson et al. 2017;Alexoudi et al. 2018) are important methods to test whether such disparities arise due to different methods in reduction and analysis. However, the main stumbling block when trying to discriminate between these scenarios is the low statistics on which any conclusions must be based. Future strategies that involve exploiting the reduced timepressure on many 2m-class telescopes could allow many secondary eclipses of ultra-short period exoplanets to be observed in quick succession. The ability to observe a sizeable proportion of these observations (or observations of a similar nature) would facilitate the identification of outlying eclipse depths. The strategy of conducting many observations in quick succession would also allow a search for any time-correlation in the eclipse depths to be identified, on a time-scale similar to the peak flux offset variability observed by Armstrong et al. (2016) for HAT-P-7b. Such a study could be supplemented by observations of the same secondary eclipse by two or more telescopes. This would put to test the very technique of ground-based photometry to observe the secondary eclipses of exoplanets, by investigating whether the same methods of reduction and analysis produce results that agree.
Although there were limited suitable targets within the Kepler field to search for variability in transiting exoplanets, the launch of the Transiting Exoplanet Survey Satellite (TESS; Ricker et al. 2015) presents the opportunity to continue this search for a wide selection of targets across the night sky. Hot Jupiters within its continuous viewing zones (most notably WASP-100) will be excellent targets to perform similar studies to that of Armstrong et al. (2016). Outside these zones, the 27+ days of TESS photometry may be sufficient to identify evidence of atmospheric variability for hot Jupiters with large secondary eclipse depths orbiting bright stars (such as KELT-9b and WASP-33b). Recently, Shporer et al. (2018) demonstrated the capability of this technique to detect the secondary eclipse and phase modulation of WASP-18b based on 54 d of TESS photometry. Finally, the upcoming launch of the CHaracterising ExOPlanets Satellite (CHEOPS; Fortier et al. 2014) will provide further opportunity to observe targeted full-phase curves of any object in the night sky.

AC K N OW L E D G E M E N T S
Special thanks go to Kevin Stevenson for his assistance in the preparation of this manuscript. We thank the anonymous referee for their meticulous feedback, which materially improved our study in many regards. We are also grateful to the staff of the INT and Robert Smith of the LT for their assistance with the described observations. The WFC photometry was obtained as part of programme I/2016B/P6. The INT is operated on the island of La Palma by the Isaac Newton Group of Telescopes in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias. IO:O photometry was obtained as part of programme PL18A05. The LT is operated on the island of La Palma by Liverpool John Moores University in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias with financial support from the UK Science and Technology Facilities Council. This publication makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation.
MJH & SRM acknowledge funding from the Northern Ireland Department for the Economy. CAW acknowledges support from Science and Technology Facilities Council grant ST/P000312/1. NPG acknowledges support from the Royal Society in the form of a University Research Fellowship.

A P P E N D I X A : C O R R E C T I N G F O R C O N TA M I NAT I N G S TA R S
Where applicable, we corrected for contaminating flux from the two stellar companions of WASP-12, as well as two nearby background stars. As well as correcting for this contamination for our data (using the values shown in Table A1), we performed a homogeneous correction for all reported eclipse depths of WASP-12b, which facilitates the direct comparison of different eclipse depths. The dilution factors, uncorrected and corrected eclipse depths, and resulting brightness temperatures are shown for each reported eclipse observation in Table A2. The calculated dilution factors and corrected eclipse depths are broadly within 1σ of the equivalent values from studies that performed similar corrections for WASP-12b eclipse data (Crossfield et al. 2012;Stevenson et al. 2014a,b).
For the two stellar companions of WASP-12, we used the spectral classification of M0V from Crossfield et al. (2012) and Sing et al. (2013). Two other nearby stars (2M063033 and 2M063032) were inside the target apertures for the reduction of two of our data sets, as well as the z -band observation from Föhring et al. (2013). We estimated the spectral types of the these two stars using their 2MASS J, H, and K s magnitudes from Skrutskie et al. (2006), as well as the J − H and H − K S values from Pecaut & Mamajek (2013). We modelled the spectra of WASP-12 and each of the contaminating stars using Kurucz stellar models (Castelli & Kurucz 2004). We then integrated the spectra over the 2MASS J filter and used the 2MASS J magnitudes for each star to calculate dilution factors for each photometric band and spectral wavelength bin for which eclipse depths have been reported for WASP-12b. When analysing our own data, we removed the flux associated with the contaminating stars for each data set by normalizing the light curves so that the median of the out-of-eclipse baseline was equal to 1, subtracting the dilution factor for the associated photometric band (shown in Table A1), and repeating the normalization. For all of the other previous reported eclipse depths, we made the dilution correction using the equation δ corr (λ) = [1 + g(β, λ)α comp (λ)]δ meas (λ), where δ corr (λ) and δ meas (λ), respectively, are the corrected and uncorrected eclipse depths, g(β, λ) is the fraction of the flux of the contaminating star inside the target aperture of size β, and α comp (λ) is the dilution factor (Stevenson et al. 2014b).
The two stellar companions are unresolved from WASP-12 for all data sets, except the HST/STIS spectrophotometry from Bell et al. (2017) which required no correction. We set g(β, λ) = 1 for bandpasses and spectral wavelength bins apart from the Spitzer/IRAC data, where we used values from Stevenson et al. (2014a) and Stevenson et al. (2014b). Table A1. The flux contribution (expressed as a per cent of the flux of WASP-12) from contaminating stars inside the target aperture (see Section 3 for details). * The spectral determination of WASP-12B and WASP-12C is from Crossfield et al. (2012) and Sing et al. (2013 Table A2. The measured depths, along with our calculated dilution factors, corrected depths, and brightness temperatures associated with all of the previously reported secondary eclipse observations of WASP-12b.