A publishing partnership

The following article is Open access

Overfitting Affects the Reliability of Radial Velocity Mass Estimates of the V1298 Tau Planets

, , , , , , , , , , , , , , , , , , , , , , , , , and

Published 2023 July 14 © 2023. The Author(s). Published by the American Astronomical Society.
, , Citation Sarah Blunt et al 2023 AJ 166 62 DOI 10.3847/1538-3881/acde78

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1538-3881/166/2/62

Abstract

Mass, radius, and age measurements of young (≲100 Myr) planets have the power to shape our understanding of planet formation. However, young stars tend to be extremely variable in both photometry and radial velocity (RV) measurements, which makes constraining these properties challenging. The V1298 Tau system of four ∼0.5 RJ planets transiting a pre-main-sequence star presents an important, if stress-inducing, opportunity to observe and measure directly the properties of infant planets. Suárez Mascareño et al. published radial-velocity-derived masses for two of the V1298 Tau planets using a state-of-the-art Gaussian process regression framework. The planetary densities computed from these masses were surprisingly high, implying extremely rapid contraction after formation in tension with most existing planet-formation theories. In an effort to constrain further the masses of the V1298 Tau planets, we obtained 36 RVs using Keck/HIRES, and analyzed them in concert with published RVs and photometry. Through performing a suite of cross-validation tests, we found evidence that the preferred model of Suárez Mascareño et al. suffers from overfitting, defined as the inability to predict unseen data, rendering the masses unreliable. We detail several potential causes of this overfitting, many of which may be important for other RV analyses of other active stars, and recommend that additional time and resources be allocated to understanding and mitigating activity in active young stars such as V1298 Tau.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

1.1. Young Planets as Probes of Formation

Planet formation is an uncertain process. Giant planets are thought to form with large radii, inflated due to trapped heat, then cool and contract over the first few hundred million years of their lives (Marley et al. 2007). However, the accretion efficiency of the formation process, which sets the planets' initial entropies and radii, spans orders of magnitude of uncertainty. The processes sculpting the postformation masses and radii of smaller terrestrial exoplanets are also uncertain. Young, terrestrial planets also have uncertain initial entropies, and for highly irradiated planets, the unknown rate of photoevaporation (itself due to uncertainties in a planet's migration history, among other physical unknowns) during and after formation compounds this ambiguity (Lopez et al. 2012; Owen & Wu 2013; Chen & Rogers 2016; Owen 2020).

Measuring the masses, radii, and ages of newly formed planets presents a path forward (Owen 2020). Young moving groups provide rigorous age constraints, and relatively model-independent methods of measuring planetary radii exist for both young directly imaged and transiting planets (for transiting planets in particular, only stellar-radius model dependencies impact the inferred planetary radius). However, in both situations, few model-independent mass measurements exist. For transiting planets, there are two complementary methods for measuring planetary masses: transit timing variations (TTVs), and stellar radial velocity (RV) time series.

Measuring the RV masses of young planets is a difficult task, so some advocate to rely on TTVs alone to measure the masses of young planets. However, not all planets transit, and only planets in multiplanet systems at or near mean motion resonances exhibit TTVs (Fabrycky et al. 2014). Even in systems that do, individual TTV mass posteriors are often covariant, since TTVs to first order constrain the planetary mass ratio (Lithwick et al. 2012; Petigura et al. 2020). In an ideal scenario, both RVs and TTVs would be used to constrain jointly the planetary masses in a given system, reducing the posterior uncertainty and TTV degeneracies.

1.2. Stellar Activity and Overfitting

As the instrumental errors of extremely precise RV instruments approach 10 cm s−1, and as the RV community begins to target more active stars, accurately modeling astrophysical noise is becoming more and more critical. Young stars present a particular challenge. These are highly magnetically active (Johns-Krull 2007), with starspots that occupy significant fractions of the stellar surface and induce RV variations on the order of ∼1 km s−1 (Saar & Donahue 1997). These RV variations are hundreds of times larger than the activity signals of older quiet stars typically targeted by RV surveys and complicate the detection of planet-induced Doppler shifts from even close-in Jupiter-mass planets (e.g., Huerta et al. 2008; Prato et al. 2008).

Other assumptions and/or information can be leveraged to model the activity signal, even if the signal is not easily understandable from the RVs themselves. A widely used practice involves independently constraining the rotation period from a photometric time series, then using an informed prior on the rotation period to model the RVs (e.g., Grunblatt et al. 2015). Other related examples include specifying a quasi-periodic kernel for a Gaussian process regression (GPR) model (i.e., assuming that the stellar activity has a quasi-periodic form), or modeling the RVs jointly with other data sets. The latter approach achieves better model constraints either by explicitly modeling the relationship between the data sets (e.g., Rajpaul et al. 2015) or by sharing hyperparameters between data sets (e.g., Grunblatt et al. 2015; López-Morales et al. 2016).

As is true for every model-fitting process, misspecifying the stellar activity model (i.e., fitting a model that is not representative of the process that generated the data) or allowing too many effective degrees of freedom can lead to overfitting.

Overfitting is a concept ubiquitous in machine learning, and in particular is often used to determine when a model has been optimally trained. One algorithm for determining whether a model is overfitting is as follows: 23 divide the data into a "training" set and an "evaluation" set (a common split is 80%/20%), and begin optimizing the model using just the training set. At each optimization step, calculate the goodness-of-fit metric for the model on the evaluation set, which is otherwise omitted from the training process altogether. This method of evaluating a model's ability to predict new, or "out-of-sample," data successfully is known as cross-validation (CV).

The classic observed behavior is that the goodness-of-fit metrics for both the training and evaluation set improve as the model fits the training data better and better. At a certain point, the model begins to overfit to the training data, and the goodness-of-fit metric for the evaluation data worsens. This is because the model parameters have begun to reproduce the noise in the training set, at the expense of reproducing the signal common to both data sets. A model that is overfitting, then, can be defined as one that predicts the observations in a training set better than those in an evaluation set. An overfitting model fits aspects of the data that are not predictable or common to the entire data set, e.g., noise.

The optimally trained model is selected not by its performance relative to the training data, but by its performance relative to the evaluation data, which was omitted from the training process altogether. Making an analogy to Bayesian model comparison, we could imagine a similar process where the goodness of fit is evaluated for an "evaluation set" left out of the training process (i.e., posterior computation using a Markov Chain Monte Carlo (MCMC) algorithm, nested sampling, etc.) for a series of models. One benefit of this method over, e.g., formal Bayesian model comparison is that it also provides an easily interpretable absolute metric for how well the model fits the data: if the evaluation set goodness of fit is significantly worse than that of the training set, we know the model is misspecified, even if it has (comparatively) the lowest Bayesian evidence.

In this study, we apply the CV technique as defined above to evaluate the predictiveness of one particular model fit to one particular star. This is intended as a case study, aiming to inspire further investigation into the extent of and causes of overfitting in RV modeling of young, active stars.

1.3. V1298 Tau

V1298 Tauri (hereafter V1298 Tau) is a young system of four ≳0.5 RJ planets transiting a K-type pre-main-sequence (PMS) star (David et al. 2019a, 2019b). Very few transiting planets have been discovered around PMS stars (other notable systems being AU Mic, Plavchan et al. 2020; Cale et al. 2021; Klein et al. 2022; Zicher et al. 2022; K2-33, David et al. 2016; DS Tuc, Benatti et al. 2019; Newton et al. 2019; HIP 67522, Rizzuto et al. 2020; Kepler 1627A, Bouma et al. 2022; and TOI 1227, Mann et al. 2022). David et al. (2019b) reported the discovery of V1298 Tau b, a 0.9 RJ object with an orbital period of 24 days. David et al. (2019a) discovered three additional planets in the system: V1298 Tau c at 8 days, V1298 Tau d at 12 days, and a single-transiting object, V1298 Tau e.

It is unclear from their radii (0.5–1 RJ ) alone whether these planets are gas giants that contracted rapidly after forming, or are young terrestrial or mini-Neptune planets, which will lose a large fraction of their atmospheres to photoevaporation (Owen & Wu 2013) and/or core-powered mass loss (Ginzburg et al. 2018) as they age.

Suárez Mascareño et al. (2021), hereafter SM21, presented over 100 RVs of the system with four different instruments, and used a suite of GP models in an effort to isolate the RV signals of the outer two transiting planets (b and e), finding masses of 0.64 and 1.16 MJ , respectively. Combined with radii of 0.91 and 0.78 RJ , this implied high densities of 1.2 and 3.6 g cm−3, respectively. Such high densities would require rapid contraction within the 23 Myr lifetime of the system and place the outer V1298 Tau planets at the upper density boundary of even mature field exoplanets, where few theories predict planets to exist. Since V1298 Tau e only transited once while observed with K2, SM21 did not have a period constraint from transits, and derived a period of 40.2 ± 0.9 days from their RV measurements. After the publication of SM21, Feinstein et al. (2022) published updated ephemerides of all four planets using TESS photometry, including a second transit of planet e. They placed a strict lower limit of 42 days on V1298 Tau e's period, in tension with the value from SM21. In addition, Tejada Arevalo et al. (2022) performed a dynamical stability analysis using the mass measurements reported in SM21, finding that 97% of the system configurations consistent with the SM21 posteriors are gravitationally unstable over the lifetime of the system.

SM21 performed a rigorous and state-of-the-art analysis, comparing several complex models with Bayesian methods. However, several independent lines of evidence appear to call the masses they report into question: the tension with formation theory discussed in SM21, the updated planet e orbital period of Feinstein et al. (2022), and the improbability of long-term stability derived by Tejada Arevalo et al. (2022).

1.4. This Paper

The purpose of this paper is two-fold: (1) to demonstrate the use of CV to show that the preferred model of SM21 is overfitting the RVs, and (2) to point out several potential causes of the overfitting, which are not unique to SM21 but common throughout the literature. We do not attempt to update the mass estimates for the V1298 Tau planets in this paper. We also do not attempt to prove that the mass estimates published in SM21 are incorrect, but instead seek to call into question their reliability. Our argument is that future joint models of the stellar activity and planetary signals of V1298 Tau will need to prove their predictiveness in order to be trustworthy.

The structure of this paper is as follows: in Section 2, we review the literature data scrutinized in this paper and describe one additional contemporaneous RV data set taken with Keck/HIRES. In Section 3, we demonstrate that the preferred model of SM21 is overfit. Section 4 discusses several potential causes of this overfitting, and advises modelers on how to detect and/or avoid these subtle pitfalls. In particular, Section 4.4 argues that differential rotation is an important effect for V1298 Tau, and must be modeled carefully. We conclude in Section 5. We also provide an Appendix that provides a geometric interpretation of how GPR penalizes complexity. All of the code used to create the plots in this work are publicly available on GitHub, 24 and a copy of these materials is deposited on Zenodo. 25

2. Data

Throughout this paper, we reference several data sets: three photometric time series measured by different instruments and three RV time series derived from spectra measured by different instruments. Each data set is detailed in the subsections below. All of the photometry is shown in Figure 9, and all of the RVs are shown in Figure 1.

Figure 1.

Figure 1. A tour of the RVs scrutinized in this study. The CARMENES and HARPS-N RVs are published in SM21, and the HIRES RVs are new in this study. Takeaway: the RV variability of V1298 Tau is hundreds of meters per second, which is similar across all three instruments. The variability is significantly greater than the instrumental errors (which are included, but too small to see for the majority of points on this plot).

Standard image High-resolution image

2.1. K2 Photometry

We downloaded EVEREST-processed (Luger et al. 2016, 2018) K2 light curves for V1298 Tau using the lightkurve package (Lightkurve Collaboration et al. 2018). We used built-in lightkurve functions to remove NaNs, remove outliers, and normalize the data.

2.2. LCO Photometry

We obtained ground-based LCO photometry verbatim from SM21.

2.3. TESS Photometry

We obtained TESS light curves from Feinstein et al. (2022), who combined time series photometry of V1298 Tau from TESS Sectors 43 and 44. Feinstein et al. (2022) used the 2 minute light curve created by the Science Processing Operations Center pipeline (SPOC; Jenkins et al. 2016), and binned those observations to 10 minutes. We normalized the data for each TESS orbit separately, following Feinstein et al. (2022).

2.4. SM21 RVs

We obtained CARMENES and HARPS-N RVs directly from SM21. We note that SM21 excluded infrared-arm CARMENES RVs in their analysis, and we do the same here.

HARPS-N RVs are wavelength calibrated using a ThAr lamp, and the HARPS-N spectrograph covers 360–690 nm.

The visible arm of the CARMENES instrument covers the spectral range 520–960 nm, and spectra from this instrument are wavelength calibrated using a Fabry–Perot etalon, anchored using hollow cathode lamps.

2.5. Keck/HIRES RVs

Between 2018 November 16 and 2020 February 6, we obtained 36 RVs using the HIRES spectrograph on the Keck I telescope (Vogt et al. 1994). Wavelength calibration was performed by passing starlight through a warm iodine cell, and data reduction was performed using the California Planet Search pipeline described in Howard et al. (2010), which is adapted from Butler et al. (1996). All HIRES RVs used in this study are given in Table 1. Some of these RVs were previously published in Johnson et al. (2022), and the processing is identical in that paper and this. The same stellar template, constructed from two stellar spectra taken on 2019 October 24 UT without the iodine in the light path, was used to derive RVs in both studies. In-transit RVs from that study have been excluded here. Spectra were typically taken using the C2 decker (14'' × 0farcs861), which enables sky subtraction, and is the CPS HIRES observer "decker of choice" for stars fainter than V ∼ 10. However, a CPS HIRES observer "rule of thumb" is to use the shorter B5 decker (3farcs5 × 0farcs861) in poor seeing conditions, as the Doppler pipeline sky subtraction algorithm is unreliable when the stellar point-spread function fills the slit. Sky subtraction is not performed under such conditions. Accordingly, seven RVs published here were calculated from spectra using the B5 decker. In both modes, HIRES has a resolving power of ∼60,000, and the iodine cell spectral grasp translates to contributions to the RVs from wavelengths between 500 and 620 nm (Butler et al. 1996).

Table 1. New HIRES RV Data

BJD_TDB – 2,457,000RV (m s−1)RV error (m s−1)
1438.9463387.477.88
1443.8205−193.328.53
1443.9573−87.967.94
1443.9682−77.177.07
1443.9792−92.678.61
1444.1566−19.408.40
1476.8089190.2811.71
1479.006030.8310.27
1490.7912246.5410.05
1491.75490.318.80
1508.9196−56.339.53
1509.7900163.2412.03
1528.7841−58.179.75
1532.7701−78.1710.47
1559.7316261.449.53
1568.7290−23.729.68
1569.7376−94.9911.02
1723.1354−0.318.62
1724.0917−26.279.65
1733.0739−44.289.66
1743.996537.4210.46
1765.9266−138.4911.11
1774.9154287.779.43
1777.016394.729.40
1787.9817−239.8210.01
1794.869134.798.83
1795.906295.708.25
1796.9165−90.098.83
1797.9532292.268.13
1845.8966375.289.47
1852.7705−259.559.59
1855.8832−68.0816.10
1870.8459318.548.90
1879.8314282.039.12
1880.8314222.228.49
1885.8611306.9519.62

Note. Epochs are reported in units of BJD_TDB, with a 2,457,000 offset applied.

A machine-readable version of the table is available.

Download table as:  DataTypeset image

3. CV Tests

Our intention in collecting additional RVs of V1298 Tau with HIRES was to analyze jointly these data together with literature data and update the masses published in SM21. However, early on in the analysis, we noticed clues that made us question our assumptions. In particular, the new data we had collected did not seem consistent with the models of SM21. In addition, many tested models converged on results that were physically unreasonable or clearly inconsistent with subsets of the data. We ultimately decided to test the predictive capability of the SM21 model that we were using as our starting point, as a check on our own assumptions. This section details the outcome of those experiments.

The main finding of this paper is that the median parameter estimate of the preferred model of SM21 (their 4pPQP2) is overfit. For convenience, we will refer to this model throughout the rest of this paper as the "SM21 preferred model." Showing that a point estimate is overfit does not necessarily indicate that every model spanned by the posterior is overfit. However, since the preferred model presented by SM21 (their Figure 11) appears approximately Gaussian around the MAP estimates of the parameters relevant for us, (except for the kernel parameter C and the white noise jitter for CARMENES, which both peak at effectively 0), we assume that the MAP and median for this fit are close enough to make no difference, and that inferences made about the median fit hold true for other high-probability areas of parameter space.

Our goal was to test the predictiveness of the preferred SM21 model 26 using CV. In an ideal situation, we would do this by evaluating the model's performance on new HARPS-N data, unseen by the trained model. Lacking this, we constructed two ad hoc "validation sets:" a time series of Keck/HIRES data contemporaneous with the SM21 HARPS-N data, and the CARMENES data presented in SM21 (that the model was also trained on, but which were treated as independent from the HARPS-N data; see Section 4). By chance, this results in a nearly perfect 80%/20% split for both validation sets (80.3%/19.7% for HARPS-N/CARMENES, and 78.9%/21.1% for HARPS-N/HIRES). In Figures 2 and 3, we show two visualizations of the results of performing CV on these two validation sets. Figure 2 shows the GP prediction of the SM21 preferred model, together with the HARPS-N data on which it was trained and conditioned. The contemporaneous HIRES data and CARMENES data and their residuals are overplotted. Figure 3 shows the residuals of this fit, given in terms of standard deviations from the mean GP prediction. In both figures, the residuals of the HIRES and CARMENES data have a much wider spread about 0 than the HARPS-N points. Because our intention was to evaluate the existing model, we did not retrain the GP hyperparameters in order to compute the prediction shown in Figure 2. Rather, we used the median parameters of the SM21 4pPQP2 model, conditioned on the HARPS-N data published in that study, to predict RV values at each of the CARMENES and HIRES epochs.

Figure 2.

Figure 2.  SM21 preferred model prediction and contemporaneous observed data. The HIRES data have been scaled and offset by linear parameters that minimize the residual spread with respect to the GP model, and the median 4pPQP2 CARMENES data RV zero-point value was been applied in order to compare both data sets more easily with the model expectations. Top: mean model prediction (gray solid line), together with contemporaneous HARPS-N (black), CARMENES (red), and HIRES (purple) RVs overplotted. Bottom: model residuals, together with 1 and 2σ GP uncertainty bands (shaded dark and light gray regions, respectively). Takeaway: the preferred SM21 model is overfitting the HARPS-N data, which can be seen in the increased spread about the residual = 0 line for both HIRES and CARMENES data during epochs with contemporaneous HARPS-N data.

Standard image High-resolution image
Figure 3.

Figure 3. Another visualization of Figure 2. Histograms of the RV residuals, given in units of standard deviation from the mean prediction. Takeaway: the broader and more uniform distributions of HIRES and CARMENES residuals relative to the HARPS residuals is another hallmark of overfitting.

Standard image High-resolution image

Our interpretation of the difference in residual distributions shown in these two figures is that the preferred SM21 model fits data included in its training set (i.e., the HARPS-N data) significantly better than contemporaneous data not included. In other words, the model is not predictive. This is a hallmark of overfitting, and indicates that the preferred SM21 model is not representative of the process generating the data.

An important counterinterpretation is that the V1298 Tau RVs measured by HARPS-N, HIRES, and CARMENES show different activity signals, and not that the preferred SM21 model is overfitting. In particular, starspots cooler than the stellar photosphere cause the RVs collected in redder bands, where the contrast between spot and photosphere is lower, to show lower variability amplitudes (e.g., Carpenter et al. 2001; Prato et al. 2008; Mahmud et al. 2011). In addition, we expect different instruments to have different RV zero-point offsets. Importantly, these two effects cannot explain the increased out-of-sample residual spread observed in Figures 2 and 3; 27 the preferred SM21 model fitted the CARMENES zero-point offset, white noise jitter value, and activity amplitude, and those values have been applied to the CARMENES data. To account for the potential differences between the HIRES and HARPS-N RVs, we applied an RV zero-point offset and scale factor (0.76) that minimizes the residual spread (i.e., we applied a best-fit linear model to the HIRES data in order to minimize χ2 with respect to the GP model prediction). See Section 4.1 for further discussion of this point.

Another potential explanation for the phenomenon observed in Figures 2 and 3 is that the activity signals observed by HARPS-N, CARMENES, and HIRES are fundamentally different; i.e., the signal observed by one instrument is not a linear combination of the signal observed by another. This might occur because, for example, all three instruments have ∼1 km/s instrumental systematics relative to one another, or because the shape of the activity signal changes significantly with wavelength. To rule out this explanation and provide more evidence that the effect we are seeing is actually overfitting, and not instrument-specific differences, we repeated the experiment above using only HARPS-N data. We randomly selected 80% of the HARPS-N data published in SM21, conditioned the preferred SM21 model on that subset, and computed the residuals for the random "held-out" 20%. The results are shown in Figures 4 and 5. Even though these held-out 20% were included in the training process (i.e., they informed the values of the hyperparameters), we observed substantially larger residuals than for the conditioned-on subset. This experiment provides additional evidence for overfitting, and not instrumental- or wavelength-dependent systematics.

Figure 4.

Figure 4. Same as Figure 2, except that the model prediction is computed by conditioning on a randomly selected 80% subset of the HARPS-N data, as described in the text; the residuals are computed for the 20% subset that was held out. Takeaway: the effect seen in Figure 2 cannot be explained by instrument- or wavelength-dependent systematics, because the same larger residuals are seen within the data taken by only HARPS-N.

Standard image High-resolution image
Figure 5.

Figure 5. Another visualization of Figure 4. Same as Figure 3, except computed using the same method as for Figure 4. Takeaway: the larger and more uniform spread of residuals for the HARPS-N data on which the model was conditioned provides more evidence that the preferred SM21 model is overfitting.

Standard image High-resolution image

It is worth noting that we distinguish between residual distributions (Figures 4 and 3) "by eye" in this paper, but this technique will not generalize for more similar residual distributions. Residual diagnostic tests (see Caceres et al. 2019 for an example) will be helpful in generalizing this methodology.

4. Potential Causes of Overfitting

This section points out several potential causes of the overfitting described in the previous section, and advises on how to detect and/or ameliorate these effects. We do not attempt to quantify the effect of each of these on the overfitting discussed in the previous section, but intend this as a qualitative discussion. Many of these effects are potentially relevant for stars other than V1298 Tau.

Importantly, this is not a list of "mistakes," but a list of assumptions we questioned throughout the process of trying to explain why the preferred SM21 fit was overfitting. We encourage future close investigation of each of these phenomena, both for V1298 Tau and other objects. This list is not exhaustive.

4.1. Correlated Data Sets versus Data Sets that Share Hyperparameters

The mathematical formalism in this section is essentially identical to that of Cale et al. (2021; see their Section 3.2), but was developed independently. We encourage readers to compare our explanations, and we ask readers to also cite Cale et al. (2021) whenever referencing Section 4.1 of this paper.

There is a difference between correlated measurements that are allowed to have different GP amplitudes and data sets that share GP hyperparameters but are themselves uncorrelated. We are motivated to stress this distinction by the need in RV time series fitting to write down the joint likelihood of a model applied to data sets taken from several different instruments. As a concrete example, let us consider three fictional RV data points, the first two from HIRES and the next one from CARMENES, to which we would like to fit a GP model. Because of the different bandpasses of HIRES and CARMENES, we might expect the same stellar activity signal to have a different amplitude when observed by these two instruments. However, we might expect the time characteristics of the signals to be identical. In other words, we expect the CARMENES activity signal to be a scalar multiple of the HIRES activity signal. 28 As discussed in Section 3, this assumption is borne out, at least to first order, in observations of other active stars at different wavelengths (see, e.g., Mahmud et al. 2011, who investigated the RV activity of the T-Tauri object Hubble I 4 with contemporaneous infrared and optical spectra taken with different instruments), but this point warrants further scrutiny. Comparing the variability of active stars with different instruments, as well as the variability of the Sun with different solar instruments, is an important endeavor.

Another important caveat is the use of different techniques for computing RVs from stellar spectra (e.g., the iodine/forward-modeling technique of HIRES versus the simultaneous reference/CCF technique of CARMENES and HARPS-N). Switching from one of these techniques to another is not expected to affect an astronomer's ability to recover common Keplerian signals, but spot activity is not a simple Doppler shift. More work is needed to understand and model spot activity at the spectral level. We proceed by assuming that modeling the same spectrum using an iodine/forward-model and with a simultaneous ThAr lamp reference (as an example) will only change the effective wavelength range of the spectrum that is used to compute the RV, and therefore affect only the amplitude of spot-induced variations.

Assuming linearly related GPs for different instruments, we can write down the joint covariance matrix for our three fictional data points, allowing unique amplitude terms aC and aH for each data set, and assuming an arbitrary kernel function ki,j describing the covariance between RVs at times ti and tj :

Equation (1)

Optimizing the hyperparameters of a fit that uses this covariance matrix to define the GP likelihood will give the desired result.

SM21, following many other fits in the literature, constructed an independent covariance matrix for each RV instrument in their data set and summed the log(likelihoods) given by these together. This allows each RV data set to be independent; i.e., a data point taken by HIRES is not correlated with a data point taken at exactly the same time by CARMENES. Figures 11 and 12 illustrate the difference between these two likelihood definitions for data for a different object (chosen because it is easier to see the effect using this data set).

This assumption of independent data for each instrument effectively adds additional free parameters to a model, and makes it more susceptible to overfitting. This is also why, in Figures 2 and 3, we could demonstrate that the preferred SM21 model was overfitting by comparing the model prediction conditioned on HARPS-N data to the CARMENES data; the CARMENES data influenced the final values of the hyperparameters, since they were shared between the two GPs, but otherwise the data sets were treated as independent.

 To illustrate the effects discussed in this paper, we used a modified version of radvel (Fulton et al. 2016), built on tinygp (Foreman-Mackey et al. 2022), that treats the models for different instruments as correlated, but allows each instrument its own GP amplitude, white noise jitter term, and RV zero-point offset term. 29 The difference between the previous version of radvel and this version is also illustrated in Figures 11 and 12 in the Appendix. 30

Future work should continue to test this assumption by obtaining simultaneous (or near-simultaneous) RVs for a variety of stellar types with different instruments, across a wide range of bandpasses.

4.2. Prot and Prot/2

Another practice that may have made the SM21 preferred fit susceptible to overfitting involves constructing a GP kernel with one term at the rotation period and another term at its first harmonic. In other words, the SM21 preferred model kernel has the following form:

Equation (2)

To understand the motivation for this, we first need to scrutinize the RV signal in Fourier space. Figure 6 shows a Lomb–Scargle periodogram of all RV data presented in SM21, zooming in on two important parts of the period space. There are four extremely significant peaks in the RVs, which can all be explained with a single periodic signal at 2.91 days, the rotation period identified by SM21. Along with a strong peak at 2.91 days (hereafter Prot), there is a signal at Prot/2, which is often observed in RVs of stars showing starspot-induced variability (Nava et al. 2020). The other two strongly significant peaks can be explained as 1 day aliases of Prot and Prot/2. In other words, the dominant RV signal is periodic, but requires a two-component sinusoidal fit (i.e., it needs more terms in its Fourier expansion) in order for the fit to reproduce the shape of the curve. This is visualized in Figure 7, which shows the RVs phase folded to Prot. In summary, the RV curve comprises a single periodic pattern, but that pattern is not a simple sinusoid.

Figure 6.

Figure 6. Lomb–Scargle periodogram of all RV data presented in SM21, and two-component sinusoidal fit passed through the same window function. Top: periodogram of all RVs (solid purple line) and a two-component sinusoidal fit to the data (filled gray). Middle/bottom: same, but zoomed in. The rotation period, its harmonic, and its 1 day aliases are labeled. Takeaway: the dominant Lomb–Scargle periodogram structure can be explained as harmonics and aliases of a single period at 2.91 days.

Standard image High-resolution image
Figure 7.

Figure 7. HARPS-N RVs and contemporaneous LCO photometry from SM21, phase folded to the rotation period and colored by observation time. Top: LCO photometry. Bottom: HARPS-N RVs, with fitted jitter values from the preferred SM21 fit added to the error bars. One- and two-component sinusoidal fits are also shown. Takeaway: the presence of a strong periodogram peak at Prot/2 results from the higher-order shape of the RV rotation pattern. This pattern is not present in the LCO photometry, which is approximately sinusoidal over the rotation period.

Standard image High-resolution image

The preferred SM21 model kernel sums two approximately quasi-periodic terms, one at Prot and one at Prot/2, because the approximate quasi-periodic kernel used in SM21 (Equation (1), derived in Foreman-Mackey et al. 2017) is less flexible than the standard quasi-periodic kernel (SM21, Equation (3)). In other words, the approximate kernel is less capable of fitting nonsinusoidal shapes. However, each term was modeled with its own independent exponential decay timescale. This adds an additional free parameter to the fit, which exacerbates the potential for overfitting.

The most straightforward way to address this is to construct a model with fewer unnecessary free parameters, for example by equating the parameters L1 and L2 in SM21's Equation (1). A more complicated suggestion, which would be an excellent avenue for further study, is to leverage the correlation between the photometry and RVs, following, for example, Rajpaul et al. (2015). This requires assuming (or fitting for) a relationship between a photometric data point and an RV data point at the same time. Our preliminary investigations along these lines indicate that the FF' formalism, which models an RV signal as a function of a simultaneous photometric (F) data set and the time derivative of the photometric data set (F'; Aigrain et al. 2012), does not allow for a good phenomenological match between the LCO photometry and the contemporaneous RVs, but the derivative of the LCO photometry appears to fit better (i.e., it appears possible to model the RV curve as a linear combination of the F' component only). 31 Future work could write down a joint GP formalism that models RVs as the time derivative of the photometry (such a formalism would be very similar to that of Rajpaul et al. 2015).

Regardless, in order to be confident in the relationship between the photometry and the RVs, as well as to pick out the components of the RVs that do not occur at Prot, we suggest a very-high-cadence (several observations per night) RV follow-up campaign with contemporaneous photometry 32 in order to develop a high-fidelity model of the stellar variability. 33 It is important to note that this campaign need not be performed by an RV instrument with 30 cm s−1 precision; Johnson et al. (2022) demonstrated 6–7 m s−1 rms precision with HIRES over several hours, even though the stars moves by hundreds of meters per second over even a single night. This level of instrumental RV error should be sufficient to understand the stellar activity, so long as the cadence is as high as possible.

4.3. Keplerian Parameters Enable Overfitting in the Presence of Unmodeled Noise

A Keplerian signal has five free parameters (semiamplitude, eccentricity, argument of periastron, time of periastron, and period). A model with two Keplerian signals therefore has 10 additional free parameters than a model without. To first order, more free parameters means more model flexibility. This problem can be addressed using model comparison, which penalizes complexity. However, if there is unmodeled noise in the data, including additional Keplerian signals in the model can lead to overfitting; for example, high-eccentricity Keplerian models have similar properties to delta functions, which have relatively "flat" RV curves, except for a spike in RV near periastron. With insufficient sampling, outlier data points can be overfit with eccentric Keplerian signals.

A common worry in the RV modeling community is that using GPR to model stellar activity will "soak up" Keplerian signals, leading to underestimates of Keplerian RV semiamplitudes (discussed in Aigrain & Foreman-Mackey 2022), even when modeled jointly. However, we find evidence for the opposite effect in the SM21 preferred fit: that the Keplerian signals function as extra parameters that make the model susceptible to overfitting, and the GP is forced to compensate. Examining Figure 8, which shows the contributions to the mean model prediction from the Keplerians and the activity-only portion of the mean GP model, 34 we find that the activity model interferes with the Keplerian model where RV data exist. This is seen most readily when smoothing the activity model over several rotation periods (effectively low-pass filtering the activity model).

Figure 8.

Figure 8. Smoothed activity-only component of the preferred model of SM21, together with the Keplerian model components. Top: 100 random draws from the posterior describing the planet b Keplerian are also shown, to illustrate that this effect holds true across the posterior, and not simply for one point estimate. The light gray solid line shows the full activity-only model component, and the darker gray shows this model averaged over a (randomly chosen) 11.2 day timescale. (Note that the same pattern holds when choosing a slightly different smoothing timescale; i.e., this is not a result of aliasing.) Shaded gray regions indicate where there are observations. Bottom: same as the top, but with a zoomed-in y-axis. Takeaways: the activity-only component changes suddenly in windows of time where there are observations. When the activity-only component is averaged over shorter-timescale variations, the GP contributes to the fit on timescales similar to the Keplerians, even interfering destructively at some times. This casts doubt on the reality of the Keplerian signals reported in SM21, indicating that they may be favored because of overfitting.

Standard image High-resolution image

We can explain this behavior by imagining that there is some unmodeled noise source in the data that is inconsistent with Keplerian motion or quasi-periodic variability (see the next section). If some unphysical combination of parameters fits the data better at an epoch with many data points that is affected by this noise source, this may outweigh the negative Bayesian evidence contributions from (1) the added complexity and (2) the worse fit at epochs with fewer data points. We would then expect the Keplerian model to oversubtract at epochs with fewer data points (e.g., around JD = 1725 in Figure 8).

This effect suggests that the Keplerian signals in the SM21 preferred fit are not a viable description of the RV variability at timescales greater than the rotation period. More effort certainly needs to be spent understanding this phenomenon, but in the meantime we suggest performing CV tests in order to detect overfitting of this nature.

4.4. Differential Rotation

The previous subsections all argue that the preferred SM21 fit had too many free parameters (or effective free parameters) that allowed the model to overfit. In other words, we have argued that a simpler model (one for which the GP predictions for each instrument are scalar multiples of each other, a single period is present in the kernel, and no Keplerian signals are present in the model) would be more predictive, albeit perhaps with larger uncertainties. In this section, we suggest that this much simpler proposed model is still insufficient, because the host star has multiple, differentially rotating, active regions.

Differential rotation may not be the unmodeled noise source that we propose is affecting the SM21 preferred fit. The conclusions of this paper do not change if this is true. We discuss it here because it is potentially widely relevant, especially for young stars. We call for more work on modeling and understanding differential rotation in RVs.

4.4.1. Evidence for a Strong Differential Rotation Signal from Photometry

In the K2 and TESS photometry of V1298 Tau (Figure 9), two periodic signals of different amplitudes are visible by eye. These peaks are coherent in phase toward the end of both baselines, producing a larger overall photometric variability amplitude. Although each baseline covers only a portion of the beat periods implied by these different periods coming into and out of phase, the beating "envelope" is still easily distinguished. To guide the eye, we overplotted the shape of the beating envelope formed by the three dominant periods in the Lomb–Scargle periodogram of the K2 data.

Figure 9.

Figure 9. A tour of the relevant photometry of the star V1298 Tau. Panel (a): detailed view of the K2 photometry (purple points), with a beating envelope overplotted in solid pink. The beating envelope is drawn to illustrate the effect of spot beating on the overall variability amplitude, not to fit the data precisely. The envelope drawn is constructed from the beating of three sinusoids at 2.70, 2.85, and 3.00 days. Signatures of beating can be seen by eye: two peaks of different amplitudes phase up toward the end of the K2 baseline, producing a single-peaked variability pattern and a larger overall variability amplitude. Panel (b): detailed view of the TESS photometry (purple points). Beating characteristics are also visible, although the baseline is shorter than that of K2. Panels (c), (d), and (e): relative views of K2, LCO, and TESS photometry, emphasizing the relative temporal baselines and variability amplitudes. A typical error bar for each data set is also shown in the bottom left corner of each panel. The differences in wavelength coverage and flux dilution between the K2, LCO, and TESS photometry largely account for the overall differences in amplitude of the signals. Both the K2 and TESS data cover less than one complete beat period of the two largest-amplitude periodic signals, but the LCO photometry (which is contemporaneous with the RVs of SM21) covers a longer temporal baseline. Panel (e): all photometry, plotted on the same panel to emphasize the relative time elapsed between each data set. Takeaway: differential rotation effects are visible by eye in both the K2 and TESS data sets.

Standard image High-resolution image

Multiple closely related periodicities are also apparent in the periodograms of the K2 and TESS data (and the LCO data, albeit at lower significance, potentially due to the lower cadence of that data set; Figure 10). In particular, over both the K2 and TESS baselines, dominant periodicities at 2.85 and 2.92 days, respectively, and two less prominent periodicities (one at a larger period, and one at a smaller period) are present. The multiple periodicities in the light curve, visible both in the shape of the beating envelope and in Fourier space, have often been interpreted as smoking gun evidence of differential rotation (see, e.g., Lanza et al. 1994; Frasca et al. 2011). It is important to note, however, that short spot lifetimes may also produce the observed photometric pattern, and have been shown in simulations to be easily confused with differential rotation (see, e.g., Basri & Shah 2020). Longer photometric temporal baselines than are available in the photometric data presented in this paper are needed to distinguish between the two. The conclusion of this section (that there is a noise source visible in photometry that is unmodeled in the SM21 preferred model) would remain unchanged in this case, but this interpretation has important implications for future modeling efforts. That the signals arise from a close binary is ruled out by the multiple nearby periods in the light curve (rotation of two tidally extended binary stars can produce a similar pattern, but with a single period), while astroseismic pulsations are ruled out by the amplitude and period of the variability; V1298 Tau is a 1.2 M PMS star with $\mathrm{log}g$ = 4.48 (SM21), which we would expect to be oscillating on scales of minutes and ≲1 ppt, not days and 20 ppt (Chaplin & Miglio 2013; see their Figure 3).

Figure 10.

Figure 10. Lomb–Scargle periodograms of the photometric data shown in Figure 9. Top: zoom in on the presumed rotation period, showing several nearby peaks in all three data sets. Bottom: same as the top over a wider period range. Takeaway: multiple closely related periodicities are visible in Fourier space for all three photometric data sets, providing more evidence for differentially rotating active regions.

Standard image High-resolution image

4.4.2. Effect on RVs

Assuming that V1298 Tau is differentially rotating, it is possible that the combination of a multiply periodic structure with insufficient cadence is leading the GP to prefer a more complex model. In other words, the data are not consistent with a quasi-periodic structure, so a simple quasi-periodic model will not be preferred over a more complex model (e.g., one with Keplerians at longer periods), even if neither is predictive. Even a secondary active region with 5% the RV amplitude of the primary structure (reasonable given the photometric amplitude ratios) would incur an RV variability of 20 m s−1, significantly greater than the instrumental floors of HARPS-N, CARMENES, and HIRES.

An important clarification is that this conclusion is consistent with the discussion in Section 4.2. Although there is a clear periodic 2.91 day signal visible in Figure 7, there is also ∼200 m s−1 of scatter around this signal. It is possible that this scatter may contain coherent signals at other periods that are unresolvable with the current RV cadence.

Complicating this already complicated story is the fact that the dominant periodicity appears to change over time (Figure 10). This provides further motivation for our major recommendation, first given in Section 4.2: V1298 Tau appears to be a multiply periodic star with evolving periodicity. A high-cadence (several data points per night) RV campaign is necessary to construct a high-fidelity activity model. The high cadence is necessary to resolve the close periodicities due to apparent differential rotation. Care should be taken to ensure that the periods do not evolve significantly over the observational baseline, or that this effect is sufficiently modeled.

5. Summary and Discussion

In this study, we have presented evidence that the preferred model of SM21 is overfitting using two ad hoc "validation" data sets: one set of contemporaneous HIRES and CARMENES data, and one set of artificially held-out HARPS-N data. The effects that we have proposed may be responsible for the nonpredictiveness of the preferred SM21 model are:

  • 1.  
    The RV data sets from different instruments are treated as uncorrelated, allowing the model more freedom.
  • 2.  
    The SM21 preferred model includes two summed quasi-periodic terms at Prot and Prot/2 in their kernel, each with its own free exponential decay parameter. This additional free parameter grants the model unnecessary flexibility.
  • 3.  
    The SM21 model also includes parameters describing eccentric Keplerian signals, which grant even more degrees of freedom.
  • 4.  
    We find evidence from multiple independent photometric data sets that this star has a strong differential rotation signal, indicating that a singly (quasi)-periodic activity model is insufficient. This explains why more complex models were favored over simpler models in SM21, even though the preferred model fell victim to overfitting.

The first point, in particular, warrants further scrutiny for stars across a range of ages and spectral types. We argued in Section 4 that RV data sets taken by instruments with different bandpasses and calculated using different RV extraction techniques should be linear combinations of each other, recapitulating the observation made in Cale et al. (2021), but this assumption may not be true. Contemporaneous RV data sets made by different instruments will help test this assumption.

These authors have devoted significant person power and computer power to producing a fit to the data presented here that takes into account all of these effects. However, we have found that jointly fitting all the data using only a single rotation period forces all of the instrumental GP amplitudes to 0. We interpret this as evidence that a singly (quasi)-periodic GP model is incapable of fitting the data (i.e., a more complex model is needed), and differential rotation provides a ready (but not sole) explanation. However, the differential rotation effects are very complicated to disentangle with the current data set. 35 Again, we suggest a high-cadence RV campaign to resolve the multiple, nearby periodicities in the RVs and construct a high-fidelity model.

One important detail to note is that the GP kernel which best fits a highly active, rapid-rotator like V1298 Tau may be wholly inappropriate to fit the activity signal of an older, quieter, Sun-like star. In young, rapid rotators, the activity signal is relatively long lived, often stable across several observation epochs (e.g., Yu et al. 2019a; Carvalho et al. 2021).

On the other hand, Sun-like stars have much shorter-lived spots, sometimes evolving over the course of one or two week observing campaigns (Giles et al. 2017; Namekata et al. 2019). A GP kernel describing the activity of Sun-like stars should be more flexible, allowing for more rapidly changing and decaying signals. While a single kernel may be capable of spanning these regimes of period evolution, the attempt to construct one should be made with caution. For the time being, the best approach may be to treat the two regimes of activity with unique kernels.

This analysis is imperfect and incomplete. Many of the effects we have discussed are subtle, and we encourage others to study them further. This analysis has also evolved (quite a lot) over the preparation of this study.

There are many exciting follow-up avenues for the V1298 Tau system. First, an independent determination of the planet masses with TTVs would be enormously helpful in providing a "check" for RV modelers. Second, we believe it is worthwhile to explore modeling frameworks for V1298 Tau that explicitly model the relationship between contemporaneous photometry, activity indices, and multiple RV data sets. These frameworks (such as those of Rajpaul et al. 2015 and Cale et al. 2021) move beyond sharing hyperparameters between contemporaneous photometric and RV data sets and allow a function of one data set to be directly correlated with the other, decreasing the overfitting potential. In the longer term, comparing or jointly modeling these data with Doppler tomographic information and spectrum-level measurements, as in Yu et al. (2019a), Finociety et al. (2021), and Klein et al. (2022), will provide even stronger constraints.

In addition to working toward an optimal physical model of all available data, it is worth investigating alternative statistical modeling pathways to GPR, especially low computational cost techniques like autoregressive moving average (ARMA) models (Durbin & Koopman 2001; Feigelson et al. 2018). ARMA models treat the ith data point as a linear combination of past data points and model residuals, and "training" involves optimizing the linear coefficients. Directly comparing models constructed with ARMA and GPR would be a worthwhile exercise in general for data sets containing stellar activity, and in particular for young, active stars.

We believe that understanding the RV variability of young stars is an endeavor that will pay dividends in the near future. The relative long-term stability of activity on young stars allows for detailed study of a given spot geometry and its impact on both photometric and spectroscopic observations across multiple bands. As we work to understand how to best fit activity with GPs, young stars, particularly WTTSs, provide good laboratories on which to test our techniques.

Just as we validate the performance of a new instrument on stars with large, well-studied Keplerian signals, we must, as a field, validate the performance of our activity-modeling techniques on stars with large, well-studied activity signals before we can trust activity models applied to Sun-like stars at 30 cm s−1 precision. 36 This starts by allocating resources to the construction of high-cadence RV data sets of young stars, and continues by studying (a) the relationship between RVs and auxiliary data, such as photometry and activity indices, (b) the best phenomenological models (kernels, etc.) for the data, (c) the best methods for validating a given model's accuracy, and (d) the cadence needed to resolve periodic signals (and combinations of signals). We believe that these studies, on young stars, will pave the way for stellar activity models with 30 cm s−1 predictive capability, on which the characterization of Earth 2.0 depends.

Acknowledgments

S.B. wishes to thank first and foremost Alejandro Suárez Mascareño for constructive and helpful thoughts throughout the preparation of this study. S.B. also wishes to thank the small army of people who shaped this analysis through conversation: Jason Wang and his research group, Heather Knutson's research group, the folks at the Flatiron CCA, the University of Michigan Exoplanet Journal Club, the UC Riverside Astrobiology Seminar group, Johanna Teske and the astronomers of the Carnegie Earth and Planets Laboratory, Jéa Adams, Kim Paragas, Shreyas Vissapragada, Ward Howard, and Roberto Tejada Arevalo. All of the authors thank both the anonymous referee and the anonymous statistics editor for helpful comments that made us further question our assumptions and improved this work.

J.M.A.M. is supported by the National Science Foundation Graduate Research Fellowship Program under grant No. DGE-1842400. J.M.A.M. acknowledges the LSSTC Data Science Fellowship Program, which is funded by LSSTC, NSF Cybertraining grant No. 1829740, the Brinson Foundation, and the Moore Foundation; his participation in the program has benefited this work. T.H. is supported by JSPS KAKENHI grant Nos. JP19K14783 and JP21H00035.

This research was enabled by the following software: numpy (Harris et al. 2020), Lightkurve, a Python package for Kepler and TESS data analysis (Lightkurve Collaboration et al. 2018), pandas (McKinney 2010), matplotlib (Hunter 2007), scipy (Virtanen et al. 2020), astropy (Astropy Collaboration et al. 2013, 2018, 2022), jax (Bradbury et al. 2018), george (Ambikasaran et al. 2015), celerite (Foreman-Mackey et al. 2017), tinygp (Foreman-Mackey et al. 2022) 37 , and radvel (Fulton et al. 2018).

S.B. wishes to acknowledge her status as a settler on the ancestral lands of the Gabrielino/Tongva people, and to recognize that the astronomical observations described in this paper were only possible because of the dispossession of Maunakea from Kanāka Maoli. We seek to work toward a scientific practice guided by pono and a future in which we all honor the land.

Appendix: GPs and Occam's Razor

Many introductions to GPR (e.g., Aigrain & Foreman-Mackey 2022) mention that the GP likelihood has an "Occam's razor" term built in that penalizes complexity. This section briefly reviews GPR, then outlines a geometrical interpretation of the complexity penalty in order to further the reader's understanding.

Figure 11.

Figure 11. Demonstration of the impact of constructing separate covariance matrices and adding the log(likelihoods). Compare with Figure 12. The data and best-fit parameters are for K2-131, published in Dai et al. (2017), for demonstration purposes only. Top: GP mean prediction (black solid line) and 1σ uncertainties (purple filled), together with the HARPS-N data points on which the GP is conditioned (purple points). Middle: same as the top, but for PFS data. Bottom: residuals with respect to the GP mean prediction. Takeaway: when separate covariance matrices for each RV instrument are used, contemporaneous data are uncorrelated in the model, allowing additional degrees of freedom.

Standard image High-resolution image
Figure 12.

Figure 12. Same as Figure 11 (in particular, using the exact same data and GP hyperparameters), but here a single covariance matrix is constructed, following the suggestion in Section 4.1. Takeaway: constructing a single covariance matrix requires that GP predictions for separate instruments are scalar multiples of one another, which is more consistent with physical expectations and results in a more constrained model than one with a separate covariance matrix for each instrument.

Standard image High-resolution image

A GPR model parameterizes the covariance between data points using a kernel function. A statistician may pick an arbitrary function (subject to certain mathematical requirements; see Rasmussen & Williams 2006 for the gory details) to be the kernel, which can then be used to calculate the covariance between any two data points. As an example, let us consider the periodic kernel:

Equation (A1)

where η1 is the amplitude, Prot is the variability period (often the star's rotation period), and η3 is the harmonic complexity, or degree of "wigglyness" of the repeating signal. Given this model for the covariance of our data, and some data, we can make a prediction, which is the conditional probability distribution over the expected values at new measurement times. This is referred to as conditioning a GP on a set of data.

Importantly, GPR does not inherently involve training (i.e., parameter tuning, generally via an optimization and/or MCMC step). GPR is just the process of using a parameterization of your covariance matrix to predict the values and uncertainties of new data points given existing data points.

The "training" part comes in when you are optimizing the hyperparameters of your kernel (optionally jointly with parameters of a mean function, which could be a function of Keplerian orbital parameters). Now, it becomes important to compute a statistic describing how well your GP model fits your data, so that you can optimize the (hyper)parameters to obtain your result. This is where the GP likelihood comes in:

Equation (A2)

where C is the covariance matrix computed for the times at which you have data, N is the number of measurements, and r is the vector of residuals (data – mean model). The first term is a constant, and does not change as a function of the kernel hyperparameters, and the second term is analogous to χ2 (in fact, it reduces to χ2 in the limit of no off-diagonal covariance). The second term describes how well your mean model and correlated noise description match your data. The third term is the "Occam's razor" term that penalizes complexity.

To understand how the third term penalizes complexity, recall that the determinant of a matrix can be understood as the hypervolume between vectors defined by the columns of the matrix. To make this concrete, consider the 3 × 3 identity matrix:

Equation (A3)

The vectors defined by the columns of this matrix are (1, 0, 0), (0, 1, 0), and (0, 0, 1). The volume of the 3D shape defined by these vectors (the unit cube) is 1, the same as the matrix determinant!

The ith column vector of a covariance matrix can be interpreted as the vector of covariances between a data point taken at ti and every other data point in the data set. The determinant of this matrix, then, is the hypervolume defined by these covariance vectors. A perfectly covariant matrix, in which all data points are perfectly correlated, will consist of all 1s, 38 and the covariance vectors will all "point" in the same direction. This results in a third-term contribution of:

Equation (A4)

A matrix of perfectly independent data points, on the other hand, is (a scalar multiple of) the identity matrix. The covariance vectors all "point" in orthogonal directions. This matrix results in a third-term contribution of:

Equation (A5)

This exercise demonstrates that the determinant of the covariance matrix quantifies how "clustered" the covariance vectors corresponding to each data point are in hyperspace. More clustered covariance vectors get a big likelihood boost, while less clustered/more independent covariance vectors get a smaller boost. Figure 5.3 in Rasmussen & Williams (2006) decomposes the likelihood contributions of the second and third terms in Equation (A2), illustrating how they combine to produce a local likelihood maximum in parameter space for a toy model.

Footnotes

  • 23  

    See also Cale et al. (2021), who define overfitting in terms of reduced χ2 in the context of RV activity modeling.

  • 24  
  • 25  
  • 26  

    Conditioned on the HARPS-N data of SM21; see Section 4.1.

  • 27  

    Assuming that the stellar activity signals observed by different instruments can be described as linear combinations; see Section 4.1.

  • 28  

    With different RV offsets as well, so technically a linear combination, not just a scalar multiple.

  • 29  

    This is slightly different from the GP prescription in juliet (Espinoza et al. 2018), which does not allow different amplitudes for individual RV instruments.

  • 30  

    This modified version of the code is available at https://github.com/California-Planet-Search/radvel/tree/tinygp.

  • 31  

    This was also noted in SM21.

  • 32  

    As of 2023 January 1, V1298 Tau will unfortunately not be reobserved with TESS through year 6. We used tess-point (Burke et al. 2020) to make this determination.

  • 33  

    It is worth pointing out that similar strategies have been successful before, e.g., to measure the mass of Kepler-78 b (Howard et al. 2010; Pepe et al. 2013)

  • 34  

    The activity-only portion is isolated following SM21, subtracting the Keplerian mean model from the total mean GP prediction.

  • 35  

    Although we highly encourage others to try!

  • 36  

    In fact, the activity-to-Keplerian ratio of 1000 m s−1:50 m s−1 for warm giant planets around a young star like V1298 Tau is reminiscent of the 1 m s−1:10 cm s−1 ratio for an Earth around a Sun-like star.

  • 37  
  • 38  

    Or a scalar multiple of the matrix of all 1s.

Please wait… references are loading.
10.3847/1538-3881/acde78