The following article is Open access

Magnitude-squared Coherence: A Powerful Tool for Disentangling Doppler Planet Discoveries from Stellar Activity

, , , and

Published 2022 March 15 © 2022. The Author(s). Published by the American Astronomical Society.
, , Citation Sarah E. Dodson-Robinson et al 2022 AJ 163 169 DOI 10.3847/1538-3881/ac52ed

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1538-3881/163/4/169

Abstract

If Doppler searches for Earth-mass, habitable planets are to succeed, observers must be able to identify and model out stellar activity signals. Here we demonstrate how to diagnose activity signals by calculating the magnitude-squared coherence ${\hat{C}}_{{xy}}^{2}(f)$ between an activity-indicator time series xt and the radial-velocity (RV) time series yt. Since planets only cause modulation in RV, not in activity indicators, a high value of ${\hat{C}}_{{xy}}^{2}(f)$ indicates that the signal at frequency f has a stellar origin. We use Welch's method to measure coherence between activity indicators and RVs in archival observations of GJ 581, α Cen B, and GJ 3998. High RV-Hα coherence at the frequency of GJ 3998 b and high RV-S index coherence at the frequency of GJ 3998c, indicate that the planets may actually be stellar signals. We also replicate previous results showing that GJ 581 d and g are rotation harmonics and demonstrate that α Cen B has activity signals that are not associated with rotation. Welch's power spectrum estimates have cleaner spectral windows than Lomb–Scargle periodograms, improving our ability to estimate rotation periods. We find that the rotation period of GJ 581 is 132 days, with no evidence of differential rotation. Welch's method may yield unacceptably large bias for data sets with N < 75 observations and works best on data sets with N > 100. Tapering the time-domain data can reduce the bias of the Welch's power spectrum estimator, but observers should not apply tapers to data sets with extremely uneven observing cadence. A software package for calculating magnitude-squared coherence and Welch's power spectrum estimates is available on github.

Export citation and abstract BibTeX RIS

1. Introduction

Planets are often diagnosed in radial-velocity (RV) periodograms as large peaks which register above some high significance level. While some periodogram peaks genuinely describe planets, others are spurious detections resulting from stellar rotation or activity (e.g., Saar & Donahue 1997; Hatzes 2002; Desort et al. 2007; Boisse et al. 2011; Robertson & Mahadevan 2014; Robertson et al. 2014; Newton et al. 2016; Suárez Mascareño et al. 2017; Rajpaul et al. 2021). Now that extreme-precision spectrographs are on the hunt for terrestrial planets (Jurgenson et al. 2016; González Hernández et al. 2018; Gupta et al. 2021), the need for high-performance activity diagnostics is urgent: while an Earth-like planet orbiting a sunlike star yields an RV oscillation with semiamplitude K < 10 cm s−1, that same star's expected rotational RV modulation has amplitude >1 m s−1 (Vanderburg et al. 2016). Doppler surveys will not yield any Earth analogs unless stellar signals can be accurately identified and modeled.

A sure sign that a peak in the RV periodogram is caused by stellar activity is when the periodogram of a simultaneously measured activity indicator such as Hα, Mt. Wilson S-index, or bisector span shows the same peak (or its harmonic; e.g., Queloz et al. 2001; Bonfils et al. 2007; Kane et al. 2016; Sarkis et al. 2018; Toledo-Padrón et al. 2019). However, for quiet target stars with low-level variability, the signal in the activity-indicator power spectrum might not hit a statistically significant false-alarm probability. In such cases, the connection between stellar activity and RV might be overlooked (e.g., Robertson et al. 2015; Bortle et al. 2021; Lubin et al. 2021). Furthermore, aliasing, small sample size, and red noise can cause bootstrap false alarm level calculations to fail (Baluev 2008; Chernick 2008; Littlefair et al. 2016), making it difficult to correctly assess the significance of activity signals. Another common diagnostic of stellar activity—a linear regression of RV onto an activity indicator (e.g., Queloz et al. 2001; Huélamo et al. 2008; Queloz et al. 2009; Tal-Or et al. 2018)—fails when the RV and activity signals are not in phase, as seen in the RV, S-index, and Hα-index measurements of 55 Cnc (Butler et al. 2017; Bourrier et al. 2018). To validate planet discoveries, we require analysis techniques that reveal all oscillatory components common to simultaneous time series, regardless of relative phase.

In this paper, we demonstrate the use of the magnitude-squared coherence in RV planet searches. This bivariate statistic diagnoses oscillations that manifest in more than one observable. Magnitude-squared coherence can be interpreted as a frequency-dependent correlation coefficient that describes the proportion of variance in one time series that can be explained by means of a lagged linear regression onto another time series. For example, if RV and and $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ (Noyes et al. 1984) both oscillate at the star rotation frequency—as happened when α Cen B had a large spot group in 2010 (Dumusque et al. 2012; Thompson et al. 2017)—the coherence would be high at that frequency, and it would be possible to predict RV using a lagged regression onto $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ if the observations were regularly spaced in time (Shumway & Stoffer 2001). Bivariate statistics are used in a myriad of physical science applications, such as solar physics (Walker & Stephenson 2014), climatology (Thomson 1995), oceanography (Chave et al. 1992; Miller & Kelley 2021), atmospheric science (Krug et al. 2019), seismology (Scafetta & Mazzarella 2015), and others (Carter 1987).

This paper is organized as follows. In Section 2, we present the mathematical fundamentals of the magnitude-squared coherence ${C}_{{xy}}^{2}(f)$ and its two companion bivariate statistics, the cross spectrum and phase spectrum. In Section 3, we describe computational methods used to estimate ${C}_{{xy}}^{2}(f)$. In Section 4, we demonstrate our methods' stellar activity diagnostic power using published RV observations of α Cen B, GJ 581, and GJ 3998. We present our conclusions and plans for future work in Section 5. Appendix introduces our new publicly available software package, NWelch (Dodson-Robinson 2022), which was used for all calculations involving real RV data that are presented here.

2. Mathematical Fundamentals of Magnitude-squared Coherence

Suppose a planet-search team is lucky enough to have an extreme-precision spectrograph in space. There are no equipment problems that require telescope downtime, and all data downlinks can be completed in under 24 hr. The team has the luxury of observing a sunlike target at the same time every Earth day, with no interruptions, for many years. The resulting observations are evenly spaced, with constant Δt = ti+1ti =1 day (where t is a time stamp and i is an integer index; symbol definitions are collected in Table 1). From each astronomical spectrum, the data pipeline returns RV plus dozens of activity indicators (e.g., Wise et al. 2018). The team has assembled the perfect multivariate planet-search time series, which should allow members to characterize all stellar signals.

Table 1. Definitions of Symbols

SymbolDefinition
ti time stamp
Δti time between observations
xt , yt time series
N number of data points in time series
μx expected value of series xt
$\bar{x}$ sample mean of series xt
E{ · }expected value
γxy (τ)cross correlation between series xt and yt
${\hat{\gamma }}_{{xy}}(\tau )$ sample cross correlation between series xt and yt
τ time lag (independent variable in cross correlation)
${ \mathcal F }\{\}$ (Nonuniform) Fourier transform
Sxy (f)cross spectrum of time series xt and yt
${\hat{S}}_{{xy}}(f)$ estimated cross spectrum of time series xt and yt
${\hat{c}}_{{xy}}(f)$ estimated cospectrum between time series xt and yt
${\hat{q}}_{{xy}}(f)$ estimated quadrature spectrum between time series xt and yt
${\hat{C}}_{{xy}}^{2}(f)$ estimated magnitude-squared coherence between xt and yt
${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$ estimated power spectra of time series xt and yt
${\hat{\phi }}_{{xy}}$ estimated relative phase spectrum of xt and yt
* convolution
$\hat{a}$ estimated value of a
Var{ · }variance
z(f)Fisher's variance-stabilizing transformation of ${\hat{C}}_{{xy}}^{2}(f)$
wt taper coefficients
W(f)spectral window
K number of segments used in Welch's algorithm
$\tilde{K}$ effective number of independent estimates ${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$, ${\hat{S}}_{{xy}}(f)$
${x}_{j}^{(k)}$, ${y}_{j}^{(k)}$, ${w}_{j}^{(k)}$ observations and taper coefficients belonging to segment k
${\hat{S}}_{{xx}}^{(k)}(f)$, ${\hat{S}}_{{yy}}^{(k)}(f)$, ${\hat{C}}_{{xy}}^{(k)}(f)$ Sxx (f), Syy (f), Sxy (f) estimated using ${x}_{t}^{(k)}$, ${y}_{t}^{(k)}$ taken from segment k
${\hat{S}}_{{xx}}^{w}(f)$, ${\hat{S}}_{{yy}}^{w}(f)$, ${\hat{S}}_{{xy}}^{w}(f)$ tapered estimates of Sxx (f), Syy (f) Sxy (f)
N(k) number of data points in segment k
${{ \mathcal R }}^{(k)}$ Rayleigh resolution of ${\hat{S}}_{{xy}}^{(k)}(f)$
$\mathrm{bias}\left[{\hat{C}}_{{xy}}^{2}(f)\right]$ approximate bias of magnitude-squared coherence estimator
${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ debiased magnitude-squared coherence estimate
FALfalse-alarm level
α false-alarm probability
${ \mathcal B }$ half-width of the main lobe of the spectral window
$\max [\cdot ]$ maximum value
gr threshold periodogram power for Fisher's test
g threshold periodogram power for Siegel's test
Nf number of entries in the frequency grid
λ proportionality constant relating g and gr
Tλ Siegel's test statistic

Download table as:  ASCIITypeset image

Knowing that any signal manifesting in both an activity indicator and RV time series cannot be driven by an Earth-like planet (after all, exo-Earths are too miniscule and too far from their host stars for tides to trigger stellar activity as in HD 179949; Shkolnik et al. 2003), the team decides to use the cross correlation as an activity diagnostic. The cross correlation between two jointly stationary time series xt and yt is defined as

Equation (1)

where E{ · } denotes expected value, μx = E{xt }, μy = E{yt }, and τ is an integer time lag. When observed at N contiguous time points t = 0, ..., N − 1, γxy (τ) can be estimated using the convolution formula from Shumway & Stoffer (2001), Equations (1.29), (1.40)

Equation (2)

where $\bar{x}={N}^{-1}{\sum }_{t=0}^{N-1}{x}_{t}$ is the sample mean of the series xt , $\bar{y}$ is the sample mean of the series yt , and $\hat{a}$ denotes an estimate of a. Oscillations with period P that manifest in both xt and yt will yield local maxima in ${\hat{\gamma }}_{{xy}}(\tau )$ at integer multiples of P and local minima where the lag is such that the oscillation is perfectly out of phase, generating a wave pattern that is easy to see in a plot. The top panel of Figure 1 shows two synthetic regularly spaced time series with N = 512 representing possible spacecraft measurements. Let xt denote a noisy activity indicator time series that records the star rotation with period Prot = 25 days, while yt is the noisy RV time series that shows both the rotation and a planet with period Ppl = 80 days:

Equation (3)

Equation (4)

where frot = 1/Prot is the rotation frequency, fpl = 1/Ppl is the planet frequency, and ζt and ηt denote uncorrelated Gaussian white noise processes with zero mean and unit variance. The middle panel of Figure 1, which shows yt as a function of xt , demonstrates that there is no zero-lag, straight-line relationship between the two observables because of both the phase shift in the shared rotation signal and the noise in each time series. But the shared oscillation is obvious in the bottom panel of Figure 1, which reveals a wave pattern with period Prot in ${\hat{\gamma }}_{{xy}}(\tau )$.

Figure 1.

Figure 1. Top: Synthetic time series xt (Equation (3)), which represents an activity indicator, and yt (Equation (4)), which represents radial velocity. Star rotation manifests in both xt and yt , and yt also records planet-induced oscillations. Middle: A plot of yt as a function of xt fails to reveal a straight-line relationship between the two observables. This is because (1) the rotation signal in yt lags the rotation signal in xt by 90, and (2) both time series are plagued with Gaussian white noise. Bottom: The cross correlation ${\hat{\gamma }}_{{xy}}(\tau )$ shows an oscillation at the rotation period.

Standard image High-resolution image

If the real data are noisier than our synthetic xt and yt or the two time series share more than one oscillation—such as rotation and a long-term activity cycle—it is more difficult to identify the periods of shared signals in a plot of ${\hat{\gamma }}_{{xy}}(\tau )$. To pinpoint the frequencies of oscillations in common to both xt and yt , the planet-search team can estimate the (complex valued) frequency-dependent cross correlation, or cross spectrum, using

Equation (5)

Equation (6)

where f is the frequency. In-phase and 180 out-of-phase sinusoids that occur in both xt and yt yield delta functions in ${c}_{{xy}}(f)=\mathrm{Re}\{{S}_{{xy}}(f)\}$ (the cospectrum), 4 while ±90° phase-shifted oscillations in common to xt and yt show up as delta functions in ${q}_{{xy}}(f)=\mathrm{Im}\{{S}_{{xy}}(f)\}$ (the quadrature spectrum). In general, the cospectrum—an even function—is large when there is a small phase angle near zero, and the quadrature spectrum—an odd function—is large and positive when there is a phase angle near 90 and will be negative when the phase separation approaches 270°. Coherent oscillations with relative phases that are not integer multiples of 90 will yield delta functions in both the cospectrum and the quadrature spectrum. Where xt and yt share multiple oscillations, the cross spectrum will have delta functions at all oscillation frequencies. The top panel of Figure 2 shows ${\hat{S}}_{{xy}}(f)$ for our synthetic spacecraft data set. The shared but phase-shifted rotation signal creates spikes in the quadrature spectrum at f = ± 1/25 days−1.

Figure 2.

Figure 2. Top: The cross-spectrum estimate ${\hat{S}}_{{xy}}(f)$ of synthetic signals xt and yt (Equation (6)). Spikes at ± frot appear in ${\hat{q}}_{{xy}}(f)$ because the rotation signal in yt lags the signal in xt . Middle: Estimated magnitude-squared coherence between xt and yt (Equation (8)). From bottom to top, the red horizontal lines show the 1%, 0.1%, and 0.01% false alarm levels. The shared rotation signal generates a strong peak in ${\hat{C}}_{{xy}}^{2}(f)$, but the planet signal present only in yt does not show up in ${\hat{C}}_{{xy}}^{2}(f)$. Bottom: Estimated phase spectrum ${\hat{\phi }}_{{xy}}(f)$. The dotted line shows the estimated phase at all frequencies. The solid line shows the part of the phase estimate at frequencies near frot, where the coherence is statistically significant. As expected from Equations (3) and (4), ${\hat{\phi }}_{{xy}}({f}_{\mathrm{rot}})=270^\circ $.

Standard image High-resolution image

One can normalize Sxy (f) to produce the frequency-dependent cross-correlation coefficient, or magnitude-squared coherence

Equation (7)

Equation (8)

To compute power spectrum estimates ${\hat{S}}_{{xx}}(f)$ and ${\hat{S}}_{{yy}}(t)$, replace the $({x}_{t}-\bar{x})({y}_{t}-\bar{y})$ in Equation (6) with $({x}_{t}-\bar{x})({x}_{s}-\bar{x})$ or $({y}_{t}-\bar{y})({y}_{s}-\bar{y})$. In Equation (8), ${\hat{S}}_{{xx}}(f)$ and ${\hat{S}}_{{yy}}(t)$ are normalization factors that control for the fact that different observables might have vastly different amplitude variations—for example, RV semiamplitudes are of order unity or larger even for quiet sunlike stars (Vanderburg et al. 2016), while the Hα index defined by Gomes da Silva et al. (2011) records oscillations at the 1% level (e.g., Robertson et al. 2015). If xt and yt are activity indicator and RV, respectively, ${\hat{C}}_{{xy}}^{2}(f)$ should be near zero at the orbital frequency of any planet candidate. As ${\hat{C}}_{{xy}}^{2}(f)$ approaches unity, it lends more support to the hypothesis that the velocity signal at f is caused by stellar activity. The estimated phase spectrum, or frequency-dependent phase lag between the two time series, is

Equation (9)

${\hat{\phi }}_{{xy}}(f)$ is especially useful at frequencies where ${\hat{C}}_{{xy}}^{2}(f)$ exceeds some threshold of statistical significance. The middle panel of Figure 2 shows the estimated magnitude-squared coherence ${\hat{C}}_{{xy}}^{2}(f)$ between our example xt and yt , while the bottom panel shows the estimated phase spectrum ${\hat{\phi }}_{{xy}}(f)$. The shared rotation signal shows up in ${\hat{C}}_{{xy}}^{2}(f)$ as a strong peak at frot, while the planet signal at fpl that appears only in RV does not show up in ${\hat{C}}_{{xy}}^{2}(f)$ at all. This example shows the power of magnitude-squared coherence in separating activity signals from planets. 5

Of course, we do not have an extreme-precision spectrograph on a spacecraft with no-fail instrumentation and rapid downlink. Like all ground-based astronomical time series, RV and activity-indicator data sets are sampled at uneven intervals, with daytime and seasonal gaps. When the time between spectroscopic observations Δt is not constant, no direct cross-correlation estimate ${\hat{\gamma }}_{{xy}}(\tau )$ can be calculated. But with a nonuniform fast Fourier transform algorithm, it is still possible to estimate the cross-spectrum ${\hat{S}}_{{xy}}(f)$ between RV and an activity indicator (Scargle 1989). Define the nonuniform fast Fourier transform (NFFT) of the sequence xj observed on the sequence of times tj where j = 0, ..., N − 1 implicitly as

Equation (10)

where k is the index of the frequency grid and tj has been standardized to the scaled time interval [ − 1/2, 1/2). That is, to solve for the coefficients $\tilde{x}(k)$ one computes ${({{\boldsymbol{A}}}^{* })}^{T}$, the adjoint of the matrix A with entries ${{\boldsymbol{A}}}_{{jk}}={e}^{2\pi {{if}}_{k}{t}_{j}}$ (Keiner et al. 2009), and multiplies ${({{\boldsymbol{A}}}^{* })}^{T}$ by the column vector of observations xj (Springford et al. 2020; see Section A.2 for more on the NFFT algorithm). In what follows, we use the notation $\tilde{x}(f)$ to denote the NFFT on a grid of equally spaced Fourier frequencies fk . When the data are equally spaced in the time domain, the inverse transform can be easily written in closed form. Replacing the Fourier transform with NFFTs, one can obtain the following (normalized) power spectral density estimator

Equation (11)

and similarly, an estimator for the cross spectrum

Equation (12)

(where * deontes complex conjugate), which has the desired property that convolution of the sequences xt and yt results in the multiplication of their Fourier transforms, i.e., $\tilde{x}(f)\tilde{y}(f)$. For a description of the NFFT algorithm, see Section A.2.

Equation (12) reveals a trap for the unwary: if the magnitude-squared coherence is calculated based only on a single estimate of each of ${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$, and ${\hat{S}}_{{xy}}(f)$, the disastrous result will be ${\hat{C}}_{{xy}}^{2}(f)=1$. To see why, we substitute Equation (12) into Equation (8): the result is Sxx (f)Syy (f)/[Sxx (f)Syy (f)]. But if we have K > 1 estimates of ${\hat{S}}_{{xy}}(f)$, say ${\hat{S}}_{{xy}}^{(k)}(f)$ for k = 0,...,K − 1, we can take advantage of the fact that

Equation (13)

Thus, a meaningful coherence estimate comes from averaging together multiple estimates of the numerator and denominator of Equation (8) before computing their ratio:

Equation (14)

The question becomes, how do we obtain the ${\hat{S}}^{(k)}$? For RV data sets, we will divide each time series into shorter segments and compute one estimate of each of ${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$, and ${\hat{S}}_{{xy}}(f)$ from each segment (Section 3.1). The signal-processing literature describes this procedure as Welch's method.

For bivariate frequency-domain analysis, there is one more useful mathematical operation. As mentioned above, ${C}_{{xy}}^{2}(f)$ is bounded between 0 and 1, which means $\mathrm{Var}\{{\hat{C}}_{{xy}}^{2}(f)\}$ is a function of ${\hat{C}}_{{xy}}^{2}(f)$ (where Var{ · } denotes variance). In practical terms, this means the difference between (for example) ${\hat{C}}_{{xy}}^{2}(f)=0.91$ and ${\hat{C}}_{{xy}}^{2}(f)=0.92$ might be more statistically significant than the difference between ${\hat{C}}_{{xy}}^{2}(f)=0.5$ and ${\hat{C}}_{{xy}}^{2}(f)=0.6$ (depending on N and K). To stabilize $\mathrm{Var}\{{\hat{C}}_{{xy}}^{2}(f)\}$ and remove some of its dependence on ${\hat{C}}_{{xy}}^{2}(f)$, we can use Fisher's z transformation (Fisher 1929; Jenkins & Watts 1968):

Equation (15)

where $\mathrm{atanh}$ is the inverse hyperbolic arctangent. The transformed coherence z(f) is approximately Student − t distributed (Thomson & Chave 1991). Figure 3 shows z(f) from the ${\hat{C}}_{{xy}}^{2}(f)$ estimate plotted in Figure 2.

Figure 3.

Figure 3.  z(f), the Fisher variance-stabilizing $\mathrm{atanh}$ transformation of ${\hat{C}}_{{xy}}^{2}(f)$. 1%, 0.1%, and 0.01% false alarm levels are shown in red.

Standard image High-resolution image

In the next section, we describe the computational methods used to estimate ${\hat{S}}_{{xy}}(f)$, ${\hat{C}}_{{xy}}^{2}(f)$, and ${\hat{\phi }}_{{xy}}(f)$ for RV data.

3. Computational Methods

The biggest challenge in applying Equation (14) is obtaining multiple estimates of ${\hat{S}}_{{xy}}(f)$. For evenly spaced data (constant Δt), there are three possibilities: the multitaper method of Thomson (1982), where the time-domain data are multiplied by a set of orthogonal sequences (Slepian 1978) and the results from subsequent Fourier analysis are averaged together via jackknife mean (see a bivariate application of this technique in Thomson 1995), smoothing the cross spectrum across frequency (Shumway & Stoffer 2001), or Welch's method (Welch 1967). 6 Here, we demonstrate how to apply the computationally lightweight Welch's method to RV data. Our methods borrow heavily from the software package redfit-x (Ólafsdóttir et al. 2016) and its predecessor SPECTRUM (Schulz & Stattegger 1997), which implement Welch's method for analysis of unevenly spaced, bivariate paleoclimate data.

3.1. Segmenting the Data

Welch (1967) built on the work of Bartlett (1948), who proposed estimating the power spectrum of a stationary, regularly spaced time series xt by dividing the series into segments, computing a periodogram of each segment, and averaging the periodograms:

Equation (16)

where K is the number of segments and ${\tilde{x}}^{(k)}(f)$ denotes the NFFT of the data subsequence ${x}_{j}^{(k)}$ of length N(k). 7 For bivariate time series observed at the same time stamps tj , one uses the same segmentation scheme for both xt and yt (i.e., two observables measured from the same astronomical spectrum taken at time tj are assigned to the same segment k) and computes ${\hat{S}}_{{xy}}^{(k)}(f)$, ${\hat{S}}_{{xx}}^{(k)}(f)$, and ${\hat{S}}_{{yy}}^{(k)}(f)$ for each segment. Averaging together the K different estimates of the cross spectrum and power spectra yields ${\overline{S}}_{{xy}}(f)$, ${\overline{S}}_{{xx}}(f)$, and ${\overline{S}}_{{yy}}(f)$—all the ingredients needed to compute ${\hat{C}}_{{xy}}^{2}$ using Equation (14). Figure 4 shows the Dumusque et al. (2012) α Cen B $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ time series divided into nonoverlapping segments in preparation for using Equation (16).

Figure 4.

Figure 4. Nonoverlapping segmenting scheme for the α Cen B spectroscopic data set from Dumusque et al. (2012).

Standard image High-resolution image

Bartlett (1948) and Welch (1967) explored time-series segmenting, not because they wanted to compute magnitude-squared coherences, but because they were looking for a power spectrum estimator whose variance decreases as the number of observations increases. This is not true of either the standard periodogram (Schuster 1898) or the Lomb–Scargle periodogram (Lomb 1976; Scargle 1982): for both estimators, the variance is independent of the number of observations. Welch (1967) pointed out that nonoverlapping segments (aka Bartlett's method) are ideal for reducing the variance of any $\hat{S}(f)$, as the various ${\hat{S}}^{(k)}(f)$ are fully independent of one another. But there is a catch. Although both the standard periodogram and the Lomb–Scargle periodogram are asymptotically unbiased—i.e., $\hat{S}(f)\to S(f)$ as N —the bias can be severe for small or even not-so-small N ("tragedy of the periodogram"; Percival 1994). To understand bias, suppose an RV time series yt traces a planet in a circular orbit with period Ppl . Since the planet's time domain signature is a perfect sinusoid, ${\hat{S}}_{{yy}}(f)$ should have an infinitely thin delta function at fpl = 1/Ppl. But the finite duration of yt creates spectral leakage: in ${\hat{S}}_{{yy}}(f)$, the planet's signal will land mostly on the frequency fj nearest to fpl , but there will be nonzero contributions to every frequency in the grid (e.g., Harris 1978). This leakage is responsible for periodogram bias. Thus there is a tradeoff between bias and variance: we want a large number of segments K in order to improve the consistency of ${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$, and ${\hat{C}}_{{xy}}^{2}(f)$, but if N(k) is too low our power spectrum and coherence estimates will be consistently biased—that is, ${\hat{C}}_{{xy}}(f)$ will be far away from Cxy (f)—especially at high frequencies (e.g., Podesta 2006; Bronez 1992; Percival & Walden 2020).

3.2. Overlapping Segments

To reduce the bias of each ${\hat{S}}^{(k)}(f)$ while retaining most of the variance suppression associated with large K, Welch (1967) proposed using tapered, overlapping segments (tapering is discussed in Section 3.3). In the 50% overlap scheme, we segment xt as follows:

Each 50% overlapping segment has 2N/(K + 1) data points, as opposed to N/K for nonoverlapping segments. But because of the overlap, the resulting spectral estimates ${\hat{S}}^{(k)}(f)$ are not independent. The variance reduction associated with overlapping segments is

Equation (17)

where $\tilde{K}\lt K$ is an effective number of segments. The left panel of Figure 5 depicts the allocation of data points in a sample xt among segments ${x}_{j}^{(k)}$ in Welch's 50% overlapping segment method.

Figure 5.

Figure 5. Left: Cartoon showing the allocation of data points in xt among segments 0...K − 1 in Welch's method. Series xt has 40 data points, which are broken into seven overlapping segments ${x}_{j}^{(0)},\ldots ,{x}_{j}^{(6)}$ that each have 10 data points. Note that 10 data points per segment are far too few to compute any kind of periodogram, let alone a minimally biased one; this figure uses small N(k) simply for the sake of readability. Right: Tapers ${w}_{j}^{(k)}$ applied to each segment shown in the left panel.

Standard image High-resolution image

3.3. Tapering

The value of $\tilde{K}$ depends on the type of taper (also called a window) applied to each segment. Tapers are functions wt that are premultiplied with xt and yt to minimize spectral leakage, and are especially valuable for detecting weak signals in the neighborhood of much stronger signals. They are normalized such that ${\sum }_{t=0}^{N-1}{w}_{t}^{2}=1$ so as to conserve power. Figure 6 shows a synthetic RV data set from a star with two unequal-mass planets in circular orbits:

Equation (18)

where f1 = 1.7 days−1 and f2 = 1.2 days−1. 8 When no taper is applied (i.e., the data set retains its rectangular or "boxcar" taper associated with the fact yt = 0 before and after the observing run such that ${\hat{S}}_{{yy}}(f)=\tfrac{1}{N}| {\sum }_{t=0}^{N-1}({y}_{t}-\bar{y}){e}^{-i2\pi {ft}}{| }^{2}$), one does not detect the signal associated with planet 2 in ${\hat{S}}_{{yy}}(f)$. But when a minimum four-term Blackman–Harris taper (Equation (33) of Harris 1978) is applied to yt , so that the power spectrum estimate becomes

Equation (19)

planet 2 is detected despite being responsible for only 0.028% of Var{yt }. Here ${\hat{S}}_{{yy}}^{w}(f)$ is not computed with Welch's algorithm—it is a standard Schuster (1898) periodogram. See Section A.1 for reasons why Figure 6 and all subsequent power spectrum plots have logarithmic y-axes.

Figure 6.

Figure 6. Top left: Synthetic data set with N = 128, ti = 0...10 days, and ${y}_{t}=\cos (2\pi {{tf}}_{1})+0.017\sin (2\pi {{tf}}_{2})$, where f1 = 1.7 days−1 and f2 = 1.2 days−1. Bottom left: wt yt , the product of yt with a minimum 4–term Blackman–Harris window (gray curve). Right: Estimated power spectra of yt (blue dashed–dotted line) and wt yt (solid red line). Vertical lines mark f1 and f2. Without tapering, leakage from planet 1's strong signal masks the weaker signal from planet 2. When the taper is applied, leakage from planet 1 is confined to frequencies near f1 and the weak signal at f2 is uncovered.

Standard image High-resolution image

Viewing tapering from a frequency-domain context, when the time series is evenly spaced with Δtj = 1, we have the following (Percival & Walden 2020, p. 186):

Equation (20)

which is a convolution between the true power spectrum and the spectral window W(f), defined as

Equation (21)

That is, the bias of the power spectrum estimator comes from convolving the true power spectrum with the spectral window. Since spectral leakage is created by the "smearing" effect of the Fourier transform of the spectral window, it is best if W(f) resembles a delta function as closely as possible (Harris 1978). For RV data sets, we calculate W(f) using the adjoint NFFT by replacing xj with wj in Equation (10). Note that, when generalizing Equation (20) to unevenly spaced time series, one does not strictly obtain a convolution between S(f) and W(f), since, in general, the NFFT cannot simply be inverted (Scargle 1982, Appendix D). However, we will see in Section 4 that peaks in the RV power spectra are shaped like W(f), and will follow Scargle (1982) in referring W(f) as the "spectral window."

When applying Welch's algorithm, tapering the overlapping segments not only mitigates spectral leakage—it also increases the independence of the various ${\hat{S}}^{(k)}(f)$. To see why, we examine the right panel of Figure 5, which shows tapers ${w}_{j}^{(k)}$ applied to each segment depicted in the left panel of Figure 5. At data point t = 5, where ${w}_{j}^{(0)}$ is highest, ${w}_{j}^{(1)}$ is nearly zero. Where ${w}_{j}^{(1)}$ is at its maximum, overlapping tapers ${w}_{j}^{(0)}$ and ${w}_{j}^{(2)}$ are near zero, and so on. The tapers ensure that very little information contained in segment k gets repeated in segments k − 1 or k + 1. The effective number of segments $\tilde{K}$ is

Equation (22)

where c is a constant that depends on the type of taper applied (Welch 1967). The boxcar taper belonging to otherwise untapered segments ${x}_{j}^{(k)}$ has c = 0.5, while the minimum four-term Blackman–Harris taper has c = 0.038, yielding $\tilde{K}\approx K$. Figure 7 shows the GJ 3998 RV data of Affer et al. (2016) broken into two 50%-overlapping segments with the associated minimum four-term Blackman–Harris tapers.

Figure 7.

Figure 7. GJ 3998 RV time series (black) divided into two 50%-overlapping segments shown by blue and orange shading. The dark pink area is where the two segments overlap. The minimum four-term Blackman–Harris tapers applied to each segment are shown at the top of the plot shifted vertically by +10 for visibility, with blue stars indicating Taper 1 and red triangles indicating Taper 2. RV data come from the HADES survey, which uses the HARPS-N instrument, and were extracted with the TERRA pipeline (Affer et al. 2016).

Standard image High-resolution image

If all RV planet-search data were observed at evenly spaced time intervals, as in the spacecraft example in Section 2, the benefits of applying tapers to the Welch's segments when computing ${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$, and ${\hat{C}}_{{xy}}^{2}(f)$ would far outweigh the drawbacks. 9 Reexamining Figure 6, we see that ${\hat{S}}_{{yy}}^{w}(f)$—the tapered power spectrum estimate of yt from Equation (18)—has a dynamic range of over 13 orders of magnitude. This dynamic range would allow planet hunters to identify terrestrial planets, hot Jupiters, rotation, and activity cycles all in the same RV power spectrum estimate ${\hat{S}}_{{yy}}^{w}(f)$: there would be no need to fit the strongest signal, subtract it out, examine a periodogram of the residuals, and keep iterating until no more signals were found. When the temporal cadence is only mildly uneven—as is common in paleoclimatology—the Blackman–Harris window and other similar tapers retain most of their bias-suppression ability (e.g., Ólafsdóttir et al. 2016).

But the wildly uneven observing cadence of RV time series often destroys the tapers (e.g., Scargle 1989). Figure 8 shows the effect of tapering the 51 Peg b RV data set of Butler et al. (2006). The top left shows the published data yt , while the bottom left shows wt yt , where wt is the minimum four-term Blackman–Harris taper evaluated at the observation time points tj . The top right shows that the planet's signal (black dotted line) is clearly visible in ${\hat{S}}_{{yy}}(f)$ (blue dashed–dotted line), while on the bottom right, ${\hat{S}}_{{yy}}^{w}(f)$ shows nothing but noise. Almost all of the 51 Peg observations took place at the very beginning and the very end of yt , when the interpolated wt is near zero, so tapering removes nearly all the information contained in the time series. When the observing cadence is extremely uneven, we revert to the boxcar tapers when computing ${\hat{C}}_{{xy}}^{2}(f)$. A good rule of thumb is that all tapers should retain the bell-like shapes shown in the right panel of Figure 5. If the "bell" is missing huge chunks or its shape is not recognizable when plotted, tapering may do more harm than good. 10

Figure 8.

Figure 8. Top left: 51 Peg RV time series measured by Butler et al. (2006), labeled yt in our notation. Bottom left: wt yt , the 51 Peg RV time series after applying a minimum four-term Blackman–Harris taper. Top right: ${\hat{S}}_{{yy}}(f)$, the estimated power spectrum of yt , shows the strong planet signal at f = 1/4.2308 days−1 (dotted black line). Bottom right: ${\hat{S}}_{y}^{w}(f)$, the estimated power spectrum of wt yt , shows only noise.

Standard image High-resolution image

3.4. Gaps, Weighting, and Number of Data Points Per Segment

Another consideration when applying Welch's algorithm to RV data is that large gaps in the middle of a segment should be avoided when possible. Gappy segments are difficult to handle because the resolution limit of ${\overline{S}}_{{xx}}(f)$, ${\overline{S}}_{{yy}}(f)$, and ${\overline{S}}_{{xy}}(f)$ is set by the time duration of the segments:

Equation (23)

where ${t}_{{N}^{(k)}}$ is the final timestamp in segment k, ${t}_{{0}^{(k)}}$ is the first timestamp in segment k, and $2{{ \mathcal R }}^{(k)}$—called the Rayleigh resolution in analogy to Rayleigh's criterion in optics—is both the smallest oscillation frequency that can be detected and the frequency separation Δf of two barely resolved peaks (e.g., Godin 1972). For evenly spaced data, ${{ \mathcal R }}^{(k)}$ is the same for all segments, but RV data sets always yield varying ${{ \mathcal R }}^{(k)}$. A segment ${x}_{j}^{(k)}$ that contains one or more large gaps has a low apparent Rayleigh resolution, which makes one optimistic that low-frequency information missing from other segments might be available in ${x}_{j}^{(k)}$. But ${\hat{S}}_{{xx}}^{(k)}(f)$ is often misleading at small integer multiples of ${{ \mathcal R }}^{(k)}$ because the gaps make for poor phase coverage of low-frequency oscillations. Here we use the segment with the longest time duration 11 to define the Rayleigh resolution ${ \mathcal R }$ of ${\overline{S}}_{{xx}}(f)$, ${\overline{S}}_{{yy}}(f)$, ${\overline{S}}_{{xy}}(f)$, and ${\hat{C}}_{{xy}}^{2}(f)$,but we urge caution when examining the low-frequency end of each statistic if one or more segments contains a large gap.

In practice it is difficult to keep the Welch's segments gap-free, as seasons and telescope scheduling often combine to isolate a handful of data points from the rest of the time series. The α Cen B $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ time series of Dumusque et al. (2012) shown in Figure 4 would ideally be broken into four segments, but we use only three segments because the first observing season (modified Julian dates 54,550–54,650) recorded just 42 observations—not enough to yield a low-bias periodogram (Pukkila & Nyquist 1985; Springford et al. 2020). No tapers are applied to the α Cen B data because of the gap in segment 1. With nonoverlapping, boxcar-tapered segments, the α Cen B segmenting scheme differs from Bartlett's method only in that each segment has a different number of data points. For all segmenting patterns, the average cross spectrum is weighted by the number of data points in each segment:

Equation (24)

Average periodograms ${\bar{S}}_{{xx}}(f)$ and ${\bar{S}}_{{yy}}(f)$ are likewise weighted by N(k).

To mitigate the worst manifestations of small-sample bias, we recommend that all segments have N(k) ≥ 100 (Hannan & Nicholls 1977; Pukkila & Nyquist 1985), which requires N ≥ 150 for K ≥ 250% overlapping segments. But many published RV data sets have fewer than 100 observations (a practice we do not endorse). Based on the simulations of Das et al. (2021), who generated small-sample realizations of AR and ARMA processes and compared the resulting periodograms with the processes' analytically known power spectra, we consider N(k) = 50 a hard lower limit. This means the minimum number of astronomical observations required for Welch's algorithm is 75 (two 50%-overlapping segments each with N(k) = 50), though more is much better. (In fact, Thomson & Haley (2014) present a time series with N = 1000 for which the periodogram, which records a power-law turbulent cascade, is still biased by more than seven orders of magnitude.) The transformed coherence z(f) and its noise properties can only be described analytically by Gaussian statistics when $\tilde{K}\geqslant 20$ (Enochson & Goodman 1965; Jenkins & Watts 1968), which requires N > 1000. With small N, the false-positive risk may depart from theoretical expectations in unpredictable ways (see below).

3.5. Bias Correction and False-alarm Levels

While minimizing bias in ${\hat{S}}_{{xx}}^{(k)}(f)$, ${\hat{S}}_{{yy}}^{(k)}(f)$, and ${\hat{C}}_{{xy}}^{2}(f)$ is always an important consideration when deploying Welch's algorithm, an approximate bias correction to ${\hat{C}}_{{xy}}^{2}(f)$ is possible. The bias on the magnitude-squared coherence measurement is

Equation (25)

(Carter et al. 1973; Bendat & Piersol 2010). The debiased coherence estimate is therefore

Equation (26)

All analyses of RV data presented in Section 4 use debiased coherence estimates.

One can also calculate analytical false alarm levels (FALs) for ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$:

Equation (27)

(Carter 1977; Schulz & Stattegger 1997), where FAL is the ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ threshold associated with false-alarm probability α. Equation (27) gives false alarm thresholds for ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ given true coherence ${C}_{{xy}}^{2}(f)=0$, i.e., the two time series trace completely unrelated physical phenomena. If a broad-spectrum random process (such as granulation) manifests in both xt and yt , then Equation (27) gives artificially low FALs for periodic signals because the two time series have some underlying nonzero coherence at all frequencies. The NWelch software package has an option for calculating frequency-dependent bootstrap FALs for ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ in addition to using Equation (27). All FALs presented here are calculated using 10,000 bootstrap iterations.

3.6. Resolution and the Spectral Window

While the Rayleigh resolution (Equation (23)) is the theoretical frequency separation between two barely resolved peaks in ${\hat{S}}_{{xx}}(f)$, ${\hat{S}}_{{yy}}(f)$, or ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$, the true frequency resolution of each statistic is determined by the width of the main lobe in the spectral window W(f). Resolution therefore depends on the choice of taper. In Figure 6, we see that the power spectrum peak associated with planet 1 is quite narrow when the boxcar taper is retained. Although the boxcar taper has poor statistical properties when it comes to leakage and bias, it works well for separating closely spaced signals of similar power. Thus boxcar tapers are useful for asteroseismologists who are trying to resolve modes separated by the small spacing, though they are not optimal for planet hunters who might have to deal with signals of widely varying power. The planet hunter's penalty for using a taper to increase the dynamic range of ${\hat{S}}_{{yy}}(f)$ is lower resolution in all $\hat{S}(f)$. Following Harris (1978), we quantify the half-width of the main lobe in W(f) as the frequency interval ${ \mathcal B }$ over which a sinusoidal signal declines from its peak value by 6 decibels (dB), or a factor of 3.981( ≈ 4). One resolution unit is $2{ \mathcal B }$ wide.

Harris (1978) calculated ${ \mathcal B }$ as a function of ${ \mathcal R }$ for a variety of tapers applied to evenly spaced time series. For the boxcar taper, ${ \mathcal B }=1.21{ \mathcal R }$. 12 The minimum four-term Blackman–Harris taper has ${ \mathcal B }=2.72{ \mathcal R }$, while the Kaiser-Bessel window (another taper option available in NWelch; see Section Appendix) has ${ \mathcal B }=2.39{ \mathcal R }$. The fact that ${ \mathcal B }$ is a function of ${ \mathcal R }$ means harmonic analysis has a resolution-variance tradeoff in addition to the bias-variance tradeoff discussed in Section 3.1. Users of Welch's method can either prioritize high resolution by constructing a small number of long-duration segments or emphasize false-positive suppression by deploying a larger number of shorter-duration segments. (Of course, small samples and seasonal gaps can constrain the segmenting scheme, making it difficult to find an optimal balance between resolution and variance.) Since almost all RV data sets require segments of varying duration (e.g., Figure 4), we estimate the main lobe half-width of the Welch's estimator as the mean of the main lobe half-widths from the individual segments, weighted by number of points per segment:

Equation (28)

where ${{ \mathcal R }}^{(k)}$ is defined in Equation (23).

In practice, the Welch's estimator for unevenly spaced data sets has spectral resolution that depends on the exact timing of the observations, not just the segment durations. (The dependence on observation timing applies to the main-lobe width of the Lomb–Scargle spectral window as well.) It is therefore possible for the actual resolution unit to differ from $2\hat{{ \mathcal B }}$ given by Equation (28). NWelch allows the user to examine the specific W(f) associated with any Welch's segmenting and tapering scheme. The software will then empirically calculate ${ \mathcal B }$ by finding the frequency at which $W(f)=\max [W(f)/3.981]$, where $\max [\cdot ]$ is the maximum value.

We recommend examining W(f) not just to find the resolution of a given Welch's estimator, but because changing the segmenting scheme can strongly alter the shape of the spectral window. For example, Figure 9 compares W(f) from the Lomb–Scargle periodogram and the three-segment Welch's estimator applied to the Dumusque et al. (2012) α Cen B data set (Figure 4). As we know from Section 3.3, a periodic signal does not yield a delta function in ${\hat{S}}_{{xx}}(f)$—it instead creates a copy of W(f) centered at the signal frequency. Since the Lomb–Scargle spectral window is beset by "ringing," there can be no clean, isolated peaks in the Lomb–Scargle periodograms of RVs and activity indicators, as noted by (Rajpaul et al. 2016, see also Section 4.2). The Welch's spectral window, which has a well-defined main lobe, is much better suited to identifying periodic signals. Figure 9 suggests that spectral window optimization using Welch's method will be a productive avenue for future research.

Figure 9.

Figure 9. Spectral windows belonging to the Dumusque et al. (2012) α Cen B data set. The ringing in the Lomb–Scargle spectral window (light blue) means each periodic signal yields multiple peaks in ${\hat{S}}_{{xx}}(f)$. The Welch's spectral window (dark blue), which has a much cleaner main lobe, will translate each periodic signal into a single power spectrum peak. While this plot zooms in on low frequencies, all spectral windows have nonzero power throughout the frequency domain.

Standard image High-resolution image

3.7. Siegel's Test

The last computational method we will describe applies only to power spectrum estimates, not to magnitude-squared coherences, but it is useful for deciding whether a data set contains periodic signals or just noise. Planet hunters often struggle with unrealistic-looking bootstrap false alarm levels (see Section 2.2 of Cumming 2004 for information on how to calculate bootstrap FALs). For example, the top left panel of Figure 10 shows the Hα-index time series ${I}_{{{\rm{H}}}_{\alpha }}$ measured by Robertson et al. (2015) from the Kapteyn's star spectroscopic data set of Anglada-Escude et al. (2014). The time series yields a Lomb–Scargle periodogram with ≥ 10 signals that exceed the bootstrap 1% FAL (Figure 10, top right; periodogram computed with NWelch). But should we really believe that a time series with only 112 measurements can record 10 distinct oscillations? It's more likely that the FALS are misleading (indeed, the same false-alarm threshold problem can be seen in the Lomb–Scargle periodogram of the same data set in Figure 1 of Robertson et al. (2015)), with small-sample statistics, spectral leakage, and aliasing all contributing to the bootstrap failure (e.g., Chernick 2008). The spectral window shown in the bottom panel of Figure 10 has a broad main lobe and a large number of spurious spikes, suggesting that ${I}_{{{\rm{H}}}_{\alpha }}$ periodogram peaks are window function artifacts.

Figure 10.

Figure 10. Top left: Hα indices of Kapteyn's star measured by Robertson et al. (2015) from the spectra of Anglada-Escude et al. (2014). Top right: Lomb–Scargle periodogram of ${I}_{{{\rm{H}}}_{\alpha }}$ with 5% and 1% FALs (purple and green, respectively). The FALs give the unrealistic impression that the periodogram has ≥ 10 significant peaks. Bottom: Spectral window W(f) of the Kapteyn's star ${I}_{{{\rm{H}}}_{\alpha }}$ time series. The broadness of the main lobe and the large number of spurious spikes suggest that peaks in the Lomb–Scargle periodogram are window function artifacts.

Standard image High-resolution image

In situations like this, we can deploy Siegel's test for compound periodicity. Siegel (1980) developed an extension of Fisher's test, which rejects the null hypothesis of white noise when the maximum power in the normalized periodogram exceeds the critical value gr . Percival & Walden (2020) approximate gr as

Equation (29)

where Nf is the number of entries in the frequency grid and α is the false alarm risk (e.g., α = 0.05 for 5% FAP). While Anderson (1971) notes that Fisher's test is the most powerful identifier of simple periodicity (oscillation at one period only), the test will not work when there are multiple oscillations—such as planet and rotation, rotation and long-term magnetic activity, or rotation with significant power at one or more harmonics. Siegel's test uses a reduced threshold g = λ gr , where λ < 1, and sums all periodogram power in excess of g to compute the test statistic Tλ :

Equation (30)

The value of Tλ is then compared with a threshold that depends on the number of entries in the frequency grid. If Tλ exceeds the threshold, the null hypothesis of white noise is rejected and the time series is considered to be periodic. When λ = 0.6, Siegel's test is conservatively optimized for two periodicities, whereas λ = 0.4 is sensitive to three or more periodicities but less robust against noise peaks. For the Kapteyn's star periodogram in Figure 10, Siegel's test does not reject the null hypothesis of white noise, even with λ = 0.4: there is no evidence for periodicity in the time series. We will use Siegel's test in Section 4 to see whether the alternative hypothesis of periodicity is supported for certain time series before using those time series to measure rotation periods. Note that Siegel's test does not differentiate between a smooth power spectrum with a large dynamic range and a power spectrum consisting of white noise plus a single large oscillation—red noise is a failure mode for both bootstrapping and Siegel's test. We are currently incorporating FALs calculated against a red noise model into NWelch and will discuss these in a future publication.

4. Application to RV Data

Here we use archival data to demonstrate the use of magnitude-squared coherence in diagnosing stellar RV signals. We select three test data sets that have concurrent RV and activity-indicator time series with N > 100. We begin by showing that GJ 581 has significant Hα-RV coherence at the frequencies of the stellar signals that were misidentified as planets GJ 581 d and GJ 581 g, and use a Welch's power spectrum of the Hα index to show that both signals are rotation harmonics. Next we demonstrate that high-frequency stellar signals appear in the magnitude-squared coherences between RV and activity indicators in the α Cen B data set assembled by Dumusque et al. (2012). We then use coherence between Mt. Wilson S-index, Hα index, and RV to argue that GJ 3998 b and c may be misdiagnosed stellar activity signals. This section closes with a step-by-step guide to interpreting magnitude-squared coherence measurements.

4.1. GJ 581

GJ 581 became the third M dwarf known to host a planetary system after Bonfils et al. (2005) used HARPS to discover a Neptune-mass planet with P = 5.36 days. Udry et al. (2007) followed up with reports of planets c (P = 12.93 days) and d (P = 83.6 days), both in or near the habitable zone. Mayor et al. (2009) presented another compelling discovery: planet e (P = 3.15 days, $M\sin i=1.9{M}_{\oplus }$), one of only a handful of super-Earths discovered in multiplanet systems at the time. They also revised the period of planet d to 66.8 days, arguing that the one-year alias introduced by Earth's orbit caused some confusion in the Udry et al. (2007) study. Next, Vogt et al. (2010) used Keck HIRES data to add planets f (P = 433 days) and habitable-zone dweller g (P = 33.6 days) to the inventory, with further follow up by Vogt et al. (2012). The six-planet system with three planets in or near the habitable zone became the object of intense climate modeling efforts (e.g., Heng & Vogt 2011; Pierrehumbert 2011; von Bloh et al. 2011; Wordsworth et al. 2011).

But papers casting doubt on the existence of one or more of the planets began to emerge almost immediately after the report of planets f and g. Anglada-Escudé & Dawson (2010) suggested that planet g was an alias of an eccentricity harmonic of planet d, while Gregory (2011) argued that a Bayesian multiplanet Kepler periodogram could only reliably detect planets b and c. Forveille et al. (2011) questioned the statistical significance of planets f and g after obtaining 121 new HARPS observations. Baluev (2013) demonstrated that planets d, f, and g could be artifacts of red noise with a correlation timescale of 10 days. Finally, Robertson et al. (2014) demonstrated that "planet" d was actually a stellar signal that, when incorrectly modeled in the time domain, created the artifact interpreted as "planet" g. Today the Extrasolar Planets Encyclopedia 13 states that GJ 581 hosts only three planets: b, c, and e.

Since Robertson et al. (2014) identified signals d and g as rotation artifacts, we will start by measuring the star's rotation period. Figure 11 shows Welch's power spectrum estimates of the time series used in the Robertson et al. (2014) analysis: ${\hat{S}}_{{xx}}^{w}(f)$ given ${x}_{t}={I}_{{{\rm{H}}}_{\alpha }}$ (top) and ${\hat{S}}_{{yy}}^{w}(f)$ given yt = RV (bottom). Both power spectrum estimates were computed using three 50% overlapping segments with the minimum four-term Blackman–Harris taper applied to each segment. As with the α Cen B data set, the Welch's estimates have a much cleaner spectral window than the generalized Lomb–Scargle periodograms (Figure 12). Robertson et al. (2014) found a primary peak in the ${I}_{{{\rm{H}}}_{\alpha }}$ Lomb–Scargle periodogram at P1 = 125 days plus a secondary peak at P2 = 138 days and attributed the split peak to phase changes in the rotation signal. However, the signals at f1 = 1/P1 and f2 = 1/P2 are not quite separated by $2{ \mathcal R }$ in the ${I}_{{{\rm{H}}}_{\alpha }}$ generalized Lomb–Scargle periodogram, so cannot be said to be truly distinct. The Welch's power spectrum estimate shows a strong single peak that exceeds the 0.1% FAL at P = 132 days (f = 0.00758 days−1), which we take to be the true rotation period. There are also significant peaks at the first two rotation harmonics, 2frot and 3frot. Although spectral window of the Welch's estimator is wide, the rotation signal and its harmonics are each separated by more than one resolution unit. The Welch's RV power spectrum has a statistically significant peak at the orbital frequency of planet b (f = 0.186 days−1) plus a local maximum at the frequency of planet c(f = 0.0774 days−1), but no obvious signals at the rotation frequency or its harmonics. The conservative Siegel's test with λ = 0.6 finds a ≥95% chance that both time series are periodic, suggesting that the bootstrap FALs are realistic.

Figure 11.

Figure 11. Top: Welch's power spectrum estimate of the GJ 581 ${I}_{{{\rm{H}}}_{\alpha }}$ time series reported by Robertson et al. (2014), with Lomb–Scargle periodogram plotted for comparison. Bottom: Welch's power spectrum estimate and Lomb–Scargle periodogram of the HARPS GJ 581 RVs reported by Forveille et al. (2011). Dotted horizontal lines show 0.1%, 1%, and 5% bootstrap FALs for the Welch's power spectra (Lomb–Scargle FALs are not shown), while vertical black dashed–dotted lines show the rotation frequency and its first two harmonics. The resolution of the Welch's estimator is indicated by the gray shading.

Standard image High-resolution image
Figure 12.

Figure 12. Spectral windows of the generalized Lomb–Scargle periodogram (light blue) and the Welch's power spectrum estimate (dark blue) calculated from the Forveille et al. (2011) observations of GJ 581. The Welch's power spectrum estimate was created using three 50%-overlapping segments with minimum four-term Blackman–Harris tapers.

Standard image High-resolution image

Robertson et al. (2014) used several lines of reasoning to argue that "planets" d and g were really stellar signals: they created a separate fit to RV as a function of ${I}_{{{\rm{H}}}_{\alpha }}$ for each observing season, identified correlations between ${I}_{{{\rm{H}}}_{\alpha }}$ and bisector inverse slope, and calculated a new RV model after subtracting off the best-fit seasonal straight-line models RV(${I}_{{{\rm{H}}}_{\alpha }}$). But a single calculation of ${\hat{C}}_{{xy}}^{{\prime} ;2}(f)$ is enough to reveal the stellar origins of the two signals. Figure 13 shows transformed magnitude-squared coherence z(f) given ${x}_{t}={I}_{{{\rm{H}}}_{\alpha }}$, yt = RV. Gray bands of width $2{ \mathcal B }$ are centered at the orbital frequencies of planets b and c. The yellow bands, also of width $2{ \mathcal B }$, are centered at the frequencies of "planets" d and g reported by Vogt et al. (2010). Vertical black dotted lines show rotation harmonics f = nfrot (where n is an integer) near which z(f) exceeds the 5% FAL. "Planet" d sits right atop the first rotation harmonic (f = 2frot) at the center of a resolution unit that includes two statistically significant coherence peaks, one of which exceeds the 1% FAL. "Planet" g is within half a resolution unit of the third rotation harmonic (f = 4frot), which is also the location of a coherence signal that almost reaches the 0.1% FAL. Significant ${I}_{{{\rm{H}}}_{\alpha }}$-RV coherence can also be seen at other rotation harmonics: at both f = 5frot and f = 7frot, z(f) exceeds the 0.1% FAL. In contrast, the band centered on planet c is clean. The band centered on planet b includes a signal that rises above the 5% FAL. That signal is likely to be a false positive—indeed, we expect to see more than one false positive above the 5% FAL given that our coherence estimate has a frequency range far greater than 20 resolution units—but further study of the GJ 581 system that incorporates other activity indicators might be useful.

Figure 13.

Figure 13. Magnitude-squared coherence estimate from HARPS spectra, with RV measured by Forveille et al. (2011) and Hα index measured by Robertson et al. (2014). Solid horizontal lines show false-alarm levels from Equation (27), while dotted lines in the same color scheme show bootstrap false-alarm levels. The stellar signals originally identified as planets d and g are shown by yellow shaded regions of width $2{ \mathcal B }$. Gray shaded regions of width $2{ \mathcal B }$ surround the orbital frequencies of planets b and c. Vertical dotted black lines show rotation harmonics at which z(f) exceeds the 1% FAL.

Standard image High-resolution image

The strong signals at the rotation harmonics in Figures 11 and 13 suggest that GJ 581 has a complex rotation signal. Spectral power and coherence at high-order harmonics may result from the star hosting multiple large spots or spot groups (e.g., Rodono et al. 1986; Günther et al. 2020; Perger et al. 2021; Perugini et al. 2021). In the next section, we will investigate coherent stellar signals that do not appear to be associated with the dominant rotation period, but which may be artifacts of differential rotation, giant cells, or supergranulation.

4.2.  α Cen B

The α Cen B data set assembled by Dumusque et al. (2012) has proven to be useful for identifying spectroscopic signatures of stellar activity. While the star is generally quiet (Cincunegui et al. 2007), it developed one or more large spot groups that caused obvious rotational modulation in the $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ time series from 2010 March 23 to 2010 June 12 (Thompson et al. 2017). This modulation is visible in Segment 2 of Figure 4. Wise et al. (2018) used the 2010 March–June data to find activity-sensitive absorption lines with modulation in either half-depth range or core flux, while Thompson et al. (2017) used the same data to identify pseudoemission features with rotationally driven RV changes. Here we will search for stellar signals using Welch's power spectra of the activity indicators and magnitude-squared coherence between activity indicators and RV.

Figure 14 shows Welch's power spectra of full width at half maximum of the cross correlation between the spectrum and a digital mask (FWHM; Pepe et al. 2000, top left), bisector velocity span (BIS; Toner & Gray 1988, top right), $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ (bottom left), and RV (bottom right) calculated using the segmenting scheme in Figure 4. Before applying Welch's algorithm to the RV data, we removed the binary motion by subtracting the best-fit quadratic model (Endl et al. 2016). To suppress spectral leakage from the long-period activity cycle, each ${x}_{j}^{(k)}$ of FWHM, BIS, and $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ and each ${y}_{j}^{(k)}$ of RV had a linear trend removed before ${\hat{S}}_{{xx}}^{(k)}(f)$ was calculated. The combination of segmenting and detrending mimics the low-pass filtering applied by Dumusque et al. (2012) and the Gaussian process model constructed by Suárez Mascareño et al. (2017). Figure 14 also shows Lomb–Scargle periodograms in light blue. The Welch and Lomb–Scargle periodograms differ at low f partly because the Welch's estimator contains no information about signals with periods longer than the longest segment duration, and partly because of the detrending. 14

Figure 14.

Figure 14. Welch's power spectra of the Dumusque et al. (2012) α Cen B data set: FWHM (top left), BIS (top right), $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ (bottom left), and RV (bottom right). Lomb–Scargle periodograms are shown in light blue for comparison. Dotted horizontal lines show 0.1%, 1%, and 5% bootstrap false-alarm thresholds for the Welch's power spectrum (Lomb–Scargle false-alarm thresholds are not shown). The black dashed–dotted line denotes the star rotation period, while the gray shaded region shows the resolution limit of the Welch's power spectrum estimate. Vertical dashed–dotted yellow lines show shared oscillations identified via magnitude-squared coherence.

Standard image High-resolution image

Although the Welch's power spectrum of BIS is the only one that has a signal that exceeds the bootstrap white-noise 5% false-alarm threshold, Siegel's test finds that Welch's power spectrum estimates of all of the activity-indicator time series (FWHM, BIS, $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$) show periodicity at the 95% significance level (Section 3.7). We therefore average the frequencies of maximum power in the BIS, FWHM, and $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ periodograms to find the rotation period that best describes the Dumusque et al. (2012) data set: Prot = 37.7 days (frot = 0.0265 day−1), which is consistent with rotation period estimates from the literature (DeWarf et al. 2010; Brandenburg et al. 2017; Suárez Mascareño et al. 2017). 15 We will use our measured rotation period as a benchmark when searching for stellar signals. Vertical black dotted lines in Figure 14 show shared oscillations identified in the magnitude-squared coherences (see below).

Before we examine any magnitude-squared coherence estimates, we pause to consider Figure 15, which shows scatter plots yt versus xt for all $\displaystyle \left(\genfrac{}{}{0em}{}{4}{2}\right)=6$ pairs of observables. RV is not well described as a straight-line function of any activity indicator. We can optimistically hope that's because activity signals are not manifesting in RV, but it is possible that (a) the relationships between RV and activity indicators are nonlinear, phase-lagged (Figure 1), and/or noisy, or (b) $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$, FWHM, and BIS do not provide a complete description of stellar activity. On the other hand, the activity indicators show obvious straight-line relationships with each other—particularly $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ and FWHM—which suggests that they all trace the same underlying physical processes. In a magnitude-squared coherence analysis with two activity indicators as xt and yt , we should expect to see statistically significant values in ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ and z(f).

Figure 15.

Figure 15. Scatter plots yt versus xt for all four α Cen B observables. While there is no obvious straight-line relationship between RV and any activity indicator, the indicators are tightly correlated among themselves, suggesting that they trace the same physical processes.

Standard image High-resolution image

In Figure 16, we see our prediction of high coherences between the activity indicators borne out. The plots show z(f), the $\mathrm{atanh}$-transformed magnitude-squared coherence, given ${x}_{t}=\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$, yt = FWHM (top); ${x}_{t}=\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$, yt = BIS (middle); and xt = FWHM, yt = BIS (bottom). Solid horizontal lines show false-alarm thresholds calculated with Equation (27), while dotted horizontal lines show bootstrap false-alarm thresholds (Section Appendix). The rotation frequency is marked with a black dashed–dotted line, with the resolution limit indicated in gray. As expected, all three plots show statistically significant z(f) across the entire rotation band. Other frequencies besides rotation at which z(f) exceeds the 1% false-alarm threshold in at least two of three panels in Figure 16 are marked with vertical yellow dashed–dotted lines; if a signal at frequency f is significant in two of three coherences between activity indicators, it means the oscillation is present in all three indicators. Coherent stellar signals are located at f = (0.0460, 0.0985, 0.138, 0.179, 0.275, 0.330, 0.350) days−1, or P = (21.7, 10.2, 7.27, 5.60, 3.64, 3.03, 2.86) days.

Figure 16.

Figure 16. Hyperbolic arctangent-ransformed activity indicator coherences from the Dumusque et al. (2012) α Cen B data. Top: z(f) given ${x}_{t}=\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} },{y}_{t}$ = FWHM; middle: z(f) given ${x}_{t}=\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} },{y}_{t}$ = BIS; bottom: z(f) given xt = FWHM, yt = BIS. The black dashed–dotted line shows the shared rotation signal, while the gray shaded area represents the resolution limit ${ \mathcal B }$. Yellow dashed–dotted lines show signals that exceed the 1% false-alarm threshold in at least two of the three panels.

Standard image High-resolution image

We know from Figure 9 that the shared signals are not window function artifacts, but their physical origin is unclear: most of them do not fall at simple rotation harmonics (f = nfrot, where n is an integer), though f = 0.138 days−1 is within half a resolution unit of the n = 4 harmonic. Differential rotation sometimes creates secondary peaks in periodograms of photometry and activity indicators (Reinhold et al. 2013); our best guess is that we are seeing a secondary peak, perhaps along with some beating between differential rotation signals. Supergranulation and giant cells may be in play on the shorter timescales (Nordlund et al. 2009). If we reexamine the Welch's power spectrum estimates of $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$, FWHM, and BIS in Figure 14, we see that the shared signals (again marked by yellow dashed–dotted lines) are all at or near local maxima. Even though none of those local maxima appear statistically significant when compared with our bootstrap white-noise false-alarm thresholds, the underlying oscillations may be real. The downward slopes of ${\hat{S}}_{{xx}}(f)$ for xt = FWHM, BIS, and $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ suggest that false-alarm thresholds in the α Cen B power spectra should be computed from a red noise model (e.g., Ólafsdóttir et al. 2016). We are developing this functionality and will include it in a future release of NWelch. Red noise has been incorporated in models of RV data by (e.g.,) Baluev (2013); Tuomi et al. (2013), and Feng et al. (2016).

Now we turn to the magnitude-squared coherences between activity indicators and RV. Figure 17 shows z(f) given yt = RV and xt = FWHM (top), yt = RV and xt = BIS (middle), and yt = RV and ${x}_{t}=\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ (bottom). The color scheme is the same as in Figure 16. Magnitude-squared coherences in the rotation band are near zero, indicating that no activity indicator is a good tracer of the way rotation manifests throughout the entire duration of the RV data set. In each panel of Figure 17, signals that exceed the 1% FAL in the z(f) estimate shown in that panel only are marked with yellow dashed–dotted lines. The activity indicators' shared signal at f = 0.275 days−1 (Figure 16) also shows up in coherences between FWHM & RV and $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ & RV. FWHM-RV coherence has a second peak at f = 0.167 days−1 (P = 5.99 days, top panel). BIS-RV coherence has peaks at f = 0.115 days−1 and f = 0.288 days−1 (P = 8.70 days and P = 3.47 days, middle panel). The four coherent RV-activity indicator oscillations are marked in the Welch's RV power spectrum in Figure 14 (lower-right panel). The FWHM-RV oscillation at f = 0.167 days−1 coincides with a local maximum of the Welch's RV power spectrum. The $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$ power spectrum also has a local maximum at f = 0.167 days−1, though not the FWHM power spectrum. The coherent RV-BIS oscillation at f = 0.115 days−1 is within one resolution unit of peaks in the FWHM and RV power spectra. All four power spectra have a local maximum at f = 0.275 days−1.

Figure 17.

Figure 17. Transformed magnitude-squared coherence for xt = FWHM, yt = RV (top); xt = BIS, yt = RV (middle); and ${x}_{t}=\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$, yt = RV (bottom). The color scheme follows Figure 16. Yellow dashed–dotted lines mark shared oscillations that rise above the 1% FAL.

Standard image High-resolution image

Our analysis of the Dumusque et al. (2012) α Cen B data set shows that periodic short-timescale stellar activity occurs at other frequencies besides frot and its harmonics. We have identified seven oscillatory signals that are present in all three activity indicators, plus four coherent activity indicator-RV oscillations. Figure 15 demonstrates that the stellar RV signals could not have been diagnosed by fitting a straight-line model to RV as a function of an activity indicator. Accurately identifying the activity signals is an important step toward suppressing false positives in RV planet searches.

4.3. GJ 3998

Since 2012, the HArps-N red Dwarf Exoplanet Survey (HADES) program has been surveying 78 early-type M dwarfs for Doppler shifts induced by rocky planets (Perger et al. 2017). The first HADES discovery paper featured two planets orbiting GJ 3998 with periods Pb = 2.65 days and Pc = 13.7 days (Affer et al. 2016). Aware of the tendency of M dwarfs to be more active than solar-type couterparts of the same age, the HADES team examined time series of ${I}_{{{\rm{H}}}_{\alpha }}$ and Mt. Wilson S-index measured from the same spectra as the RVs, as well as near-simultaneous EXORAP and APACHE photometry. Affer et al. (2016) identified two signals with periods P1 = 30.7 days and P2 = 42.5 days that were present in RV, ${I}_{{{\rm{H}}}_{\alpha }}$, and S-index and posited that P1 was the true rotation period while P2 represented modulation due to differential rotation. The photometry was consistent with rotation period P1. Following Robertson et al. (2014), the HADES team verified that the planetary signals were present in generalized Lomb–Scargle periodograms of RV data from all observing seasons and demonstrated that the Spearman's rank correlation coefficients of RV as a function of S-index and ${I}_{{{\rm{H}}}_{\alpha }}$ were insignificant. They then concluded that the signals at Pb and Pc were planetary in origin.

As with GJ 581 and α Cen B, we begin our analysis by measuring the star rotation period from Welch's power spectrum estimates with the segmentation and tapering scheme shown in Figure 7. Affer et al. (2016) explain how the traditional HARPS and HARPS-N way of measuring RVs, by cross correlating with a mask consisting of a thin rectangle surrounding each spectral line center, is suboptimal for M dwarfs because their spectra feature substantial line blending. Accordingly, in Figure 18 we examine the Welch's and generalized Lomb–Scargle periodograms from the RV, S-index, and ${I}_{{{\rm{H}}}_{\alpha }}$ time series measured with the TERRA pipeline (Anglada-Escudé & Butler 2012). The S-index and ${I}_{{{\rm{H}}}_{\alpha }}$ time series contain statistically significant peaks at 0.0321 days−1 and 0.0311 days−1, respectively, which we average together to find frot = 0.0316 days−1/Prot = 31.7 days. Our rotation period is similar to P1 identified by Affer et al. (2016). Although the stellar signal with frequency f2 = 1/P2 would be more than one resolution unit away from frot, the Welch's power spectra do not show any evidence for such a signal.

Figure 18.

Figure 18. Welch's (dark blue) and generalized Lomb–Scargle (light blue) periodograms of the GJ 3998 spectroscopic data of Affer et al. (2016). The color scheme is the same as in Figure 11.

Standard image High-resolution image

We now turn to magnitude-squared coherence. Although Affer et al. (2016) emphasize that the TERRA pipeline is preferred over the CCF-based HARPS DRS pipeline for M dwarfs, they nevertheless present RVs measured with both methods. They also present a second set of activity indicator time series, with S-index measured according to Henry et al. (1996, H96) and ${I}_{{{\rm{H}}}_{\alpha }}$ measured as in Robertson et al. (2013, R13). Accordingly, we analyze coherences with xt = TERRA S-index, H96 S-index, TERRA ${I}_{{{\rm{H}}}_{\alpha }}$, and R13 ${I}_{{{\rm{H}}}_{\alpha }}$, and with yt = TERRA RV and CCF RV—eight measurements of z(f) in all. Figure 19 shows the four transformed coherence measurements with an S-index measurement as xt . Planets b and c are marked by vertical dashed–dotted black lines surrounded by a shaded region indicating the resolution unit. In three of four S-RV coherences, there is a peak within half a resolution unit of planet c that exceeds the 1% FAL. The fourth, with xt = H96 S-index and yt = CCF RV, has a signal at fc that exceeds the 5% FAL. As with GJ 581, we are also seeing high-order rotation harmonics: all z(f) estimates have high coherence at the fifth harmonic (f = 6frot), while z(f) estimates with yt = CCF RV have high coherence at the third harmonic (f = 4frot).

Figure 19.

Figure 19. Transformed magnitude-squared coherences between S-index and RV from the HADES GJ 3998 observations of Affer et al. (2016). Top left: xt = TERRA S-index, yt = TERRA RV. Top right: xt = TERRA S-index, yt = CCF RV. Bottom left: xt = H96 S-index, yt = TERRA RV. Bottom right: xt = H96 S-index, yt = CCF RV. All panels show a signal at the frequency of planet c. In addition, all z(f) estimates show high coherence at the fifth rotation harmonic (f = 6frot), while z(f) estimates with yt = CCF RV have high coherence at the third harmonic (f = 4frot). Rotation harmonics are marked by dotted black lines.

Standard image High-resolution image

If we examine the coherences with an ${I}_{{{\rm{H}}}_{\alpha }}$ measurement as xt plotted in Figure 20, we find evidence for a stellar signal at the frequency of planet b. When yt = CCF RV, the ${I}_{{{\rm{H}}}_{\alpha }}$-RV coherence at fb exceeds the 0.1% FAL. With the TERRA RVs, we find ${I}_{{{\rm{H}}}_{\alpha }}$-RV coherence over the 5% FAL at fb and nearing the 1% FAL for xt = TERRA ${I}_{{{\rm{H}}}_{\alpha }}$. However, the band surrounding planet c is clean. ${I}_{{{\rm{H}}}_{\alpha }}$ does not have any coherent oscillations with RV directly at the rotation harmonics, though there is a coherence peak almost within a half resolution unit of 4frot for yt = TERRA RV. The reason Affer et al. (2016) could not see a straight-line relationship between either ${I}_{{{\rm{H}}}_{\alpha }}$ and RV or S-index and RV is because there is more than one oscillation shared between RV and the activity indicators—in addition to the planet candidates, rotation harmonics are present, and so are higher-frequency signals that are not obviously associated with rotation, as in the α Cen B data.

Figure 20.

Figure 20. Transformed magnitude-squared coherences between ${I}_{{{\rm{H}}}_{\alpha }}$ and RV from the HADES GJ 3998 observations of Affer et al. (2016). Top left: xt = TERRA ${I}_{{{\rm{H}}}_{\alpha }}$, yt = TERRA RV. Top right: xt = TERRA ${I}_{{{\rm{H}}}_{\alpha }}$, yt = CCF RV. Bottom left: xt = R13 ${I}_{{{\rm{H}}}_{\alpha }}$, yt = TERRA RV. Bottom right: xt = R13 ${I}_{{{\rm{H}}}_{\alpha }}$, yt = CCF RV. All panels show a stellar signal at the frequency of planet b, with FAP ≪ 1% for yt = CCF RV.

Standard image High-resolution image

While we are not yet ready to state definitively that the GJ 3998 RVs show only stellar signals, the system requires further follow up in light of our results. If coherence between RV and activity indicators at fb and fc persists after additional data are taken, that would suggest that the planets are not real. It's also crucial to measure multiple activity indicators, since they may not all trace the same underlying physical phenomena. Indeed, Robertson et al. (2013) highlight the fact that Ca H&K and Hα emission come from different chromospheric depths and point out that signal-to-noise ratio is often problematic in the Ca H&K lines in M dwarf spectra.

4.4. Interpreting Magnitude-squared Coherence Measurements

What should observers look for when applying bivariate frequency-domain techniques to their own planet-search data sets? In our analyses of GJ 581, α Cen B, and GJ 3998, we roughly followed the following procedure:

  • 1.  
    Examine the spectral window of the Welch's estimator. Look for sidelobes that could yield false positives in the power spectra. Calculate the Rayleigh resolution limit, the analytical resolution unit ${ \mathcal B }$ (e.g., Harris 1978), and the empirical value of ${ \mathcal B }$ for a data set's specific observing cadence. NWelch automatically reports these numbers for each segmenting/tapering scheme. Be aware that two signals with a frequency separation of less than $2{ \mathcal B }$ are not statistically distinguishable. It's especially important to consider resolution when searching for planets near the star rotation period or differential rotation.
  • 2.  
    Use Welch's power spectra of the activity indicators to estimate the rotation period. While many planet-search targets will have previous rotation period measurements in the literature, the quasiperiodic nature of the rotation signal means one might recover a somewhat different rotation period depending on the observational epoch and activity cycle phase (Robertson et al. 2014). It's important to determine how rotation is manifesting in the particular data set under analysis. For the α Cen B data set, in which three activity indicators yielded three slightly different rotation period estimates, we averaged together the various estimates (Section 4.2). It's possible that the community will come up with a more sophisticated way to handle differing rotation period measurements.
  • 3.  
    Search for coherence between RV and activity indicators within one resolution unit of the rotation period, the periods of planet candidates, and the harmonics of all of the above. For any planet that is not large enough or close enough to the star to trigger stellar activity, ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ and z(f) should be impervious to eccentricity harmonics. Observers who find coherent RV-activity indicator signals at either their planet candidate frequency or its harmonics should use extreme caution before reporting a planet discovery. It's unclear how high up the overtone sequence we should expect to see rotation harmonics—for example, if there's a peak in z(f) near the eighth harmonic (9frot), is it truly rotation-related, or is its proximity to a harmonic just a coincidence? We hope stellar physicists will explore the complexity of rotation signals and the extent to which their harmonics should be traceable by RV data sets. We also hope stellar physicists will weigh in on possible sources of coherent RV-activity signals that are not related to rotation.
  • 4.  
    When coherent RV-activity indicator signals are found, check to see whether they line up with local maxima in the power spectra. Sometimes a power spectrum peak that's not significant when judged against a white noise model indicates a real signal. The signal processing literature features more sophisticated ways of checking the significance of power spectrum peaks, such as F-testing (e.g., Thomson 1994), false-alarm thresholds determined from red noise models (e.g., Ólafsdóttir et al. 2016), and prewhitening followed by checking against the exponential quantiles. But these techniques are mostly unexplored for RV data. When a peak in z(f) corresponds to a local maximum in the RV and/or activity-indicator power spectrum, it lends support to the hypothesis that the coherence is real and not spurious. However, given that the geophysics literature features coherent signals that do not correspond to power spectrum local maxima (see example in Pardo-Igúzquiza & Rodríguez-Tovar 2012), one should not be too quick to dismiss any signals in z(f) as false positives.

5. Conclusions and Plans for Future Work

Magnitude-squared coherence is a powerful tool for diagnosing stellar signals in bivariate RV-activity indicator time series. By combining Welch's power spectrum estimates with magnitude-squared coherence measurements, we were able to identify the signals that were originally labeled GJ 581 d, g as rotation harmonics. In the α Cen B data set, we mapped bivariate oscillations onto non-rotation-related local maxima in the Welch's power spectrum estimates of FWHM, BIS, $\mathrm{log}{R}_{\mathrm{HK}}^{{\prime} }$, and RV. Finally, in the GJ 3998 data, we found high coherence between ${I}_{{{\rm{H}}}_{\alpha }}$, S-index, and RV at the frequencies of planet candidates b and c. Since it is now standard practice for planet hunters to analyze and publish activity indicator time series along with RVs (e.g., Dalba et al. 2021; González-Álvarez et al. 2021; Maldonado et al. 2021), and since most stellar signals show up in some, but not all, activity indicators (Section 4.3), we recommend that every planet discovery be vetted by analyzing the magnitude-squared coherence between RV and as many activity indicators as can be measured. Doing so is computationally cheap, and our NWelch software package is publicly available from a repository that contains examples of all functionality (Section Appendix). More generally, since the nonuniform Fourier transform can map to any set of frequencies, the frequency-domain approach described here is the only way to properly describe correlations between lagged versions of two time series with unequal observing cadence—there is no time-domain approach to the problem that does not involve interpolation.

The Welch's estimator that underlies our coherence measurements gives cleaner spectral windows and lower variance than generalized Lomb–Scargle periodograms (Section 4.1), even when tapers are not applied to the segments (Section 4.2). Magnitude-squared coherence is valuable primarily for its ability to identify stellar signals, but sometimes it is the spectral window and not the star that is responsible for false positives (e.g., Rajpaul et al. 2016). Our results suggest that Welch's method may be a valuable addition to the set of frequency-domain methods already in use in planet searches, such as the generalized Lomb–Scargle periodogram (GLS, Zechmeister & Kürster 2009), the Bayesian GLS (Mortier et al. 2015), and the maximum-likelihood periodogram (Stoica et al. 1989). Again, it is computationally inexpensive to compute a Welch's power spectrum estimate with NWelch, and we recommend that all planet hunters closely examine the Welch's power spectra that are generated along with magnitude-squared coherence estimates.

In Section 3.3 we demonstrate how tapering can increase the dynamic range of the power spectrum estimator. The ability to see low-amplitude and high-amplitude signals in the same power spectrum estimate, without iteratively fitting and subtracting out signals one by one, would be invaluable in searches for Earth-like planets. However, in both statistics and astronomy, almost all research on tapering has been confined to data sets with even or near-even observing cadence (e.g., Chave 2019; Springford et al. 2020). Quantifying the dynamic range attainable by the Welch's power spectrum estimator for different uneven observing cadences, taper types (e.g., Blackman–Harris, Kaiser–Bessel, Hann), and segmenting schemes would be a useful direction for future research.

Our findings show that star rotation signals can be complex, with significant RV-activity indicator coherence appearing at high-order rotation harmonics in our two example M dwarfs (Sections 4.1, 4.3). Furthermore, the α Cen B data set has multivariate oscillations with unclear physical origins. At the short end of the period range (P ∼ 3 days), the multivariate oscillations with may be associated with supergranulation or giant cells. The longer-period oscillations (P = 7.27, 10.2, 21.7 days) could be related to differential rotation and harmonics thereof—though the period separation between our measured rotation period of 37.7 days and the 21.7 days signal is above the ∼30% threshold selected by Reinhold et al. (2013) for detecting differential rotation in Kepler targets. We expect investigations of magnitude-squared coherence between RV and activity indicators to yield new insights into stellar physics.

The false-positive rate of the magnitude-squared coherence estimator applied to RV data should be explored further. We have not yet encountered an RV data set without statistically significant coherence between RV and an activity indicator at some frequency. This is to be expected: the precision achieved by the RV community in the past decade makes it inevitable that stellar signals will manifest in RV data instead of being subsumed by instrument noise, and even if that were not true, coherences have spurious signals just like any other statistic. But we do not yet know the extent to which broad-spectrum processes such as turbulence boost the RV-activity indicator coherence and complicate the interpretation of analytical false-alarm thresholds in ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$ and z(f). Furthermore, RV data sets have an observing cadence that is far more uneven than the paleoclimatology data sets on which the nonuniform Welch's method is typically deployed, so astronomers might be dealing with different false-positive rates than geophysicists. We expect the RV community to achieve better understanding of false-positive rates as the magnitude-squared coherence estimator is applied to more data sets.

We are grateful to Lily Zhao, Debra Fischer, and the EXPRES team for allowing us to test our statistical methods on new EXPRES data. We thank Joan Caicedo Vivas and Henry Sanford-Crane for asking important questions about our methodology and Catherine Lembo and Rebecca Hutchinson for input on early phases of this work. Effort by SDR, VRD, and JH was funded by Bartol Research Institute. VRD received additional funding from the UNIDEL foundation. This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357.

Facilities: Keck (HIRES), ESO La Silla 3.6 m (HARPS), Telescopio Nazionale Galileo (HARPS-N).

Software: NWelch (Dodson-Robinson 2022), FINUFFT (Barnett et al. 2019; Barnett 2021), astropy (Astropy Collaboration et al. 2013), redfit-x (Ólafsdóttir et al. 2016).

Appendix A: NWelch Software Package

All Welch's power spectrum and coherence estimates in this work were computed with the new NWelch software package (Dodson-Robinson 2022), written for python 3, which performs Fourier analysis of bivariate time series. NWelch borrows heavily from redfit-x, a package for cross-spectral analysis of paleoclimate time series written in fortran (Ólafsdóttir et al. 2016). NWelch is stored in a repository at https://github.com/sdrastro/NWelch; see the README for installation instructions and a complete listing of the software functionality. The software consists of two classes: a base class called TimeSeries, which implements the univariate calculations, and a derived class called Bivariate, which calculates ${\hat{C}}_{{xy}}^{{\prime} 2}(f)$, z(f), and $\hat{\phi }(f)$. Jupyter notebooks containing the calculations shown in Section 4 are located in the repository directories GJ581, aCenB, and GJ3998. The demo directory contains an example Jupyter notebook that demonstrates the full functionality of the TimeSeries class using the Barnard's star activity indicator data set of Toledo-Padrón et al. (2019).

A.1. Power Spectrum Plots: Logarithmic or Linear y-axis?

NWelchdefaults to a logarithmic y-axis for all power spectrum plots. The user can select a linear y-axis (see the demo notebooks in the githubrepository for instructions), but there are good reasons to use logarithmic scaling (e.g., Thomson 1994):

  • 1.  
    Power spectrum peaks that are not significant when judged against a white noise model can correspond to coherence peaks that are significant, so it is good to be able to see all the local maxima in the power spectrum (see the α Cen B analysis in Section 4.2);
  • 2.  
    Human vision and hearing respond to the intensity of stimuli on a logarithmic scale;
  • 3.  
    A downward slope in ${\mathrm{log}}_{10}[{\hat{S}}_{{xx}}(f)]$, which is easy to spot on a semilog-y plot, is an indicator of red noise;
  • 4.  
    Multiplicative effects appear additive on a logarithmic scale, so convolution in the time domain—which is equivalent to multiplication in the frequency domain—yields simple addition in ${\mathrm{log}}_{10}[{\hat{S}}_{{xx}}(f)]$.

Finally, the convention in the signal processing literature is to use semilog-y plots for power spectra because engineering processes often use linear filters such as moving averages or Gaussian smoothing. A moving average applied to time-domain data is obvious on a semilog-y plot of the power spectrum because the plot will show the filter sidelobes. We recommend that all astronomers working in the frequency domain examine semilog-y plots of their power spectra.

A.2. Nonuniform Fourier Transforms

The workhorse calculation that underlies all NWelch functionality is the nonuniform fast Fourier transform (NFFT). Here we provide a brief overview of the mathematics behind the NFFT. Lomb (1976) and Scargle (1982) were the first to develop periodogram (Fourier spectrum) analysis techniques that were quite powerful for finding and testing the significance of weak periodic signals in otherwise random, unevenly sampled data. Given a set of data values xj = 1, ..., N at respective observation times, the Lomb–Scargle periodogram is constructed as follows. First, compute the data's mean and variance by

Equation (A1)

Second, for each angular frequency ω = 2π f > 0 of interest, compute a time-offset τ by

Equation (A2)

Third, the Lomb–Scargle normalized periodogram (spectral power as a function of angular frequency ω = 2π f) is defined by

Equation (A3)

The constant τ makes pn (ω) completely independent of shifting all the tj by any constant. Lomb (1976) showed that this particular choice of offset has another, deeper, effect: it makes Equation (A3) identical to the equation that one would obtain if one estimated the harmonic content of a data set, at a given frequency ω by linear least-squares fitting to the model

Lomb–Scargle periodograms can be approximated by employing an "extirpolation" process (interpolation of the unevenly spaced points onto a regular grid) using the Lagrange interpolation method, and then using the ordinary FFT (Press & Rybicki 1989). However, as described in Leroy (2012; Springford et al. 2020, Section 7), one can employ the adjoint nonuniform FFT (Keiner et al. 2009) for a fast implementation which is not approximate. Briefly, one computes

Equation (A4)

Equation (A5)

then

Equation (A6)

Equation (A7)

Equation (A8)

Equation (A9)

NWelch uses FINUFFT, the Flatiron Institute nonuniform fast Fourier transform, 16 to compute the adjoint NFFT. Methods for performing the exponential sums are given by Barnett et al. (2019) and Barnett (2020). For all power spectrum estimates (Figures 11, 14 and 18), NWelch uses the power spectral density normalization:

Equation (A10)

where m is the integer index of the frequency grid and Δf is the frequency grid spacing.

A.3. Comparison with astropy.timeseries.LombScargle

The most basic NWelch task is to compute a periodogram of an unevenly spaced time series without tapering, detrending, or segmentation. Here we show that this task produces results that are consistent with astropy.timeseries.LombScargle. The top panel of Figure 21 shows two periodograms of the residuals of the Dumusque et al. (2012) α Cen B RV data after subtracting a quadratic model of binary motion. The periodogram plotted with a dark blue, solid line comes from NWelch, while the periodogram plotted with the light blue, dashed line comes from astropy.timeseries.LombScargle. The periodograms are almost identical, as are their bootstrap 5% and 1% false-alarm levels (purple and green, respectively, with dotted lines for NWelch FALs and solid lines for astropy.timeseries.LombScargle FALs). For simple Lomb–Scargle periodograms with bootstrap false-alarm thresholds, NWelch and astropy.timeseries.LombScargle can be used interchangeably. However, NWelch does not include the Baluev (2008) method for calculating false-alarm probabilities based on extreme value theory.

Figure 21.

Figure 21.  NWelch single-segment power spectrum estimate (dark blue, solid line) and astropy Lomb–Scargle periodogram (light blue, dashed line) of the α Cen B RVs measured by Dumusque et al. (2012). 5% and 1% false-alarm thresholds are shown in purple and green, respectively, with the solid horizontal lines showing astropy false-alarm thresholds and the dotted horizontal lines showing NWelch false-alarm thresholds. The two power spectrum estimates and their bootstrap false-alarm thresholds are nearly identical.

Standard image High-resolution image

Footnotes

  • 4  

    Actually the signals that appear in the cross spectrum will be shaped by the spectral window (Section 3.3), but for this "perfect data set" example they will approximate delta functions in the limit as N .

  • 5  

    For our synthetic data set, ${\hat{S}}_{{xy}}(f)$ and ${\hat{C}}_{{xy}}^{2}(f)$ were computed in python 3 with scipy.signal.csd and scipy.signal.coherence, respectively. To estimate ${\hat{C}}_{{xy}}^{2}(f)$, each time series was divided into six overlapping segments and a Blackman–Harris taper was applied to each segment. See Sections 3.1 and 3.3 for more on segmenting and tapering the time series.

  • 6  

    A forthcoming paper will extend multitaper analysis to RV data (S. E. Dodson-Robinson et al. 2022, in preparation). That paper will build on the work of Springford et al. (2020), who applied the multitaper technique to Kepler data with near-constant Δt.

  • 7  

    In Welch's original work on regularly spaced time series, ${\tilde{x}}^{(k)}(f)$ is the fast Fourier transform and N(k) = N(j) for all j, k = 0,..., K − 1.

  • 8  

    From a dynamical perspective, such a planetary system is unlikely to be stable; we use it here merely for illustrative purposes.

  • 9  

    The high dynamic range delivered by the minimum four-term Blackman–Harris taper and other similar spectral windows comes at the cost of some resolution in the frequency domain; see Section 3.6 for more on how tapering affects resolution.

  • 10  

    Not all useful tapers have bell shapes in the time domain—in particular, higher-order multitapers have zero crossings (Slepian 1978; Thomson 1982)—but in this work we use only bell-shaped tapers.

  • 11  

    The longest-duration segment does not necessarily have the highest number of data points N(k); it is simply the segment with the largest value of ${t}_{{N}^{(k)}}-{t}_{{0}^{(k)}}$ (Equation (23)).

  • 12  

    Astronomy students may find 1.21 an easy constant to remember, as it is quite similar to the constant in the telescopic Rayleigh resolution equation for a circular aperture, θ = 1.22λ/D (where θ is the telescope resolution limit, λ is the observation wavelength, and D is the telescope diameter).

  • 13  

    exoplanet.eu

  • 14  

    See Section Appendix for a "like-for-like" comparison between the single-segment, untapered NWelch power spectrum and the astropy.timeseries.LombScargle periodogram.

  • 15  

    Since rotation signals are quasiperiodic, it is common for estimates of the rotation period to vary slightly with activity cycle phase and/or number of spot groups on the star surface (e.g., Gilbert et al. 2021), hence the variety of similar but not identical measurements from the literature.

  • 16  
Please wait… references are loading.
10.3847/1538-3881/ac52ed