FIESTA II. Disentangling stellar and instrumental variability from exoplanetary Doppler shifts in Fourier domain

The radial velocity (RV) detection of exoplanets is challenged by stellar spectroscopic variability that can mimic the presence of planets and by instrumental instability that can further obscure the detection. Both stellar and instrumental changes can distort the spectral line profiles and be misinterpreted as apparent RV shifts. We present an improved FourIEr \textit{phase} SpecTrum Analysis (FIESTA a.k.a. $\mathit{\phi}$ESTA) to disentangle apparent velocity shifts due to a line deformation from a true Doppler shift. $\mathit{\phi}$ESTA projects stellar spectrum's cross correlation function (CCF) onto a truncated set of Fourier basis functions. Using the amplitude and phase information from each Fourier mode, we can trace the line variability at different CCF width scales to robustly identify and mitigate multiple sources of RV contamination. For example, in our study of the 3 years of HARPS-N solar data, $\mathit{\phi}$ESTA reveals the solar rotational effect, the long-term trend due to solar magnetic cycle, instrumental instability and apparent solar rotation rate changes. Applying a multiple linear regression model on $\mathit{\phi}$ESTA metrics, we reduce the weighted rms noise from 1.89 m/s to 0.98 m/s. In addition, we observe a $\sim$3 days lag in the $\mathit{\phi}$ESTA metrics, similar to the findings from previous studies on BIS and FWHM.


INTRODUCTION
In radial velocity exoplanet surveys, the Doppler shifts that result from the gravitational interaction of planets and their host star are measured via high-resolution spectroscopy to detect and characterise the masses and orbits of planets. However, the spectral signature of a Doppler shift can be mimicked by deformation of the host star's spectral line profiles.
As the field pushes towards improved Doppler precision, developing robust and powerful approaches for characterizing line profile variations, whether they be due to intrinsic stellar variability or instrumental effects, is becoming increasingly critical. Early planet-hunting spectrographs went to great lengths to provide a precise and stable wavelength calibration, but did not stabilise the instrument, leading to significant instrumental line spread function variations. Recently, a new generation of spectrographs (e.g., ESPRESSO (Pepe et al. 2014), NEID (Schwab et al. 2019), the EXPRES Spectrometer (Petersburg et al. 2020)) have been designed to be much more intrinsically stable, in an effort to enable extremely precise radial velocity (EPRV) measurements. These instrument specifications aim to reach 10-30 cm/s radial velocity precision, with the long-term goal of characterizing potentially Earth-like planets around Sun-like stars (Fischer et al. 2016). To reach this goal, these instruments incorporate multiple environmental control strategies, in an effort to stabilise the line-spread function and minimise instrument-induced variability. Equally, or perhaps even more importantly, this improved instrumental stability makes it feasible to use line shape variations as a diagnostic to recognise and mitigate the effects of stellar variability.
Astronomers have been exploring a variety of strategies to recognise and mitigate the effects of stellar variability, ranging from looking for correlations with traditional activity indicators such as the equivalent width of Hα (Herbst & Miller 1989), the line profile full width at half-maximum (FWHM, Queloz et al. 2009), the line profile's bisector (Queloz et al. 2001), and data-driven and machine learning methods (e.g. Pearson et al. 2018;de Beurs et al. 2020). Zhao & Tinney (2020) proposed φESTA for characterizing a signal deformations and signal shifts. In the context of radial velocity detection of exoplanets, the signal is usually the cross-correlation function (CCF) of the stellar spectrum with a template spectrum or synthetic mask. By combining information from many spectral lines, the CCF has significantly higher signal-to-noise ratio (SNR) than individual spectral lines. In this study, we validate and test φESTA using CCFs based on a weighted binary mask (Baranne et al. 1996;Pepe et al. 2002).
φESTA decomposes the CCF into the orthogonal Fourier basis functions and calculates the shift for each basis function. A pure Doppler shift results in all the Fourier basis functions being shifted by the same RV 0 . In contrast, a signal varying its shape results in different basis functions being shifted by different amounts. Zhao & Tinney (2020) used the averaged effect of shifts in the lower and higher frequency ranges (denoted as RV FT,L and RV FT,H respectively) as a summary statistic to characterise the CCF behaviour.
In this paper, we provide a more rigorous derivation of the φESTA methodology, and make updates to ease the interpretation of φESTA outputs. For example, we abandon zero-padding (adding zeros at the end of the signal to increase sampling) as was used in the previous implementation of Fourier transform, since this acted as interpolating the power spectrum and the phase spectrum but did not add useful information. We also adopt the discrete Fourier transform (DFT), so that the output sampling size in the velocity frequency domain matches the input in the velocity domain, and thus for N velocity grids in the CCF as input, there will be N outputs in the corresponding "velocity frequencies".
The paper is organised in the following manner. We provide an intuitive way to understand the maths behind the φESTA method and discuss its implementation in Section 2. We discuss the practical consideration when applying φESTA to the parametrisation of CCFs and analysing timeseries in Section 3. In Section 4, we demonstrate the applications of φESTA on the SOAP 2.0 simulated solar spectra, with varying latitudes of solar spots and plages, as well as the simulated solar observation timeseries. Then it is followed by the applications of φESTA on the 3-years HARPS-N high-resolution solar observations in Section 5. A detailed comparison of φESTA models and similar models using more traditional activity indicators (FWHM and BIS) is carried out in Section 6. Section 7 is the summary and discussions where we also compare φESTA with previous works and discuss future research opportunities.
Appendix A defines the coefficient of determination R 2 and the adjusted R 2 . Then in Appendix B we discuss the noise propagation in the Fourier transform and explore under what conditions the distributions of φESTA amplitudes and phases are nearly Gaussian. Lastly, Appendix C demonstrates that zero-padding (implemented in the earlier version of φESTA ) only changes the sampling of the Fourier transform output and so we do not adopt it anymore.
It is known as the inverse discrete Fourier transform (inverse DFT) that decomposes CCF (v n ) into a linear combination of the orthogonal basis functions e i 2π N nk , weighted by the complex coefficient CCF (ξ k ). Each coefficient CCF (ξ k ) is derived from the DFT of CCF (v n ) N nk k = 0, 1, . . . , N − 1. (2) The line shape information stored in CCF (v n ) is lossless in the CCF (ξ k ) if all N terms are retained, as both can be converted via DFT and inverse DFT. We define, for each mode k in the Fourier domain representation, the amplitude the phase and the velocity frequency where L the length of the CCF velocity grid and 1/L is the unit velocity frequency. Note that in this paper, the original domain is labeled in velocity (v) and the Fourier transformed domain is labeled in "velocity frequency" (ξ). Because the inputs CCF (v n ) are real functions and the DFT output is Hermitian-symmetric, only the first N 0 = N/2 + 1 or (N + 1)/2 (depending on whether N is even or odd) modes are needed for CCF parametrisation and the rest modes contain redundant information. Replacing nk N by ξ k v n , we can rewrite the CCF in Eq. 1 in velocity v n as opposed to index n φESTA interprets the CCF shapes, as parametrised by the amplitudes A k and the phases φ k at corresponding velocity frequencies ξ k with k = 0, 1, . . . , N 0 − 1 in the Fourier domain. In practice, we will keep only the leading terms which provide a dimensionally reduced representation of the CCF. The decision of how many terms to retain will depend on the spectral resolution, pixel sampling of the line spread function and SNR. We discuss the choice for terms needed in Section 3. We use the python function numpy.fft.rfft to compute the DFT of the real input CCF (v n ) with the Fast Fourier Transform algorithm (Cooley & Tukey 1965;Press et al. 2007).

φESTA applied to a pure line shift
A pure radial velocity shift RV due to orbiting exoplanets results in a bulk shift of the CCF as shown below.
Comparing Eq. 7 with Eq. 6, the RV shift results in a phase shift ∆φ k = −2πξ k RV , which is consistent with the differential phase spectrum discussed in the continuous domain (Zhao & Tinney 2020). While the phase shift depends on the velocity frequency ξ k , the ratio ∆φ k /ξ k is invariant across velocity frequencies and proportional to the radial velocity shift.
The amplitudes A k are also invariant for a pure line shift.
2.3. φESTA applied to a perturbed line shape Substituting CCF (ξ k ) by A k · e iφ k in Eq. 6, we have the following form The shape of a deformed CCF can be characterised by changes in both the amplitudes A k and the phases φ k , and thus Comparing Eq. 10 with Eq. 9, the line deformations can be interpreted as an apparent (but spurious) radial velocity shift for each velocity frequency k.
In addition to the phase-derived RV FT,k , amplitude changes can also be used to trace a perturbed line shape.

Measuring ∆φ k and ∆RV k
Measuring the phase shift ∆φ k requires specifying a reference CCF. In this manuscript, the weighted mean CCF serves as our phase reference. Throughout the paper (unless otherwise specified), we use the inverse variance weighting to calculate the weighted averages (e.g., mean CCF, daily binned RV, weighted RMS), where the variance is taken as the RV uncertainty squared. Once we calculate ∆φ k between an observed and the reference CCF, Eq. 11 provides a group of relative radial velocity shifts RV FT,k between the two CCFs for each velocity frequency ξ k . Similar to RV FT,L and RV FT,H proposed in Zhao & Tinney (2020) that represent the RV shifts in the lower and higher frequencies, RV FT,k (k = 0, 1, . . . , N 0 − 1) will show the same response to a bulk line shift but different responses to line shape changes. Therefore, is used to parametrise the line profile deformation, where RV apparent is a single apparent RV shift derived from the full CCF. RV apparent may be computed using an independent RV measurement algorithm or as a weighted mean of RV FT,k 's that account for varying measurement precision (e.g. Section 4.2). In this paper, we use RV Gaussian , which is the centre of a Gaussian fit to the CCF extending to 5σ in either direction from centre.

Summary
The effects of a pure shift (Eq. 8) and the effects of line shape changes (Eq. 11) are fundamentally different, even though the expressions are similar. The RV in Eq. 8 is the same for all velocity frequencies ξ k , whereas the RV FT,k in Eq. 11 changes with the Fourier mode k. We treat Eq. 8 as a special case of Eq. 11 where RV FT,k is a constant for all k (k = 0, 1, . . . , N 0 − 1). As a result, we can now use a group of RV FT,k (k = 0, 1, . . . , N 0 − 1) to parametrise the CCF behaviour (be it a bulk shift, a deformation, or both). While a single measurement of the apparent radial velocity shift per observation can not distinguish between the effects of orbiting exoplanets and stellar variability, the set of RV FT,k 's can be used to recognise the presence of line shape changes. We explore that possibility using simulated and actual observations in Section 4 and Section 5.
Changes in the Fourier amplitudes (A k ) also indicate line variability. However, CCF shape changes in the vertical direction that may not induce apparent RV variation can also impact A k . In this paper, as we intend to model the line variability induced RV, we focus on the analysis of RV FT,k and ∆RV k , which are directly related to the spurious RV (see Section 4.2 for further discussions).

USEFUL FOURIER MODES IN φESTA
Noise in the CCF propagates into the uncertainties in the φESTA amplitudes and phase (and thus radial velocity shifts calculated from Eq. 11). We need to understand the effects of noise in order to interpret the results of the φESTA analysis. In principle, using all N Fourier modes (or N 0 Fourier modes for real inputs as in our case) can reconstruct the CCF without losing information. In practice, measurement noise and/or the spectrograph resolution may limit the utility of higher frequencies. In this section, we consider each of the factors that may limit the number of useful terms in the φESTA reconstruction. We summarise five aspects to consider when it comes to making use of the amplitudes A k and phases φ k information in the presence of associated uncertainties in the Fourier domain as below.
1. How many Fourier modes are necessary to accurately and adequately reconstruct the typical CCF shape?  The vertical axis on the right stands for the SNR to which the CCF can be reconstructed within the photon noise uncertainties. Using 20 Fourier modes will fully recover the CCF (i.e., zero modelling noise; not shown). The modelling precision as a function of k is for illustration only and it may vary if a different CCF profile is used, e.g., from a different star or a different instrument LSF.
2. What is the distribution of errors in A k and φ k due to photon noise?
3. From which k does the SNR of A k and φ k become so low that the estimates of individual A k and φ k are no longer useful?
4. From which k is the variability in the A k and φ k timeseries dominated by the measurement uncertainty of A k 's and φ k 's?
5. Which terms of the φESTA decomposition correspond to velocity scales greater than the width of the spectrograph line spread function (LSF)?
They are discussed in the following subsections respectively.

Reconstructing the CCF
First, we consider how many Fourier modes are needed for φESTA to accurately parametrise a typical CCF without overfitting. The number of modes needed depends on the shape of the CCF and the modelling error, which is defined as the residual root mean square (RMS) between the observed CCF and the φESTA reconstruction of the CCF using only the first k terms. Too few terms would result in an inaccurate representation of the CCF due to a large residual, but too many terms would result in a modelling error below the photon noise, indicating overfitting.
For example, Fig. 1 shows the first 5 φESTA basis functions (or 6 including the zeroth mode, which is the mean function of the CCF) used to fit an arbitrary but typical HARPS-N solar spectrum CCF. For details of the solar spectrum, refer to Dumusque et al. (2021) and Section 5. Using up to k = 5 φESTA basis functions reaches the residual or modelling error of 6.7 × 10 −4 , corresponding to a SNR 1 = 1.4 × 10 3 for the CCF it is able to represent. The HARPS-N solar CCFs have a high median SNR ∼ 10,000 for individual CCFs and SNR ∼ 46,000 for the daily binned CCFs. In order for the modelling error to be less than the photon noise level, one would need to use all the available Fourier modes, which is k max,photon = 20. Note that k max,photon is recommended for extracting all usable information of the CCF up to the photon noise limit, but any k ≤ k max,photon provides valuable information.

Noise distribution of amplitudes and phases
According to Eq. 2, CCF (ξ k ) Re and CCF (ξ k ) Im can be obtained by a linear projection of CCF (v n ) onto a set of Fourier basis vectors. Therefore, Gaussian noise in CCF (v n ) translates into Gaussian errors for CCF (ξ k ) Re and CCF (ξ k ) Im . In fact, as derived in Appendix B, their 1σ uncertainties are where σ CCF (vn) is the uncertainty of CCF (v n ). In contrast, the amplitude, phase and the resulting RV FT,k are obtained by non-linear transformations (i.e., square and square root operations on two random variable in Eq. 3 and the ratio and arctan operations in Eq. 4). Therefore, the uncertainties in A k 's, φ k 's and RV FT,k 's are not strictly Gaussian. The amplitudes tend to be skewed with a lower boundary at zero and the phases tend to present a narrower core and broader tails (Fig. 15). These effects can be substantial when the input CCFs data are noisy. We define the noise level (NL) as Based on numerical simulations, we have verified that noise in measurement of amplitudes and phases follow nearly the normal distribution when NL ≤ 0.2 (Appendix B). Under this low NL regime, we can approximately derive the analytical expression for the uncertainties of amplitudes and phases In general, the higher the mode k becomes, the smaller the amplitudes and thus the larger the NL. We define a cut-off frequency index based on normality k max,normal and ξ k max,normal as the last frequency that satisfies our criterion. By limiting our analysis to k ≤ k max,normal , we ensure that noise in A k and φ k closely follows a normal distribution.

Individual A k and φ k SNR
For an individual observation, we compute the SNR for A k 's and φ k 's as A k /σ A k and 1/σ φ k 2 using the uncertainties from that observation. The SNR for A k 's and φ k 's are thus identical according to Eq. 15. As A k decreases rapidly while σ A k is nearly a constant (see Eq. 13 and 15) with increasing k, the SNR for A k 's and φ k 's decreases for higher k. Therefore, we limit our analysis to k ≤ k max,individual , where k max,individual is the largest k for which the median SNR of A k 's or φ k 's is at least 2.
When choosing which k to analyse in a timeseries, we use the median σ A k and median σ φ k instead of the uncertainty for each observation, so as to ensure consistency across observations and to avoid discarding information in k-th timeseries that would result in only a small fraction of low SNR observations not passing our minimum SNR threshold.

Timeseries SNR
When applying φESTA to a timeseries of CCFs, we aim to characterise the deviations of A k and φ k from their time-averages. We define a SNR timeseries,k as the ratio of the RMS deviation of each A k and φ k from their mean value over a timeseries to the median σ A k and σ φ k .
It is useful to limit our analysis to the A k 's and φ k 's that have measurement uncertainties less than the extent of their variations with time. In this paper, we require a minimum SNR of 2 to ensure high-quality timeseries are obtained, i.e., the standard deviation of an A k or φ k timeseries is at least twice as large as the median σ A k or median σ φ k in the timeseries. We define k max,timeseries as the maximum k for which this criterion is satisfied.

Instrumental resolution
Finally, we consider k max,inst , the maximum k for which we expect changes in the CCF could be dominated by the target star, rather than the instrument or detector. Due to the instrumental broadening of the spectral lines, higher Fourier modes that captures changes below the LSF width will not be useful in parametrising stellar variability. For example, the instrumental resolution of HARPS in velocity space is 2.5 km/s. For a CCF with a velocity ranging from −15 to 15 km/s, Fourier modes higher than k = L/(1/ξ k ) = 30/2.5 = 12 (Eq. 5) capture variations on a scale finer than the instrumental resolution. In this example, we limit the A k 's and φ k 's analysed to have k ≤ k max,inst = 12. As shown in Section 5, the solar rationally induced variability are barely seen in modes with k > 12.

Summary notes
For illustrative purposes, we use the three years of HARPS-N solar CCFs (details in Section 5) to give readers an idea of what k is recommended based on the criteria discussed above. k = min(k max,photon , k max,normal , k max,individual , k max,timeseries , k max,inst ) = min(20, 20, 20, 11, 12) = 11 will ensure that the A k 's and φ k 's being analysed can be associated with an unbiased 1σ uncertainty and have sufficient SNR to characterise the CCF and the timeseries variations as well as focusing on revealing information that could be due to stellar variations (as opposed to instrumental or detector variations). While φESTA is capable of including higher k's, its default choice of thresholds (and those used in this study) are designed to be conservative in ensuring that the measurement uncertainties are very nearly Gaussian.

VALIDATION OF φESTA ON SIMULATED SOLAR OBSERVATIONS
We use the SOAP 2.0 (Spot Oscillation And Planet 2.0, Dumusque et al. 2014) simulator to generate CCFs of solar spectra affected by spots or plages that are based on the HARPS observations of the Sun. Unless specified in this paper, we set the spot at 30 • latitude with the size of 0.1R ; the stellar configurations are based on the Sun and chosen as the default values of SOAP 2.0, e.g. T star = 5778 K, T spot-photosphere = 663 K.

Line shift vs line deformation
First we generate the CCFs of solar spectra affected by a single spot in one rotation. In the presence of stellar variability, each RV FT,k differs for each CCF, because each mode k traces the line deformation at a different lengthscale, with k = 1 tracing variability at the full range of the CCF, k = 2 tracing variability at 1/2 the range, etc. Therefore, RV FT,k provides a quantitative measurement of the radial velocity shift and the line deformation.
To illustrate the changes in the Fourier amplitudes and phases of the CCFs for a line deformation as opposed to a line shift, we take one snapshot at solar rotation phase 0.51, when the spot is slightly off solar disk centre and the resulting CCF is deformed with an apparent radial velocity measured at 1.1 m/s. For comparison, we shift the undeformed CCF by the same 1.1 m/s and feed them into φESTA. When the CCF is decomposed into the Fourier basis functions, the amplitudes and RV shifts remain the same for all modes for a pure line shift, while amplitudes and RV shifts vary for a line deformation (Fig. 2).

Weighted mean of RV FT,k
We calculate the weighted mean of RV FT,k for the SOAP simulated CCFs over the course of one rotation, with the weights being the corresponding amplitudes A k . We find that the weighted means are consistent with the radial velocity shifts measured by fitting a Gaussian function to the CCF (Fig. 3). Therefore, the apparent radial velocity RV Gaussian can be treated as the overall effect of the radial velocity shifts of individual Fourier basis functions (i.e. RV FT,k ). RV FT,k , on the other hand, can be treated as a multi-dimensional measurement of the RV shift at different CCF length scales.

Spots vs plages and their φESTA correlations
We further explore how φESTA behaves under the influence of a solar spot and plage in various latitudes (from 10 to 80 degrees) in one stellar rotation. Earlier studies using the previous version of φESTA can be referred to Zhao (2019).
As we can tell from Fig. 4, higher latitudes tend to have minor effects on A k (∆A k is shown using a solar CCF without spot or plage as a reference) and ∆RV k . This is because both the projected area of spots and plages in high latitudes and their rotational velocity along the line of sight are smaller. On the other hand, changes in A k and ∆RV k per latitude towards the equator are not as prominent as in high latitudes, and this is because a latitude change near Amplitudes and phases at a glance for two CCFs, one caused by a pure shift (blue dashed line) and the other caused by a spot at 30 • latitude with the size of 0.1R at rotation phase 0.51 (red solid line). ∆CCF, ∆A k , ∆φ k and ∆RV k are relative measurements to the template CCF which is neither shifted or deformed. Even though both RVs are comparable (1.1 m/s), φESTA tells them apart (one is intrinsic and the other is spurious) and provide a quantitative measurement of the phase and amplitude changes. Note that it is impossible to replicate a line profile with a shift below the sampling spacing, and so even the pure shift of a CCF simulated by interpolation will end up in a slightly different shape and result in unequal RV shifts in higher frequencies when the amplitudes are smaller.    the equator does not change the the projected area of spots and plages along the line of sight and the rotation speed as much as in higher latitudes.
Since changes in both A k and ∆RV k are parametrisations for the amount of spectral line variability, a larger deformation, which is likely to lead to larger apparent radial velocity shifts, is expected to show larger changes in A k and larger ∆RV k as well. However, this is not always the case as the correlations presented in Fig. 5. Overall, larger apparent RV tend to be only associated with larger ∆RV k for spots in lower Fourier modes (e.g. k = 1 and 2). For plages and higher Fourier modes, their linear correlations are not observed. On the flip side, the different patterns of ∆RV k -RV Gaussian plots and their stronger dependence on latitude will be valuable for figuring out the latitude information of the spots and plages. In the next step, we analysed SOAP 2.0 solar simulations based on the known spot properties considering a similar activity level to the Sun, including spot temperature, size distribution and lifespan (Gilbertson et al. 2020a). The simulated data set was sampled twice per day and has 730 spectra covering a whole year. It was originally used for timeseries analysis (Gilbertson et al. 2020b), we nevertheless use the simulated solar CCFs as individual observations and seek correlations between ∆RV k and the apparent RV due to solar spots.
We present the RV and ∆RV k timeseries of the simulated data set in Fig. 6. The results are consistent with the discussions in Section 4.3 that ∆RV k shows strong linear correlations with the apparent RV (RV SOAP ) for k = 1 and 2, as the (Gilbertson et al. 2020b) solar simulations are deformed only by spots but not plages. It indicates the φESTA lower Fourier modes may be used for linear RV correction for spot-dominated stars.

Data
As an example to show how φESTA can analyse real world line profile variability, we apply φESTA to the highcadence HARPS-N spectroscopic observations of the Sun-as-a-star during 2015 to 2018 (Dumusque et al. 2015;Phillips et al. 2016). The data are publicly accessible on https://dace.unige.ch/sun/, including 34550 observations. These public data sets were already pre-selected to minimise non-astrophysical effects on observed CCF (Collier Cameron et al. 2019). They were taken in good observing condition (i.e. not significantly affected by clouds), the differential extinction for each observation less than 0.1 m/s, and the data reduction software (DRS) reports good quality results. The same selection criteria were also used in de Beurs et al. (2020)  by an older version of the HARPS-N DRS. As a result, the RV HARPS presented in this paper reveals a long-term RV drift induced by the solar magnetic cycle and contributes to a larger scatter of apparent RVs (both before and after applying mitigation procedures) than the ones in de Beurs et al. (2020).
We work on the inverted, normalised daily weighted averaged CCF and the the daily weighted RVs, therefore reducing the number of original observations over the three years from 34550 to 567. Each exposure is designed to integrate over the solar p-mode oscillation timescale of ∼ 5 min (Strassmeier, K. G. et al. 2018;Chaplin et al. 2019). The daily weighted average CCFs further average out both the p-mode solar oscillation effects and much of solar granulation which is significant on timescales from minutes for granules (Vázquez Ramió, H. et al. 2005) to hours for mesogranules (Matloch, L. et al. 2009). A φESTA study on these intra-day oscillation and granulation effects will be carried out in the near future.
The periodogram of the daily weighted HARPS-N RVs (first row of Fig. 7 or 8) shows several prominent periodicities in the daily RV timeseries, with ∼28.5 days being identified as the solar rotation period. The other periods will be discussed in the following sections. The unbinned RV periodogram (not shown) is almost identical to the daily binned RV periodogram above 1 day.

φESTA timeseries
In the left-hand panels of Fig. 7 and 8, we show the timeseries for each ∆RV k and A k for 20 Fourier modes. Different modes characterise the CCF deformation on different scales, with k = 1 being the longest mode (refer to Fig. 1). As k increases, the scale of study decreases as 1/k and the amplitude decays approximately as a Gaussian function of k (Fig. 2). Therefore, the signal-to-noise ratio for measurements of ∆RV k and A k become dominated by photon noise for large k. The middle panels are the correlations between each row's indicator and RV HARPS . The right-hand panels of Fig. 7 and 8 show the periodogram for each ∆RV k and A k .
Based on the ∆RV k timeseries and the timescales observed in their periodograms, we've identified five mechanisms contributing to the variations in the CCF profile: 1. Solar rotationally modulated CCF variations: The ∼ 28 day solar rotation period and its harmonics can be seen in both ∆RV k and A k periodograms (e.g. k from 3 to 8). The relative significance of the peaks often differs between the periodograms of ∆RV k and A k , even for the same k. Rotationally modulated variations are more obvious in ∆RV k periodograms.
2. Solar magnetic cycle: The long-term RV drift over the three years of observations is correlated with ∆RV k and A k up to k = 4 (R 2 ∼ 0.5).
3. CCD detector warm-ups: Instrumental features can be seen most clearly as the sudden jumps of ∆RV 1 and ∆RV 2 that coincide with times of interventions to the CCD detector (Dumusque et al. 2021) . While not strictly periodic, these this effect likely leads to the periodogram peaks near ∼ 210 d and ∼ 285 d.
4. Changes in the apparent solar angular velocity: The Earth-Sun distance changes both due to Earth's eccentricity and as a result of perturbations from Jupiter (also discussed in Collier Cameron et al. 2019). The periodicity at around 400 days is predominant in the higher Fourier modes (e.g. k 9) and nearly matches the time between conjunctions with Jupiter.
5. Changes in the apparent solar angular velocity as a result of the Earth's orbit being slightly eccentric changing the distance between Earth and the Sun (Collier Cameron et al. 2019). This periodicity at half a year is apparent in many of the higher Fourier modes, but can be overwhelmed by the CCD detector warm-ups and/or harmonics of the timescale between conjunctions of Jupiter.
While both ∆RV k and the change in A k trace the CCF variability, A k seems to capture more periodicities. This is expected since line depth changes or artifacts in the estimate for the continuum around the line may affect A k 's, but not ∆RV k 's.

Principal component analysis
Some of the nearby Fourier modes present similar or correlated behaviour as seen in the timeseries and their periodograms, especially for ∆RV 1 and ∆RV 2 , which appear anti-correlated with each other. It is possible that certain features of a line deformation can take place across several Fourier modes and thus cause correlated effects. Therefore,      Figure 9. Left: PCA for ∆RV k up to kmax = 9. The first 3 principal components explain 75% of the variance in ∆RV k (k = 1, 2, . . . , 9). Middle and the right columns are the same correlations plots and periodograms plots as in Fig. 7 except for these principal components.
we can model a large fraction of the CCF variance with a smaller number of coefficients by invoking principal component analysis (PCA) to perform dimensional reduction on the ∆RV k 's (or A k 's). This has dual benefits of increasing the SNR in the retained PC scores and potentially ease interpretation of a lower dimensional timeseries.
In an ideal world with ultra-high SNR, we could make use of all the 20 Fourier modes for the CCF parametrisation. In reality, higher Fourier modes are more strongly affected by measurement uncertainties; in addition, they do not contribute as much information due to rapidly decaying amplitudes (Section 3). As mentioned in Section 3.6, it is advised to use no more than 11 Fourier modes for the HARPS-N solar data analysis. If we were to focus on the first 3 principal components built from the first k max modes, we find that increasing k max from 7 to 11 hardly changes the first 2 PC score timeseries, and increasing k max from 9 to 11 hardly changes the 3 rd PC score timeseries. Therefore, we choose k max = 9 to parametrise the CCF variability. When performing the analysis in Sections 5.5 and 5.6, we also find that building the first 3 principal components from the first 9 Fourier modes results in the smallest residual (compared with other k max ), though the difference of the residual WRMS (i.e. weighted RMS) is less than 5% among choices of k max ranging from 7 to 11.
While most PCA packages are designed to deal with equally weighted data by default, the HARPS-N observations have associated errorbars. The weighted PCA is aimed to tackle this problem by solving the weighted covariance matrix of the data (Delchambre 2014). We used the wpca python package for its implementation. The weighted PCA on ∆RV k (Fig. 9) reduces the feature dimension from 9 to 3 while ∼75% of variance in the data are preserved. The residual due to projecting data into a lower dimension accounts for the rest ∼25% of the variance. We can tell the aforementioned periodicities in Section 5.2 are preserved in the first 3 PC scores. Note that we normalise the ∆RV k for each k before applying weighted PCA to prevent the higher modes from dominating the principal components.

Separating out long-term and short-term variations
PCA extracts independent features in high dimensional data while keeping most of the variation information within the first few principal components, however, the process is not physically driven and thus we should not expected that each mechanism that contributes to CCF variations will be neatly contained in an individual principal component. Therefore, we apply a low-pass filter to the PC scores to separate rotationally induced effects (periods < 100 days) from long-term effects (periods > 100 days). Specifically, we compute L-PC i (t), the mean of Gaussian process conditioned on each of the PC score timeseries (PC i 's) separately using a Matérn 5/2 kernel (e.g. Genton 2001;Minasny & McBratney 2005) with a length-scale of r = 100 days, which captures the variations above r and smooths out those below r (Fig. 10). We then subtract L-PC i from the corresponding principal component scores to obtain S-PC i , which probes the short-term variability below ∼100 days and are found to be dominated by the solar rotationally modulation.

Multiple linear regression modelling
Once PCA has reduced the number of features required to accurately parametrise the CCF variation, we want to investigate whether these PC scores are useful for predicting the behaviour of the spurious RV measurements. The simplest approach would be a multiple linear regression model  Figure 10. Separating the short-term (S-PC) and long-term (L-PC) variability of the 3 principal components in Fig. 9. The corresponding periodograms are on the right.
where X is the regressor matrix that traces the CCF variability, β is the coefficient vector, is the error vector and y ≡ RV HARPS is the response vector.
We examine two models for which X consists of various features derived from the CCFs parametrisation, Model 1 using the first three principal component scores (PC i 's computed from ∆RV k , k = 1, 2..., 9, Fig. 9) and Model 2 taking a step further by using their short-term components S-PC i (i = 1, 2, 3) and long-term components L-PC i (i = 1, 2, 3) as regressors (Fig. 10). As models naturally fit the data better with more features or parameters, to avoid over-fitting, we perform Lasso regression (i.e., added a L1 regularisation penalty to the weighted sum of the squared residuals, Tibshirani 1996;Hastie 2015) and minimise the following where W is the weight matrix, λ is the tuning parameter and β 1 is the 1-norm of the vector β defined as n i=1 |β i |. λ = 0 implies no penalty and Eq. 17 is reduced to the normal linear regression optimisation. The larger λ becomes, the larger the penalty term and the more coefficients are driven to zero. To find out the optimal λ, we adopt a cross-validation approach where the data are randomly scrambled and split into 5 folds. We train our model as we minimise the loss function in Eq. 17 using 4 folds of the RV data and feature matrix, and then obtain a residual WRMS for the validation data. This process is repeated 100 times, each with a randomised split and λ is tested from 10 −3 to 1. Then we locate λ =λ 1 corresponding to the smallest WRMS in the testing set. Usually a more regularised and simpler model with λ >λ 1 is preferred.
With the multiple linear regression model regularised by the loss function above (Eq. 17), we reduce the uncorrected WRMS of RV HARPS from 2.00 m/s to 1.36 m/s for using the first 3 PC scores (Fig. 9) and further down to 1.14 m/s with additionally separating out the long-term and short-term variations (Fig. 10), corresponding to 32% and 43% reduction in the WRMS, or 53% and 67% reduction in the weighted variance, as calculated byR 2 . Among the total reduction in the weighted variance, we determine the importance of each feature by calculating their contributed variance (β i σ i ) 2 . The information is summarised in Table 1's rows corresponding to Models 1 and 2.
We estimate that solar rotation-induced RVs contributes just ∼ 7% of the variance (based on sum of the variance percentages due to S-PC i (i = 1, 2, 3)) or ∼ 0.4 m/s in terms of RMS without considering measurement errors or instrumental instability. In contrast, L-PC 1 (which traces the solar magnetic cycle and the CCD detector warm-ups) contributes over 80% of the variance reduction. Most of the remaining ∼10% of the RV variation is believed to be caused by changes in the apparent solar rotation rate due to Jupiter and Earth's eccentricity (L-PC 2 ).

Multiple linear regression modelling with time lags
Collier Cameron et al. (2019) explored the correlation between the spurious RV signal in HARPS-N observations and two traditional CCF shape indicators -the Bisector Inverse Slope (BIS) and the full-width half-maximum (FWHM). They found that the apparent RV variations led the BIS by 3 days and the FWHM by 1 day. We want to investigate if introducing a lag between RV HARPS and the principal components of ∆RV k can improve our predictions for the spurious RV signal. As the lag between two timeseries can be poorly constrained when the variation timescale are too long compared to the time lag, we decide to fix a zero lag for the long-term variations L-PC i and only introduce a lag parameter in integer days for the short-term variations S-PC i . In order to implement multiple linear regression with time lags, we must address two challenges. First, observations are not available for every consecutive day. Most gaps are less than a few days whilst a large one starts from 2017-11-11 (BJD−2400000 = 58068.5) due to the HARPS-N Fabry-Pérot upgrade (Dumusque et al. 2021). Second, the daily binned timestamps are not exactly evenly spaced, creating a slight misalignment between timeseries when they are shifted by integer number of days. To overcome these challenges, we linearly interpolate the indicators timeseries to the RV HARPS timestamps and their offsets in days. The beginning and the end of the two trunks of RV HARPS separated by the Fabry-Pérot upgrade gap are not used so that we can avoid the artifacts of extrapolating the indicators timeseries outside the RV HARPS timestamps when dealing with lags. We also note that over 90% of the daily binned observations are within 2 hours of the day, so using the RV HARPS directly as the response vector is sufficient to study the time lags in days.
The lagged multiple linear regression introduces additional fitting parameters -one coefficient for each lag for each of the S-PC i and L-PC i (i = 1, 2, 3). We tested different maximum lags allowed and found that the fitted coefficients remain mostly stable once the model includes at least a maximum 5-day lag. Therefore, we present the results using a maximum 5-day lag. As for the regularisation parameter λ, we note the residual WRMS reaches minimum at λ 1 = 0.028. According to "the one standard error rule" (Breiman et al. 1984;Hastie et al. 2009), a more regularised model withλ 2 = 0.080 at one standard error of WRMS(λ 1 ) is usually recommended (Fig. 11). However, considering we would like to compare all the models usingR 2 , it would be easier if the same λ is chosen. Therefore, we choose the middle ground λ = 0.05, which results in a more parsimonious Model 3 than if λ =λ 1 were chosen and meanwhile λ = 0.05 does not over regularise Models 1 and 2 3 . With this, we obtain a residual WRMS = 0.94 m/s, a 50% reduction in the WRMS or 73% reduction in the weighted variance. In Fig. 11, we observe that ∼ 7% of the variance (or 0.51 m/s in terms of RMS) comes from the solar rotationally induced RV variability (i.e., sum of the variance in S-PC i (i = 1, 2, 3) for all lags), consistent with the ∼ 7% in Model 2 with no lags. Over 80% of variability is dominated by terms that show trends of the solar magnetic cycle plus the CCD detector warm-ups. In addition, the lag information of S-PC i (i = 1, 2, 3) tracing solar rotation is spread out between −2 days and 5 days, with the single most prominent lag for 3 days by summing up the variance of S-PC i (i = 1, 2, 3) by day, accounting for ∼ 2% of the RV variance. Individually, the weighted averaged lag for S-PC i (i = 1, 2, 3) are 2.2, 1.7 and 3.7 days, while Collier Cameron et al. (2019) noted lags of 3 and 1 days between the BIS and FWHM and the spurious RV. Our finding that a weighted average of the indicators at multiple lags can provide a more effective predictor of the spurious RV signals is complementary to results from Collier Cameron et al. (2019). We hypothesise the different mechanisms that result in RV variation as the Sun rotates may show different lags, and S-PC i (i = 1, 2, 3), BIS and FWHM detect part or a combination of these mechanisms.

THE ALTERNATIVE: BIS AND FWHM
To investigate if using φESTA metrics improves the modelling of line profile variability, we compare the results of §5.5 and 5.6 with a similar analysis applied to two traditional CCF variability indicators, the FWHM and BIS. We repeat the process of HARPS-N RV modelling except for substituting the PC scores of ∆RV k by the daily binned FWHM and BIS (Fig. 12). The originally unbinned FWHM and BIS data were provided on https://dace.unige.ch/sun/. The FWHM and BIS timeseries are standardised (mean = 0 and variance = 1) so that the linear regression coefficients are not penalised inappropriately in the Lasso regression because of their different units or because of one having intrinsically smaller variance but requiring a larger coefficient to compensate.
Comparing Fig. 10 and Fig. 12, L-PC 1 derived from φESTA appears to behave similarly to L-BIS, the long-term variation of the BIS timeseries. As Fig. 13 shows, this long-term components contribute to ∼ 91% of variance in modelling RV HARPS . The other ∼ 9% (equivalently 0.55 m/s RMS) comes from the short-term variations which we attribute to be primarily solar rotationally modulated variability. In terms of performance, using FWHM and BIS  together results in a slightly worseR 2 = 0.680 and a slightly larger residual weighed RMS of 1.04 m/s. This is likely due to the fact that the combination of FWHM and BIS is not as effective at describing the CCF as the combination of S-PC i and L-PC i derived from ∆RV k 's. We summarise 3 models using FWHM and BIS for predicting the HARPS-N solar spurious RV in Table 1 (Models 4-6). In order to compare how the models are similar to each other, we compare the residual correlations between Model 3 and Model 6 specifically. Their residuals are well correlated with each other, indicating that the two models agree well on each other in predicting RV HARPS . The characteristic 1σ width of the residuals (the half-width of the central 68.2% credible interval) is 0.77 m/s (Fig. 14), indicating the overall difference of the two model predictions. We suggest using this approach for evaluating the consistency between two or more models in the planet detection or stellar variability modelling, such as model comparisons in the EXPRES Stellar-Signals Project (Zhao et al. 2022).

Overview
We presented the improved Fourier phase spectrum analysis (FIESTA or φESTA) for disentangling intrinsic RV shifts from apparent RV shifts caused by stellar variability and instrumental instability. φESTA provides summary statistics for each spectra in a timeseries based on projecting the CCF onto a truncated version of the Fourier basis functions. This approach is motivated by the shift-invariant properties of the Fourier basis: 1. A shift in the signal (a spectral line profile or a CCF in the context of radial velocity exoplanet detection) does not change the amplitudes of each Fourier basis function: and thus ∆RV k=1 = ∆RV k=2 = · · · = 0.
As a result, we can use changes in the amplitude A k and RVs measured as RV FT,k or ∆RV k for each Fourier mode k to parametrise the variations of spectral line shapes. We focus on the analysis of ∆RV k because it is directly related to the spurious RVs due to stellar and/or instrumental variability. Specifically, φESTA decomposes the spectral line profile CCF into the Fourier modes (Eq. 6), i.e. the Fourier basis functions truncated to have finite support (Fig. 1, left panel). The amplitude (A k ) and the phase-derived RV shift (RV FT,k ) for each Fourier mode k fully parametrise the CCF through discrete Fourier transform. The suite of RV FT,k is a N -dimensional measurement of the shared intrinsic planetary RV shift of the CCF plus any stellar or instrumental variability measured at mode k (Section 4.2). The changes with time of the amplitudes and RV shifts for each Fourier mode (denoted as A k and ∆RV k ) is used to trace the CCF variability.
As discussed in Section 3, higher Fourier modes generally come with larger uncertainties. Practical consideration to pick the number of modes for the analysis include (1) the precision of reconstructing the CCF, (2) the uncertainties of A k and RV FT,k for a single observation, (3) the SNR of A k timeseries and ∆RV FT,k timeseries and (4) the instrument resolution.
We demonstrated the use of φESTA in analysing the RV timeseries from Section 4 to Section 6, covering SOAP 2.0 solar simulations and the 3-years of HARPS-N solar observations. SOAP 2.0 simulations -We demonstrated how φESTA distinguishes an intrinsic line shift from an apparent line shift due to a line deformation (Section 4.1). We also explored the response of ∆RV k to simulated solar spots and plages at different latitudes, as well as their correlations with the apparent RV (Section 4.3). The simulated solar observations show high correlations between the spot-induced spurious RVs and the lower Fourier modes (Section 4.4).
HARPS-N solar data -In Section 5, we applied φESTA to 3 years of HARPS-N solar observations ( Fig. 7 and 8). As neighbouring Fourier modes may show high correlations with each other, we are motivated to use PCA for dimension reduction. With PCA, we extracted the most prominent 3 orthogonal features from the first 9 ∆RV k timeseries (Fig. 9). Next, we separated the short-term variability dominated by solar rotation from the long-term variability modelled by a Matérn 5/2 GP kernel with a length scale of 100 days (Fig. 10). Feeding these features into multiple linear regression models to fit RV HARPS , we reduced the WRMS RV from 2.0 m/s to 1.14 m/s (R 2 = 0.67) in a model that uses only the spectra at each epoch. Incorporating φESTA outputs a few days prior/after each observations allowed us to further reduce the WRMS RV from 1.89 m/s to 0.94 m/s (R 2 = 0.73). To avoid overfitting, Lasso regression and cross validation have been implemented. Our model shows the solar rotationally induced variability contributes only 7% to the total RV variation modeled by φESTA (Fig. 11). The long-term variability, which we identified as sources from the solar magnetic cycle and instrumental instability, compose of over 80% of the RV variations in RV HARPS . The remaining less than 10% RV variability come from the CCF changing resulted from the apparent solar rotation rate changes due to Jupiter and Earth's eccentricity. Therefore, it is important for future EPRV exoplanet surveys to consider modelling and correcting for instrumental instability and long-term activity variations, in addition to modelling the rotationally-linked stellar variability.
Furthermore, we compared the models using PC scores of ∆RV k as features (Section 5) and the ones using FWHM and BIS (Section 6) as summarised in Table 1. The latter performs slightly worse with a residual WRMS of 1.18 m/s (R 2 = 0.65) without lags and 1.04 m/s (R 2 = 0.68) with lags. The fact that FWHM and BIS do not perform as well may be because they do not provide as complete CCF parametrisation as the φESTA-derived ∆RV k . However, both methods predict the solar rotational spurious RMS RV ∼ 0.5 m/s. As a measurement of model consistency, the RMS of the difference between the two residuals using these two models is ∼ 0.77 m/s (Fig. 14). We encourage future studies to perform a similar consistency check for multiple methods proposed to reduce the effects of stellar variability.

Original φESTA
The original φESTA proposed by Zhao & Tinney (2020) extracted the CCF variability into the lower and higher frequency ranges RV FT,L and RV FT,H . The current version of φESTA presented here decomposes the CCF into all the available Fourier modes up to the CCF sampling limit, providing a comprehensive parametrisation of the CCF using the amplitudes A k and the phase derived RV shift RV FT,k and the CCF variability using A k and ∆RV k . This facilitates the study of multiple sources contributing to CCF variability, as demonstrated in our analysis of the HARPS-N solar observations in Section 5.

SCALPELS
Collier Cameron et al. (2021) introduced SCALPELS to study the CCF variability using the autocorrelation function (ACF) of the cross-correlation function of each spectrum with a mask. SCALPELS was able to reduce the RMS of daily averaged HARPS-N solar spectra from 1.76 m/s to 1.25 m/s, a 29% reduction in the RV scatter. In order to make a more direct comparison to this result, We conducted a preliminary analysis of the same HARPS-N solar observations as Collier Cameron et al. (2021). When applied to the 5 year timeseries, φESTA reduced the weighted daily RV RMS to 1.09 m/s using the same approach as in Section 5.6. Since the Collier Cameron et al. (2021) dataset provides only daily binned CCFs and does not include FWHM and BIS measurements and it became available after much of our analysis had been completed, this manuscript only focuses on the three year timeseries available on https://dace.unige.ch/sun/ at the time writing.
In addition to comparing the amount of stellar variability removed, it is interesting to compare the basis vectors used by the two studies. The eigenvectors for the ACF in Collier Cameron et al. (2021) Fig 3 look very similar to the Fourier basis functions used in φESTA (Fig. 1). It is exciting to see how the two completely different methods -the data driven approach from Collier Cameron et al. (2021) and the theoretical derivation from Zhao & Tinney (2020) and this paper end up with similar basis functions (i.e. eigenvectors).

Machine learning
de Beurs et al. (2020) analysed 3 years HARPS-N solar observations and predict the spurious RV due to stellar variability by applying either linear regression or a dense neural network to the CCF. Using a neural network, they reduced the RV scatter by 47%, very similar to our 48% reduction in the WRMS using the multiple linear regression with φESTA indicators and allowing for a lag (Section 5.6). A precise comparison of these two results is not practical because the de Beurs et al. (2020) HARPS-N RVs were based on an earlier version of the HARPS-N data reduction system, while ours are from the newer version (see discussions in Section 5.1). Regardless, there is room for further research in reducing the residual scatter by treating the φESTA inputs (either directly or after dimensional reduction via PCA) with neural network methods that are able to learn non-linear relationships between activity indicator proxies and RVs. 7.3. Opportunities for Future Research 7.3.1. PCA We explored PCA for dimension reduction and used the PC scores as features to characterise the CCF deformation. As PCA is an unsupervised learning technique, we could not separate out the different mechanisms that caused the CCF deformation. In order to improve interpretability, we computed a smoothed version of the PC scores (L-PC i 's) that was insensitive to changes on the solar rotation timescale. Indeed, this resulted in the long-term solar magnetic cycle and the CCD detector warm-up events becoming more prominent in L-PC 1 . Inspecting the difference between the PC scores and smoothed PC scores allowed us to separate out the short-term variability (e.g. the solar rotation). In future studies, we could explore employing regularisation in PC scores so as to drive some basis vectors to focus on variations on a timescale near the stellar rotation period and other basis vectors to focus on CCF shape changes occurring on longer timescales (e.g. instrumental variations or long-term stellar cycles).

Gaussian processes
We used activity indicator proxies (PC scores derived from ∆RV k ; FWHM and BIS) as regressors in an effort to model the spurious RV introduced by solar variability. Although they depend only on each instantaneous spectrum, we showed that incorporating temporal information by performing regression on these activity indicator proxies and the lagged components simultaneously resulted in improved prediction power, as evidenced byR 2 . This implies Gaussian process (GP) regression, particularly, multivariate GP models with a common latent process and its derivatives approximating the effect of regressing on the lagged indicator proxies would be a promising approach. (Rajpaul et al. 2015;Jones et al. 2020;GLOM, Gilbertson et al. 2020b. Although GP may not be needed for the highly sampled solar data analysed in this manuscript, planet hunting in other stellar targets with sparse observing cadence would benefit from GP analysis of both the stellar spurious RVs and the activity indicator proxies (e.g. Langellier et al. 2021;Faria et al. 2022). In fact, we have made initial attempts to combine the PC scores derived from ∆RV k and the multi-variate GP known as GLOM for modelling the stellar variability for HD 101501, HD 34411, HD 217014, and HD 10700 as part of the EXPRES Stellar-Signals Project ). This approach allows φESTA to be applied to stars other than the Sun, for which gaps in the observations are much more common (Zhao et al. 2022).
Most published applications of GP regression to radial velocity timeseries have assumed stational GP kernels. Stationary kernels are not well suited for modelling sudden changes in the RV HARPS due to detector warm-ups or power failures. We find that these contribute a significant fraction of the CCF deformation changes in the HARPS-N solar dataset. Future studies may benefit from adopting more complex GP kernels, such as the sum of a stationary GP kernel for modelling stellar variability and a non-stationary kernel with parameters informed by prior knowledge about any instrumental changes.

CCF construction
In this paper, we demonstrate the φESTA methodology using HARPS-N solar observations and simulated SOAP data. Publicly available CCFs for HARPS-N solar data and SOAP data are based on a weighted binary mask. While CCFs based on a template spectrum can improve the robustness of RVs for some stars (e.g., low SNR, broad absorption features in cool stars), we expect a narrow, weighted binary mask to preserve more line shape information. While it could be interesting to study the effects of different CCFs masks and preprocessing (e.g., CCF continuum normalisation, velocity range) on φESTA outputs, that is beyond the scope of this paper. Of course, one should use the CCFs constructed using the same mask and preprocessing to correctly account for other changes due to stellar variability and instrumental instability. 7.3.4. Improving ∆RV k ∆RV k , the difference between the RV shift at k-th Fourier mode (RV FT,k ) and the apparent RV (RV apparent ), traces the amount of CCF variability at the frequency ξ k (Eq. 12). By construction, a true Doppler shift (RV planets ) does not affect the expected values of individual ∆RV k 's, but contributes to each of the RV FT,k . For the HARPS-N solar spectra, there are no true Doppler shift signals in the heliocentric RV timeseries and so RV FT,k alone would trace the CCF variability and could have been a stronger indicator for variability. Nevertheless, for other stars, there will inevitably be uncertainty about the potential presence of planetary companions. That is why we choose to use ∆RV k instead of RV FT,k even for the analysis for the HARPS-N solar observations as a manner of consistency.
It may be possible to improve the statistical power and accuracy of ∆RV k as a series of CCF variability indicators by re-defining ∆RV k planets , the RV signal of putative planetary companion(s). Such first guess may be derived from jointly fitting the planetary RV signal and intrinsic stellar variability with ∆RV k . Then as an iterative approach, we keep improving the fit of RV (l+1) planets as we improve ∆RV k (l) from the previous round. For its implementation, one could take a Bayesian approach and simultaneously analyse the planetary signal and stellar variability. This would require placing priors on both the planet properties and the φESTA indicators (e.g. A k 's and ∆RV k 's). Multivariate Gaussian (GP) regression (Section 7.3.2) provides a powerful framework for placing priors on the φESTA indicators. It is beyond the scope of this paper and offers opportunities for future research. The coefficient of determination, also known as R 2 , is defined as 1− wRSS wTSS where wRSS is the weighted sum of squares of residuals and wTSS is the total weighted sum of squares (Heinisch 1962). It is a measurement of the percentage of variance of the dependent variable predicted by the independent variable(s), with R 2 = 1 indicating a zero residual and the modelled values totally match the observations.
To avoid overestimating when extra explanatory variables are introduced into the model, the adjusted R 2 , denoted asR 2 , is used (Raju et al. 1997).R 2 is defined as 1 − (1 − R 2 ) n−1 n−p−1 where n is the sample size and p is the number of explanatory variables.R 2 can thus be used to compare the performance of different models, regardless of different explanatory variables being used. For the simple linear regression, there is only a single explanatory variable and so p = 1.

B. NOISE IN AMPLITUDE AND PHASES
For simplicity, we treat photon noise across the velocity grid as uncorrelated and Gaussian (approximated for Poisson distribution because of high photon counts), with the mean being the observed flux CCF (v n ) and the 1σ of the Gaussian distribution being the associated photon noise error counts at each v i denote as σ CCF (vn) . Then the uncertainty of CCF (ξ k ) can also be calculated using DFT Since the DFT is a linear transform, a normally distributed CCF (v n ) results in a normally distributed CCF (ξ k ). The distribution of CCF (ξ k ) can be approximated by a 2D multivariate normal distribution in the Fourier domain. The marginalised CCF (ξ k ) RE and CCF (ξ k ) IM also follow the normal distribution, with the variance Next we consider what distributions the amplitudes A k and φ k as calculated in Eq. 3 and 4 follow. We study under what condition A k and φ k are well-approximated by the normal distribution. Without loss of generality, we choose CCF (ξ k ) = (1, 0) and test in a bootstrapping approach if the distribution of A k and φ k can be distinguished from a normal distribution when the noise-to-signal-ratio N/S = σ CCF (ξ k )RE = σ CCF (ξ k )IM = 0.1, 0.2, , . . . , 1 using the D'Agostino's K-squared (D'agostino et al. 1990) and the Shapiro-Wilk normality tests (Shapiro & Wilk 1965). We find that when N/S ≤ 0.2, the resulting A k and φ k distributions are indistinguishable from a normal distribution. Larger N/S results in deviations from the normal distribution (Fig. 15).
In conclusion, we compute the uncertainties of σ CCF (ξ k )RE and σ CCF (ξ k )IM , which can be approximated by 1 2 N −1 n=0 σ 2 CCF (vn) , and compare it with | CCF (ξ k )|. When the ratio is less than 0.2, we can be confident that the distribution for the amplitudes and phases is well described by a normal distribution.
Since we treat noise across CCF pixels uncorrelated, the resulting uncertainties in the amplitudes and phases obtained by the bootstrapping are likely to be overestimated compared to a noise model that accounts for correlations in the CCF.  Figure 15. The distributions of the amplitudes and phases in the presence of noise, where the signal (S) refers to | CCF (ξ k )| = 1 and the noise (N) refers to σ CCF (ξ k ) RE and σ CCF (ξ k ) IM . The black curve is a normal distribution fit to the histogram (regardless whether the underlying distribution is normal). N/S = 0.2 is the threshold when the distributions of A k and φ k start to deviate from normal distributions under the normality test. For comparison, examples of N/S = 0.4 and 0.8 represent the lower SNR regime, where the A k distributions tend to be skewed and the φ k distributions tend to be taller in the core and thinner in the wing than a normal distribution.
Zero-padding adds zeros to the end of the signal. Let N be the intrinsic number of inputs of the original discrete signal and N be the total number of inputs for the DFT with (N − N ) zeros padded after the original signal. Denote the new signal as {x n }. According to Eq. C3, the Fourier transform of the zero-padded signal is Replace k/N in Eq. C3 by ξ and and k/N Eq. C4 by ξ , we have x n exp(−2πiξ n).
For N > N , ξ means a finer sampling than ξ in the frequency grid. Therefore, the resulting amplitudes and phases in the Fourier transform space look smoother, but it does not extract more features of the original signal.