Leveraging space-based data from the nearest Solar-type star to better understand stellar activity signatures in radial velocity data

Stellar variability is a key obstacle in reaching the sensitivity required to recover Earth-like exoplanetary signals using the radial velocity (RV) detection method. To explore activity signatures in Sun-like stars, we present SolAster, a publicly-distributed analysis pipeline that allows for comparison of space-based measurements with ground-based disk-integrated RVs. Using high spatial resolution Dopplergrams, magnetograms, and continuum filtergrams from the Helioseismic and Magnetic Imager (HMI) aboard the Solar Dynamics Observatory (SDO), we estimate 'Sun-as-a-star' disk-integrated RVs due to rotationally modulated flux imbalances and convective blueshift suppression, as well as other observables such as unsigned magnetic flux. Comparing these measurements with ground-based RVs from the NEID instrument, which observes the Sun daily using an automated solar telescope, we find a strong relationship between magnetic activity indicators and RV variation, supporting efforts to examine unsigned magnetic flux as a proxy for stellar activity in slowly rotating stars. Detrending against measured unsigned magnetic flux allows us to improve the NEID RV measurements by ~20\% (~50 cm/s in a quadrature sum), yielding an RMS scatter of ~60 cm/s over five months. We also explore correlations between individual and averaged spectral line shapes in the NEID spectra and SDO-derived magnetic activity indicators, motivating future studies of these observables. Finally, applying SolAster to archival planetary transits of Venus and Mercury, we demonstrate the ability to recover small amplitude (<50 cm/s) RV variations in the SDO data by directly measuring the Rossiter-McLaughlin (RM) signals.


INTRODUCTION
The field of exoplanet science has drastically grown in popularity and fervor since the first confirmed exoplanet discovery around a Sun-like star (Mayor & Queloz 1995). This has led to significant advancements in both instrumentation and data analysis techniques as we push towards the detection of a Earth-like planets . The radial velocity (RV) technique has been a cornerstone of exoplanet science since the first exoplanet discovery using Doppler velocimetry, and a) https://tamarervin.github.io/SolAster/ is credited with over 1000 additional planet discoveries (Hatzes 2016;Fischer et al. 2016).
The RV technique searches for periodic Doppler shifts in the host star's spectra (Hatzes 2016). These periodic variations are driven by the presence of a planetary companion whose motion shifts the spectroscopic signature of its host star. As the field strives towards the detection of smaller, terrestrial-mass planets, improvements in RV measurement precision are required  to push beyond the current ∼1 m s −1 measurement floor. Recent advancements in RV instrumentation, culminating in the delivery of a new generation of Doppler measurement facilities such as NEID (Schwab et al. 2016), ESPRESSO (Pepe et al. 2021), EXPRES (Jurgenson et al. 2016), HARPS3 (Thompson et al. 2016), and KPF (Gibson et al. 2018) aim to push down to the ∼30 cm s −1 range. Detecting an Earth-like planet orbiting a Sun-like star requires additional improvement down to the 10 cm s −1 level (Wright 2018;Fischer et al. 2016;Hatzes 2016).
The next challenge to improving detection sensitivity lies largely in the removal of stellar variability, which leads to noise that can often dominate measured RV variability (Saar & Donahue 1997) at the m s −1 level (Crass et al. 2021). The signal from stellar activity can often mask or even masquerade as planetary signals (Robertson et al. 2015;Wright 2018). Stellar activity signals are due to a combination of (super) granulation (Meunier et al. 2015;Dumusque et al. 2011), oscillations (Palle et al. 1995), meridional circulation (Meunier & Lagrange 2020), magnetic activity, and photospheric motion (Meunier et al. 2010a;Haywood et al. 2016;Crass et al. 2021). These phenomena are often periodic, aligning with the stellar rotation period and subsequent harmonics, and can consequently be mistaken for planetary signals (Boisse et al. 2011). The lack of temporal stability across the stellar surface, coupled with inhomogeneous stellar intensity and differential rotation of the star, makes it difficult to robustly disentangle stellar signals from planetary ones when studying disk-integrated spectra.
To improve our understanding of stellar activity and its effects in Sun-like stars, we turn to our closest Solartype star -the Sun. The sheer amount of available solar data products, combined with established abilities to leverage high-cadence images of the solar surface to produce maps of solar velocity, intensity, and magnetic field strength, make the Sun the perfect candidate for studying activity-induced temporal variability (Pesnell et al. 2012;Scherrer et al. 2012). Using the Sun as a test case allows us to more cleanly separate the various component velocities and analyze relationships between measured disk-integrated RV variations and calculated solar observables.
NASA's Helioseismic and Magnetic Imager (HMI), an instrument aboard the Solar Dynamics Observatory (SDO ), was built as the successor to the Michelson Doppler Imager (MDI) to study the solar surface magnetic field. Launched in 2010, it continuously observes the Sun in the spectral region of the Fe I 6173Å line, providing four high resolution data products: line-ofsight and vector magnetograms, continuum filtergrams, and Dopplergrams (Pesnell et al. 2012;Scherrer et al. 2012). These measurements of the magnetic field variability, intensity continuum, and velocity profile across the solar disk allow us to study the Sun's temporal vari-ability and the effect these variations have on solar RV's (Pesnell et al. 2012;Scherrer et al. 2012;Haywood et al. 2016).
In this study, we develop a SDO /HMI data analysis pipeline to compliment future extreme-precision RV (EPRV) studies of the Sun. Originally developed by Fligge et al. (2000), this technique has been adapted by Meunier et al. (2010b), Haywood et al. (2016), and Milbourne et al. (2019) to extract disk-averaged quantities from spatially resolved solar observations. Our publicly available Python pipeline, SolAster, uses data products from SDO /HMI to better characterize a suite of solar magnetic activity parameters, and performs a simple decorrelation analysis on disk-integrated solar RV measurements (now available from a number of RV facilities). There are two primary activity effects that strongly impact the measured RV (on timescales of days to months): the velocity variation due to the traversing motion of sunspots and faculae across the rotating solar surface, and the variation due to the suppression of the convective blueshift by active regions (Aigrain et al. 2012). When linearly combined, these velocity components can be used to generate an independent estimate of the disk-integrated solar RV (Haywood et al. 2016). The individual velocity components serve as a strong proxy for surface magnetic activity, providing a independent window into the stellar surface that can aid in interpreting ground-based RV measurements (Haywood et al. 2016;Milbourne et al. 2019;Haywood et al. 2020). Additionally, from the SDO /HMI data we calculate an array of magnetic observables that can be used to gauge the effects of the size and intensities of active regions on measured RV variations.
The paper is organized as follows. In Section 2, we describe our SDO/HMI data processing pipeline and analysis products. We outline the data correction process and methodology for classifying different magnetically active regions. In Section 2.7 we discuss the calculation of the full 'Sun-as-a-star' RVs, outlining how each of the velocity components are independently calculated. We also describe the calculation of solar magnetic observables (Sections 3.3 and 3.4) and compare these results with our space-based measurements from HMI and ground-based RV measurements from the NEID instrument (4). Finally, we apply these calculation techniques to archival planetary transits in Section 5 to highlight the precision of the reconstructed RVs delivered by the pipeline, and demonstrate that magnetic variability can affect precision RV measurements at the 10's of cm s −1 level over multi-hour timescales.

SolAster -AN SDO /HMI ANALYSIS PIPELINE
The plethora of data available from the Helioseismic and Magnetic Imager aboard SDO allows us to calculate space-based, 'Sun-as-a-star' radial velocity estimates that can be directly compared to ground-based measurements. Here we describe the underlying data products and techniques used to calculate various solar observables using SolAster.
Before computing the RVs from the SDO /HMI data, there are a number of data preparation steps required. The suite of SDO /HMI images used in this study were: wide-band continuum filtergrams (intensity grams), line-of-sight longitudinal magnetic field measurements (magnetograms), and maps of solar surface velocity (Dopplergrams) (Pesnell et al. 2012;Scherrer et al. 2012) (see Figure 1 for example images). These three data products provide the necessary intensity, magnetic field strength, and velocity information for active regions to be detected, tracked, and accurately integrated into the full RV model. HMI data products are publicly available and can be queried from the data archive using Sunpy, a community based Python package for solar data analysis (SunPy Community et al. 2020). In addition to providing archive querying capabilities, Sunpy includes userfriendly methods for accessing and visualizing solar data.
Using Sunpy, SolAster calculates a combination of velocities and magnetic observables using the SDO /HMI intensity, velocity, and magnetic field data. Photometric and convective velocity components are independently calculated and then linearly combined to generate 'Sunas-a-star' RVs. Additionally, we calculate both unsigned magnetic flux and filling factor, which can be used to study the correlation between disk-integrated radial velocity and measures of magnetic activity. These magnetic observables are calculated contemporaneously to the RV calculations and include unsigned flux and filling factor measurements specific to all relevant active regions (plage, intranetwork, and sunspots). We also look at flux due to convective regions, and area cuts to study the differing effects between large and small active regions on the modeled RVs.

Coordinate transformations
Before calculating the three-dimensional Heliocentric velocity, the data must be transformed from the Helioprojective Cartesian Frame into the Heliographic Carrington frame, then corrected for line-of-sight projections and relative positioning of the spacecraft. This transformation is based off the description in Thompson (2006) and is necessary to ensure the images are cen-tered on the solar surface and independent of the Carrington rotation cycle, the 25.38 day solar sidereal rotation period (Carrington 1859). To transform coordinate systems, we rotate the image grid from Cartesian pixel coordinates to Heliographic Carrington coordinates by building a rotation matrix calculated from the reference coordinates listed in each image's FITS header (Ulrich & Boyden 2006). Each HMI image's relative pixel locations in the Heliographic Carrington frame are specified by (w ij , n ij , r ij ) denoting the direction westward, northward, and radially outward from disk center, respectively. This coordinate system fixes the image onto the solar surface and allows for a determination of the relative position of the spacecraft with respect to the Sun using only the radial coordinate. Additionally, we constructed an array of µ (cos θ) values for each pixel in each image, which determines the position of the pixel relative to disk center. Flux values for pixels with µ values below 0.3 were set to zero in all images as the limb-brightening model is often unreliable far from disk center (Haywood et al. 2016), and projection issues can cause non-physical fluctuations in measured values.

Spacecraft Velocity Correction
To isolate the solar velocity component due to magnetic activity alone, we first corrected the Dopplergrams for the relative motion of the spacecraft.
We corrected the Dopplergrams for the motion of the spacecraft relative to the Sun by building a pixel-wise mask of the relative spacecraft velocity. The w, n, and r components of the relative spacecraft velocity are read in from the FITS headers with a quoted precision of 0.01 m s −1 (Hoeksema et al. 2018). We then calculated the position of the spacecraft relative to each pixel ij, which combined with the velocity components from the FITS header, determined the required velocity correction. After the coordinate transformation, the spacecraft is located at position (0, 0, r sc ) where r sc is the radial position of the spacecraft relative to disk-center, and can be determined by dividing the net distance to the Sun by the solar radius (both these values are found in the FITS header of all SDO /HMI images with keywords dsun obs and rsun ref).
The (w, n, r) components due to the relative motion of the spacecraft are found in the FITS header of the Dopplergram files (obs vw, obs vn, obs vr respectively). We then project these components such that each pixel ij has a Heliocentric velocity magnitude of (Haywood et al. 2016): where d ij is the distance between the spacecraft and pixel ij.

Solar Rotational Velocity Correction
Next, we turn our attention to the differential rotation of the solar disk which must be accounted for when correcting the measured Doppler maps. Differential rotation is the result of turbulent motion and convective activity due to temperature gradients permeating outwards from the stellar core (Schou et al. 1998). This produces a latitude-based rotation profile, where the rate of surface rotation is maximized at the equator (φ = 0 • ) and is inversely proportional to latitude (Schröter 1985). The angular velocity due to rotation in the photospheric layer ranges from 14.1-14.4 deg day −1 at the equator to 10.07 deg day −1 at the poles (Snodgrass 1984). The sidereal rotation period for a Carrington rotation is 25.38 days, which is the rotation rate at a latitude of 26 • , where sunspots are most often found. This rotation period is accounted for in our coordinate transformation to the Heliographic Carrington frame (Thompson 2006). Snodgrass & Ulrich (1990) used full-disk Magnetograms and Dopplergrams from the Mount Wilson Observatory to track magnetic features on the solar surface over time in order to build a model of the solar differential rotation profile. They determined three constants α 1 , α 2 , α 3 , all of which are in units of deg day −1 . The parameterization of the differential rotational profile is as follows: Using this parameterization with coefficients 14.713, 2.293, and 1.787 as α 1 , α 2 , α 3 respectively, we calculate our differential rotation profile and project this onto the solar disk to build a map of solar rotational velocity at each latitude. We then project this differential rotation profile into the Heliographic Carrington frame to determine the rotational velocity component at each pixel. Finally, we calculate the full rotational velocity array based on the methods of Haywood et al. (2016) and Milbourne et al. (2019):

Foreshortening Correction
The line-of-sight magnetograms measure the longitudinal surface magnetic field. Foreshortening causes a decrease in observed spatial resolution relative to the distance from disk center due to the geometric projection and must be accounted for when estimating the true magnetic flux (Zhao et al. 2016). This measured magnetic field is less than the true radial solar magnetic field by a factor of µ = cos (θ), where θ is the centerto-limb angle (Zhao et al. 2016). To calculate the true field strength, we divide the observed field (B obs ) by µ and recover the full radial field: Additionally, we set all pixels with magnetic field strengths below the noise threshold (σ B obs, ij ) to zero to account for instrument noise as described in Yeo et al. (2013). Although the noise does increase as a function of angle from disk center (µ) we take a constant minimum noise threshold of 8G based on Yeo et al. (2013). Therefore, pixels with longitudinal magnetic field strengths (B obs, ij ) below 8G are set to 0 for both B obs, ij and B r, ij . This ensures our magnetic measurements are not contaminated by instrument noise, which would otherwise propagate through many aspects of the analysis pipeline.

Limb Darkening Correction
Similar to the effect foreshortening has on the HMI magnetograms, the continuum images are also affected by limb-darkening. We correct for this by using a static fifth-order circularly symmetric polynomial brightness function (L ij ), with scaling coefficients determined through empirical methods by Allen (1973). This polynomial produces a pixel-wise array of correction values and the base intensity image is divided by these correction factors.
The flattened intensity image can now be used to classify bright and dark regions (Figure 1), which are separated via thresholding.

Region Identification
The underlying assumption in our space-based RV calculation is that magnetic activity is the primary driver of bulk RV variability in the Sun. We identify magnetically active regions, and differentiate between regions of bright faculae and dark sunspots, to distinguish the impact of different types of magnetic activity on RVs.
Active regions are detected using a thresholding identification scheme described in Yeo et al. (2013). Re-gions above the threshold are marked as 'active' and regions below the threshold are stored as 'quiet-Sun' pixels. Additionally, we remove pixels near the solar limb (µ < 0.1) and ignore pixels with µ values below 0.3 since the limb-darkening model is often flawed near the limb, as was done in Haywood et al. (2016) and Milbourne et al. (2019). We apply the same magnetic threshold described in Yeo et al. (2013), where pixels three times the noise cutoff in unsigned radial magnetic field strength (8G) are considered active: where σ B obs,ij is the magnetic noise level of 8G from Yeo et al. (2013). We set any isolated active pixels, i.e. those with no identified neighboring active pixels, to 0 ('quiet-Sun') as these can often be misidentified as sunspots and may instead be instrumental artifacts.
Once active regions are identified (Fig 1), we then apply intensity thresholding to differentiate between faculae and sunspot regions. Similar to the magnetic thresholding previously described, we base our intensity thresholding on values determined by Yeo et al. (2013) and used by Haywood et al. (2016) and Milbourne et al. (2019). Pixels with flattened intensity values above the threshold are denoted as faculae and those below the threshold are sunspots: The intensity threshold is based on I quiet , the mean flattened pixel intensity of quiet-Sun pixels, and is calculated by summing the flattened intensity of quiet-Sun pixels with a binary weighting array based on magnetic thresholding: where W ij is set to 1 for quiet-Sun pixels (|B r, ij | < |B r, thresh, ij |) and 0 for active pixels.

Radial Velocity Calculation
Following Haywood et al. (2016), we parameterize the full, disk-integrated solar radial velocity as a linear combination of contributions from the quiet-Sun and active regions. Active regions produce RV variations through two primary mechanisms: photometric effect and convective effect. Meanwhile, the quiet-Sun RVs are primarily driven by granulation.

Photometric Contribution
The photometric velocity traces the rotational Doppler imbalance caused by bright faculae and dark sunspots. The presence of bright and dark active regions leads to an inhomogeneity across the solar disk, altering the Doppler balance between redshifted and blueshifted hemispheres. This leads to RV shifts of up to several percent depending on spot size and stellar activity levels (Saar & Donahue 1997). Sunspots are generally the dominant source of variability in the photometric RV signal, although plage regions also contribute to the signal (Lagrange et al. 2010). The photometric effect due to these two factors is accounted for in∆v phot , which we calculate based on the methodology outlined in Haywood et al. (2016) and Milbourne et al. (2019): whereK is a scaling factor based on the limb darkening correction polynomial: I ij −K L ij is the intensity map corrected for limbdarkening, as seen in the top left panel of Figure 1. An example of the W ij weighting array can be seen in the bottom right panel of Figure 1, where W ij = 0 for quiet-Sun pixels and W ij = 1 for active pixels.
The velocity perturbations from faculae and sunspots are approximately anti-correlated due to their opposing flux signs. When calculating the photometric velocity component, we find that this velocity perturbation is almost entirely driven by sunspots (see Figure 6), corroborating the results of Meunier et al. (2010a). This is likely due to the Sun's geometric configuration and the ratio of bright/spot regions at the time, meaning this may not be the case for other stars with different filling factors and distributions of bright and dark regions.

Convective Contribution
Active magnetic regions have different velocity amplitudes and surface areas distributed between upward and downward flows of solar granulation (Dravins 1990). In the photosphere, active magnetic regions inhibit granular convective motions of the quiet-Sun, and these convective motions manifest as wavelength shifts of photospheric lines (Dravins et al. 1981). While the photometric velocity variation is driven by sunspots, the convective velocity variation is driven by larger brighter faculae regions and thus these drive the overall RV signal.
SDO /HMI images can resolve these granules, allowing us to calculate the velocity contribution specifically due to suppression of the convective blueshift. In convective cells, dark outward flowing plasma at the cell's center and downward flowing bright plasma on the cells edge leads to overall convective blueshifts on the order of 0.5 km s −1 (Dravins et al. 1981). This correlation weakens across the solar disk as we see primarily horizontal velocity flows on the solar limb (Dravins 1990). The effect of the suppression of the convective blueshift varies with line depth (Gray 2009) and thus we expect to see different temporal convective velocity shifts across different wavelengths. For this reason, we do not expect perfect correlation between NEID observations and the space-based convective velocity, as the ground-based RV measurements utilize thousands of spectral features while SDO /HMI observes velocities only in the magnetically sensitive 6173.3Å Fe I line. We use linear regression to scale the SDO /HMI derived convective velocity to account for this difference (see Section 2.10).
The convective velocity is then calculated by taking the disk averaged Doppler velocity,v, and subtracting from it the disk averaged quiet-Sun velocity,v quiet . We subtract the quiet-Sun velocity because SDO /HMI Dopplergrams are not well calibrated nor stable over long timescales (Haywood et al. 2020).
We calculate the disk-averaged and quiet-Sun velocities following the methodology of Haywood et al. (2016) and Milbourne et al. (2019): where v sc, ij is the relative spacecraft velocity and v rot, ij is the solar rotational velocity. An example of the corrected spacecraft velocity (v ij − v sc, ij − v rot, ij ) can be seen in the top right panel of Figure 1. The quiet-Sun velocity is calculated by using the corrected Doppler velocity and weighting by the intensity of quiet-Sun pixels: where W ij is the magnetic weighting array.

RV Reconstruction from Velocity Features
To estimate the full disk-integrated SDO /HMI spacebased radial velocities, we follow the methodology outlined in Milbourne et al. (2019) and Haywood et al. (2020), adapted from Haywood et al. (2016). We build a model radial velocity variation ∆RV model assuming a linear combination of ∆v conv and ∆v phot : where A and B are independent scaling factors, and RV 0 is the relative RV offset parameter. Similar to Milbourne et al. (2019), these coefficients are determined by linear least-squares optimization using the ground-based RV measurements, assuming the two RV components are orthogonal and do not have correlated noise. These scaling factors account for the systematic differences between observations taken using SDO /HMI in one line (λ = 6173.3Å), and ground-based spectra using thousands of lines.
2.11. Validation of SolAster using previously published SDO/HMI measurements As our methodology for calculating the velocity components and full model RVs is based on the methods outlined in Haywood et al. (2016) and Milbourne et al. (2019), we analyzed the same solar data used in these studies to verify our performance.
We use our pipeline to calculate RVs for the time frame in Milbourne et al. (2019)

COMPARISON OF SDO /HMI DERIVED OBSERVABLES WITH GROUND-BASED MEASUREMENTS
Our ultimate goal in calculating these solar observables is to gain insight into the physical mechanisms driving measured RV variability seen in ground-based Doppler measurements. We use data from the recentlycommissioned NEID instrument (Schwab et al. 2016), which has a dedicated solar feed that delivers diskintegrated sunlight to the RV spectrometer (Lin et al. 2021 submitted.). NEID records high signal-to-noise (SNR∼600) spectra every ∼90 seconds throughout the day. NEID spectra are reduced using the standard NEID pipeline, which delivers both integrated RVs and crosscorrelation functions (CCFs) for each frame recorded throughout the day 1 . We filtered for days with low cloud coverage, using data from the pyheliometer atop the NEID Solar telescope (Lin et al. 2021 submitted.), and good instrumental drift correction.
Using our SDO analysis pipeline, we then computed the component RVs (∆v phot and ∆v conv ) during periods when NEID spectra were being collected. The in- dependent amplitudes of ∆v phot and ∆v conv , are significantly lower for the period of NEID data collection (December 2020 -May 2021) than the values shown in Haywood et al. (2016) and Milbourne et al. (2019), reflecting the low level of magnetic activity over the period for which NEID has been observing the Sun (Table 1).
We find that the resultant 'Sun-as-a-star' SDO /HMI computed RVs are largely dominated by the convective velocity signal, likely due to the low level of surface features (spots, plages, etc.) on the solar surface during the period of analysis. This is further highlighted when comparing the modeled RVs and convective velocity component with unsigned flux and filling factor, which are all strongly correlated (Figure 3).

Comparison of SDO Model RVs with NEID Ground Based RVs
We then proceeded to compare the results of the SDO /HMI measurements to NEID ground-based RVs collected during instrument commissioning between De-cember 2020 and May 2021 (the nominal NEID commissioning period). Using these data, we refit for the linear coefficients in equation 16 for the convective and photometric velocity components. Table 2 lists our derived scaling factors using the NEID RVs for calibration in comparison to the scaling factors derived in Haywood et al. (2016) and Milbourne et al. (2019). Both scaling factors characterize the impact of a measurement taken using one line (SDO /HMI data) to ground-based RVs measured across many lines, and we expect these factors to change over time due to fluctuations in spot coverage and activity level (Milbourne et al. 2019). Scaling factor A traces the contribution of rotating active regions (primarily spots) to the bulk RVs. Our time frame of interest had consistently low spot coverage, leading to a very low amplitude for the photometric component, and thus large uncertainty in the determination of scaling factor A (see Table 2). Scaling factor B is a measure of the systematic difference in the convective blueshift due to varying spectral line formation depths.
As established by Meunier et al. (2010a), Haywood et al. (2016), and Milbourne et al. (2019) we expect the suppression of the convective blueshift to dominate the overall RV, which is consistent with our measurements during this time period (see Table 1). Our ground-based RVs from NEID do not show measurable correlation with the photometric velocity signals calculated from the SDO /HMI images (Figure 3), which is consistent with the current phase of solar activity (minimum) and in agreement with Lagrange et al. (2010). The lack of sunspots during the NEID commissioning period leads to the low variability in the photometric velocity, which further minimizes the contribution of the photometric component to the overall model RVs (∆RV model ), as seen by the low value of scaling factor A in Table 1.
We are able to improve our results when restricting our analysis to days with the most reliable ground-based observations. These dates have both excellent observing weather (very low or no cloud coverage) and verified wavelength solutions from the NEID laser frequency comb. For these dates we see very strong correlation between our ground based RV measurements and both the model RVs and unsigned magnetic flux. We find that unsigned magnetic flux is a strong proxy for RV variation supporting the conclusion of Haywood et al. (2020).

Active Region Area Dependence
In addition to the comparison of ∆RV model with the convective velocity and ground-based measurements, we    (2019) using HARPS-N. In the period studied here, the overall activity level and amplitude of the velocity components was lower along with the rms amplitude of the RVs (see Table 1). We see a ∼20% difference between the A and B parameter values between the different time periods. This discrepancy is likely due to the significantly lower level of activity in our recent data from 2020-2021.
examine the area dependence of active regions on the suppression of the convective blueshift. To differentiate between large and small convective regions, we build an area plot as a function of latitude, similar to Milbourne et al. (2019) and use 20 µHem as our area threshold. We find network regions below the cutoff across the solar disk, while larger plage and spot regions are found near the solar equator at latitudes ±30 • . Milbourne et al. (2019) implemented the same area cutoff and found similar latitude cuts of large active regions, 0.75 < sin Φ ≤ 1.0, where Φ is active region co-latitude. We then compute the ∆v conv for the full time series using the contributions from the small network regions, and large plage/spot regions separately to compare with the RV NEID from NEID. In support of Milbourne et al. (2019) we find that large plage and spot regions drive the RV variability more so than smaller network regions ( Figure 6). Additionally, we find that the power spectral density (PSD) contributions due to small active regions do not contribute on timescales relevant to this study (rotation timescales) as seen in Figure 4, similar to the results found in Milbourne et al. (2019), while large active regions (plage/spots) show rotational modulation as seen in Figure 4. While the convective velocity component is calculated with the corrected Doppler map (rotational velocity removed) there is still periodic structure on rotation timescales (see Figures 2 & 3). Large active magnetic regions (plage, spots) are expected to drive variability on shorter timescales such as the period in this study, while the network is expected to show more impact on longer timescales such as a Solar Cycle (Milbourne et al. 2019).

Unsigned Magnetic Flux
Unsigned magnetic flux, |B obs |, has been shown to be a valuable proxy for stellar activity in the Sun (Haywood et al. 2020). Our pipeline calculates unsigned magnetic flux using the methodology posed in Haywood et al. (2016) and Haywood et al. (2020), which yields a high signal-to-noise, independent activity metric to guide our analysis of ground-based RVs. Figure 3 shows the calculated |B obs | time series during the full NEID commissioning period. |B obs | is determined by performing an intensity weighted sum of each pixel in the magnetogram: For our five month span (December 2021 -May 2021) of 'best' weather dates, defined as days with no measurable cloud cover and minimal extinction at Kitt Peak, we find a Spearman correlation coefficient of 0.43 between |B obs | and measured NEID RVs and a correlation of 0.29 between measured NEID RVs and model RVs. We see strong correlation between |B obs | and both the space-based RVs and ∆v conv (∼ 0.90) throughout the entire time span, emphasizing the dependence of magnetic activity on the suppression of the convective blueshift. This is expected based on previous works such as Meunier  These observations were taken during the solar minima of Solar Cycle 25 and thus there was very little to no sunspot activity on the solar surface. The convective velocity is primarily driven by large magnetic structures, specifically plage regions. We see strong correlations between the plage filling factor, convective velocity, and unsigned flux. Additionally, we find the unsigned flux due to active regions strongly correlates with the SDO /HMI model RVs.
Performing a linear regression of NEID RVs against measured unsigned magnetic flux (see Figure 5), the NEID RV measurement RMS decreases from ∼80 cm s −1 to ∼60 cm s −1 over the five month span (Figure 5) -an improvement of ∼50 cm s −1 in a quadrature sum sense. This simplistic activity decorrelation implies there is much promise for using unsigned magnetic flux to reduce RV variability, and we await additional ob- servations to ensure this correlation remains consistent as the Sun enters a more active phase of the magnetic cycle.

Filling Factor
The magnetic filling factor, defined as the fraction of the observed solar disk that is active (corrected for foreshortening), is the second metric we studied and compared to the NEID RVs: where W ij is the magnetic weighting array and W ij = 1 for active pixels and 0 for pixels identified as inactive. The Spearman correlation coefficient between |B obs | and f is 0.93 and we also find strong correlation between f and ∆v conv (0.97), and ∆RV model (0.96) for the full five month time span. We compare the unsigned flux due to active regions with the three filling factors calculated (see Figure 6). The dominant feature driving the active region flux is the condensed faculae regions known as plage. Plage are bright regions on the solar surface typically found near sunspots that make up the majority of polarity in solar active regions (Buehler et al. 2019). MHD simulations show that the amount of thick flux tubes in plage regions is small, while flux tubes in the faculae network expand more quickly than those in plage regions, and that the continuum intensity of bright regions strongly correlates with magnetic field strength (Röhrbein et al. 2011;Danilovic et al. 2013 Figure 5. Ground-based RV measurements before and after correction for correlation with unsigned magnetic flux. For these 'good' weather days, where there was low measurable cloud cover at WIYN, the correlation between RVNEID and |B obs | is 0.43, while the correlation between ∆RV model and RVNEID is 0.29, and the average daily binned error for RVNEID is ∼5 cm s −1 . We linearly fit the correlation between |B obs | and RVNEID and subtract this from RVNEID to reduce the scatter due to the unsigned magnetic flux. The scatter reduces from ∼80 cm s −1 to ∼65 cm s −1 after this correction. In addition to comparing our space-based observables to integrated ground-based RVs, we also compared them to RVs calculated using different spectral line masks.
These filtered masks were used to try and isolate the most and least activity-sensitive features in the NEID solar spectrum. To test this, we built several physicallymotivated line masks to compare the effects of magnetic activity on different groups of spectral lines using our SDO -derived solar activity proxies.
We derived RVs using these tailored masks by calculating cross-correlation functions (CCF) in the same manner as the standard NEID pipeline. These masks were selected based on filtering for line species and line depth, and in all cases our starting line list was the standard ESPRESSO G2 line mask 1 . Once computed, the CCFs were fit with Gaussian functions to determine the averaged spectrum velocity for that particular mask.

Exploration of line depths and the relationship with magnetic observables
The shifts in spectral line profile shape are primarily driven by convection (Dravins et al. 1981) associated with velocity and intensity variations in stellar granules (Hathaway et al. 2000). As the plasma from active regions (sunspots, faculae, plage) interacts with the solar magnetic field, we see an inhibition of this convective blueshift (see section 2.9) and this effect varies with line depth (Gray 2009). Deeper lines are less blue-shifted than shallow lines (Gray 2009) and thus we expect to see a stronger inhibition of the convective blueshift from the mask built with shallow lines.
We study the effects of deep lines versus shallow lines to determine how these correlate with both the bulk NEID RVs, as calculated by the NEID pipeline using the ESPRESSO G2 mask, and unsigned magnetic flux values computed by SolAster. We first applied a binary split between 'deep' lines and 'shallow' lines (deep lines having binary mask weights ≥ 0.5 and shallow having weights < 0.5) to see if the correlation between the NEID RVs and the unsigned magnetic flux changed measurably (see Figure 7). The RVs calculated using the mask of deep lines show strong correlation with NEID RVs (Spearman correlation coefficient of 0.92), but show weaker correlation with unsigned magnetic flux (correlation coefficient of 0.33). For shallow lines, we similarly found a strong correlation with the NEID RVs (Spearman coefficient of 0.98) and weak correlation with unsigned flux (correlation coefficient of 0.28). The rms ampltiude for the RVs calculated using the depth cut line mask is ∼90 cm s −1 for the deep lines and ∼85 cm s −1 for the shallow lines. We also look at the residual scatter between the integrated ground-based RVs (RV NEID ) and the RVs from our curated masks, finding rms scatter of ∼22 cm s −1 for deep lines and ∼10 cm s −1 for shallow lines. We find that the RVs calculated using either deep or shallow lines show similar correlations with NEID RVs and unsigned flux, along with similar RMS scatter (Figure 7).

Derivation of ground-based RVs using physically-motivated masks
In addition to studying the variability as a function of line depth, we also constructed masks based on previously published studies that isolate active lines. To attempt to enhance the activity signature in the NEID spectra, we adapted the line list from Wise et al. (2018) of activity-sensitive lines and recomputed the NEID RVs. This list contains only those lines that showed significant correlations between their line depths and the chromospheric Ca II H&K index for active stars (Wise et al. 2018). We then compared these RVs with our SDO /HMI pipeline measurements, as well as the 'standard' NEID pipeline RVs. We found the RVs of these select ∼20 lines (those found in the Wise list and ESPRESSO G2 mask) correlate well with the NEID pipeline RVs, and do not show a strong correlation with the SDO -derived unsigned magnetic flux. This may be unsurprising, given that Wise et al. (2018) built this list of activity-sensitive lines through observations of stars that are more active than the Sun's current activity level and show significantly higher rotationallymodulated behavior. As such, the observed relationship between activity and line variability may not hold nor be measurable during this time of low solar activity. Figure 7 shows the correlation between the RVs derived using different line masks and the SDO -derived unsigned magnetic flux. The left panel shows the correla-tions for the activity-sensitive line list (Wise et al. 2018) while the second column displays the correlations for the mask built using only Fe I lines within the ESPRESSO G2 mask. We filtered specifically for Fe I lines to compare against the SDO-derived RVs, which use a single Fe line to compute the majority of the observables used in SolAster. This Fe I line list contains significantly more features than the mask constructed using features from Wise et al. (2018), yet we find similar scatter in the RV time series in both masks, implying neither list is significantly more activity sensitive. We do find that the integrated RVs from the Fe I lines show mildly stronger correlation with both the bulk NEID RVs and unsigned magnetic flux in comparison to the activity-sensitive lines from Wise et al. (2018).

Comparing measurables from NEID Cross Correlation Functions
Beyond integrated RV measurements, we also studied an assortment of CCF parameters to determine which metrics best trace magnetic field variability. The primary CCF metrics we compared are: fitted amplitude, full width at half maximum (FWHM), skew, and integrated area below the line profile, in addition to calculating the RV shift via Gaussian fit of our CCF. Comparing the variation of these metrics with magnetic observables over time provides insight into the effects driving the CCF shape changes we observe. Figure 8 shows the full suite of computed CCF measurements compared to the SolAster data products. We looked at CCF metrics using the full G2 ESPRESSO mask which covers wavelengths from 3700 -7900Å to calculate RV variations and the list of metrics outlined previously. We find that certain CCF metrics serve as better proxies for magnetic activity due to their strong correlation with unsigned magnetic flux. The SDO data allow us to isolate days with higher magnetic activity, namely days with larger magnetic filling factors, and compare these results to days of 'quiet-Sun'.
We find the strongest proxies for magnetic activity to be integrated area and amplitude, each showing moderate correlation coefficients (∼0.5, see Figure 8) when compared to |B obs |. This broadly supports the results of Costes et al. (2021), specifically in regards to the CCF FWHM and integrated area measurements, though we note the base level of solar activity is different between these two studies. The moderately strong negative correlation between integrated area and unsigned magnetic flux is also supported by the work of Collier Cameron et al. (2019) and Costes et al. (2021), where integrated area was found to be a strong tracer for the evolution of the magnetic network due to the  Wise et al. (2018). There is moderately strong correlation with the NEID pipeline RVs, but only a weak correlation with unsigned magnetic flux. Second column: Correlation for RVs derived using only Fe I lines in the ESPRESSO G2 mask, showing stronger correlation with both the bulk NEID RVs and the unsigned magnetic flux. Third column: Correlation for RVs derived using deep lines in the ESPRESSO G2 mask, showing very strong correlation with the bulk RVs. Fourth column: Correlation for RVs derived using shallow lines in the ESPRESSO G2 mask, showing the strongest correlation with the bulk RVs. All error bars report only photon noise, excluding any instrument systematics or additional stellar jitter (5-7 cm s −1 ). variation in CCF area showing little to no rotational modulation. The lack of rotational modulation of the integrated area metric implies that its variation is likely driven by axisymmetrically distributed structures over the solar surface, specifically from the circulation of dispersed magnetic flux elements from regions that were once active (Collier Cameron et al. 2019). For days with sunspots, the correlation between photometric velocity and unsigned flux increases while the correlation between convective velocity and unsigned flux decreases. This shows the effect of the rotationally modulated spots on the photometric velocity component as discussed in section 2.8 and supports the correlation between spot factor and jumps in unsigned active magnetic flux as seen in Figure 6.
To explore the activity behavior as a function of spectral line depth, we created several CCF masks using different depth cuts. As previously described (section 4.1), we use the CCF mask line 'weights' listed in the ESPRESSO mask as proxies for relative depth. We then recompute the CCF and resultant RVs for all lines within each depth cut. We find that across all depth masks, we see a strong correlation between the derived RVs and bulk NEID pipeline RV measurements (indicating the majority of lines are shifting in a similar manner). For CCF metrics derived with the shallow line mask, we see stronger correlation between unsigned flux and both amplitude and integrated area in comparison to the metrics calculated with the deep line mask as shown in Figure 9. In addition, our observed correlations broadly supports previous results from Meunier et al. (2017), and Reiners et al. (2016) of stronger correlation between the CCF RVs from only the shallow lines and the calculated convective velocity component.

APPLICATION OF SolAster TO ARCHIVAL PLANETARY TRANSITS
Since the launch of SDO in 2010, three planetary solar transits have been imaged by SDO instruments -Mercury in 2016 and 2019, and Venus in 2012. Here we detail our attempts at recovering the corresponding Rossiter-McLaughlin (RM) signals due to the transits of Venus and Mercury. The measurement of these transits serve as proof of concept of using spatially resolved disk images to calculate disk-integrated RVs at precisions currently unattainable from the ground. We apply our SolAster pipeline methods to these time frames in the attempt to 1) recover the effect of the planetary transit on the RV measurements and 2) empirically estimate the precision floor of our constructed 'Sun-as-astar' RVs. Beyond testing our pipeline, these RV measurements also showcase the magnitude of the effect that stellar activity has on our ability to detect small RV signals over short timescales. For the all three transits, we used SDO /HMI data products at a two minute cadence and computed the unsigned magnetic flux, convective and photometric veloci-ties. We then reconstruct the overall model RV variation based on the parameterization outlined in Section 2.7. Since the start of the SDO observation period, there have been two Mercury transits on May 9, 2016 and on November 11, 2019. Using SolAster we attempt to recover the RM signal of these planetary transits to understand the noise floor (both astrophysical and instrumental) of RV measurements due to the extremely low amplitude expected from these transits.

Mercury 2016 Transit
The RV signal induced during the Mercury 2016 transit was expected to be on the ∼5 cm s −1 level, and our constructed RV model from SDO /HMI (using the weighting factors in Table 2) shows significantly higher variability. Through observations of the Sun, we have established that solar RV variations are driven by bright faculae (in regions of concentrated plage). From long-term surveys, such as the Mt Wilson HK project (Baliunas et al. 1988), we know the surfaces of old, slowly rotating Sun-like stars are faculae-dominated (Radick et al. 1983;Lockwood et al. 1984). These surveys monitored the optical photometric variations and the Ca II H&K over decades for FGK stars noting that brightness increases as a function of activity, just as observed for the Sun throughout its magnetic cycle. Therefore, these stars are faculae rather than spot dominated and we expect RV variations to be driven by the suppression of the convective blueshift, as seen for the Sun. This is generally consistent with our observed ∆v conv in Fig-ure 10, which has significantly more scatter than∆v phot . This result serves as a glaring example of the ability of stellar activity to degrade RV sensitivity to planetary signals, even over short timescales.

Mercury 2019 Transit
Identical to the construction of the model RVs for the Mercury 2016 transit, we also look at the November 11, 2019 transit of Mercury. This transit period was clear of sunspots and occurred while the Sun moved out of the absolute solar minimum of Solar Cycle 24 which occurred in October 2019, just prior to the transit. These low activity conditions provided an ideal background for the recovery attempt of this ∼5 cm s −1 signal. We show our recovery attempt along with the component velocities in Figure 11. This transit occurred at an even lower level of solar activity than the 2016 Mercury transit, but the overall RV variability is still largely dominated by the convective velocity component.

Venus 2012 Transit
The Venus transit occurred during a period of high activity in the Solar Cycle, however this specific day had relatively low solar activity with a small sunspot and facular filling factor which remained consistent throughout the transit period. The transit occurred from 22:09 UTC on June 5, 2012 until 04:49 UTC on June 6, 2012. Using the SolAster pipeline, we calculate the RV com- ponents and reconstruct the model RV variation as outlined in Section 2.10. We show the overall velocity signal is largely dominated by the RM signal, rather than the convective velocity component (see Figure 12), allowing for a clean recovery of the RM waveform.

DISCUSSION
Our Python based, publicly available SDO /HMI analysis pipeline (SolAster) allows us to calculate both magnetic observables and 'Sun-as-a-star' RV variations using space-based data for comparison with groundbased measurements. Moving forward, these data products will aid in studies aimed at deriving new stellar activity indicators explorations in ground-based spectra. By looking at correlations between the space-based data and ground-based measurements from RV facilities such as NEID, we hope to find stronger proxies for stellar activity in Sun-like stars, which would help to improve planet detection sensitivity in future RV surveys. Leveraging the SDO /HMI and NEID data, there are a variety of paths we aim to explore in future studies, including: • Comparing line-by-line (Dumusque 2018) and integrated CCF metrics with space-based observables, and more closely explore the metrics that show the most promise as activity proxies (CCF integrated area and amplitude being two examples that show promise based on our preliminary study).
• Exploring the wavelength dependence of correlations between NEID measurements and SDOpipeline calculations both in RVs and CCFs. This could allow us determine whether the line shape variability is driven by Zeeman effects (Reiners et al. 2013). If this is the case, we would expect to see a stronger correlation between unsigned magnetic flux and the CCF depth/integrated area in the redder portions of the spectrum. If instead the variation is driven by the inhibition of the convective blueshift, then we ex-  pect to see stronger correlation in the blue lines (Reiners et al. 2013).
• Detrending NEID RVs against the unsigned magnetic flux (|B obs |) using more complex parameterizations: including the FF' (Aigrain et al. 2012) technique and Gaussian Processes (Haywood et al. 2014) and others. While our preliminary linear detrending ( Figure 5) did improve the scatter for a subset of the NEID data, a more robust use of the SDO data, looking at the solar rotation period and life time of sunspots, to detrend the RVs could further decrease the activity signal.
• Revisiting our analysis on times with heightened solar activity. The time span used in this study (December 2020 -May 2021) was during a period of low solar activity. While there were times with sunspots and slightly increased activity levels, the overall activity level was very low, both for the Sun itself and in comparison with other Solar-type stars. Studying periods of higher solar activity (which we are entering now) that have complimentary ground-based spectra (e.g. 2013, with HARPS-N (Dumusque et al. 2021) would be useful for quantifying how the level of solar activity affects the correlations between the ground-based metrics and space-based observables. This would also allow us to fine tune our scaling factors ( Table 2) and derive more precise model RVs. This would also allow us to better understand how the amplitudes and relative contributions of the two velocity components, ∆v conv and ∆v phot , vary as a function of time and activity level.

CONCLUSION
Using SDO /HMI data products and ground-based spectra, we studied the 'Sun-as-a-star' to estimate RV variations due to forms of solar activity. In doing so, we developed a standalone solar data analysis package, SolAster, using the methods outlined by Haywood et al. (2016) and Milbourne et al. (2019) to compute model RVs, and validated our results against these previously published datasets. We also calculated additional magnetic observables to search for any correlations between these measurements and ground-based RVs from NEID. We found that while the RV variation is driven by the combination of convective blueshift suppression and the rotational Doppler imbalance due to sunspots and plage, the dominant component is due to the convective blueshift suppression, confirming previous results by Meunier et al. (2010b) and others. We found the RMS scatter of both ∆v conv and ∆v phot are lower than the results of Haywood et al. (2016) and Milbourne et al. (2019) due to our study taking place during a period of minimal solar activity.
We found that plage regions are the dominant driver of the observed magnetic flux and convective blueshift suppression, as seen in Figure 6, and the overall RV variation is dominated by the convective blueshift suppression. The photometric contribution, primarily affected by bright active regions and sunspots, is quite minimal and strongly dependent on sunspot filling factor. These conclusions support previous work on this topic by Meunier et al. (2010b), Haywood et al. (2016), Milbourne et al. (2019), and Haywood et al. (2020).
Filtering the NEID data for only days with optimum observing conditions, we find a strong correlation between the ground-based RVs, model RVs, and unsigned magnetic flux. There is a strong correlation between B obs and the NEID RVs, which we removed via a linear decorrelation to improve the RMS scatter down to the ∼60 cm s −1 level.
To better understand which surface features drive active region magnetic flux, we compared a variety of activity indicators and proxies for solar activity. We found that large facular regions known as plage are the dominant component driving the temporal flux variation. Additionally, we found a strong correlation between these magnetic observables and RV variations, aligning with current work in this field (Haywood et al. 2016;Haywood et al. 2020). This provides additional evidence for the exploration of unsigned magnetic flux as a proxy for stellar activity, especially in Sun-like stars.
We also investigated correlations between spectral line shape and magnetic activity indicators. We built a variety of physically motivated line masks to better quantify the affect of magnetic activity on different spectral line parameters. We found that RVs calculated using masks built from either deep lines or shallow lines show similar correlation with both ground-based NEID RV pipeline measurements and unsigned magnetic flux in comparison to RVs from shallow lines.
Using SolAster, we are able to recover the planetary RM signal for the Venus 2012 transit at high SNR (see Figure 12). However, the planetary RM signals in both the 2016 and 2019 Mercury transits are dwarfed by activity signals, which dominate the RV noise floor over during transit. While the 2019 Mercury transit took place during a period of low solar activity, we are still unable to measure the RM signal due to colluding noise from solar activity at levels ∼2-4× the RM signal. We used these transit events to both empirically gauge the noise floor of our integrated RV measurements, as both signals are expected to be of low amplitude (< 50 cm s −1 ), and study activity signals at short (<8 hour) timescales, finding that even at low activity levels we were unable to recover the RM signal due to the Mercury transit highlighting the incredibly small amplitude of the signal.
While a simple linear detrending with unsigned flux enabled a measurable improvement in a subset of NEID RVs down to the ∼60 cm s −1 level, there are still additional avenues of exploration required in order to reach the ∼10 cm s −1 sensitivity needed to detect Earth-like planets orbiting Sun-like stars. We are able to recover a planetary transit (RM signal) with an amplitude on the 10's of cm s −1 level, showing the promise of using this method to improve our understanding of stellar activity and its effect on RV measurements at a variety of timescales. With more ground-based observations from NEID and the upcoming increase in solar activity, we will be able to continue improving our modeling methods and more finely hone our understanding of how specific types of stellar activity affect ground-based high resolution spectra.

ACKNOWLEDGEMENTS
The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). This research used version 3.1 (SunPy Community et al. 2020) of the SunPy open source software package.
Based on observations at Kitt Peak National Observatory, National Optical Astronomy Observatory, which is operated by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation. These results are based on observations obtained with NEID on the WIYN 3.5m Telescope. WIYN is a joint facility of the University of Wisconsin-Madison, Indiana University, NSF's NOIRLab, the Pennsylvania State University, Purdue University, University of California, Irvine, and the University of Missouri. The authors are honored to be permitted to conduct astronomical research on Iolkam Du'ag (Kitt Peak), a mountain with particular significance to the Tohono O'odham.