GRASS. II. Simulations of Potential Granulation Noise Mitigation Methods

We present an updated version of the GRanulation And Spectrum Simulator (GRASS) which now uses an expanded library of 22 solar lines to empirically model time-resolved spectral variations arising from solar granulation. We show that our synthesis model accurately reproduces disk-integrated solar line profiles and bisectors, and we quantify the intrinsic granulation-driven radial-velocity (RV) variability for each of the 22 lines studied. We show that summary statistics of bisector shape (e.g., bisector inverse slope) are strongly correlated with the measured anomalous, variability-driven RV at high pixel signal-to-noise ratio SNR and spectral resolution. Further, the strength of the correlations varies both line by line and with the summary statistic used. These correlations disappear for individual lines at the typical spectral resolutions and SNRs achieved by current extremely precise radial velocity spectrographs; so we use simulations from GRASS to demonstrate that they can, in principle, be recovered by selectively binning lines that are similarly affected by granulation. In the best-case scenario (high SNR and large number of binned lines), we find that a ≲30% reduction in the granulation-induced root mean square RV can be achieved, but that the achievable reduction in variability is most strongly limited by the spectral resolution of the observing instrument. Based on our simulations, we predict that existing ultra-high-resolution spectrographs, namely, ESPRESSO and PEPSI, should be able to resolve convective variability in other, bright stars.


INTRODUCTION
Intrinsic stellar variability is one of the chief obstacles limiting the detection of small, rocky planets in the habitable zones of Sun-like stars (i.e., "Earth twins") with the radial-velocity (RV) method (Crass et al. 2021).Various strategies have been devised for coping with different sources and manifestations of this intrinsic variability.For stellar pulsations (p-modes), Chaplin et al. (2019) devised theoretically-motivated observing strategies designed to mitigate residual p-mode amplitudes to the sub-10 cm s −1 level; these strategies have since been widely implemented in RV surveys (e.g., Blackman et al. 2020, Gupta et al. 2021) and explicitly validated by Gupta et al. (2022).For magnetic activity on rotation timescales, various works have employed Gaussian Processes (GPs; e.g., Haywood et al. 2014;Rajpaul et al. 2015Rajpaul et al. , 2016;;Gilbertson et al. 2020;and references therein), statistically-motivated methods (e.g., Collier Cameron et al. 2019;Zhao & Tinney 2020), and neural networks (e.g., de Beurs et al. 2022;Liang et al. 2024) to disentangle the velocity contributions of spots and faculae from those of planets with the aid of various activity indicators.Additionally, models such as SOAP-GPU (Zhao & Dumusque 2023) have been developed to numerically forward model activity-driven variability.However, aside from integrating over several hours, astronomers lack an effective strategy for mitigating the noise introduced by granules that has been definitively demonstrated and implemented in extant extremely precise radial velocity (EPRV) surveys (Cegla et al. 2019;Crass et al. 2021).Indeed, two recent works have shown that supergranulation, the manifestation of convection on larger spatial and longer temporal scales than granulation (see reviews such as Rieutord & Rincon 2010 and Cegla 2019), is the dominant cause of RV variability during solar minimum (Lakeland et al. 2024) and that (super)granulation will preclude the blind detection of Earth-twin exoplanets even if variability from magnetic activity can be perfectly mitigated (Meunier et al. 2023).
As RV spectrographs have begun to achieve instrumental precisions and stabilities at and below the ∼1 m s −1 level, granulation has become a greater cause for concern in the EPRV community.Recognizing that stellar oscillations and granulation would constitute large sources of RV noise problematic for discovery and characterization of Earth-mass planets with the HARPS spectrograph, Dumusque et al. (2011) evaluated various observations strategies in order to determine which ones optimally averaged out noise from these processes.Using simulated RV time series generated from observationally-derived stellar velocity power spectra, Dumusque et al. (2011) determined that averaging multiple observations per night spaced apart by hours more effectively mitigates granulation noise than a single long or multiple consecutive exposures.They conclude, quite optimistically, that "granulation phenomena and oscillation modes will not prevent us from finding Earthlike planets in habitable regions." Following the Dumusque et al. (2011) study, Meunier et al. (2015) took a differing but complementary approach for assessing the impact of (super)granulation on the measurement of precise RVs.Tiling a stellar hemisphere with simulated granules of varying sizes, intensities, and velocities given relations for these quantities derived from empirical laws and studies of hydrodynamical (HD) simulations (see §2.2 of Meunier et al. 2015 and references therein), they allowed the granules to probabilistically evolve in time.Over a simulated 12-year time span, they note a root mean square (RMS) RV of ∼0.8 m s −1 and a photometric RMS of 67 ppm.Simulating observation strategies for mitigating this variability, Meunier et al. (2015) show that the RMS from granulation cannot be sufficiently reduced for averaging timescales commensurate with the granule lifetime (∼10-15 minutes in the case of the Sun).After 30 minutes of averaging, they report an RMS RV of about ∼50 cm s −1 , which they state is in conflict with the claim made by Dumusque et al. (2011) that granulation (but not supergranulation) can be largely averaged out over these timescales.Nevertheless, they do report that performing multiple measurements over the course of a night is more successful in reducing the granulation RMS than backto-back observations, but not to the degree reported by Dumusque et al. (2011).
The results presented in both Dumusque et al. (2011) and Meunier et al. (2015) consistently suggest that spreading observations out over a night (or over several nights in the case of supergranulation; see §4 of Meunier et al. 2015) more effectively mitigates the impact of granulation than consecutive exposures.However, they disagree somewhat about the absolute effectiveness of their best-case scenario observation strategies, with Meunier et al. (2015) painting a notably more pessimistic picture.Considering more recent results, including Meunier et al. (2023) and Lakeland et al. (2024), it does appear that Dumusque et al. (2011) underestimated the amplitude and impact of granulation on precise RV measurements.
It is important to note, though, that the methods used for simulating and interpreting stellar velocities in these works differ fundamentally from how velocities are measured in reality.Whereas Dumusque et al. (2011) and Meunier et al. (2015) directly simulate disk-integrated stellar velocities, in practice RVs are inferred from the observed Doppler shifting of absorption lines in stellar spectra.These measured RVs will differ from the somewhat fictitious RVs studied by Dumusque et al. (2011) and Meunier et al. (2015) because lines form over a range of heights in stellar atmospheres, tracing different portions of the full 3D atmospheric velocity field with varying sensitivities as a function of height.
To address this complicating reality, a series of papers (Cegla et al. 2013(Cegla et al. , 2018(Cegla et al. , 2019) ) used 3D magnetohydrodynamic (MHD) simulations coupled with detailed radiative transfer modeling to faithfully reproduce stellar magnetoconvection and absorption profiles for the Fe I 6302 Å line (see also Cegla 2019 for a detailed review).Through this series of papers, the authors show that perturbations in the shapes of their model diskintegrated line profiles encode information about spurious Doppler shifts from granulation.More specifically, they find strong correlations between the apparent RV of the line studied and various summary statistics designed to describe the bisector curve, including, e.g., bisector inverse slope (BIS, Queloz et al. 2001).
In principle, such correlations are promising: precise measurements of individual bisector asymmetries predict the anomalous (i.e., granulation-induced) velocity shift, and could consequently be used to mitigate velocities from granulation.However, the authors of Cegla et al. (2019) consider only one absorption line in their work and consequently caution that the optimal bisector summary statistic (or the optimal regions of the bisector for measuring such summary statistics) will likely vary among lines.They further warn that the typical spectral resolutions and line spread function (LSF) sampling (i.e., pixelization) of extant spectrographs, together with the effects of photon noise (see Povich et al. 2001), will preclude measuring sufficiently precise bisectors for such a mitigation technique to be viable.
In order to address the limitations of and complement previous studies of granulation in the context of EPRVs, Palumbo et al. (2022) presented the GRanulation And Spectrum Simulator (GRASS), an open-source computation tool which uses time-and disk-resolved solar spectra first observed for Löhner-Böttcher et al. (2018, 2019) and Stief et al. (2019) to empirically model changes in line shapes created by granulation.Validating this tool, Palumbo et al. (2022) showed that GRASS reproduces the time-averaged line profile and bisector of the Fe I 5434 Å line observed in the disk-integrated Sun by Reiners et al. (2016b), as well as the degree of variability expected from granulation.
In this work, we present a large update to the GRASS software (v2.0), and use it to explore potential avenues for mitigating granulation noise in EPRV measurements.In §2, we describe the solar observations which are used to synthesize spectra ( §2.1) and provide an overview of the procedure used by GRASS to construct disk-integrated line profiles from the input solar observations ( §2.2).In §3, we validate our model for line synthesis by comparing our synthetic disk-integrated line profiles to the time-averaged profiles observed in the IAG Solar Atlas (Reiners et al. 2016b).In §4, we characterize the line-by-line variability in each of the solar lines studied in this work, and simulate avenues for recovering the line-shape vs. anomalous RV correlation with limitations imposed by finite spectral resolution and SNR.In §5, we interpret and discuss our results in the context of previous studies concerned with the impact of granulation on the measurement of precise RVs, comment on future avenues for demonstrating the mitigation techniques considered in this work, and highlight various limitations of GRASS and the scope of this study.Finally in §6, we briefly summarize our findings and em-phasize that granulation, strictly speaking, encodes coherent signals, rather than simple noise, in the shape of stellar lines.GRASS uses disk-and time-resolved observations of solar lines to synthesize time series model stellar (i.e., diskintegrated) spectra with variability from granulation.GRASS and supporting documentation are publicly available on GitHub1 .Tagged version releases corresponding to Palumbo et al. (2022) and this work are archived on both GitHub and Zenodo (Palumbo et al. 2023a).In this section, we briefly describe the solar observations used as input to GRASS ( §2.1), provide an overview of the procedure used by GRASS to synthesize disk-integrated spectra ( §2.2), and detail our process for measuring velocities from lines ( §2.3).Additional, software-specific implementation details for GRASS v2.0 (including the input data pre-processing, stellar grid tiling procedure, and the GPU implementation) are discussed in detail in Appendix A.

Summary of Simulation Input Data
In Palumbo et al. (2022), we used disk-and timeresolved observations of the Fe I 5434.5 Å line from Löhner-Böttcher et al. (2019) to produce synthetic timeresolved, disk-integrated line profiles.In this work, we use additional spectra originally presented in Löhner-Böttcher et al. (2018), Stief et al. (2019), andLöhner-Böttcher et al. (2019) to synthesize other disk-integrated lines.The observed spectral regions include 22 solar lines which are sufficiently unblended and deep to use as models for spectral variability from granulation.Example disk-center spectra from Löhner-Böttcher et al. (2018), Stief et al. (2019), andLöhner-Böttcher et al. (2019) are shown in Figure 1 with annotations for the lines used in this work.A tabulation of the atomic parameters for these lines is provided in Table 1.We below provide a brief summary of the processing procedure for these data; a more comprehensive description is provided in Appendix A.1.
LARS, the instrument that provided input spectra for this work, is described in detail in Löhner-Böttcher et al. (2017).The observing scheme and reduction procedure used to obtain these spectra is summarized in Palumbo et al. (2022) and explained in greater detail in Löhner-Böttcher et al. (2018), Stief et al. (2019), andLöhner-Böttcher et al. (2019).In brief, spectra were observed  2019), but are excluded from the analyses presented in this work, as explained in §2.Other lines are of telluric origin, significantly blended, and/or very shallow.Note the breaks in the wavelength axis; eight spectral regions spanning only a few Å each were observed.Spectroscopic properties and parameters for these lines are tabulated in Table 1.
at 41 locations along the North-South and East-West axes of the Sun; only quiet-Sun regions were observed.
The spectra have R ∼ 700,000, and are calibrated on an absolute wavelength scale with accuracy ∼0.02 m Å (or about 1 m s −1 ).On short timescales (e.g., over the duration of the ≳20-minute observations baselines), Löhner-Böttcher et al. (2017) report that the instrument instability is the largest source of uncertainty in the positions of lines; generally this error is at the level of a few cm s −1 , but can be larger in some cases.In this work, we quantify the impact of this uncertainty by repeatedly synthesizing many realizations of our synthetic spectra, from which we measure sample means and standard deviations for the RV variability of lines.This process is described further in §4.1.
As in Palumbo et al. (2022), we measure line bisectors and widths as a function of depth into the line for all observed lines for each 15-second bin, which together we refer to as the "input data," following the nomenclature established in Palumbo et al. (2022).These input data losslessly encode the temporal variability in the shape of each line and are used as the basis of our stellar surface simulation and integration described in the following section, §2.2.Our process for measuring these input data from the solar spectra, which has changed slightly from Palumbo et al. (2022), is described in detail in §A.1.This pre-processing of the solar observations was performed once, and the resulting data products are downloaded from Zenodo (Palumbo et al. 2023b) by GRASS upon installation.

Overview of GRASS Synthesis Procedure
Owing to the expanded library of solar templates lines available to GRASS, we have slightly updated and optimized the procedure it uses to create synthetic, diskintegrated spectra from these input data.Here, we describe the synthesis procedure used by GRASS, highlighting changes and updates to the synthesis procedure detailed in §3.2.1 of Palumbo et al. (2022).
Prior to the spectral synthesis step, GRASS first tiles a model stellar grid and computes the requisite weights and rotational velocities (see §A.2).These quantities are computed once and stored in memory, since they are assumed to be unchanging in time.Following the initial tiling and geometric computations, GRASS uses the input data to reconstruct disk-resolved line profiles in each tile on the model stellar surface for each time step of the simulation.As in Palumbo et al. (2022), GRASS selects the input data with the closest µ value and matching directional axis for each stellar surface tile.In the event that a certain axis and µ position lack data for a given template line, the next-nearest input data are used.
The temporal variability in each disk-resolved line profile is directly encoded from the time-series input observations.However, as described in §3.2.3 of Palumbo et al. (2022), we do apply a random phase offset to the input data in each stellar surface tile.This random  (2019).All other quantities are from the NIST Atomic Spectra Database (Kramida et al. 2020).Rows marked with a double dagger ( ‡ ) are excluded from the analyses presented in this work, as explained in §2.The number of significant figures reported matches those given in the NIST Atomic Spectra Database or Löhner-Böttcher et al. 2019 (for daggered quantities).phase offset ensures that the disk-resolved line profiles in adjacent tiles do not unrealistically move in concert.Because some template lines were observed contemporaneously (e.g., Fe I 5432 Å and Fe I 5434 Å are in the same observed spectral region, and so were always observed simultaneously; see Figure 1), the input data for these lines is, by default, kept in-phase within each stellar surface tile.The applied phase offsets also lead to the suppression of p-modes in the final disk-integrated spectrum, since the oscillations in the input data are added incoherently (see §3.2.4 of Palumbo et al. 2022).
Similarly to GRASS v1.0 (Palumbo et al. 2022), the disk-integrated flux as a function of time and wavelength is then calculated as the (weighted) sum over the individual tile intensities, where (as noted in Appenidx A.2) tile weights are given by the product of the limb-darkened continuum intensity and the projected tile area (analogous to Equation 18.3 of Gray 2008).As in Palumbo et al. (2022), these summations are performed in place to avoid excess memory allocation (i.e., the disk-resolved spectra computed for each tile are not retained), which would become quite unwieldy for larger spectra.

Measurement of Velocities from Spectra
As in Palumbo et al. (2022), we compute CCFs in order to measure velocities from individual lines and spectra.In implementation, EchelleCCFs (Ford et al. 2021) was used to cross correlate the considered line profile(s) with a Gaussian template mask centered at the known rest-frame wavelength of said line and width (in units of velocity) given by the speed of light divided by the spectral resolution.The mask was projected onto the line profile in steps of 100 m s −1 .The extremum of the resulting cross-correlation function (CCF) was fit with a Gaussian function via non-linear least squares, and the RV was taken as the mean of the best-fit Gaussian.
Our velocity-measurement procedure was tested by injecting known Doppler shifts into synthetic spectra; we find that the recovered velocities are accurate, with er-rors at the ≲1 cm s −1 level (compare to the expected amplitude of granulation noise at a few to several tens of cm s −1 ).We also tested other combinations of mask shapes (tophat vs. Gaussian) and functional forms for fitting the CCF peak (quadratic vs. Gaussian) and found that the chosen combination of Gaussian mask with Gaussian fit performed the best.

GRASS MODEL VALIDATION
In Palumbo et al. (2022), the line synthesis procedure used by GRASS was validated by comparison of timeaveraged line profiles to the IAG Solar Atlas (Reiners et al. 2016b).In the following subsections, we reperform this validation for each template line used in this work ( §3.1), and also discuss how effectively these template lines are able to model the shapes and behaviors of other lines ( §3.2).

Disk-Integrated Line Profiles and Bisectors
In Palumbo et al. (2022), we verified that GRASS could accurately reproduce disk-integrated line profiles and bisectors using disk-resolved spectra as input.Here, we validate the synthesis for the other lines presented in this work.Similarly to Palumbo et al. (2022), we compare time-averaged synthetic line profiles from GRASS to line profiles observed in the IAG Solar Atlas (Reiners et al. 2016b) in order to semi-quantitatively assess our synthesis accuracy.
To perform this comparison, we degrade the spectral resolution of the IAG Solar Atlas to the nominal LARS resolution of R ∼ 700,000 via convolution with a Gaussian LSF given by: where σ(λ) is given by and λ is the wavelength sampled by a given pixel.To compute residuals, we then perform a flux-conserving interpolation onto a common grid of wavelengths with ∼4 pixels per LSF FWHM (following the algorithm of Carnall 2017).Rather than computing bisectors directly from the line profiles, we instead measure bisectors from individual-line CCFs.These CCF bisectors are smoother (and consequently easier to compare) than bisectors measured directly from line profiles.The resulting line profiles and bisectors are discussed individually in Appendix B. An example set of comparisons is shown in Figure 10; the full set of comparison figures is available from the online journal.The full ensemble of reproduced bisectors are shown in Figure 2 and discussed further in §3.2.
In general, we find that we are able to synthesize line profiles that are faithful to those seen in the IAG Atlas with error within about 1% flux.Deviations are usually caused by blends in the wings of the lines of interest, which are modeled out in the pre-processing stage for the LARS data (see §A.1).Likewise, bisectors are most accurate where the first derivative of the line profile is largest, but tend to deviate in the top 20% or so of line, owing to blends and/or our imperfect modeling of the line wings in the preprocessing stage, as well as at the very bottom of the line, where interpolation noise and measurement error become large.Below ∼80% of the continuum, the velocity errors in the line bisectors are generally within a couple of m s −1 and within 1 m s −1 in the best cases.For lines that have blends in their wings, the velocity errors are larger toward the continuum (above 10 m s −1 in some cases).As noted in Palumbo et al. (2022), deviations in the cores of lines (particularly the deeper lines) could arise from chromospheric activity captured during the observation of the IAG Atlas during a period of heightened solar activity in 2014 (Hathaway 2015;Reiners et al. 2016b).Specific comments for each line are given in Appendix B, as well as Section 3 and Appendix A of Löhner-Böttcher et al. (2019).

Use of Template Lines as Generalized Line Models
Only a limited number of lines were observed by Löhner-Böttcher et al. (2018, 2019).Since thousands of absorption lines are generally used to measure velocities from spectra, v1.0 of GRASS (Palumbo et al. 2022) modeled lines of differing depths by truncating the deep Fe I 5434 Å line bisector (see §3.2.2 of Palumbo et al. 2022).This approach was motivated by the heuristic presented in Gray (2008), wherein it is noted that the bisectors of shallower lines tend to reflect the shape of the upper portions of bisectors of deeper lines (see their Figure 17.14).For comparison, we have overplotted the synthetic-disk integrated bisectors produced in this work in Figure 2.
It is readily apparent that the four blend bisectors at right in Figure 2 cannot be superimposed as readily as the classical "C"-shaped bisectors at left, with the potential exception of Ni I 5435 Å and Ca I 6169.5 Å.Further, even in the left-hand plot of Figure 2, there is notable variance between the bisectors, especially in the case of the three deepest lines: Fe I 5576 Å, Fe I 5250.6 Å, and Fe I 5434 Å.This imperfect correspondence is not entirely surprising, given that the limited number of lines considered were not cherry-picked for the similar- ities in their bisectors (as is the case in Figure 17.14 of Gray 2008).Rather, the lines and spectral regions observed for Löhner-Böttcher et al. ( 2019) and then used in this work were chosen based on their past usage in the heliophysics literature.
Although the picture painted by Figure 17.14 of Gray (2008) is certainly convenient, it is clear from Figure 2 that the correspondence of bisectors values at given intensities/depths is not universal.This reality is not entirely surprising.Lines do not form at a single representative "formation height", but rather over a run of heights that differs in accordance with the properties of the stellar atmosphere and the atomic/ionic properties of a line-forming species.Consequently, a given continuum-normalized intensity cannot be neatly mapped to a singular physical height in the solar atmosphere across differing lines.For the results presented in this study, we only use the 22 template lines to reconstruct their own disk-integrated profiles.Future works could explore which template lines used herein are bestsuited to model variability in other lines not observed by LARS, but such a study is beyond the scope of the analysis presented in this work (see also the discussion in §5.1 and §5.2).

SIMULATIONS OF GRANULATION MITIGATION
As is visually apparent from Figure 2, granulation affects lines differentially.Consequently, measured RV variability will differ line by line, and the amount of granulation noise measured in RV observations will differ depending on the set of lines used.In the following subsections, we examine and quantify this line-by-line variability for the set of solar lines modeled in this study ( §4.1), explore correlations between line-shape summary statistics and anomalous RVs ( §4.2), and simulate how these correlations might be used for mitigation given the constraints of existing instrumentation ( §4.3).

Line-by-line Variability
It has been widely shown that the measured extent of variability manifests differentially between lines and sets of lines.In the case of individual lines, Elsworth et al. (1994) and Pallé et al. (1999) report different RMS RVs for the K I 7699 Å line and the Na I D 1 and D 2 lines, respectively.Considering sets of lines, Al Moulla et al. (2022Moulla et al. ( , 2024) ) showed that the measured RMS RV varies among sets of solar lines binned by formation temperature.As suggested by these studies, simply measuring RVs from a minimally variable set of lines may constitute one potentially viable, albeit limited, method for granulation mitigation.To quantify the different levels of variability in lines, we synthesized many diskintegrated time series for each of the 22 lines considered in this work, from which we measured the RMS of the individual line RVs.
The procedure for determining these individual line RMS values is as follows: GRASS was used to generate 40-minute disk-integrated time series with a time resolution of 15 seconds for each considered line profile (see Appendix A.1 for details on the temporal cadence and baseline of the input data); velocities were measured from these line profiles as described in §2.3 (see also §3.3 of Palumbo et al. 2022); lastly, an RMS was measured for the resulting RV time series.This process was repeated many times with different realizations of the Although there is an appreciable difference between the most and least variable lines, it is not immediately apparent how the variability of a line could be predicted from the formation or atomic properties of a line.
synthetic line profiles to yield a robust mean and standard deviation for each RMS RV.
The resulting RMS values are shown in descending order in the left-hand panel of Figure 3 and are reported in Table 2.There is a notable difference in the RMS between the most and least variable lines.At the high end, the Fe I 5379 Å line exhibits a ∼75 cm s −1 RMS.At the low end, Mn I 5432 Å has a ∼45 cm s −1 RMS.Given this nearly ∼30 cm s −1 difference between the most and least variable lines, a fruitful granulation mitigation strategy might consistent of identifying the least variable lines and measuring RVs from only these lines, rather than from a larger line list also including highly variable lines.To enable such an approach, one would need to know the intrinsic variability of each line for a given star.If such a quantity can not be measured directly from spectra (owing to e.g., limitations from SNR, sampling, etc.), then predicting variability from a proxy variable may be the next-best approach.However, we note no significant trend connecting the RV variability of a line to its depth, wavelength, or the potential of the lower/upper state of its transition.
Recently  2024).However, given the small number of lines considered, the significance of such a trend in our data is questionable at best.We discuss these results and prospects for studying line-by-line variability further in §5.1.

Mitigation via Bisector Diagnostic Correlations
Using disk-integrated Fe I 6302 Å profiles derived from MHD simulations, Cegla et al. (2019) showed notable correlations between various line-shape summary statistics and the anomalous (i.e., granulation-induced) RV measured.An example of such a correlation between the bisector inverse slope (BIS) and velocity for synthetic Fe I 5434 Å profiles produced by GRASS are shown in Figure 4.Note that the amplitude of the bisector variations (shown in the left-hand panel of Figure 4) were amplified by 50× in order to make these variations (which are at the sub-m s −1 scale) visible on the scale of the mean bisector (which spans multiple hundreds of m s −1 from the red-most to the blue-most point).
Notably, the simulations performed by Cegla et al. ( 2019) considered only one line (the Fe I 6302 Å line) under an artificially strong vertical magnetic field, which they note inhibits convection (and therefore the ob-served variability of their line) relative to what is observed in the quiet Sun.To supplement the analysis presented Cegla et al. (2019), we here perform a similar bisector analysis for the 22 lines studied in this work.
To perform this analysis, we first used GRASS to synthesize time series for each of the 22 template lines considered, and then measured various bisector-shape summary statistics and RVs for each time snapshot of each simulation.RVs were measured from CCFs as in §4.1, and bisector shape diagnostics were measured from the CCF bisectors.As in §3.1, we measured bisectors from CCFs, rather than directly from line profiles, in order to produce smoother curves.In later sections (namely §4.3), we will also measure bisector diagnostics for CCFs computed from many lines; by measuring CCF bisectors for individual lines, we can directly compare the performance of the bisector diagnostics on lines individually and in aggregate.Finally, mitigated RVs were measured by subtracting off the RVs predicted by a linear fit between the raw measured RVs and each bisector diagnostic.This procedure was repeated many times for each line using many different realizations of the synthetic profile time series to measure robust averages for the correlation coefficients and improved RMS RVs.These initial simulations and correlations were constructed for line profiles synthesized at R = 700,000 with SNR = ∞ (i.e., no additional noise, photon or otherwise, was simulated in the synthesized spectra).We acknowledge that these simulation parameters are certainly not realistic; this initial analysis was performed in order to determine the relative usefulness of each bisector shape diagnostic in a hypothetical best-case scenario where granulation is the sole source of RV noise.The effects of various limitations imposed by more realistic observing conditions (including spectral resolution and photon noise) are evaluated in §4.3.
The considered bisector shape diagnostics include: bisector inverse slope (BIS), bisector curvature (C), and bisector span (or amplitude, a b ).Definitions for these diagnostics are given below; see also Figure 6 of Cegla et al. (2019) for an excellent illustrative demonstration of these shape diagnostics.The bisector inverse slope (BIS, Queloz et al. 2001) is given by: where v t is the average velocity of the bisector between 10% and 40% of the line depth, and v b is the average velocity between 55% and 90%.The bisector curvature (C, Povich et al. 2001) is given by: where v 3 , v 2 , and v 1 regions are bounded at 20%-30%, 40%-55%, and 75%-95% of the line depth, respectively.Lastly, the bisector amplitude (a b , Livingston et al. 1999) is given by: where v min is the blue-most velocity in the bisector curve and v 0 is the velocity in the line core (the bottom of the bisector).In practice, we calculate v 0 as the mean velocity in the bottom few points of the bisector, excluding the bottom-most measurement which is often highly erroneous to due to interpolation noise.We present and analyze the results of these simulations below in §4.2.1, where correlation coefficients and corrected RMS RVs were calculated using the default definitions of the bisector diagnostics given above.In §4.2.2 we follow (Cegla et al. 2019) and iteratively "tune" the definitions of these diagnostics to maximize the correlation for each line.

Default Bisector Diagnostics
The results of the granulation mitigation simulations described above are tabulated in Table 3.For each line and bisector diagnostic, the mean Pearson correlation coefficient (R), mean RMS RV, 1σ variation in RMS RV, and percent improvement (rounded to the nearest whole number) over the RMS RVs reported in Table 2 are shown.The reported improvements should be taken as approximate and representative; a 1% change in the RMS RV would amount to ≲1 cm s −1 at the ∼30-70 cm s −1 velocity scale of granulation.
The best-performing diagnostics are able to remove, optimistically, ∼25-33% of the intrinsic RV variability measured for each line.However, the level of correction achieved by each diagnostic was inconsistent across the considered lines.E.g., although BIS performed quite well for Fe I 5434 Å, this diagnostic's performance varied widely for other lines.Likewise, the bisector amplitude performed well for a small handful of lines, but failed to achieve any meaningful improvement for several lines.Interestingly, the bisector curvature performed poorly across the board, with the notable exception of Fe I 5383 Å.This variance is not surprising: Cegla et al. (2019) inferred that the BIS performed well when the v t and v b regions from which this quantity is calculated bound the bend of the "C" and bottom of the bisector (corresponding to the line core), respectively, for their synthetic Fe I 6302 Å line.This picture is generally corroborated by our results: BIS generally performs best for lines with classical "C"-shaped bisectors whose bend and core happen to be captured by the canonical v t and v b regions of 10%-40% and 55%-90% depth.Indeed, we find that BIS performs well for our Fe I 6302 Å line simulations, removing ∼24% of the line's intrinsic variability.
In the context of Cegla et al. (2019), the observed variance of the bisector amplitude is perhaps harder to explain.Yet, on closer inspection it is clear that the bisector amplitude tends to fail to predict the anomalous RV for lines whose bisectors deviate from the classical "C" shape.Two salient examples are Fe I 6170 Å (which has a "/"-shaped bisector owing to line blends; see the online journal figure set and Appendix B) and Fe I 5436.3Å (which has a bisector corresponding to only the top-most portion of the classical "C" shape; see the online journal figure set and Appendix B).In the latter case, the bisector amplitude hardly varies from 0, since the blue-most point and the core of the bisector are, in most snapshots in any given synthetic time series, one and the same.
The generally poor performance of the bisector curvature C, might suggest that increasing the number of averaging regions (three, compared to two for BIS) does not necessarily improve sensitivity to changes in bisector asymmetry.In some part, though, the poor performance of C may also be happenstance: the canonical averaging regions may not be ideally placed to probe changes in bisector curvature for every line.In the following section, we explore this possibility by adjusting the averaging regions used in both BIS and C in order to maximize their diagnostic power for each line.

Optimized Bisector Diagnostics
As noted by Cegla et al. (2019), these bisector diagnostics (except for a b ) are not "tailored" to the properties of a given line: although the v t region happens to capture the kink in the "C"-shaped curve of the Fe I 5434 Å line (see Figure 4), it may not for another line.To address this fact Cegla et al. (2019) iteratively varied the averaging regions used in the definitions of BIS and the bisector curvature C in order to optimize their correlations with the measured RVs of their Fe I 6302 Å line profiles.We have performed this tuning for each of the 22 lines studied in this work.
To tune the bisector diagnostics, we first drew values for the bounding intensities (which we refer to as b 1 , b 2 , b 3 , and b 4 for BIS; and c 1 , c 2 , c 3 , c 4 , c 5 , and c 6 for C) from uniform random distributions.As in Cegla et al. (2019), we required that each averaging region spanned at least 5% in depth, and we additionally restricted the bounding intensities to fall between 15% and 95% of the given line depth (in order to avoid the regions near the continuum and the core of line, where bisector measurement error is largest).Draws which violated these restrictions were rejected and re-drawn.For each accepted draw, we repeatedly calculated the Pearson R between the tuned BIS/tuned C and measured RV using many different realizations of each line time series.For a given draw, we retain the median Pearson R, and then choose the optimized bounding intensities as those with the highest median Pearson R. The optimized values and the corresponding improved RMS RVs are given in Table 4 and plotted as the colored diamonds with blue error bars in Figure 3.
Generally, tuning BIS is successful: the correlation strengths are improved, and the anomalous RVs for each line are better mitigated than in the case of the canonical BIS definition.Optimistically, the tuned BIS removes ∼25-30% of the intrinsic RV variability, depending on the line.Like for BIS, the bisector curvature C was successfully tuned for most lines.In fact, the tuned C vastly outperformed the canonical C, which generally failed to produce any significant improvement over the intrinsic RMS with the exception of the Fe I 5383 Å line.Optimistically, the tuned C diagnostics remove ∼20-30% of the intrinsic RV variability.At best, however, the tuned C diagnostics match the performance of the tuned BIS.As previously mentioned in §4.2.1, this would suggest that increasing the number of averaging regions does not increase the performance of a diagnostic.
It is notable, and quite visually apparent from Figure 3, that the lines with greater intrinsic variability see the largest absolute reduction in their RMS RV.Less variable lines see a reduction that is less signifi-cant.Interestingly, the full span in intrinsic (i.e., unmitigated) RMS RVs is about ∼30 cm s −1 (from ∼75 cm s −1 to ∼45 cm s −1 ), whereas the span in mitigated RMS RVs is only about ∼15 cm s −1 (from ∼50 cm s −1 to ∼35 cm s −1 ).This decreased span is interesting, and might suggest that simple linear correlations between bisector diagnostic and anomalous RV can only correct granulation velocities to some limit.This, and other possibilities, are discussed further in §5.2.

The Mn I 5432 Å Line
Compared to all other lines considered in our analysis of bisector diagnostics, the Mn I 5432 Å is an interesting outlier: the tuning of both BIS and C failed for this line.The standard BIS diagnostic produced a ∼5% improvement over the intrinsic RMS RV, compared to ∼7% for the tuned BIS.Likewise, the standard C produced a ∼6% improvement, compared to ∼7% for the tuned C. We do not consider these changes between the tuned and un-tuned diagnostics to be significant.On first inspection, it is not immediately clear why the tuning failed.Mn I 5432 Å is among the shallowest studied lines, which might suggest that the averaging regions are too small or too close to each other to meaningfully diagnose changes in bisector shape.Yet, both the BIS and tuned BIS performed well for other shallow lines (e.g., Fe I 5382 Å), suggesting that the modest depth of Mn I 5432 Å is not the implicating factor.
It is also plausible that the failure of the bisector diagnostic tuning is related to the intrinsic variability of this line: Mn I 5432 is the least variable line examined in this work (Table 2 and Figure 3), and so it might follow that the shape of this line is minimally perturbed by granulation.Lending credence to this narrative, it has been previously recognized in the literature that the properties of the Mn I 5395 and 5432 Å lines are somewhat peculiar: these lines produce anomalous abundances (Booth et al. 1984;Scott et al. 2015) and are curiously sensitive to global solar activity, despite forming in the photosphere (Doyle et al. 2001;Livingston et al. 2007).Vitas et al. (2009) demonstrate that this peculiar behavior is a consequence of the hyperfine structure of the Mn atom (Murakawa 1955), which drives significant non-thermal broadening of the Mn I 5395 and 5432 Å lines.It is this broadening, claim Vitas et al. (2009), that makes these lines less susceptible to "smearing" by convective motions.The minimal variability we observe for the Mn I 5432 Å line suggests consistency with these previous works; the failure of the diagnostic tuning can then be understood as a simple reflection of the fact that there is minimal line-shape variability to optimize on.Table 4. Optimized regions for bisector diagnostics (BIS and C) and corresponding improvement in RMS RV.As in Table 3, the improvement is relative to the "intrinsic" RMS of each line and rounded to the nearest whole percent.The parameters b1 and b2 refer to the depths bounding the vt region; b3 and b4 bound the v b region (see Equation 3).
Similarly for C: c1 and c2 bound v1, and so on (see Equation 4).

Line
Tuned BIS

Granulation Signal Aggregation
In practice, it is not feasible to measure precise bisector shapes (or RVs) for individual spectral lines for stars other than the Sun, owing to limitations imposed principally by photon noise, spectral resolution, and detector sampling.These limiting factors were not accounted for in the analysis presented in §4.2; re-performing these simulations at R ∼ 120,000 and per-pixel SNR ∼ 400 (values representative of those optimistically achieved with current EPRV surveys, such as those conducted with EXPRES or NEID), none of the considered bisector diagnostics retain their correlation with the measured line velocity (not shown), and we find RMS RVs well exceeding the m s −1 level.This imprecision is not surprising, given that photon noise dominates compared to the variability produced by granulation in this regime.
Of course, velocities are not measured from individual lines in RV surveys.Typically forward-modeling (e.g., Petersburg et al. 2020), a CCF-based approach (e.g., Pepe et al. 2002), or a line-by-line method (e.g., Dumusque 2018) is used to aggregate velocity information across lines in order to measure a precise RV.Schematically, it may be helpful to think of this methodology in terms of the central limit theorem: each line contains some amount of information about the bulk velocity of the star with some error (e.g., from photon noise).By measuring velocities from many lines together, one can estimate the "true" bulk velocity as the (weighted) sample mean of the individual line velocities.
However, this picture begins to fall apart when one considers the effects of granulation on spectral lines.In reality, bisector shapes (e.g., Figure 2) and bisector variability (e.g., Figure 3) will differ across lines.As a result, the bisector of a CCF constructed from many disparate lines will not be representative of an underlying "true" bisector common to each line.Therefore, any underlying correlation between individual line profile (or individualline CCF) bisector shape and velocity (e.g., Figure 4) is not retained in the case of the spectrum-CCF bisector shape and velocity.Indeed, one recent work (Sulis et al. 2023) examined ESPRESSO CCFs (which are measured from thousands of disparate lines) for two bright stars (a G0V and F7V), and was unable to find significant correlations between CCF bisector curvatures and RVs, as in Cegla et al. (2019) and this work.

Simulations of Binned Granulation Signals
In principle, it may be possible to overcome this present limitation by using much more selective line lists in CCF measurements.I.e., constructing CCFs for families of lines that share very similar bisectors.To test this possibility, we used GRASS to synthesize time series con-sisting of many copies of the Fe I 5432 Å line (a moderately deep Fe I line with a classical "C"-shaped bisector that achieved a fractional reduction in its RMS RV representative of the median improvement among the lines studied in this work; see Table 4).Only the central wavelengths of these copies were varied; their convective blueshifts were identical, as well as the depths of the disk-resolved line profiles (the disk-integrated line depths varied ≲1% due to the wavelength scaling of the rotational broadening; see, e.g., Carvalho & Johns-Krull 2023).The copies were initially placed 4 Å apart, such that they were totally unblended.To mitigate any impact from aliasing with the discretized grid of wavelengths, the position of each line was perturbed by a (known) random offset drawn from a Gaussian distribution with width 0.1 Å.Because these lines were synthesized as copies of one another with known absolute convective blueshifts, they share a common time-averaged bisector shape and they also are perturbed in an identical manner in each snapshot of the simulation.
Then, varying the spectral resolution, SNR, and number of identical lines considered, we calculated CCFs from which we measured velocities and bisector shape diagnostics.The resolutions considered are meant to reflect the (approximate) median resolutions of existing spectrographs: KPF (R ∼ 98,000 -Gibson et al.For the sake of comparison, we also ran simulations at R ∼ 350,000 to probe a fictionalized, best-case scenario as a fiducial.In all cases, the sampling was set at ∼4 pixels per LSF FWHM (i.e., twice the number of pixels necessary to satisfy the Nyquist criterion), and the width of the Gaussian CCF mask was set as the speed of light divided by the resolution.We found that changing the number of pixels per LSF FWHM to two or six pixels did not significantly impact our results.Velocities were measured from the resulting CCFs as described in §4.1.This process was performed repeatedly in order to measure sample means and standard deviations.The results of this exercise are visualized in Figure 5 for several spectral resolutions.
First, we consider the results of the simulation where no granulation mitigation was performed (the left-hand panels of the plots in Figure 5).Across all spectral resolutions, the observed RMS RV is dominated by photon noise when the SNR and number of lines used in the CCF are low.In the most optimistic scenario, where the number of lines is large and the SNR is high, photon noise is essentially entirely averaged out and the "intrin-   Note that the step sizes for SNR and number of lines are not of fixed size.Unsurprisingly, both the raw and mitigated RMS RVs improve as the SNR and number of lines are increased.Notably, though, the achievable mitigation is strongly dependent on spectral resolution -only the three highest resolution simulations achieve statistically significant reductions in the RMS RV (see Figure 6 and associated text).Combinations of SNR and number of binned lines denoted with a "-" showed no significant improvement in RMS RV; none of the three lowest-resolution simulations (R ∼ 98,000, R ∼ 120,000, and R ∼ 137,000) achieved significant improvements in RMS RV for any combination of SNR and number of binned lines and so are not shown.It is plausible that ESPRESSO in UHR mode and PEPSI could demonstrate a retrieval (and correction of) convective variability for a bright star if enough similarly-varying lines could be identified and binned, as discussed in §4.3.2.
sic" variability of the line from granulation entirely dominates the RMS RV.It is notable that the higher spectral resolution simulations do not necessarily perform better when no granulation mitigation is attempted, except in the most extreme scenarios where the SNR and number of lines are quite low.Beginning at moderate SNR and number of lines (e.g., SNR ∼ 400 and 200 lines) and up to the highest SNR and number of lines, the measured RMS RVs improve by only several cm s −1 at most.For each combination of SNR, spectral resolution, and number of lines, we also attempted to construct a correlation between the measured velocities and tuned CCF BIS (since the tuned BIS generally outperformed all other considered diagnostics).The linear fit to these quantities was then used to correct the measured velocities, from which the final RMS RV was measured and reported.The results of this procedure are shown in the right-hand panels of Figure 5, and the percent improvements in RMS RVs (with propagated errors) are shown in Figure 6 for the subset of simulations that achieved meaningful improvement.Notably, significant mitigation via bisector diagnostic correlation can only be achieved at higher spectral resolution.At the three lowest considered resolutions (R ≤ 137,000), no scientifically significant improvement in the RMS RV is achieved, even at the maximum considered SNR and number of lines.At R ∼ 190,000, the improvement in RMS RVs is at the ≲20% level in the best-case scenario, perhaps approaching detectability.At the highest considered resolutions, the most optimistic improvement in the RMS RV is consistent with that achieved in the fullresolution, noiseless simulation laid out in §4.2.2.We discuss some considerations for attempting the observational retrieval of such signals below.

Practical Considerations for Existing Instruments
The analysis presented in §4.3.1 and shown in Figures 5 and 6 suggests that existing instruments might might be able to capture (and correct for) granulationdriven variations in line shape.Whether this can be performed in reality will depend on a few factors.First, signal must be binned across lines which are perturbed by convection in a similar-enough manner.Some initial work (Dravins et al. 2017a,b) has already successfully shown that "similar" Fe lines can be binned in order to enable retrievals of time-averaged spatially resolved stellar line profiles during planet transits.Additionally, HD simulations performed by Dravins & Ludwig (2023) show that different absorption lines fluctuate in phase (with some dependence on line strength and spectral region), suggesting that it is indeed plausible that convective variability can be coherently binned across lines.
Second, the requisite SNR of the unbinned spectra may change depending on the number of lines that are available for binning.For the science case demonstrated in Dravins et al. (2017b), binning only 26 lines was sufficient; resolving convective variability in other stars will require much greater precision.Stenflo & Lindegren (1977) list over 400 particularly clean Fe lines in the 400-686 nm range in the solar spectrum, so it is opti-mistically plausible that multiple hundred lines could be available for binning, even after accounting for practical details such as loss due to detector gaps and lines falling in the edges of orders.It is apparent from Figure 6 that an implausible number of lines (>500) would need to be binned to compensate for even moderate SNRs.We caution that observations should (at least initially) target the highest achievable SNR (therefore favoring brighter stars), since the number of lines available for binning is not precisely known at present.
Finally, the resolution of the observing instrument appears to be the strongest constraint on the detectability of convective variability.As discussed in §4.3.1, the three lowest-resolution simulations achieved no reduction in RMS RV regardless of the SNR or number of binned lines.Optimistically, ESPRESSO in UHR mode has sufficient resolving power to detect this variability; prospects for PEPSI are slightly more promising.Additional gains can be made beyond the resolution of PEPSI, though increasing the resolution much further may offer diminishing returns (in addition to the growing challenge of reaching adequate SNR at such high resolutions).We discuss some considerations for the design of future instruments in §5.2.1.
We caveat that these simulations are not survey or truly realistic observation simulations and should not be interpreted as such.Notably, the simulated SNR is divorced from exposure time, and the instrument LSFs are modeled as simple Gaussians.Mirror size, throughput, limiting magnitudes of the observed stellar sample, etc. vary greatly across instruments, and it is beyond the scope of this study to consider these myriad factors.Instead, we emphasize that these simulations were carried out in order to assess how photon noise and spectral resolution interact and affect attempts to measure and mitigate granulation signals encoded in the shapes of lines.As shown by this exercise, retrieving these signals given realistic observing and engineering constraints will be quite challenging.Despite this challenge, we believe that recovering these granulation signals may be possible at present, and perhaps a requirement for achieving ∼10 cm s −1 wholesale RV precision, as discussed further in the following section.

DISCUSSION
Recent studies, particularly Meunier et al. (2023) and Lakeland et al. (2024), have shown that granulation and supergranulation will pose significant barriers to the detection of Earth-mass exoplanets, even in very magnetically inactive stars.Assessments of optimal observing strategies for mitigating granulation and supergranulation (particularly Dumusque et al. 2011 andMeunier et al. 2015) agree that binning multiple observations over timescales larger than the typical (super)granule lifetime perform better than consecutive long exposures, but disagree on the absolute effectiveness of brute-force binning alone as a mitigation strategy.As an alternative to a "beating-down-the-noise" approach, Cegla et al. (2019) used MHD-driven Sun-as-a-star simulations to show that various summary statistics capturing changes in individual line bisectors strongly correlate with the anomalous, granulation-induced velocity shift.To complement these previous studies, we have presented and used v2.0 of the GRASS tool to empirically synthesize stellar line profiles with perturbations from granulation.In the following subsections, we discuss the various insights into granulation mitigation that simulations from GRASS have offered.We additionally comment on what further work will be needed to enable and implement these mitigation methods in EPRV surveys.

Need for Broader Studies of Line-by-Line Variability
In §4.1, we used GRASS to empirically measure the intrinsic, granulation-driven variability for the 22 solar lines shown in Figure 1.As shown in Figure 3, the most variable line exhibits a ≳70 cm s −1 RMS RV, and the least variable line ≳40 cm s −1 RMS RV.That different lines exhibit different degrees of variability is unsurprising; past works have shown that measured RVs change with the sets of lines used to measure velocities (e.g., Meunier et al. 2017;Al Moulla et al. 2022, 2024).Lines that trace global activity have been the focus of many studies and good progress has been made toward understanding the physical mechanisms underpinning their ability to trace this activity (e.g., Vitas et al. 2009;Wise et al. 2022;Cretignier et al. 2024).
In comparison to studies of global activity, the magnetoconvective variability of individual lines has been somewhat understudied.Studies utilizing HD and MHD codes (e.g., Dravins et al. 2017a;Cegla et al. 2019;Dravins & Ludwig 2023) have made important contributions to our understanding of this "microvariability" (borrowing the language of Dravins & Ludwig 2023), but are limited by computational costs and our current lack of disk-resolved line profiles for other stars to use as a basis for validation.A growing number of works have catalogued and investigated the absolute convective blueshift of lines both in the Sun and in other stars.Studies of the Sun have noted trends in convective blueshift with line depth (e.g., Reiners et al. 2016b), temperature (Al Moulla et al. 2022), and wavelength (Ellwarth et al. 2023a).Studies of other stars have shown that the gradient of convective velocities increases with stellar effective temperature (e.g., Gray 2009 andLiebing et al. 2021).Though it is possible to estimate an approximate granulation noise amplitude with simple stellar scaling relations (Dalal et al. 2023), previous observational studies have thus far not probed the temporal variability in the blueshifts of individual lines, a key focus of this work.
Looking beyond the characterization of the 22 lines presented in this work, future studies should attempt to characterize the convective jitter in individual lines en masse.Such works could use existing solar data sets (e.g., solar observations from HARPS-N, NEID, and/or KPF) to empirically measure these jitters, though the lower spectral resolutions of such instruments may make direct measurement difficult for individual lines.Separating the influence of global and/or localized magnetic activity may also prove quite challenging.Despite these challenges, achieving a synoptic understanding of stellar velocity fields, especially the convective velocity field, will be a key goalpost toward the detection of Earth analogs.

Promise and Limitations of Correction with
Bisector Diagnostics Cegla et al. (2019) found that various bisector diagnostics (particularly the bisector inverse slope BIS, bisector amplitude a b , and the bisector curvature C) could be used to remove 50-60% of the RV noise from granulation in their synthetic Fe I 6302 Å profiles.They noted, however, that the strength of the correlations used to achieve this reduction would likely change line-to-line and star-to-star.To expand on this study, we have performed this same exercise for the 22 solar lines studied in this work.Following Cegla et al. (2019), we used both the canonical definitions of these bisector diagnostics (given in §4.2) and definitions tuned to maximize the correlation for each given line and diagnostic.
As Cegla et al. (2019) predicted, the canonical definitions of BIS and C need to be tuned to each line to maximize their predictive power.In general, we found that the best-performing bisector diagnostics could be used to improve the RMS RV for a given line by ∼25-35% (see Table 4).Cegla et al. (2019) were able to correct their observed variability at the 50-60% level in their most optimistic scenario, but it should be noted that their raw RMS RV was ∼10 cm s −1 , artificially low likely as a result of the enhanced magnetic field in their simulations (see discussion in §2.3 of Cegla et al. 2019).Although our best-case fractional improvement in line RMS RV is generally lower than that found by Cegla et al. (2019), our absolute improvement is relatively large.E.g., we achieve in excess of 20 cm s −1 for some lines using the optimized bisector diagnostics.
Interestingly, none of the diagnostics were able to correct the RMS RV of any line to below ∼30 cm s −1 , suggesting that not all of the observed RV variability is driven by changes in line asymmetry.Pure Doppler shifts of bisectors will produce no change in the considered bisector diagnostics (modulo small measurement errors owing to pixelization), and so the remaining RV variability likely constitutes an upper limit on the net RV shift introduced by granulation.Of course, the bisector-diagnostic correlations are imperfect, and so some asymmetry-driven variability likely remains.Instrumental jitter probably also contributes to this remaining variability (see §2.1 and Löhner-Böttcher et al. 2017).It is possible to envision more advanced methods for correcting the asymmetry-driven variability (which would enable tighter constraints on the purely shiftdriven granulation noise), but limitations introduced by finite sampling and photon noise will probably necessitate some degree of averaging within the bisector (as in the definitions of BIS and C).
If the residual, uncorrected RMS RV observed is indeed created primarily by pure shifts in bisector position, then another strategy may need to be devised to cope with this variability.One plausible avenue is in binning or smoothing of measurements over time.Although binning alone will likely not suffice as a granulation mitigation strategy (Meunier et al. 2015), in reality some combination of diagnostic-based granulation mitigation and averaging could be employed.As Cegla (2019) note in their conclusion, the path forward likely lies in a combination of empirically and theoretically motivated strategies.

Overcoming Observational Constraints
The correlations between bisector diagnostic and anomalous RV will not be observable for individual stellar lines given constraints introduced by the spectral resolution and typical SNR achieved in current instrumentation.These same limitations introduce large errors in the RVs measured from individual lines; in practice this is overcome by binning signal across lines via a forward-modeling, CCF, or line-by-line approach.However, as currently implemented, these techniques are not well-suited for measuring changing convective velocities: whereas a wholesale motion of the star will create a uniform Doppler shift in every line, changes in convection will manifest differentially in each line.
Although it is beyond the scope of this work to devise new methods for binning granulation signals across lines, we carried out an exercise in §4.3 to show that, in principle, these signals can be aggregated and retrieved.Synthesizing spectra consisting of varying numbers of identical lines with only differing central wavelengths at varying SNRs (owing to photon noise alone) and spectral resolutions, we found that correlations between CCF BIS and RV could be retrieved under reasonably realistic (if slightly difficult) conditions.Of the existing suite of high-resolution RV spectrographs, it is most plausible that ESPRESSO (in UHR mode) or PEPSI could resolve the sub-m s −1 stellar line-shape variations that characterize granulation-driven jitter, assuming an adequate number of similarly varying lines can be identified and binned.As discussed in §4.3.2, a recent work based on HD simulations (Dravins & Ludwig 2023) demonstrated that subsets of lines do indeed vary in phase, motivating our claim that variability in granulation signals would need to be coherently binned across lines.Line lists derived from or informed by such HD simulations could provide a valuable starting point for observational studies of convective variability in other stars.
Looking forward to future generations of instruments, particularly in the coming age of extremely large telescopes (ELTs), we emphasize that spectral resolution is fundamental to resolving the signals that granulation encodes in the shapes of spectral lines.Though the velocities that fall out of these changes in line shape and position are largely treated and discussed as noise within the EPRV community and literature at present, it is important to recognize that mitigating their impact on the measurement of cm s −1 -precise bulk motions of stars likely lies in treating them as signals that vary somewhat from line to line.As shown in §4.3 and Figure 5, the spectral resolution of an instrument determines its ability to resolve perturbations in the shapes of lines.In order to resolve this variability, future instruments should target higher resolutions: ESPRESSO in UHR mode (at median R ∼ 190,000) currently sets the bar for future instruments to surpass.As Dravins et al. (2017b) argue, such ultra-high resolution instruments would enable extremely precise and novel studies of both stars and planets.

Limitations and Caveats of GRASS
As discussed in §6.2 of Palumbo et al. (2022), GRASS has limitations that should be considered both in the context of this study and before using GRASS in future works.Principally, GRASS uses solar data to construct line profiles and spectra.The time-average shape of bisectors is known to be highly sensitive to the structure of the stellar atmosphere, and is therefore a strong function of the stellar spectral type and luminosity class (Gray 2008).An independent limitation of GRASS is imposed by the length of the time series used as input to the simulation.These observations are described in greater detail in §2 of Löhner-Böttcher et al. (2019); but, in brief, the minimum time baseline for most observation was ∼40 minutes.Consequently, GRASS is sensitive to frequencies corresponding to the maximum realistically expected lifetime of granules (see Hirzberger et al. 1999), but is completely insensitive to longer timescale phenomenon, namely supergranulation.Because of this limitation, we refrain from running simulations of smoothing and binning as in Meunier et al. (2015), instead focusing on lineby-line variabilities and bisector-diagnostic-based mitigation techniques.

CONCLUDING REMARKS
Compared to other drivers of intrinsic stellar variability, methods for mitigating granulation-driven RV variability are comparatively poorly developed.Because stellar granulation precipitates a ∼30-70 cm s −1 RV noise source (depending on the line or lines measured), current and upcoming RV surveys will need to develop methods for mitigating the effects of granulation in order to detect true Earth twins, i.e., Earth-mass planets orbiting in the habitable zones of Sun-like stars.In this work, we: 3. Quantify the line-by-line RV variability in 22 solar lines, showing that there is ∼30 cm s −1 difference between the most variable and least variable lines; 4. Show that diagnostics of bisector shape generally correlate with the granulation-induced RV (consistent with the MHD-driven simulations of Cegla et al. 2019), and can be used to remove 25-35% of the measured granulation noise; 5. Demonstrate that although these correlations can't be retrieved for individual lines at the resolutions and typical SNRs achieved by existing spectrographs, informed and selective binning of similar lines can overcome these limitations; 6. Show that on the basis of their ultra-high spectral resolution, ESPRESSO (in UHR mode) and PEPSI are the best-suited existing spectrographs to demonstrate a retrieval of such line-shape correlations; 7. Argue that future spectrographs should target ultra-high resolutions of R ≳ 190,000 in order to resolve convective variability in lines; 8. Emphasize that granulation encodes a signal in the shapes of lines, and that future instrument builders should be mindful of the high spectral resolution needed to faithfully resolve them.
in normalized flux, consistent with the methods used by Gray (1988) and Dall et al. (2006).As in Palumbo et al. (2022), we fit separate Voigt profiles to the red and blue wings of each line in order to measure a smooth line width into the continuum (because the deep solar lines of interest are often increasingly blended in their wings).We use a Voigt profile, rather than e.g., a Gaussian, since the different lines are shaped quite differently in the wings and core (see Figure 1).Voigt profiles are able to capture this diversity in shape more flexibly and accurately.The effects of this approach are discussed specifically for the Fe I 5434 Å line in §4.1 of Palumbo et al. (2022) and generally in §3.1.
Like the width measurements, the bisector measurements become fraught near the continuum, owing both to blends and measurement uncertainty which is inversely proportional to the first derivative of the spectrum at the intensity of interest (Gray 2008).To compensate for this, we fit a polynomial to the lower 80% of each bisector and model the remaining top 20% as the extrapolation of the best-fit polynomial.Similarly to the fitting procedure employed by Zhao & Dumusque (2023), we use a third-order polynomial for bisectors at µ > 0.4, and a first-order polynomial for bisectors at µ ≤ 0.4.Example disk-resolved spectra with model fits and input data with extrapolations are shown in Figure 7.
Rather than writing the pre-processed data for each limb position to separate FITS files, as in GRASS v1.0 (Palumbo et al. 2022), we instead write all data for a given line to an HDF5 file, in order to reduce the storage size of the input data and excess I/O latency during synthesis.The input data are available in a Zenodo record (Palumbo et al. 2023b), which GRASS automatically downloads upon installation.GRASS includes additional functions used to read and manipulate the data stored in these files.These functions are described in the package documentation available on GitHub.As explained in §3.2.3 and §3.2.4 of Palumbo et al. (2022), v1.0 of GRASS synthesized disk-integrated line profiles by integrating over an evenly sampled 2D stellar grid projected on the sky plane.To enable more complex modeling of stellar geometries and to be consistent with other Sun-as-a-star simulations, v2.0 of GRASS now tiles the surface of a sphere in stellar longitude and latitude, following from the procedures used in Vogt et al. (1987) and Piskunov & Kochukhov (2002).In brief, we first divide the star into N ϕ latitude slices.The number of longitudes sampled in each latitude slice is then given by:

A.2. Changes to Model Grid
where ϕ is the latitude at the center of a given latitudinal slice and ∆ϕ the latitude span of the slices.In each slice, N θ is rounded up to the nearest whole number.As noted in Vogt et al. (1987), this tiling scheme produces tiles whose areas varies minimally with latitude, except near the poles where the tiles become triangular.Tiles that are located entirely on the invisible hemisphere of the star do not contribute to the integrated flux.Because the tiles are no longer of equivalent projected area (as in GRASS v1.0, Palumbo et al. 2022), we now weight the contribution of each tile to the disk-integrated flux by the projected area of each tile, in addition to limb darkening.An example grid with color-coded weights is shown at left in Figure 8.The number of longitudinal elements is set by the latitude increment ∆ϕ, which is equal to 180 • /N ϕ ; consequently, we parameterize the resolution of the spatial grid in terms of only N ϕ .In previous works that have implemented the Vogt et al. (1987) tiling scheme (e.g., Reiners et al. 2016a), N ϕ is customarily set to some very large number in order to minimize errors introduced by the discretization of the stellar surface elements.However, as explained §4.2 of Palumbo et al. (2022), the resolution of the spatial grid must correspond to the (average) angular size of observed patches in Löhner-Böttcher et al. (2018) andLöhner-Böttcher et al. (2019); i.e., there is an optimal N ϕ that cannot be freely chosen.With our modified tiling procedure, we find that N ϕ = 197 yields tile sizes corresponding to the intensity-weighted average area of the observed patches.As in §4.2 and Figure 4 of Palumbo et al. (2022), we have verified that this resolution produces an RMS RV broadly consistent with those observed by Elsworth et al. (1994) and Pallé et al. (1999), with the exact RMS RV depending on the input data used to synthesize the spectra (see §4.1).
Generally, at N ϕ = 197 the star is not tiled densely enough and appreciable (∼m s −1 ) errors in the rotational velocity are introduced by the sparse tessellation.To circumvent this problem, we follow the procedure developed by Cegla et al. (2019) and compute the limb darkening, projected rotational velocity, and projected areas in each tile as the (weighted) average of those values computed on a 40-by-40 grid of sub-tiles.Within each larger tile, the sub-tiles are evenly spaced in stellar latitude and longitude.The limb-darkened intensity assigned to each large tile is then given as the mean intensity across the corresponding sub-tiles and the rotational velocity as the weighted mean of the sub-tile velocities.This sub-tiling scheme is illustrated at right in Figure 8.
To calibrate the necessary number of sub-tiles, we computed the disk-summed rotational velocity and projected tile area as a function of the number of sub-tiles.Ideally, as the number of sub-tiles is increased, the sum of the tile rotational velocities should tend to zero, and the sum of the projected areas should approach π.With a 40-by-40 grid of sub-tiles in each larger stellar tile, we find that the disk-summed rotational velocity error is ∼3.7 × 10 −9 m s −1 ; the error in the total projected area of the disk is ∼5.9 × 10 −9 .To assess the impact of these errors on the synthesized line profiles, we compared line profiles generated with an arbitrarily high density of sub-tiles (1600-by-1600 grid) and with the standard 40-by-40 grid.Taking the high-resolution line profile as the fiducial, the flux errors did not exceed ∼8.9 × 10 −10 ; likewise, the line velocity errors were sufficiently small at 8.9 × 10 −7 m s −1 .

A.3. Treatment of Convective Blueshift
Stellar absorption lines are known to exhibit convective blueshift that varies with both limb angle (Löhner-Böttcher et al. 2018, 2019) and depth (Reiners et al. 2016b;Ellwarth et al. 2023b).By default, GRASS uses only the convective blueshifts in the solar observations in its construction of line profiles (as is the case for the simulations and analysis presented in this work).However, if GRASS is used to model a line of arbitrary depth (following the procedure described in §3.2.2 of Palumbo et al. 2022), it may also be desirable to artificially prescribe the disk-integrated convective blueshift of the synthetic line profile.To enable this modeling, GRASS now optionally includes an additional convective blueshift term in order to reproduce the solar scaling relation given in Reiners et al. (2016b).Specifically, we use Equation 2 where ∆v conv is measured in units of m s −1 .In implementation, GRASS draws the appropriate convective blueshift from a look-up table tabulated from this polynomial relation, rather than evaluating the polynomial for each synthetic line.
A.4. GPU Implementation Residuals for each of the GPU precisions are shown in the middle and bottom panels.Marginal distributions of the residuals are shown at the right of the middle and bottom panels.Note the differences in y-axis scales between each panel.Compared to the double-precision synthesis, which is accurate to several parts in 10 15 , single-precision synthesis is only accurate to a few parts in 10 3 as the result of catastrophic cancellation in repeated interpolation calculations.This effect is most pronounced closer to the line core, where many repeated interpolations and in-place multiplications are performed, compared to the line wings and continuum.Because flux errors at the level of 1 × 10 −3 propagate to velocity errors of order ∼several cm s −1 , we strongly advise against the use of the single-precision GPU implementation of GRASS.Panel (b): GRASS performance by implementation architecture.GPU benchmarks were performed with an NVIDIA Tesla V100S GPU.For these benchmarks, the size of the synthesized spectrum (i.e., number of lines) was varied, with the number of time steps fixed at Nt = 50, and the number of latitude slices fixed at N ϕ = 197.Even for smaller problem sizes, the GPU implementation outperforms the CPU implementation because of the large overhead incurred by computing the necessary geometrical parameters and weights (see §A.2) on the CPU.The double-precision GPU implementation achieves a factor of ≳100 speed-up for larger problem sizes.Owing the large loss in accuracy in the Float32 implementation (see Figure 9), we do not recommend use of single-precision synthesis with GRASS.Palumbo et al. (2022) presented v1.0 of GRASS, which synthesized spectra serially on a CPU.Depending on the number of spectral resolution elements and the number of time steps in the synthesis, the required computation time could become quite large.As shown in another recent work (Zhao & Dumusque 2023), thoughtful use of parallelization can greatly increase the performance and usability of stellar modeling codes, particularly owing to the performance boost enabled by performing the necessary interpolations (see §3.2.3 of Palumbo et al. 2022) of spectra en masse.In order to compute spectra more efficiently, we have implemented a GPU version of GRASS.
The GPU kernels are written in Julia using the CUDA.jl(Besard et al. 2018b(Besard et al. , 2019) ) package for use with NVIDIA GPUs.Multiple steps of the updated synthesis procedure have been parallelized.Specifically, the spectral synthesis computations and interpolations for each spatial grid cell and wavelength element are performed in parallel, in addition to the computation of weights governed by the viewing geometry, limb darkening, and differential rotation of the model star.The same high-level functions used to generate synthetic spectra are able to interface with the GPU kernels such that no knowledge of GPU computation is needed by users of GRASS to take advantage of this implementation.As in the CPU implementation, the computation of the disk integrated spectrum is performed in place to avoid excess memory allocation, which could otherwise exceed the available VRAM at moderate problem sizes.
As shown in Panel (a) of Figure 9, the double-precision GPU implementation of GRASS produces flux values that are accurate to the CPU implementation within about ∼several parts in 10 15 when using the same random number generator (RNG) seed.Due to catastrophic cancellation arising in the interpolation step of the spectral synthesis (described in §3.2.3 of Palumbo et al. 2022), the flux values produced by the single-precision GPU synthesis are accurate to the CPU implementation at only about a couple parts in 10 3 .Propagating these errors in flux to velocity, the singleprecision flux error produces a velocity error of order ∼several cm s −1 , whereas the double-precision GPU velocity error amounts to only ∼1×10 −10 m s −1 .Given the appreciable velocity error in the single-precision implementation, only the double-precision GPU implementation of GRASS was used for computations and results presented in this work, and we strongly discourage the use of single-precision GPU implementation for synthesizing spectra with GRASS.To prevent unwitting single-precision computations, GRASS will throw a warning if any GPU allocations are made with single-precision floats.
To assess the relative performance of the CPU and GPU implementations, we performed benchmarks of GRASS with a 2.5 GHz Intel Xeon Gold 6248 CPU and a NVIDIA Tesla V100S GPU running with CUDA Driver Version 550.54.14 and CUDA Toolkit Version 12.4.Because the GPU implementation performs parallel computations over the spatial and spectral grid cells, we evaluated performance as a function of the number of spectral resolution elements (or somewhat equivalently, the width of the spectrum in units of length), because the number of spatial grid cells should be fixed for physical validity (see §4.2 of Palumbo et al. 2022).The number of time steps in the simulation was held fixed at N t = 50, corresponding to 12.5 minutes of time given the time step of 15 seconds.The results of these benchmarks are shown in Panel (b) of Figure 9.For larger problem sizes, the GPU implementation outperforms the CPU implementation by a factor of about 100×.

B. NOTES ON SPECIFIC LINES
In Palumbo et al. (2022), we demonstrated that GRASS accurately reproduced the line profile and bisector of the FeI 5434 Å line by comparison to the IAG Solar Atlas (Reiners et al. 2016b).In this work, we have performed this semi-quantitative comparison for all 22 lines used in the GRASS template line library (see §3.1).An example set of comparisons is shown in Figure 10; line profile and bisector comparison plots are available for all 22 lines in the online journal.Below, we provide comments on specific details relevant to each line.For additional discussion of the convective blueshift of these lines, as well as their historical use in the heliophysics literature, refer to Löhner-Böttcher et al. ( 2018

B.2. Lines around 5381 Å
Four lines in this region were studied: Fe I 5379 Å, Ti II 5381 Å, Fe I 5382 Å, and Fe I 5383 Å.A fifth line, C I 5380 Å, was noted but excluded from the analysis owing to its very modest depth.Two lines, Fe I 5379 Åand Ti II 5381 Å, exhibit classical "C"-shaped bisectors.Fe I 5383 Å, on the other hand, has a "\"-shaped bisector which are typically observed for disk-resolved bisectors near the limb.In this case, as noted by Löhner-Böttcher et al. (2019), the strong blueward trend is caused by blends in the blue wing of the line.Lastly, I 5382 is quite shallow, but its bisector is suggestive of the upper region of a "C"-shaped bisector.GRASS models Fe I 5379 Å and Ti II 5381 Å quite well, except for diminished curvature in the top ∼20% of the Fe I 5379 Åline.GRASS also reproduces the "\"-shaped bisector of Fe I 5383 Å quite well up to ∼60% of the continuum.However, the shape of the bisector of this region is shaped by the cores of weak lines in the wing of Fe I I 5383 Å; small changes in the relative depths of these lines (e.g., from activity, as the Sun was more active during the observation of the IAG Atlas; see Hathaway 2015 andReiners et al. 2016b) could lead to this discrepancy.Lastly, GRASS underestimates the amplitude of the shallow Fe I 5382 Å bisector.Being the shallowest line that we model with GRASS (with fractional depth ∼0.2), the amplitude underestimate is not entirely surprising.We note that the mismatch between the IAG and GRASS bisectors in general becomes most extreme above ∼80% continuum flux; the entirety of this line is above this level.

B.3. Lines around 5434 Å
Six lines in this region were studied: Mn I 5432 Å, Fe I 5432 Å, Fe I 5434 Å, Ni I 5435 Å, Fe I 5436.3Å, and Fe I 5436.6 Å. Fe I 5434 Å was the line modeled in the initial presentation of GRASS in Palumbo et al. (2022).Lines with classical "C"-shaped bisectors included Fe I 5432 and Fe I 5434.Lines whose bisectors visually correspond to the upper half of the classical "C" include Mn I 5432 Å, Fe I 5436.3Å, and Fe I 5436.6 Å.As discussed in §4.2.3, the Mn I 5432 Å line is particularly broad as the result of the prodigious hyperfine structure of Mn.Ni I 5435 Å is blended in the wing such that it's bisector is "\"-shaped, as seen for Fe I 5383 Å. GRASS generally models these lines well (with only very small differences in the curvature of bisectors).Somewhat notably, GRASS underestimates the curvature in the upper region of the "C" for Fe I 5434 Å, but this discrepancy is due to blends in the red line wing of the line which were modeled out in the pre-processing of the LARS data (as discussed in greater detail in §4.1 of Palumbo et al. 2022).
B.4. Lines around 5578 Å Two lines in this region were studied: Fe I 5576 Å and Ni I 5578 Å.Both lines exhibit classical "C"-shaped bisectors which are well-modeled by GRASS.However, GRASS does somewhat underestimate the curvature of Fe I 5576 Å above ∼70% flux.

B.5. Lines around 6150 Å
Two lines in this region were studied: Fe II 6149 Å and Fe I 6151 Å.Both lines exhibit bisectors which visually correspond to the upper ∼half of the classical "C" shape.Considering the modest depths of these lines (especially so in the case of Fe II 6149 Å), GRASS models the bisectors of these lines remarkably well.

B.6. Lines around 6171 Å
Four lines in this region were studied: Ca I 6169.0Å, Ca I 6169.5 Å, Fe I 6170 Å, and Fe I 6173 Å.Two lines, Ca I 6169.0Å and Fe I 6173 Å, exhibit classical "C"-shaped bisectors that are well modeled by GRASS.Fe I 6173 Å is notably one of the most magnetically sensitive lines in the visible portion of the spectrum, and such has been extensively used to study the solar line-of-sight magnetic field (e.g., the Helioseismic Magnetic Imager of the Solar Dynamics Observatory, Scherrer et al. 2012).GRASS does somewhat underestimate the curvature of this line.The remaining two lines, Ca I 6169.5 Å and Fe I 6170 Å, are blended such that their bisectors deviate from the classical "C" shape.Ca I 6169.5 Å exhibits a "\"-shaped bisector as in Ni I 5435 Å and Fe I 5383 Å. Fe I 6170 Å exhibits a bisector unlike those of the other blend lines studied in this work; it is shaped like a "/" up about ∼80% continuum before hooking back to the blue.GRASS faithfully reproduces the majority of these two bisectors, but deviates slightly in the uppermost regions.

B.7. Lines around 6302 Å
Two lines were studied in this last region: Fe I 6301 Å and Fe I 6302 Å.The Cegla et al. (2013Cegla et al. ( , 2018Cegla et al. ( , 2019) ) series of papers synthesized the Fe I 6302 Å, which has also been widely used in the solar physics.Both lines are otherwise widely used in the solar physics literature (see, e.g, the Introduction of Smitha et al. 2020 for a review).Both lines have classical "C"-shaped bisectors and are notably flanked by deep telluric oxygen lines.The bisectors of these lines are well-modeled by GRASS up to ∼70% of the continuum; above this level, GRASS notably underestimates the redward curvature of the bisector relative to the IAG bisector.It is plausible that these deviations are caused by variations in the strength of the oxygen telluric lines between the respective observing sites of the IAG Atlas (Göttingen, Germany; altitude ∼150 meters) and the LARS data used as input for GRASS (Teide Observatory, Canary Islands; altitude ∼2400 meters).

Figure 1 .
Figure 1.Collage of continuum-normalized, disk-center solar spectra originally observed for Löhner-Böttcher et al. (2018), Stief et al. (2019), and Löhner-Böttcher et al. (2019) with annotations for the lines used as input in this work.Lines denoted with a double dagger (C I 5380 Å and Na I 5896 Å) were included in the convective-blueshift analysis presented in Löhner-Böttcher et al. (2019), but are excluded from the analyses presented in this work, as explained in §2.Other lines are of telluric origin, significantly blended, and/or very shallow.Note the breaks in the wavelength axis; eight spectral regions spanning only a few Å each were observed.Spectroscopic properties and parameters for these lines are tabulated in Table1.

Figure 2 .
Figure 2. Ensemble of synthetic disk-integrated bisectors for the 22 lines analyzed in this work; compare to Figure 17.14 of Gray (2008).These bisectors were measured from significantly oversampled spectra synthesized at R ∼ 700,000 to ensure smoothness.Classical "C"-shaped bisectors are shown in the left-hand figure, and bisectors of deeply blended lines are shown in the right-hand figure.The velocity offsets in both panels of each figure are arbitrary and for illustrative purpose only.As in Figure 17.14 of Gray (2008), the bisectors are composited at right in each figure to show their (dis)similarity.

Figure 3 .
Figure 3. Line-by-line differences in granulation-induced variability.Left: Colored circles indicate the mean RMS RV measured from time-series of individual disk-integrated line profiles, following from the procedure outlined in §4.1.The colored diamonds show the mean residual RMS RV after using the tuned BIS to correct granulation velocities, as detailed in §4.2 and §4.2.2.In both cases, the error bars are the 1σ width of the RMS RV distributions measured from many trials, as described in §4.1.Right:The amount of variability (expressed as the RMS RV) exhibits no clear correlation with T 1/2 (see AlMoulla et al. 2022), or any other single quantity, including: wavelength, line depth, and potential of the lower and upper states of each transition (not shown).Although there is an appreciable difference between the most and least variable lines, it is not immediately apparent how the variability of a line could be predicted from the formation or atomic properties of a line.
, Al Moulla et al. (2022) and Al Moulla et al. (2024) used solar and stellar data, respectively, to show that the measured RMS RV has a dependence on the formation temperature of the lines used.Following the methods of Al Moulla et al. (2022), we also computed their T 1/2 parameter (the temperature at the height in the atmosphere corresponding to 50% of a line's cumulative contribution function; see Figure 1 of Al Moulla et al. 2022) at the central wavelength of each line.The RMS RVs for each line are plotted against their T 1/2 in the right-hand panel of Figure 3.As with the other variables, we note no simple trend connecting a line's variability to its T 1/2 .Excluding the two lines with the highest variability (around T 1/2 ∼ 4700 K), one might plausibly deduce hints of the same "high-low-high" trend of RMS with temperature noted by Al Moulla et al. (2022) and Al Moulla et al. (

Figure 4 .
Figure 4. Example correlations between bisector shape summary statistic and anomalous, granulation-induced velocity shift for a synthetic Fe I 5434 Å time series.Left: Time-averaged line bisector (opaque blue curve) and bisector variations (transparent blue curves).The amplitude of the bisector variations are exaggerated 50× for visual effect.The regions bound by the horizontal dashed lines and dotted lines represent the vt and v b regions used in the calculation of BIS, respectively.Middle: Correlation between BIS and apparent RV excursion with best-fit relation.Each point corresponds to one snapshot of the synthetic time series.Right: Same as middle panel, but for bisector amplitude a b .

Figure 5 .
Figure5.Results of the shape-based granulation mitigation simulation described in §4.3.Each set of plots corresponds to simulations of spectra observed at spectral resolutions approximately representative of those of existing instruments, except for the bottom-right panel which is meant to represent a fictionalized ultra-high-resolution instrument as the best-case, fiducial scenario.The RMS RVs reported in left-hand panels of each figure correspond to measurements performed without any granulation mitigation.The RMS RVs in the right-hand panels were calculated from RVs corrected using the procedure described in §4.3.1.The reported RMS RVs are the mean of many trials; the typical errors on the mean are at the ≲1.5% level (not shown).Note that the step sizes for SNR and number of lines are not of fixed size.Unsurprisingly, both the raw and mitigated RMS RVs improve as the SNR and number of lines are increased.Notably, though, the achievable mitigation is strongly dependent on spectral resolution -only the three highest resolution simulations achieve statistically significant reductions in the RMS RV (see Figure6and associated text).

Figure 6 .
Figure6.Percent improvement in RMS RV for a subset of the simulations described in §4.3.1 and shown in Figure5.Combinations of SNR and number of binned lines denoted with a "-" showed no significant improvement in RMS RV; none of the three lowest-resolution simulations (R ∼ 98,000, R ∼ 120,000, and R ∼ 137,000) achieved significant improvements in RMS RV for any combination of SNR and number of binned lines and so are not shown.It is plausible that ESPRESSO in UHR mode and PEPSI could demonstrate a retrieval (and correction of) convective variability for a bright star if enough similarly-varying lines could be identified and binned, as discussed in §4.3.2.

1.
Present v2.0 of the GRanulation And Spectrum Simulator -GRASS -and document the changes and upgrades since v1.0 (Palumbo et al. 2022); 2. Verify that GRASS empirically reproduces the timeaveraged, disk-integrated line profiles and bisectors observed by Reiners et al. (2016b);

Figure 8 .
Figure 8. Tiling (left-hand plot) and sub-tiling (right-hand plot) of an example stellar surface viewed at slight inclination.In both plots, the tiles and sub-tiles are drawn artificially large for illustrative clarity.The coloring indicates the fractional weight of each tile normalized to the largest weight value; individual weights are given by the product of the limb darkening and projected area of the tile.The stellar equator and the zero-and ninety-degree meridians are overplotted as white dashed lines.Left: The stellar surface is divided into tiles according to the prescription developed byVogt et al. (1987).The number of latitudinal slices (and consequently the total number of tiles) is set such that the average tile area corresponds to the average angular size of the input observations, as described in §A.2.Right: Zoom-in on the tile highlighted by the blue border in the left-hand plot, with example sub-tiling.Because the grid resolution dictated by the angular size of the input observations would normally introduce discretization errors in the rotation velocity, limb darkening, and projected tile area computations, these quantities are calculated at the centers of sub-tiles within each larger tile (indicated by the black dots).As inCegla et al. (2019), the sub-tiles are spaced evenly in stellar latitude and longitude within each larger tile.

Figure 9 .
Figure9.Panel (a): Synthesis accuracy for the GRASS GPU implementation assessed against the CPU implementation fiducial.Spectra synthesized at double precision on a CPU and at both single and double precision on a GPU are shown in the top panel.Residuals for each of the GPU precisions are shown in the middle and bottom panels.Marginal distributions of the residuals are shown at the right of the middle and bottom panels.Note the differences in y-axis scales between each panel.Compared to the double-precision synthesis, which is accurate to several parts in 10 15 , single-precision synthesis is only accurate to a few parts in 10 3 as the result of catastrophic cancellation in repeated interpolation calculations.This effect is most pronounced closer to the line core, where many repeated interpolations and in-place multiplications are performed, compared to the line wings and continuum.Because flux errors at the level of 1 × 10 −3 propagate to velocity errors of order ∼several cm s −1 , we strongly advise against the use of the single-precision GPU implementation of GRASS.Panel (b): GRASS performance by implementation architecture.GPU benchmarks were performed with an NVIDIA Tesla V100S GPU.For these benchmarks, the size of the synthesized spectrum (i.e., number of lines) was varied, with the number of time steps fixed at Nt = 50, and the number of latitude slices fixed at N ϕ = 197.Even for smaller problem sizes, the GPU implementation outperforms the CPU implementation because of the large overhead incurred by computing the necessary geometrical parameters and weights (see §A.2) on the CPU.The double-precision GPU implementation achieves a factor of ≳100 speed-up for larger problem sizes.Owing the large loss in accuracy in the Float32 implementation (see Figure9), we do not recommend use of single-precision synthesis with GRASS.
) and Löhner-Böttcher et al. (2019).B.1.Lines around 5251 Å Two lines in this region were studied: Fe I 5250.2Å and Fe I 5250.6.As noted in Löhner-Böttcher et al. (2019), other lines in this region were extremely weak or significantly blended in the wings.Both Fe I 5250.2Å and Fe I 5250.6 exhibit classical, smooth "C"-shaped bisectors, which are well-modeled by GRASS.

Figure 11 .
Figure 11.Same as previous figure.

Figure 12 .
Figure 12.Same as previous figure.

Figure 14 .
Figure 14.Same as previous figure.

Figure 16 .
Figure 16.Same as previous figure.

Table 1 .
Parameters for spectroscopic lines in this work.Quantities denoted with a dagger ( † ) are from Löhner-Böttcher et al.

Table 2 .
RMS RV and 1σ variability in RMS RV for each line considered in this work; the data in this table correspond to the values shown in Figure3.Velocity quantities are rounded to cm s −1 precision.

Table 3 .
Summary of granulation mitigation potential for individual lines using various diagnostics of bisector shapes.For each line and bisector shape diagnostic, R is the median Pearson correlation coefficient between the measured line RV and considered bisector diagnostic.The corrected RMS RV and 1σ variability in the RMS RV are reported, and the percent improvement (rounded to the nearest whole number) over the "intrinsic" RMS RVs reported in Table2are shown for each line and bisector shape diagnostic.