The CARMENES search for exoplanets around M dwarfs. Line-by-line sensitivity to activity in M dwarfs

Radial velocities (RVs) measured from high-resolution stellar spectra are routinely used to detect and characterise orbiting exoplanet companions. The different lines present in stellar spectra are created by several species, which are non-uniformly affected by stellar variability features such as spots or faculae. Stellar variability distorts the shape of the spectral absorption lines from which precise RVs are measured, posing one of the main problems in the study of exoplanets. In this work we aim to study how the spectral lines present in M dwarfs are independently impacted by stellar activity. We used CARMENES optical spectra of six active early- and mid-type M dwarfs to compute line-by-line RVs and study their correlation with several well-studied proxies of stellar activity. We are able to classify spectral lines based on their sensitivity to activity in five M dwarfs displaying high levels of stellar activity. We further used this line classification to compute RVs with activity-sensitive lines and less sensitive lines, enhancing or mitigating stellar activity effects in the RV time series. For specific sets of the least activity-sensitive lines, the RV scatter decreases by ~ 2 to 5 times the initial one, depending on the star. Finally, we compare these lines in the different stars analysed, finding the sensitivity to activity to vary from star to star. Despite the high density of lines and blends present in M dwarf stellar spectra, we find that a line-by-line approach is able to deliver precise RVs. Line-by-line RVs are also sensitive to stellar activity effects, and they allow for an accurate selection of activity-insensitive lines to mitigate activity effects in RV. However, we find stellar activity effects to vary in the same insensitive lines from star to star.


Introduction
High-resolution stellar spectra are routinely used to study and characterise exoplanet companions orbiting stars through the Doppler spectroscopy or radial velocity (RV) technique. Stellar Tables containing information about the sensitivity of the different lines to activity are available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via https: //cdsarc.cds.unistra.fr/cgi-bin/qcat?J/A+A/ spectra are affected by the intrinsic variability of the stellar hosts. Stars are not quiet, homogeneous bodies, but they display variability on different timescales and amplitudes, including the effects of oscillations (e.g. Bedding et al. 2001;Bazot et al. 2012;Kunovac Hodžić et al. 2021), granulation (e.g. Meunier et al. 2015;Cegla et al. 2018), and magnetically active regions such as spots and faculae (e.g. Saar & Donahue 1997;Desort et al. 2007;Lagrange et al. 2010). These features distort the stellar spectra, introducing biases in the measured RVs that can be large enough Article number, page 1 of 51 arXiv:2302.07916v1 [astro-ph.SR] 15 Feb 2023 A&A proofs: manuscript no. lines to mimic or hide the signal caused by a planet. Magnetically active regions are specially important because they co-rotate with the star and hence have timescales of the order of the stellar rotation period (similar to the orbital periods of close-in planets) and they impact the RVs on the m s −1 level.
The different absorption lines observed in stellar spectra are created by the different atomic or molecular species present in the stellar photosphere. Different atoms and molecules have different sensitivities to temperature, magnetic field strength, and convection pattern. These are parameters affected by photospheric stellar activity features: spots and faculae possess strong magnetic fields that inhibit convective motions and change the temperature in these regions. We therefore expect that changes in these parameters due to stellar activity will affect the profile of different absorption lines in different ways, depending on the sensitivity of the lines to these parameters. Line profile changes affect not only RV measurements used to study exoplanets, but also the determination of stellar properties and chemical abundances, especially in young, active stars (e.g. Reiners et al. 2016;Meunier et al. 2017;Passegger et al. 2019;Spina et al. 2020;Abia et al. 2020;Shan et al. 2021;Liebing et al. 2021).
Usual methods to determine RVs (either by cross-correlation or template-matching schemes, e.g. Baranne et al. 1996;Pepe et al. 2002;Anglada-Escudé & Butler 2012;Zechmeister et al. 2018) yield a global RV measurement for the full spectrum. This means that these RV measurements average over the different asymmetries and shifts experienced by individual lines. Consequently, information related to the different effects of activity on different spectral regions is lost. Some spectroscopic activity indicators are also determined from the entire spectral range of the observations. The full-width-at-half-maximum (FWHM) of the cross-correlation function (CCF) or bisector asymmetries (such as the bisector inverse slope, BIS) are measured from a CCF that averages a large number of lines (e.g. Baranne et al. 1996;Queloz et al. 2001;Lafarga et al. 2020). The chromatic index (CRX) or the differential line width (dLW) also come from a template-matching method that takes into account wide spectral regions at the same a time ). Therefore, as occurs with RVs, the activity information that these indicators contain is also averaged among many absorption lines that may show different activity effects. Other indicators such as those measuring the emission from the core of chromospheric lines, such as the Ca ii H&K or the Hα lines (e.g Noyes et al. 1984;Lovis et al. 2011;Schöfer et al. 2019), probe activity in the chromosphere, and hence, may not be perfectly correlated with the photospheric activity, which is what causes changes in the RVs.
Recently, a number of studies have started to focus on how line profiles change due to the effects of activity. Davis et al. (2017) used simulated time series of disc-integrated spectra with spots, faculae, and Doppler shifts due to planetary companions to study their different signatures. By applying a principal component analysis (PCA) on the simulated spectra, the authors found that spots and faculae induce variability in the spectral lines different from that introduced by pure Doppler shifts.
Several works have found line profile variations correlated with activity indices in HARPS observations (High-Accuracy Radial velocity Planetary Searcher, Mayor et al. 2003) of the nearby dwarf α Cen B, a moderately active K1 V star that shows a clear activity modulation in its RVs and activity indicators. By comparing spectra of high-and low-activity states of the star, Thompson et al. (2017) were able to identify lines whose profile changes depending on the activity level. The specific morphol-ogy of the variations differs on a line-to-line basis, but several lines show depth variations. The pseudo-equivalent widths measured from some of these features are rotationally modulated, and show correlations with the log R HK activity index. Wise et al. (2018) also studied line profile variations in HARPS observations of α Cen B and Eri, an active K2 V star. They found that the depth (or core flux) of about 40 absorption lines is correlated with the S index derived from the Ca ii H&K lines, and they periodically change with the stellar rotation period. Ning et al. (2019) extended the previous work with an automated method to identify activity-sensitive lines with a Bayesian variable selection method, which accounts for dependencies between lines and uses different activity indicators (S index, Na i D, Hα, CCF BIS, and FWHM) to trace activity changes in the RVs. Lisogorskyi et al. (2019) used the α Cen B dataset to measure equivalent widths and asymmetries and compute their correlation with the S index, finding almost 350 activitysensitive lines, which include the 40 lines compiled by Wise et al. (2018). Methods such as these could be used to find activity indicators derived from the properties of photospheric absorption lines for stars of other spectral types and for observations at different wavelength ranges.
Instead of studying line profile variations, Dumusque (2018) measured the RV of individual absorption lines present in stellar spectra (following the method described in Bouchy et al. 2001) and correlated them with an activity indicator. Similarly to previous studies, the stars used are relatively early-type cool dwarfs (G1 V to K1 V, including α Cen B) observed with HARPS. In the case of α Cen B, the author used the global RV as the activity indicator since, in principle, the RV variation of this star is solely due to activity. Different correlation strengths between the line-by-line RVs and the activity indicator were interpreted as the lines having different sensitivities to stellar activity. This work also showed that a judicious selection of the lines used to compute the total RV of a spectrum (taking into account the sensitivity of the lines to activity) can result in measurements where the activity signal is mitigated or amplified depending on the lines selected. Cretignier et al. (2020) continued the work presented in Dumusque (2018) by refining the method used to measure RVs from individual lines and studying the RV relation with the line properties. The authors show that, in α Cen B, lines with different depths display different effects due to activity; in particular, the RV effect is inversely proportional to the line depth. This agrees with the fact that shallow lines, which are formed deeper in the stellar photosphere where the convection velocity is larger, are more affected by the inhibition of this convection in the presence of an activity feature, while deep lines, which formed in the outer regions of the photosphere where the convection velocity is lower, show a diminished effect. Cretignier et al. (2020) also propose a new activity indicator based on the RV difference between deep and shallow spectral lines. Siegel et al. (2022) used line-by-line RVs to build a novel activity indicator -the depth metric -based on the depth variations of activity-sensitive lines. The authors used this metric to study the effects of activity in the HARPS RVs of α Cen B and HD 13808, an active K2 V star hosting two Neptune-mass planets, finding it to be efficient at mitigating stellar activity in these Sun-like stars. Within each individual line, Al Moulla et al. (2022) further showed that RVs measured from different line segments (different parts of the lines formed at different temperatures) correlate with stellar activity in the Sun and α Cen B, observed with HARPS-N (Cosentino et al. 2012) and HARPS, respectively. Several lineby-line approaches were also tested within the EXPRES Stellar Signals Project (Jurgenson et al. 2016;Zhao et al. 2022) in four G and K dwarf stars, including the method we present here.
The aforementioned studies focused on Sun-like stars but they did not include M dwarf stars. M dwarfs display a spectrum with a higher density of features, including atomic lines and molecular bands, which makes it difficult to separate individual lines due to blending and the presence of the molecular pseudocontinuum. Despite that, a method such as the CCF, which uses a mask built from selecting 'individual' lines, is still able to deliver precise RVs (e.g. Lafarga et al. 2020), so it is expected that it could be possible to study different activity effects on individual lines. Moreover, convective blueshift, which is affected by the presence of active regions, is different in M dwarfs and Sunlike stars (it is decreased for M dwarfs e.g. Beeck et al. 2013; Baroch et al. 2020;Liebing et al. 2021). Therefore, lines in M dwarf spectra could show different activity-related effects.
Recently, Bellotti et al. (2022) studied the effect of using different combinations of lines to compute least squares deconvolution (LSD) profiles in three M dwarf stars (EV Lac, AD Leo, and DS Leo) observed with ESPaDonS (Echelle SpectroPolarimetric Device for the Observation of Stars, Donati 2003) and NARVAL (Aurière 2003). A line selection based on several line parameters (depth, wavelength, and Landé factor) does not result in stellar activity effects seen in the RVs derived from the LSD being mitigated. However, a randomised algorithm is able to find a subset of lines that minimises the RV scatter, without computing line-by-line RVs. Artigau et al. (2022) applied a line-by-line approach similar to the one presented by Dumusque (2018) to compute precise RVs of the M dwarf stars Proxima Cen, observed with HARPS, and Barnard's star, observed in the nearinfrared with SPIRou (SpectroPolarimètre InfraRouge, Donati et al. 2020), finding similar RV precision as template-matching techniques. From their line-by-line framework, the authors also introduce an activity indicator similar to the differential line width implemented by Zechmeister et al. (2018). Martioli et al. (2022), Gan et al. (2022), and Cadieux et al. (2022) also used the same line-by-line framework to measure precise RVs of M dwarf stars observed with SPIRou, and Radica et al. (2022) applied the same method to measure precise RVs from HARPS and CARMENES (Calar Alto high-Resolution search for M dwarfs with Exo-earths with Near-infrared and optical Echelle Spectrographs, Quirrenbach et al. 2016Quirrenbach et al. , 2018) spectra of K2-18. Inspired by the method proposed by Dumusque (2018), in this work we apply a similar approach to observations of M dwarfs obtained with the high-resolution spectrograph CARMENES. In a sample of six early to mid M dwarfs, and different activity levels, we compute line-by-line RVs, classify lines according to their sensitivity to activity, and use this classification to compute RVs affected by activity to different degrees. We also study how the activity sensitivity of the same lines varies for the different stars studied. This article is structured as follows. In Sect. 2, we present the stars analysed. In Sect. 3 we explain the method followed to determine line-by-line RVs, including our initial line selection. Sect. 4 deals with the classification of lines based on their sensitivity to activity, and in Sect. 5 we use this classification to compute RVs in which the changes induced by activity have been removed or enhanced. We compare the activity sensitivity of the lines in the selected stars in Sect. 6. Finally, we discuss and summarise our findings in Sect. 7.

Targets
We used observations obtained as part of the CARMENES (Quirrenbach et al. 2016(Quirrenbach et al. , 2018 main survey (guaranteed-time

Notes.
Values taken from the Carmencita database (Caballero et al. 2016a). We also show the number of CARMENES VIS observations obtained, and their RV scatter, measured as the standard deviation (std) of the serval RVs (instrumental drift and nightly average corrected). Stars are sorted with decreasing activity level as measured from pEW (Hα). Article number, page 3 of 51 A&A proofs: manuscript no. lines observations -GTO programme). CARMENES is installed at the 3.5 m telescope at Calar Alto Observatory in Almería, Spain, and consists of a pair of cross-dispersed, fibre-fed echelle spectrographs with complementary wavelength coverage, which allow simultaneous observations in the visible and the nearinfrared wavelength range. The visible (VIS) channel covers the spectral range λ = 5200-9600 Å at a resolution of R = 94 600, with an average sampling of 2.5 pixels per spectral resolution element. The near-infrared (NIR) channel covers the range λ = 9600-17100 Å at a resolution of R = 80 400, and has an average sampling of 2.8 pixels per spectral element. The CARMENES survey has been ongoing since 2016. It monitors over 300 M dwarfs across all spectral subtypes with the main goal of detecting orbiting exoplanets with the Doppler method .
Here, we are interested in seeing the effect of activity in individual spectral lines, for different types of stars in the CARMENES sample. Therefore, we selected stars of different spectral types and different activity levels, as measured from the average pseudo-equivalent width of their Hα line (pEW (Hα), Schöfer et al. 2019). To select the appropriate targets, we considered the following criteria. To be able to properly characterise the individual lines in the spectrum, we limited our targets to bright stars (J ≤ 9 mag, to have high S/N per line), with low rotational velocity (v sin i ≤ 7 km s −1 , to avoid strong line blending, which difficults the identification of lines and reduces the number of lines available). We also selected only targets for which we had 20 observations, which allowed us to derive reliable correlations between the RV of the individual lines and the activity indicators. In our early tests, we found that another limiting factor was the RV scatter of the observations. With the method that we used to measure line-by-line RVs, the error on the RV of each line is on average about 300 m s −1 for bright targets. If we then average the RV values of about 1000 lines, we have a maximum precision of about 10 m s −1 in the RV of each observation. For this reason, we also excluded stars with RV scatter (std) smaller than ∼15 m s −1 (that is, stars showing small RV variability).
These criteria left us with the six targets shown in Table 1: two early M dwarfs, J15218+209 (OT Ser) and J11201-104 (LP 733-099), and four mid M dwarfs, J07446+035 (YZ CMi), J05019+011 (1RXS J050156.7+010845), J22468+443 (EV Lac), and J10196+198 (AD Leo). The table shows the main properties of the selected stars together with the number of observations available and their RV scatter. Due to the constraint on the RV scatter, the selected stars have relatively high activity level (from pEW (Hα) ≤ −1.8 Å, up to pEW (Hα) ∼ −7 Å). And due to the constraint on brightness, our sample only includes early and mid spectral types.

Line selection: CCF mask
After the target selection, the second step in our analysis was to identify and select individual lines 1 in the spectrum. To do that, we made use of the raccoon pipeline, which we previously developed to create CCF binary masks (Lafarga et al. 2020). To select lines, we looked for minima in a serval spectral template, characterised them by fitting a Gaussian, and chose the features with depth, FWHM, and contrast between certain cut values. serval ) is the default CARMENES pipeline to estimate RVs, based on a template-matching approach. It creates a high S/N stellar template by co-adding observations, and subsequently uses this template to compute a least-squares fit to each of the observations, from which the RV time series is obtained iteratively. Before co-adding, the observations are corrected for the corresponding barycentric motion of the Earth and any other known drift so that the stellar lines are optimally aligned. The templates have a similar format as the observations, and echelle orders are considered individually. Here, the templates were built with CARMENES VIS observations, and the cut values on depth, FWHM, and contrast were the same as the ones used for the standard masks (see Table 1 in Lafarga et al. 2020). We also took into account the position of telluric lines and the varying position of the spectra on the CCD due to the barycentric movement of the Earth as explained in Lafarga et al. (2020). In summary, to account for tellurics, we broadened the features of a telluric mask by the maximum barycentric Earth RV (BERV) of the observations, and removed from the line list those overlapping with telluric features, taking into account the absolute RV of the target star. We also removed lines at the order extremes which are not always present in the observed spectra due to the varying BERV of different observations. In this work, we only need the wavelength position of the lines, that is, we do not need the full information of a binary mask, which includes wavelength and weight. The wavelength positions are given by the minimum of a Gaussian fit to the line.
We used two different line lists, depending on the spectral type of the star. For J15218+209 and J11201-104 (spectral types M1.5 V and M2.0 V, and similar v sin i) we used a line list created from the serval template of J15218+209, with 1712 lines. For the other four stars (spectral types between M3.0 V and M4.5 V, and also similar v sin i), we used a line list obtained from the J07446+035 template, with 2207 lines. Both J07446+035 and J22468+443 are relatively bright and have over 20 observations, so the line list could have been build from either star. We selected J07446+035 because it has a v sin i close to the mean v sin i of the four stars, which can affect the lines present in the spectrum (e.g. Lafarga et al. 2020), however, given the uncertainties in the v sin i, we expect a line list made from a template of J22468+443 to yield similar results as the template used here.

Line-by-line RV
Next, for all the available observations of each target, we computed an RV for each of the lines in the line list, that is, a line-by-line RV. We used observations reduced with caracal, the standard CARMENES reduction pipeline (Caballero et al. 2016b), which reduces the spectra by flat-relative optimal extraction (Zechmeister et al. 2014) and outputs the different spectral orders in vacuum wavelength (throughout this paper all wavelengths are in vacuum). The spectra were corrected for the BERV, secular acceleration, and instrumental drifts, as measured by the standard CARMENES pipelines Trifonov et al. 2018;Tal-Or et al. 2019).
There are several ways to measure line-by-line radial velocities. To obtain the RV of a specific line l in one of the observed spectra, we could compute its CCF using a single-line mask, or apply a template-matching algorithm using only a small predefined region around the line. We opted for a more straightforward method. First, we obtained the position of the line l in the observed spectrum, λ obs, l , by fitting a Gaussian to the region around the line in the observed data (similarly to the process of line char- acterisation when building CCF masks mentioned above). The line position is given by the minimum of the best fit. Then, we computed the RV of the line as the Doppler shift between the position of the line in the observation, λ obs, l , and the position of the line in the line list, λ tpl, l (i.e. as measured in the template, also from a Gaussian fit) where c is the speed of light. This is illustrated in Fig. 1. To estimate the uncertainty on the individual RV measurements, we used the formal error of the Gaussian fit to the observation. We did not consider the uncertainty of the initial Gaussian fit to the high S/N template (i.e. we did not propagate this uncertainty) because we are interested in the relative RV measurement. Although we are fitting lines that are not completely Gaussianshaped, this error gives an indication of the goodness of the fit for the different lines, which reflects the S/N of the different regions of the observed spectra. The spectral region used in the Gaussian fit is constrained by the adjacent local maxima at each side of the line minimum, as measured in the serval template (shaded grey area in Fig. 1). In this way, we made sure to always use the same region around each line. This may not happen if, instead, we measured the maxima in each different observation, because their position could change depending on the S/N, activity effects, or tellurics which may not have been considered by our mask. For J11201-104, we used the line limits obtained from the template of J15218+209, instead of using its own template, which did not have a high S/N because the star is relatively faint (J > 7 mag) and we did not have a large number of observations available. J05019+011 is the faintest star in our sub-sample (J > 8 mag) and has only 19 observations, therefore we also tried to use the line limits from the template of another star, in this case J07446+035. However, for J05019+011, we obtained better results (i.e. a better match between the template and observed lines) using the line limits obtained from its own template, which could be due to the fact that this star has a slightly larger v sin i than J07446+035.
During the creation of the line list (Sect. 3.1 above), we already took into account the regions contaminated by tellurics affecting the template. However, depending on the absolute RV of the target star and the BERV correction of the target observations, regions different than the ones in the template can be affected by tellurics. Therefore, before computing the RVs, we further removed the affected lines following the steps described above.
Since we are using observations of an echelle spectrograph, the observed spectrum is divided into different orders. For most orders, a wavelength region larger than the free spectral range falls on the detector and is extracted by the pipeline (i.e. there is a wavelength overlap between consecutive orders). For the lines in the overlap regions, we only used the redder part of the bluer order, as opposed of the bluer part of the redder order in each overlap. This is because in general, we found that the redder part of most of the orders has better S/N than the bluer part of the next order.
As an example, in Fig. 2 we show, for each line, the scatter of the RVs of all the observations of J07446+035, measured as the weighted standard deviation of the RVs of each epoch. We also show the median RV error of each line. We see that most lines show a scatter close to 300 m s −1 , and the typical error in the individual line RV is also of about 300 m s −1 . Lines located in the bluer region of the spectral range, with wavelengths shorter than ∼ 6400 Å, are the ones that show a larger RV scatter and error. This happens because the bluer part of the spectral range is where observations of M dwarfs have lower S/N, which makes it difficult to correctly identify the lines and measure their RV. We obtain similar values, about 300-400 m s −1 , for the rest of stars.

Total RV
We averaged the line-by-line RVs to compute a total RV per observation and compare it to the RVs obtained with standard methods that consider all the lines simultaneously: the CCF obtained with raccoon and the template-matching scheme from serval. In the following, we refer to the method of computing the total RV by averaging the line-by-line RVs as the lines average (LAV) method. The LAV RV uncertainties are given by the standard error of the mean. Before computing their mean, we discarded some data points. We did not use line-by-line RVs corresponding to bad Gaussian fits to the spectra, which we identified as those with RV errors larger than 1000 m s −1 and smaller than 20 m s −1 (i.e. points where the fit was clearly not successful). This procedure typically removed less than 1% of the total lines. We note that these cuts work for the stars in our sample but in general, they will depend on the properties of each specific star, such as the S/N or the rotational velocity. A more general way to remove data points with bad Gaussian fits would be to directly reject outliers in the χ 2 distribution of all the Gaussian fits. After removing these data points, we performed, in each observation separately, a 4σ clipping on the RVs of all the lines to discard outliers. This procedure typically discards about 1% of all lines. Finally, we discarded lines that did not have a reliable RV measurement in more than ten observations. This process mainly removed lines in the bluer part of the spectrum, where the S/N is lowest, and weak lines, which can be properly identi-Article number, page 5 of 51  fied and characterised in a co-added spectrum template such as the one produced by serval, but not in single observations that have much lower S/Ns.
In Figs. A.1 to A.6 (in Appendix A), we show the average RV of all the lines compared to the RV obtained with the CCF method and serval's template matching, for the targets under analysis. The masks used to compute the CCFs were the same as the ones used to define the individual line lists. That is, masks created from serval templates of J15218+209 (used in the CCF of J15218+209 and J11201-104) and J07446+035 (used in the CCF of J07446+035, J05019+011, J22468+443, and J10196+198). The serval RVs of each target are computed using a template made by co-adding the observations of the target itself. We obtain comparable RV values for the three different methods (LAV, CCF, and serval, but see next paragraph on the RV uncertainties), but the LAV RVs are in general closer to the CCF RVs than to those obtained with serval. This is probably due to the fact that the line list used to compute the individual line RVs is obtained from the mask used in the CCF method, while serval considers the entire wavelength domain, not just a set of lines. We also note that serval includes the weakest lines in the spectrum, which are excluded in the CCF masks. Therefore, serval might result in higher RV scatter for active stars. For simplicity, in the following we only use the CCF RVs and not the serval ones, but the results obtained are comparable.
Despite the LAV RV values being similar to those obtained with the CCF method, and also despite using the same line list as the CCF, the LAV RV uncertainties (of the order of 10 m s −1 ) are larger than the CCF ones (∼4 m s −1 ). This difference could indicate that the individual line RV uncertainties are overestimated. As explained above, we used the formal uncertainty of the Gaussian fit to the individual line, and not an estimate of the actual RV content of the line, which could lead to smaller uncertainties. Using the formal uncertainty of the fit allows us to compare different lines, that is, the formal uncertainty is adequate for relative measurements, but is not ideal when comparing with other methods such as the CCF. It is also possible that the average of the Gaussian fits of the individual lines introduces noise and is less reliable than computing the CCF with all the lines and then fitting a Gaussian to the averaged profile, an effect that would be enhanced if the individual line RV uncertainties are not ac-curate. Hence, we only obtained comparable RV values between the LAV and the CCF RVs because the dispersion due to the stellar activity of the stars is significantly large. Considering the uncertainty, the actual precision of the LAV RVs is worse than the CCF RVs.
There are some observations for which the difference between the LAV RVs and the other two datasets (i.e. CCF and serval RVs) is significantly larger than the rest. These observations are the ones with the lowest S/N within each time series. We can observe this difference of S/N in the RV errors of the three datasets, which are significantly larger for these observations, and also, in the number of lines used to compute the average RV, which is significantly lower than for the rest of epochs (again see Figs. in Appendix A). For instance, in J07446+035 ( Fig. A.1), the LAV RV of the fourth observation starting from the end of the time series deviates from the CCF and serval values, and uses less lines than the rest of the epochs. As another example, in J15218+209 (Fig. A.5), there is an observation near BJD 2457800 with a LAV RV deviating from the CCF and serval ones, larger uncertainties, and using less lines than the rest of data points. This decrease in the number of lines is related to the lower S/N of these observations, which makes it difficult to correctly identify the lines in the observed spectrum and obtain a reliable Gaussian fit of the line. This seems to indicate that computing total RVs by averaging the RVs of individual lines (as computed here) is more sensitive to the S/N of the data than the other two methods. In the following analysis, we did not consider these observations with low S/N (specifically, we discarded observations with S/N < 25 to 50 in the CARMENES VIS reference order 82, centred at about 7500 Å, depending on the average S/N of the different stars).
We also discarded observations with strong flares. Flares add continuum flux over the whole spectral range, and can be easily identified by an increase in the Hα core emission. This change in flux can have a strong effect on RVs, introducing drifts of several hundred m s −1 (see e.g. Reiners 2009). Aside from stronger than average Hα emission, flares also affect the other activity indicators, which show extreme values in their time series, or clear outliers. Therefore, to avoid RV biases due to strong flares when computing the correlations with the activity indicators, we discarded observations with strong Hα emission by performing a 3 sigma clipping on the Hα index, I Hα , which measures the ratio of flux around the centre of the Hα line to the flux in reference bandpasses on either sides of the line (as defined in Zechmeister et al. 2018). We observed that other activity indicators such as the contrast of the CCF and the differential line width (dLW, which accounts for changes in the line widths of the observed spectrum compared to a spectrum template, also defined in Zechmeister et al. 2018), show clear outliers corresponding to strong flare events, but that do not have an I Hα value large enough to be removed by the sigma clipping procedure. Therefore, we also performed a 3 sigma clipping on the dLW time series.

Correlation between line RV and activity indicators
A way to study how stellar activity affects different lines is to check for correlations between the RV of each individual line and an activity indicator. Strong correlations would indicate that a certain line is highly affected by activity, while no correlation could mean that the line is not very sensitive to activity effects (Dumusque 2018).
We checked the linear correlation between the line-by-line RVs and several activity indicators: CCF FWHM, contrast, and BIS (computed with raccoon as in Lafarga et al. 2020), and CRX, dLW and I Hα (computed with serval as in Zechmeister et al. 2018). To quantify the correlations, we computed the Pearson's correlation coefficient R. A value of R close to 1 indicates a strong linear correlation, −1, a strong anti-correlation, and R close to 0 indicates no correlation. In Fig. 3 we show, as an example, the time series and correlations of two different lines of J07446+035: an 'active' line that shows strong correlations with several activity indicators, λ tpl,l = 6661.25 Å, and an 'inactive' line, for which we do not observe any clear correlation, λ tpl,l = 7855.92 Å.
BIS and CRX are known to show clear anti-correlations with the total RV of the spectrum if it is affected by activity (see e.g. Zechmeister et al. 2018;Tal-Or et al. 2018;Lafarga et al. 2021, but also Kossakowski et al. 2022, for an example of a positive correlation). Here, we also observe a strong anti-correlation with the individual RVs of several lines, such as the example line λ tpl,l = 6661.25 Å shown in Fig. 3. For the other indicators, such as FWHM and dLW, we observe some loop-or circularlike shapes following the phase of the stellar rotation modulation (the circular-like shape being due to phase shift) but not a clear linear positive or negative correlation. The same applies to the correlation between total RV and these indicators (again see e.g. Zechmeister et al. 2018;Lafarga et al. 2021;Jeffers et al. 2022). Indicators such as the FWHM or the dLW measure the width of the lines (i.e. second moment of the line profile), as opposed to other proxies such the BIS, which are sensitive to line asymmetries (third moment of the line profile). Therefore, linear correlations between the line or total RV and the FWHM or dLW are not necessarily expected (see e.g. Jeffers et al. 2022;Cardona Guillén et al. 2022). For these indicators, other types of correlations should be investigated.
In addition to the usual activity indicators, we also computed the correlation between the line-by-line RVs and the total RV obtained from the CCF (also in Fig. 3). In very active stars, where the modulations observed in the total RV are clearly caused by activity, the total RV itself can also be considered an activity indicator. This seems to be the case for the targets under study (but see Sect. 5.4 about J10196+198), and, as expected, several lines show a clear strong correlation. It is important to note that if the target star hosts exoplanet companions, the total RV will also contain the Doppler shifts due to the gravitational pull of these companions. Hence, in such cases, the total RV is not a good proxy for stellar activity, and any correlations can be biased by the modulation caused by the orbiting companions, unless this modulation does not significantly contribute to the RV.

Correlation difference between activity indicators
In the following, we focus the analysis on the three indicators that show a simple, approximately linear, correlation with the line-by-line RVs: total CCF RV, CCF BIS, and serval's CRX. Aside from the correlation, BIS and CRX are the most effective indicators in tracing activity in very active, early-and mid-type M dwarfs as those analysed here, at least within the CARMENES GTO sample (Lafarga et al. 2021). To see if the correlation strength of different lines is consistent among these three activity indicators, we compared the R values obtained for the correlation between the line-by-line RVs and the indicators (see Fig. 4).
For J07446+035, J05019+011, and J22468+443 (the stars with the largest average pEW (Hα) and largest RV scatter), the three indicators show similar R scatter and similar values for all the lines, that is, lines with a strong correlation with CCF RV also show a strong correlation (anti-correlation in these cases) with BIS and CRX. Therefore, we expect subsets of active and inactive lines selected based on the correlation with these indicators to be similar. J10196+198 and the earlier type stars, J15218+209 and J11201-104, show less well-defined correspondence between the R values of the three indicators (i.e. the data points in Fig. 4 show a larger spread than in the three previously-discussed stars). This means that the three activity indicators result in correlations of slightly different strength (i.e. different R) for the same line.

Correlation strength as a function of the line wavelength
Next, we study the distribution of R values as a function of wavelength. In Fig. 5, we show the distribution of R values of all the lines obtained from the correlations with the three selected indicators (CCF RV, BIS, and CRX) as a function of the line wavelength, for J07446+035. In the same figure, we also show the correlation of the R values with the RV scatter of each line, measured as the weighted standard deviation of the line RV in all the observations (wstd RV, as in Fig. 2).
For the three indicators, the distribution of R values is not constant in wavelength. The average value of |R| increases from short to long wavelengths, peaking for lines between λ ∼ 7000 and 8000 Å, and decreasing again for redder wavelengths. Regarding the RV scatter of each line, its average value decreases from the bluest wavelengths up to ∼ 7500 Å, and remains constant for longer wavelengths. Figure 5 only shows data of J07446+035, but we observe a similar behaviour for the other five stars.
The increase of |R| with wavelength in the blue part of the spectrum seems to be related to the decrease in RV scatter. In the blue, the spectrum has lower S/N than in the red. Due to this, the RV of the bluest lines has large uncertainties and is not as precise as the RV of lines at longer wavelengths. Hence, this decrease in precision due to low S/N results in an increase of the RV scatter. The low S/N also drives the decrease in |R|. If the line RV measurements are not precise enough, it is not possible to detect any correlation with activity, resulting in |R| values close Article number, page 7 of 51  to 0. Indeed, we see that blue lines with large RV scatter have |R| ∼ 0. We conclude that these |R| values close to 0 do not mean that these lines are not sensitive to activity, but instead reflect the lack of precision in their RV measurement. In other words, the R coefficient seems to be biased at low S/N (which is what we observe in the blue part of the spectrum).
Regarding the red part of the spectrum, for wavelengths longer than ∼ 7500 Å, we do not expect the decrease in |R| to be caused by a decrease in precision or S/N, because the RV scatter of these lines remains approximately constant. This decrease in |R| could be due to wavelength-dependent physical effects, such as the temperature contrast effect. Activity features such as spots have cooler temperatures (and are hence darker) than the surrounding 'quiet' photosphere. This temperature contrast or difference in flux breaks the symmetric flux contribution from the blue-shifted and red-shifted hemispheres of the star, which results in a distortion of the spectral line profiles as the spot covers different parts of the rotating stellar disc. This temperature contrast effect decreases with wavelength, because the contrast in flux is less pronounced at redder wavelengths. That is, the effect on the spectral lines should be smaller at longer wavelengths. Therefore, this decrease in temperature contrast at red wavelengths could be the cause of the decrease in |R| with wavelength in the red part of the spectrum. This effect should be similar to what indicators such as the CRX trace . The decrease of |R| with wavelength is not seen bluewards of λ ∼ 7500 Å, perhaps because the decrease in S/N (and hence the loss in RV precision) dominates.
Another explanation for lines with no correlation (|R| ∼ 0) and large RV scatter could be that these lines are affected by activity in different ways, for instance, there could be a chromospheric component. Therefore, these lines would still show significant RV scatter, but would not be correlated with photospheric activity indicators such as those used here. Due to the fact that most lines with large RV scatter also show large RV uncertainties (implying a low S/N as the cause of the scatter), we do not expect such effects to affect a significant number of lines, but this is something that requires further work.

Selection of active and inactive lines
The strength of the correlation with the activity indicator R allows us to classify the lines based on their sensitivity to activity. We have selected several sets of inactive and active lines depending on their R value. We considered as inactive lines with R values around 0, |R| ≤ R cut , for several R cut from 0.1 to 0.4. As active lines, we selected lines with extreme R values. In the case of the correlation with the total RV, we selected lines with R close to 1, R > R cut , for several positive R cut , such as 0.4, 0.6, 0.8. For the correlations with BIS and CRX, we expect an anticorrelation if there is a modulation due to activity, therefore we selected lines with R < R cut , for several negative R cut , such as −0.4, −0.6, −0.8.
We also selected lines based on their RV scatter. As mentioned above, lines with the largest RV dispersion have an R close to 0 (Fig. 5), probably due to the fact that they do not contain enough RV information to yield a precise measurement, and not because they are in fact less sensitive to activity. Therefore, when selecting inactive lines, we also performed some cuts discarding lines with large RV scatter (wstd RV).
Another way to select correlated or uncorrelated lines reliably would be to take into account the line-by-line RV uncertainty, or the S/N of the line. Similarly to the cuts based on the RV scatter above, we could have discarded lines with a median RV uncertainty above a specific value, which could have helped us avoid selection biases. Avoiding selection biases could be relevant especially for the active line selection, since we did not use the scatter of the line-by-line RVs (because active lines would have a larger scatter), and hence, our only selection criteria is based solely on the value of R, which can be biased at low S/N. We performed several tests of active line selection by limiting the median RV uncertainty of the lines to several values around the overall median RV uncertainty (∼200-300 m s −1 ). These tests show that, despite the limited number of active lines used, the LAV RV scatter and the periodicities present in the LAV RVs are very similar to those obtained without this extra cut on the RV uncertainty. Hence, we decided to only include here the simpler case of a single cut on R value.
To test if the correlations obtained are statistically significant (i.e. to know if the R values are expected from random fluctuations), we computed their p-value. We see that for all stars and indicators, |R| values ≥ 0.3 have p-values close to 0 (<0.05), and that there are no high |R| values (i.e. close 1 or −1) with a high p-value. This indicates that the correlation is statistically significant. Also as expected, the smaller the |R| value, the higher is the p-value. Therefore, when performing cuts on the distribution of R values, we are indeed selecting or rejecting lines with a significant correlation with the activity indicators.
Tables containing information about the sensitivity to activity of the different lines are available online, one table for each of the six stars studied in this work. Specifically, each table includes the central wavelength of the line as measured in the spectral template used, the scatter of the line RV, and the Pearson's correlation coefficient R obtained for the correlation between the line RV and the three activity indicators considered (that is, three different R values, one per indicator). An example of the information contained in these tables can be seen in Figure 5. We note again that even though we use the term line, these lines correspond to minima in the spectrum (as identified in Lafarga et al. 2020) and are the result of blends of true atomic lines or features in molecular bands.

Total RV computation with selected lines
We used the different sets of active and inactive lines to recompute the total RV of each observation. As before, we used the LAV method (average of the individual line RVs). In this section, we show the results obtained for the six targets studied. To evaluate the effect of activity in the new total RVs, we analysed the change in the time series RV scatter, as well as the change of the activity-related signals present in the generalised Lomb-Scargle periodogram (Zechmeister & Kürster 2009). Here we only include figures of the results obtained with the correlation with the total RV for the first target analysed, J07446+035. For clarity and completeness, figures obtained with the correlation with the BIS and CRX for J07446+035, and all figures of the rest of the targets, can be found in Appendix B.

J07446+035 (YZ CMi, GJ 285)
J07446+035 (YZ CMi, GJ 285) is a mid-type M dwarf and one of the most active stars in our sample. It shows a large RV scatter, of almost 90 m s −1 , due to stellar activity. Global RVs and both CRX and BIS show modulations at the stellar rotation period, and these two activity indicators show strong linear correlations with the RVs (see Zechmeister et al. 2018;Tal-Or et al. 2018;Lafarga et al. 2021;Schöfer et al. 2022). Baroch et al. (2020)   used the deviations from a straight linear correlation to constrain starspot and convective motion parameters for this star. As mentioned above, for this star we used an initial line list derived from a template of observations of J07446+035 itself, with ∼ 2000 lines.
Inactive lines We show a summary of the LAV RV time series obtained for the different sets of inactive lines tested in Fig.  6. As we decrease the range of the R values of the averaged lines towards zero, and remove lines with large RV dispersion (wstd RV), the scatter of the total RV time series decreases. For the three indicators, the smallest RV scatter occurs when the RVs are computed using lines with |R| ≤ 0.1 or ≤ 0.2 and wstd RV 200 m s −1 , but the scatter is not minimised exactly for the same cuts. For the correlation with the total RV, the scatter becomes ∼ 5 times smaller than the initial one, and for the correlation with BIS and CRX, ∼ 4 times smaller. This decrease in RV scatter occurs up to a specific point. After that, if we further decrease the maximum |R| and/or wstd RV of the selected lines, the scatter starts to increase. As we decrease the number of averaged lines, the uncertainty of the averaged RV tends to increase too, due to the fact that we have less RV content. However, when the number of averaged inactive lines is smaller than ∼ 100, the uncertainty in the averaged RV can be larger than the dispersion of the data points (see e.g. B.6 and other figures in Appendices B and C). This increase could indicate that the LAV RV errors are overestimated if a small number of lines is used to compute the average, and a more robust or accurate method of determining the errors would be needed.
To further analyse the recomputed RVs and their modulations, we computed the periodogram of the different RV datasets. Fig. 7 shows the LAV RV time series, the number of lines used, and the corresponding periodograms for four sets of lines selected using the correlation with the total RV. The different RV datasets shown correspond to those obtained with lines having wstd RV 200 m s −1 and four different maximum values of |R|, 0.4, 0.2, 0.1, and 0.05, which include some of the datasets with the smallest time series RV scatter. For comparison, we also show the original RV obtained using all the lines. Figs. B.1 and B.2 show the same but for the LAV RVs obtained with BIS and CRX, respectively. We observe a similar behaviour for the lines selected based on the correlations with the three indicators, which was expected, since for J07446+035 all three indicators show very similar R values for the same lines (as seen in Fig. 4).
We see that as the RV scatter decreases when restricting the lines used, the power of the periodogram peak at P rot is also re-Article number, page 11 of 51     duced. This seems to indicate that, by rejecting lines with large |R|, we are effectively reducing the activity signal present in the RV measurements. If all the lines were equally affected by activity, we would expect the RV scatter to become larger as we restrict the R of the averaged lines. This is because, as we decrease the range of allowed R values, we are decreasing the number of lines used, and hence, we are degrading the precision of the RV measurement. Despite that, since the RV scatter decreases as we reduce the number of lines used, it seems that, for this star, reducing the red noise caused by activity has a larger effect than the increase in white photon noise due to the reduction in RV content, as also found by Dumusque (2018) for α Cen B. We note that some of the smallest RV scatters are obtained using datasets with only ∼ 200 lines, about 10% of the original ∼ 2000. With more restrictive cuts, the RV scatter starts to increase. This probably occurs because the number of lines is too small to obtain sufficient precision in the RV measurements, and hence the photon noise dominates. For these sets of lines, the periodogram peak at P rot also becomes less significant, and the highest peak in the periodogram is not related to P rot any more. This could reflect the fact that the RVs of the datasets with the smallest number of lines contain mostly noise. Figures 8, B.3, and B.4 show the LAV RV time series, number of lines, and periodograms of three sets of active lines, selected using the correlations with the total RV, BIS, and CRX, respectively. Here, the scatter increases significantly as we restrict the minimum value of R towards 1 for the correlation with the total RV, or towards −1 in the case of BIS and CRX, becoming more than 2 times larger for the most extreme sets. This increase could be due to the fact that we are only using the most active lines, but also to an increase in the photon noise, because we are reducing the number of lines used. In all cases, the periodogram shows the highest peak at the stellar rotation period, all with similar power. We note that for R cut = 0.8, using 10 to 20 lines (0.5 to less than 1 % of the initial set of lines), depending on the activity indicator, the periodogram still clearly displays a signal due to the stellar rotation, very similar to the periodogram obtained using the ∼ 2000 initial lines.

Active lines
5.2. J05019+011 (1RXS J050156.7+010845) J05019+011 (1RXS J050156.7+010845) is, as one can expect from its name, a relatively strong X-ray emitter (Haakonsen & Rutledge 2009). Based on its kinematics and activity indicators, it has repeatedly proposed as a member of the young β Pictoris moving group (e.g. Schlieder et al. 2012;Alonso-Floriano et al. 2015a). In the CARMENES data, it has an initial RV scatter of about 90 m s −1 from the serval RVs. The LAV method results in a larger scatter, close to 106 m s −1 . We only have 19 observations for this target, however this is sufficient to see that the global RVs, BIS, and CRX show a significant modulation at the stellar rotation period, and that the activity indicators and the RVs are linearly correlated, as in J07446+035 (see Lafarga et al. 2021). For this star we used the line list built from the J07446+035 template. Fig. B.5 shows the RV scatter obtained for the different sets of inactive lines considered. Figs. B.6, B.7, and B.8 show the RV time series, number of lines, and periodograms of four datasets, obtained using the correlation with the total RV, BIS, and CRX. In this case, the RV scatter is minimised for the datasets that include lines with |R| ≤ 0.3 or ≤ 0.4 and wstd RV ≤ 300 m s −1 , for the correlation with the total RV, BIS, and CRX. The decrease in RV scatter is of ∼ 2.5 times compared to the initial set of lines. The periodogram shows a significant peak at 2.09 d, close to but not exactly at P rot , 2.12 d, whose power decreases as we restrict the lines used. We note here that Revilla (2020) find a P rot of 2.088 d using TESS data, which is closer to the value derived from the RVs here.

Inactive lines
Active lines Regarding the active lines (Figs. B.9, B.10, and B.11), the RV scatter increases as we restrict the number of lines used, and the periodogram power of the peak at 2.09 d remains significant. Contrary to the case of J07446+035, the periodogram of the RVs obtained by the average of the lines with R > 0.80 for the total RV, or R < −0.80 for BIS and CRX, is less clear than the previous two cuts, and the peak at P rot is not as significant.

J22468+443 (EV Lac, GJ 873)
J22468+443 (EV Lac, GJ 873) is well known flaring mid M dwarf. In the data analysed here, the periodograms of the RVs and the indicators show signals at both P rot (4.38 days) and 1 2 P rot , with the strongest signal at 1 2 P rot , and the indicators and the RVs show linear correlations (Lafarga et al. 2021;Schöfer et al. 2022;Jeffers et al. 2022;Cardona Guillén et al. 2022). The stronger signal at 1 2 P rot is probably due to the fact that RVs and some indicators (including BIS and CRX) show a modulation with a double dip structure within one rotation period, which favours 1 2 P rot over P rot . Interestingly, for the same set of observations, other indicators show a single dip structure. This could be due to different indicators tracing different moments of the line profile (Lafarga et al. 2021;Schöfer et al. 2022;Jeffers et al. 2022). CCF and LAV RVs have a scatter of ∼ 40 m s −1 , about 10 m s −1 (1.2 times) smaller than the one obtained with serval ( Fig. A.3). We used the line list built from the J07446+035 template. Similarly to the two previous stars, the correlations obtained with the total RV, BIS, and CRX are very similar, and hence, we obtain similar results for the three indicators (Fig. 4).
Inactive lines The smallest RV scatter occurs when using lines with |R| ≤ 0.1 or ≤ 0.2 and wstd RV 200 or 400 m s −1 (Fig. B.12). The minimum scatter is ∼ 18 − 19 m s −1 , 2.2 − 2.4 times smaller than the initial ∼ 40 m s −1 , depending on the indicator. Regarding the activity modulation present in the RVs (Figs. B.13, B.14, and B.15), the periodogram shows a very significant signal at 1 2 P rot , 2.19 d, and a less significant one at P rot , 4.38 d. As we restrict the lines used, the peak at 1 2 P rot decreases in power, becoming non-significant for the datsets that result in some of the smallest time series RV scatter (datset with lines with |R| ≤ 0.1 and wstd RV ≤ 200 m s −1 ). We see that for some datsets, for instance the one with lines with |R| ≤ 0.2 and wstd RV ≤ 200 m s −1 , the peak at P rot increases its power, becoming significant.
Active lines For this star, the strength of the correlations between the individual line RVs and the activity indicators does not reach values as large as R = 0.8, as for J07446+035 or J05019+011, so the most restrictive cut performed to select active lines is at R > 0. 60 (Figs. B.16,B.17,and B.18  used. The power of the periodogram peak at 1 2 P rot slightly increases, too. We observe a similar behaviour for the correlations obtained with the three indicators.

J10196+198 (AD Leo, GJ 388)
J10196+198 (AD Leo, GJ 388) has been the subject of recent, deep analyses to disentangle if the origin of the RV signal is due either to a planetary companion or to stellar activity (e.g. Carleo et al. 2020;Kossakowski et al. 2022). It shows an activity level similar to that of J22468+443 (similar pEW (Hα) and log(L Hα /L bol )), but has an RV scatter significantly smaller than J22468+443 or the other two previous stars (∼ 18 m s −1 for J10196+198, while for the previous stars it is > 40 m s −1 ). The number of observations is relatively small, 26, but the periodograms of the RVs, BIS, and CRX, show a significant peak at ∼2.24, close to P rot , so the number of observations and sampling are sufficient for observing activity-related modulations.
The different RV amplitudes between J10196+198 and J22468+443 could be caused by different visible spot configurations. The spin axis of J10196+198 has a relatively low inclination (i ∼ 14 • , Kossakowski et al. 2022) in comparison to J22468+443 or the previous stars (which have i ≥ 60 • , see e.g. Morin et al. 2008). This close to pole-on orientation could cause any visible co-rotating spots to induce a smaller modulation in the RVs, since they would not abruptly appear and disappear as the star rotates. Also, the photosphere of J10196+198 could be more homogeneously spotted, which would also induce smaller RV modulations.
As seen in Sect. 4.1, Fig. 4, for this star there are no lines whose RV shows a very strong correlation with the activity indicators. The correlation coefficients R do not reach very large values, contrary to the findings for the three previous stars. This means that, by using these correlations, we are not able to identify lines that have a strong contribution to activity, even though the activity indicators show significant peaks at P rot . Aside from this, the R values obtained for each line depend on the activity indicator used to compute the correlation. All this could indicate that the correlations that we find do not have much information related to the activity of the star.
As mentioned before, this star has been speculated to host an exoplanet with an orbital period similar to the stellar rotation, ∼2.23 d, in a 1:1 spin-orbit resonance (Tuomi et al. 2018), although this claim has been challenged by further studies (Carleo et al. 2020;Robertson et al. 2020). However, since the stellar rotation and the hypothetical planet have the same period, it is difficult to completely rule out the planet's existence (Kossakowski et al. 2022). Since the RVs of this star could potentially contain the signal induced by the presence of an orbiting planet, using the total RV as an activity indicator is not a good choice here, because the correlations with the line RVs would not solely reflect the effect of activity. It could be argued that the planet amplitude is much smaller than the modulation due to activity. However, it would be very challenging to discern the amplitude of said planet from the residual activity present in the RVs computed with inactive lines. We presented an initial analysis of J10196+198 in Kossakowski et al. (2022), where we studied activity-insensitive lines. Here we summarise it and show in addition the effect of selecting activity-sensitive lines.
Inactive lines As we restrict the lines used to those with R closer to 0, we do not see a significant decrease in the RV scatter, but an increase (Fig. B.19). Only for a few datsets that contain almost the same lines as the initial one, obtained with the correlation with RV and CRX, the scatter decreases, but not significantly (about 1 m s −1 less than the initial 16.6 m s −1 ). For the datsets obtained using the correlation with BIS, none shows a decrease in the RV scatter. We note that the initial RV scatter obtained with the LAV method is lower than that obtained with the SERVAL RVs, 18.4 m s −1 , but larger than the one obtained with the CCF RVs, 15.0 m s −1 , which is close to the smallest value obtained with the inactive lines datset.
The periodograms show a peak at the P rot of the star, 2.24 d, which decreases in power as we restrict the lines used (Figs. B.20,B.21,and B.22). This applies to all three indicators, but for the BIS the decrease is smaller than for the total RV and the CRX. Since the RV scatter does not decrease significantly, but remains the same or increases, we attribute this decrease in the significance of the peak at P rot to the increase in photon noise due to the smaller number of lines used in the datsets.
Active lines As we restrict the line selection to those showing the stronger correlations, we see that the RV scatter increases significantly (Figs. B.23,B.24,and B.25). The activity signal present in the RVs, however, does not remain constant. The peak at P rot shows a decrease in power for the datsets using the lines with the stronger correlations, and even completely disappears in the RV datsets computed using lines that show strong correlations with BIS. This could indicate that the lines that we identified as active are in fact not related to activity, which agrees with the fact that the strength of the correlations between the individual line RVs and the activity indicators were not large (Fig. 4).

J15218+209 (OT Ser, GJ 9520)
J15218+209 (OT Ser, GJ 9520) is one of the two early M dwarfs analysed. It has a large RV scatter, 37 m s −1 . RV, BIS, and CRX periodograms show a peak at P rot , 3.37 d, but it is only significant (FAP < 0.1 %) in the case of the RVs (Lafarga et al. 2021). For this star we used the line list built from the J15218+209 template itself.
Inactive lines The smallest scatter of the RV time series occurs when using lines with |R| ≤ 0.1 − 0.2 and wstd RV ≤ 100 − 300 m s −1 (Fig. B.26). The RV scatter attains values of 13, 19, and 17 m s −1 , 2.7, 2.0, and 2.2 times smaller than the initial 37.2 m s −1 , for the correlations with the total RV, BIS, and CRX, respectively. The periodogram of the initial datset shows a significant peak at P rot , 3.37 d (Figs. B.27,B.28,and B.29). For the datsets computed from the correlations with the total RV and CRX, the P rot peak decreases in power and reaches a FAP of 0.1, which corresponds to a datset with one of smallest RV scatters. For the BIS datsets, the decrease in power is not as clear, and for the datset with the smaller RV scatter, the peak has a similar power as using all the lines.
Active lines The time series RV scatter increases as we restrict the lines towards those with stronger correlations (Figs. B.30,B.31,and B.32). For the datsets obtained with the correlations with the total RV and CRX, the power at the P rot peak remains significant but decreases slightly. For the BIS datsets, the power decreases significantly. This seems to indicate that the BIS correlations are not as reliable as those obtained from the total RV or CRX. 5.6. J11201-104 (LP 733-099) J11201-104 (LP 733-099) is the other early-type star analysed. The average log(L Hα /L bol ) indicates that J11201-104 is less active than J15218+209, and it shows a smaller scatter in its RV time series, of about 18 m s −1 , in the RVs obtained with serval, the CCF, and the LAV method (Fig. A.6). It has a P rot of 5.643±0.005 (Revilla 2020;Shan et al. 2022), however its periodograms do not show significant signals. CRX and BIS show linear correlations with the RVs, but they are less clear than in the previous stars. For this star, we used the set of lines derived from the J15218+209 template.
Inactive lines The RV time series with the smallest scatters are obtained for the line sets with |R| ≤ 0.1 − 0.3 and wstd RV ≤ 200 m s −1 (Fig. B.33), in the case of the correlations with the total RV and CRX. The scatter decreases from ∼ 19 m s −1 to ∼ 11 m s −1 for the RV datsets and to ∼ 13 m s −1 for the CRX ones, 1.7 and 1.5 time smaller, respectively. In the case of BIS, there are several datsets that show a small scatter close to the minimum one, which is about ∼ 15 m s −1 , 1.3 times smaller than the initial one. The periodogram does not show any significant peaks (Figs. B.34,B.35,and B.36).
Active lines Regarding the active lines, the scatter increases significantly, but the periodogram does not shown any significant peaks for any of the datsets used (Figs. B.37, B.38, and B.39).

J07446+035, J05019+011, J22468+443, and J15218+209
For these stars, the scatter of the LAV RV time series significantly decreases when restricting the lines used towards those whose RV shows no correlation with the activity indicators (|R| ∼ 0), and by removing lines with large RV scatter (limited wstd RV), that is, when using only inactive lines to compute the RV. The activity-related signals in the RV periodogram also lose significance when using only inactive lines. This indicates that the modulation due to activity present in the RVs is mitigated. These four stars are those with the largest time series RV scatter, from ∼ 37 to ∼ 100 m s −1 , and those whose line RVs show the strongest correlation with the activity indicators. When using sets of lines in which the conditions are more restrictive than those mentioned above (i.e. sets with less lines), the RV scatter increases, probably because the photon noise starts to dominate over the activity-driven variability.
There are specific sets of lines for which the RV scatter is minimised. These sets change depending on the star and the activity indicator used, but in general, the minimum scatter occurs when using lines with |R| 0.1 or 0.3 and wstd RV 150 or 300 m s −1 . For J07446+035, the scatter can be decreased ∼ 5 times with respect to the initial one. For the other three stars, the maximum decrease is between ∼ 2 and 3 times. The number of lines in the 'best' sets of lines is ∼ 100 to 200 for the mid-type stars J07446+035, J05019+011, and J22468+443, and ∼ 400 for the early-type J15218+209.
For J07446+035, J22468+443, and J15218+209, RVs computed using line sets based on the correlation with the total RV are those that result in the lowest scatter, compared to RVs obtained from line sets based on the correlation with the other two indicators (BIS and CRX). Between BIS and CRX, for J15218+209, the CRX line sets result in smaller RV scatters than those of BIS. For J07446+035 and J22468+443, both BIS and CRX datsets result in similar minimum scatters. For J05019+011, the datsets of the three activity indicators reach similar minimum values.
Doing the same but selecting lines that show a strong correlation with an activity indicator (R ∼ 1 or −1, depending on the activity indicator used), the time series RV scatter increases significantly. This could be due to the fact that we are enhancing the activity signal, but also to the increase in photon noise caused by the decrease in the number of lines (RV content) used. For most of the line sets tested, the activity-related signals in the RV periodograms show similar power as in the periodogram computed from the original RVs. However, for the most restrictive sets of lines (R ≥ 0.6 to 0.8 or −0.6 to −0.8, depending on the indicator), the signal loses significance, which could reflect the fact that the photon noise has increased due to the low number of lines used.
We based our choice of R and wstd RV by minimising the LAV RV. We note that this might not be the optimal criterion to select the cuts in these parameters, because by minimising LAV RV we might eliminate or underestimate the amplitude of other astrophysical signals such as planets. A way to overcome this caveat could be to adopt general thresholds when selecting inactive lines based on the typical R and wstd RV cuts obtained for similar stars without planets.
J10196+198 This star shows similar activity level as J22468+443, however, its initial RV scatter is significantly smaller (∼ 18 m s −1 compared to ∼ 40 m s −1 for J22468+443), which may in part be due to the low estimated inclination of the star or to an homogeneously spotted photosphere. The line RVs are less correlated with the activity indicators than in the four previous targets (the correlations between the individual line RVs and the activity indicators show smaller strengths, i.e. R values less close to 1 or −1). The different datsets tested did not result in a significant decrease in RV scatter, probably due to the fact that the correlations between the line RVs and the indicators are not sufficiently strong. The periodogram shows a decrease of power at P rot , probably due to an increase of photon noise in the recomputed RVs. Regarding the sets of active lines, power at P rot decreases significantly, which agrees with the fact that the correlation between the line RVs and the indicators is not strong. We then also attribute this decrease in power to increasing photon noise.
J11201-104 This object shows a similar initial scatter as J10196+108 (∼ 19 m s −1 ) and, as in the case of J10196+198, the correlations between the line RVs and the activity indicators are not strong. Despite that, there is a decrease in the RV scatter for some sets of lines obtained with similar selection criteria as the sets that minimised the scatter in the four stars J07446+035, J05019+011, J22468+443, and J15218+209. The maximum scatter decrease in this case is ∼ 1.7 times the initial one, with a set of about 500 lines. The RV periodogram shows no significant signal.

Lines in different stars
Next, we investigate if the sensitivities to activity of the different lines (i.e. their R values) are similar in different stars. We performed pairwise comparisons of the stars in two groups: the midtype stars J07446+035, J05019+011, and J22468+443, which used the same initial line list created from a J07446+035 template, and the early-types J15218+209 and J11201-104, which used the initial line list created from a J15218+209 template.
We exclude J10196+198 from this analysis because we were not able to find a set of lines that mitigated the activity signal present in the RVs. For clarity and completeness, we include the corresponding RV time series and periodogram figures in Appendix C.

J07446+035 and J05019+011
We compared the distribution of R values of J07446+035 and J05019+011 as a function of the line wavelength in Fig. 9. The R values shown are those obtained from the correlation between the individual line RVs and the total RV. We show the correlation with this indicator as an example, but we obtained similar results for the correlations with CRX and BIS. Of the 2207 initial lines of the J07446+035 line list, we show here 2028 lines, which are those for which we were able to measure a reliable RV for both stars (i.e. after removing those with low S/N and non-common lines due to different overlap with tellurics or order ends). The distribution of R values of J07446+035 is slightly narrower and shifted towards 1 with respect to that of J05019+011. Many of the lines show different R values in the two stars, since there is only a weak correlation between the two sets (R ∼ 0.3).
We also compare the datsets of lines for which we obtained the smallest RV scatter: lines with |R| ≤ 0.2 and wstd RV ≤ 200 m s −1 for J07446+035, and |R| ≤ 0.3 and wstd RV ≤ 300 m s −1 for J05019+011, which include 188 and 218 lines, respectively. Of these lines, 53 are common in both datsets. This represents 24−28 % of the selected lines, and 2.6 % of the initial 2028 lines.
Next, we recomputed the RV of each star using the selected lines of the other star, as well as using only the common selected lines. Fig. C.1 shows the RV time series, lines used, and periodograms of the RVs of J07446+035, computed using these datsets. The RVs recomputed using the best datset of J05019+011 (blue data points in the figure) have a scatter smaller than the initial one, ∼ 54 m s −1 compared to ∼ 86 m s −1 , but the periodogram shows a peak at P rot almost as significant as in the initial datset. The RVs recomputed using the 53 common lines (green data points) show a scatter similar to those obtained using the best data of J07446+035 (orange data points) but slightly larger, of ∼ 23 m s −1 compared to ∼ 20 m s −1 , and a periodogram with a peak at P rot with a low significance. Figure C.2 shows the same as Fig. C.1 but for J05019+011. For the best datset of J07446+035 (orange), the scatter decreases from the initial ∼ 106 to ∼ 60 m s −1 , and the peak close to P rot decreases in power significantly. The RVs obtained from the common lines have a scatter very similar to the one obtained with the best lines of J05019+011 (blue and green), and both periodograms show no significant peaks. Figure 10 compares the R values of J07446+035 and J22468+443. In this case, the R distribution of J07446+035, which is the more active of the two stars, is wider and reaches values closer to 1 than that of J22468+443. The R values of the same lines for the two stars are more similar than for the previous two stars (J07446+035 and J05019+011), since now the correlation between the two R datsets is stronger (R = 0.6). The cuts yielding the lowest RV scatter are |R| ≤ 0.  lines, there are 52 common to both stars (28 − 30 % of the selected lines, and 2.5 % of the initial 2052 lines). Figures C.3 and C.4 show the RV time series and periodograms recomputed using the best set of lines of the other star, and using the common selected lines, for J07446+035 and J22468+443, respectively. In both cases, using the line list that minimises the RV scatter of the other star results in a significantly smaller RV scatter than initially, about 1.8 − 2.1 times smaller. For J07446+035 (blue), this decrease in half is far from the minimum scatter obtained with its own best datset (orange), which is about 4.4 times smaller than the initial one, and the periodogram continues to show a very significant peak at P rot .

J07446+035 and J22468+443
But for J22468+443 (orange) the decrease is close to the one obtained with its own best datset (blue), which is about 2.4 times smaller, and the periodogram peak at 1 2 P rot disappears. As with its own best datset, for J22468+443 there is now more power at P rot than at 1 2 P rot (but the peak at P rot does not become significant in this case). Regarding the common selected lines, in both cases the scatter is close to the minimum one obtained with the best datset of each star, and the periodogram does not show significant peaks related to activity.  Fig. 9, but for J22468+443 and J05019+011.

J22468+443 and J05019+011
For J22468+443 and J05019+011, the correlation between R values is low (R = 0.3, Fig. 11). There are 40 common selected lines, 18 − 23% of the 172 and 218 lines that minimise the RV scatter of the stars, which is 2.0 % of the initial 1963 lines.
The recomputed RV time series and periodograms are shown in Figs. C.5 and C.6. For J22468+443, using the best datset of J05019+011 (blue) results in a decrease in the RV scatter compared with the initial one (1.4 times smaller), but not as small as the minimum obtained with its own best datset (2.4 times smaller). The periodogram continues to show a significant peak at 1 2 P rot , although with less power than initially, and in this case, the power at P rot does not increase. For J05019+011, using the best J22468+443 datset (orange) does not decrease the RV scatter significantly (1.2 times smaller compared to 2.5 times smaller for its own best datset), but the periodogram does no longer show a peak at P rot . Using the common selected lines, for both stars the RV scatter decreases, but not as much as using their own best datset, and the periodogram does not show any significant peaks.

J15218+209 and J11201-104
For these early-type stars, the correlation between R values is even lower than for the mid-type stars (R ∼ 0.2, Fig. 12). The selection cuts that result in the smallest RV scatter are |R| ≤ 0.1 and wstd RV ≤ 300 m s −1 for J15218+209 and |R| ≤ 0.2 and wstd RV ≤ 200 m s −1 for J11201-104, which correspond to 355 lines for J15218+209 and 482 lines for J11201-104. Of these, 122 lines are common in both datsets (25 − 34 % of the selected lines, and 7.6 % of the initial 1610 lines).
In both cases, using the best datset of the other star to recompute the RVs results in a scatter very similar to the initial one ( Fig. C.7, blue, for J15218+209, and Fig. C.8, orange, for J11201-104). In the case of J15218+209, the periodogram also looks similar to the original one, with a significant peak at P rot . For J15218+209, the RVs computed with the common selected lines (green) have a scatter smaller than the initial one (1.3 times smaller), but larger than the one obtained using the best line set of the star itself (which was 2.7 times smaller, orange datset). In the case of J11201-104, the scatter of the common selected lines RVs is larger than the original one. If we focus on the line sets for which the RV activity signal is most strongly mitigated, in general the number of lines common in the star pairs is low. For the pairs of mid-type stars (J07446+035 and J05019+011, J07446+035 and J22468+443, and J22468+443 and J05019+011), only 2.0 % to 2.6 % of the initial lines are common in the best sets of the respective stars. For the early-type pair (J15218+209 and J11201-104), this number is larger, of almost 8 % (but their best line sets already have a few hundreds of lines more than those of the mid-type stars). These 'common' sets have ∼ 50 lines in the case of the mid-type pairs, and 122 lines for the early-type one.

Overview of the results
For the stars in each pair, we used sets of inactive lines based on the correlation of the other star in the pair to compute RVs. This results in small changes in the RV time series compared to using all the lines, due to the lack of correspondence between the correlation strength of the same lines in different stars. In general, the scatter decreases, but not as much as using a set of inactive lines based on the correlations of the star itself, and the periodogram peaks related to P rot remain significant.
We also recomputed RVs using inactive lines common to the sets that minimised the RV scatter in the two stars of the pair. As mentioned above these sets of common inactive lines have from ∼ 50 to 122 lines, depending on the pair of stars, which represents from 18 to 34 % of the lines in the best sets, a significant decrease in the number of lines. In general, using these sets results in RVs and periodograms similar to those obtained using the best set of lines from the star itself, but with slightly larger RV scatters.
The stars in the two groups used here all show high activity levels, but do not have the exact same properties. The spectral types, rotational velocities, and metallicities are similar but not identical. Based on the results for these stars, we are not able to obtain a general set of lines, even when considering spectral type intervals, for which the effect of activity in the RVs is minimised.

Discussion and final remarks
We have studied activity effects on individual spectral lines in a set of six active early-and mid-type M dwarfs observed with CARMENES-VIS as part of the CARMENES GTO sample.
Here we summarise and put our findings into context.
We used the raccoon pipeline to select lines in the stellar spectra and compute line-by-line RVs by comparing the centroid of the line with a reference. By averaging these line-by-line RVs, we computed a global RV per observation, obtaining values comparable to RVs from those resulting from the CCF and template-matching techniques. However, this similarity is only true because the stars selected show large RV dispersions due to stellar activity, which are significantly larger than the RV uncertainties. In fact, the RV uncertainties of the LAV RVs are about one order of magnitude larger than those obtained with the CCF or template-matching methods.
We analysed the correlation strength between these line-byline RVs and several spectroscopic activity indicators, following a method analogous to that described in Dumusque (2018). Amongst the different indicators analysed (global RV of the spectrum, CCF FWHM, contrast and BIS, CRX, dLW, and I Hα ), we find that only the global RV, BIS, and CRX show significant linear correlations with the line-by-line RVs (Lafarga et al. 2021;Jeffers et al. 2022;Cardona Guillén et al. 2022).
Using the strength of these correlations, which we measured with Pearson's correlation coefficient R, we classified the lines according to their sensitivity to activity. We find that the R coefficient is a biased indicator of the correlation with activity when measurements have low precision. This is true for lines located in low S/N regions of the spectrum, especially in the bluer region, which leads to large RV uncertainties.
We then used different sets of lines to compute a new global RV of each observation. With lines having a strong correlation with the activity indicators (i.e. activity-sensitive lines), we obtained activity-dominated RVs, while with sets of lines showing weak correlations (i.e. activity-insensitive lines), we decreased the effect of activity in the RVs. We note here that, despite referring to them as 'activity-insensitive' lines, the recomputed RVs still have a significant scatter, larger than the RV uncertainties, and in some cases the RV periodograms still show lowsignificance signals related to P rot . Therefore, these lines still contain some contribution from activity, that is, they are the least activity-sensitive lines. We only use activity-insensitive lines for clarity throughout the text. The decrease in activity has been evaluated by analysing the total RV scatter and the presence of significant activity-related signals in the RV periodogram. We have been able to effectively find lines with different correlation strengths and mitigate activity on the RV for five stars of our sample of six: J07446+035, J05019+011, J22468+443, J15218+209, and J11201-104. The maximum decrease in RV scatter obtained using sets of inactive lines is ∼ 2 to 5 times the initial one. These sets of inactive lines have of the order of 100 lines, while initially we started with about 2000 lines.
For J10196+198 (AD Leo), which shows a lower initial RV scatter and weaker correlations, the method used here is not able to distinguish between active and inactive lines, and hence we did not see an improvement in the RV time series by using subsets of lines (see also Kossakowski et al. 2022). This could be because the precision of the individual line RVs is not sufficiently high to deliver global RV time series with smaller scatters. It is also possible that the correlations with the activity indicators are not reliable, either because of the low precision of the RVs or the indicators, or because of the way we quantified the correlations, which could have hampered the classification of lines according to their sensitivity to activity.
We also studied if the same lines show similar correlation strengths between the line-by-line RV and the activity indicators in different stars. By doing this, we aimed to see if a set of inactive lines could be generalised for stars with similar characteristics. We performed pairwise comparisons of the correlation strength of the mid-type M dwarfs, J07446+035, J05019+011, and J22468+443, and, separately, the early-type M dwarfs, J15218+209 and J11201-104. We find that the same insensitive lines in different stars, in general, do not show similar correlation strengths, that is, there is a different activity dependence for the same lines in different stars even if the stars are similar in terms of spectral type, activity level, and rotation. Such a lack of consistency could be due to the fact that most absorption features in M dwarf spectra are line blends rather than wellisolated, single lines. Lines in different stars could be blended to varying degrees, and the combination of the activity-affected profiles of the blended lines could result in different RV variations depending on the specific target star. Further analysis is needed to understand these results. Using the best set of inactive lines obtained for a specific star to recompute RVs of another target star, even if they have similar properties, is less effective (i.e. results in a less significant reduction of the RV activity signal) than using a line set obtained from the target star itself. Using only the lines common in the best sets of two similar stars achieves better results. However, since the number of common lines is very small, the RV scatter is still large, probably due to the fact that photon noise starts to dominate (i.e. a larger number of lines would be needed to improve the RV precision). Therefore, to achieve the maximum mitigation of activity, the best strategy appears to be selecting lines based on the correlations of each individual star.
The same observations of the six stars analysed here have also been studied by Cardona Guillén et al. (2022). Similarly to Tal-Or et al. (2018), Cardona Guillén et al. (2022 studied the correlation between the total RV and several indicators of stellar activity, but increased the sample of stars used by including most young stars in the CARMENES-GTO sample and analysed a larger number of activity proxies. Moreover, Cardona Guillén et al. (2022) went one step further and used the correlation between RV and activity indicators to correct for activity effects in the RVs. Jeffers et al. (2022) also performed a similar decorrelation with the CRX and the centre of light for several sets of J22468+443 (EV Lac) observations. Table 2 shows a comparison of the results obtained with the line-by-line approach presented here and the de-correlation performed by Cardona Guillén et al. (2022). The decrease in RV scatter obtained by both methods is similar. Similarly, for J22468+443 (EV Lac), Jeffers et al. (2022) obtained a reduction factor of ∼3 to 4, depending on the set of observations used. This was expected because both approaches rely on the use of linear correlations between RV and activity indicators. As opposed to the de-correlation approach, the line-by-line approach can make use of the total RV as a proxy of activity, and hence it does not depend on extra indicators of activity. However, as mentioned above, the total RV modulation can be affected by unknown companions and, for different types of star, other indicators might be better tracers of activity, so using the total RV is not necessarily the best option. On the other hand, the de-correlation approach is simpler and is not limited to bright, relatively slowly rotating M dwarfs showing significant RV scatter.
Our results from combining lines for different stars are in contrast with those obtained by Bellotti et al. (2022) with M dwarf spectra from ESPaDonS and NARVAL. Bellotti et al. (2022) were able to achieve a reduction in the total RV scatter of EV Lac (J22468+443), AD Leo (J10196+198), and DS Leo by applying a line list of activity-insensitive lines based on spectra of EV Lac. These results, however, cannot be immediately compared with ours. Bellotti et al. (2022) used a line list based on the VALD database and did not always remove telluric features initially (their algorithm is able to do that afterwards), while our lines have been empirically found in the observed spectra and we have removed any lines overlapping with tellurics. Their initial RV scatters (i.e. using all the lines) are higher than those we obtained with CARMENES-VIS. For EV Lac, the initial RV scatter of the dataset in Bellotti et al. (2022) is of 182 m s −1 , and for AD Leo, 110 m s −1 , while with CARMENES-VIS we have 40 m s −1 and 17 m s −1 , respectively. The final RV scatters (i.e. using a line list that minimises the scatter) obtained by Bellotti et al. (2022) are still larger than our starting values.
One of the reasons for these differences could be the different wavelength ranges of the instruments. The wavelength range of both ESPaDonS and NARVAL include bluer wavelengths than CARMENES-VIS (they start at ∼ 3700 Å, while CARMENES-VIS starts at 5200 Å), and hence, the masks the authors used contain a significantly higher number of lines in the blue than in the red wavelength range. The use of these bluer lines could lead to differences between the RVs measured with different instruments. However, to test variability with wavelength, the authors computed RVs using lines (in this case with tellurics having been removed previously) bluewards and redwards of 5500 Å separately (i.e. the red set being comparable to the CARMENES-VIS wavelength range), and they found no decrease in the RV scatter with either set of lines. Therefore, it is not clear whether the bluer range of the instruments has a significant impact on the differences mentioned above. Another factor that could be playing a role if the fact that ESPaDonS and NARVAL have slightly less resolution (R ∼ 70 000) than CARMENES-VIS (R = 94 600), making the identification of single lines harder. Furthermore, the datasets in Bellotti et al. (2022) and in this work were observed at very different times (between 2005 and 2016 for EV Lac and in 2008 for AD Leo), hence, the stars could have intrinsically different activity levels, which would also contribute to the differences in RV.
In this work we show that for a small sample of active stars with relatively large RV scatters, it is possible to select specific spectral lines to compute RVs that are affected by stellar activity to varying degrees. Our method can be expanded in several ways.
-The line-by-line RVs were computed by fitting a Gaussian model to the lines, finding their centroids, and comparing them to a list of reference wavelengths. In most cases, our lines do not have clear Gaussian profiles, so we expect to obtain more precise individual RVs by fitting other kinds of functions (e.g. a parabola in the line core such as in Reiners et al. 2016;Liebing et al. 2021) or a model of the spectrum to the region around each individual line (e.g. Dumusque 2018; Artigau et al. 2022). Instead of working on a line-by-line basis, for M dwarfs we could also consider groups of lines, such as those from molecular bands, and measure RVs by template matching. These modifications of the method could improve the precision of line-by-line RVs, and the reliability of the correlations between the line RVs and the activity indicators, allowing for a better classification of the sensitivity of the lines to activity. -To quantify the correlations, we used the Pearson's correlation coefficient R, which measures the strength of linear dependencies (proportional changes between variables). We observed that the R coefficient could be biased due to the low precision of the RV measurements, as it did not provide reliable values in lines with large RV scatter (mainly lines in low-S/N regions). Moreover, we performed cuts on R arbitrarily, without accounting for uncertainties on the exact R value to give an example. For instance, when selecting active lines, a cut on R>−0.80 would select the example line at 6661.25 Å in Fig. 3 for the CRX (R=−0.81), but not for the BIS (R=−0.77) correlation. A way to account for biases in R could be to use different R limits as a function of the local S/N of the spectrum. One could also use the slope (and its uncertainty) between the line RV and activity indicator as a quantitative indicator for the quality of the correlation. It is also possible to estimate uncertainties on R with Monte Carlo simulations as in Cretignier et al. (2020), which could then be used to further select reliable R values by avoiding selection biases. These approaches could complement the selection based on per-line wstd RV and RV uncertainty presented here. -The relationship between the individual line RVs and the three final indicators used (global RV, CCF BIS, and CRX) appears to be linear; however, to further improve the accuracy of the correlations, we could also test if other methods are able to capture these dependencies better, such as the Spearman's rank correlation coefficient. This correlation coefficient assesses the strength of a monotonic relationship between two variables and hence it is not limited to linear dependencies. The other activity indicators investigated (CCF FWHM, contrast, dLW, and chromospheric lines) clearly show non-linear (circular) relationships with the line-by-line RVs, which seem to arise from phase differences (e.g. Bonfils et al. 2007;Santos et al. 2014;Perger et al. 2017). If further methods to quantify these dependencies are studied, these indicators could also be used to assess the sensitivity of the different lines to activity. Quantifying the correlation in another way with different indicators could be relevant especially for stars with low levels of stellar activity, because their linear correlations might not be as strong as those presented here, and indicators other than BIS and CRX might be better at tracing activity (see e.g. Lafarga et al. 2021;Cardona Guillén et al. 2022). -The method we used to compute the global RVs was to simply average the RVs of the selected individual lines (i.e. compute LAV RVs). However, the selected lines could also be used to build cross-correlation masks (e.g. Lafarga et al. 2020;Rainer et al. 2020;Bellotti et al. 2022) or select specific regions in template-matching approaches, which could deliver even more precise RVs (i.e. decrease the uncertainty of the LAV RVs). By doing that, it could also be tested if stellar activity indicators such as those derived from the CCF or the CRX also vary and show a weaker dependence with the recomputed RV and the activity signals, to further probe the presence of activity. By having more precision in the LAV RVs, one could also focus on specific regions of the spectrum, that is, the red part where lines seem to be less active, as opposed to what is presented in this work, where we study the spectrum as a whole. With our approach to average line RVs, we are limited in precision to study the behaviour of RV or CRX computed over a small spectral range. -Ideally, the LAV approach would always average the same number of lines. However, depending on the S/N of each line in each observation, the number of averaged lines slightly differs from observation to observation. Rejecting some lines in some observations can lead to a bias in the averaged RV. Artigau et al. (2022) accounted for this issue by applying an iterative debiasing process, in which the offset introduced by rejecting specific lines was taken into account when computing the global average RV. We assessed the effect introduced by averaging different lines in our results by computing LAV RVs using only lines common in most observations, as opposed to using as many lines per observation as possible, which is what we did to achieve the results presented above. For all cases tested (i.e. using all lines and applying cuts in R and wstd), the RV scatter decreases by about less than 10% when using only common lines, compared to using all possible lines. The periodogram structure remains the same, except for cases with a small number of lines (i.e. |R|≤10), probably because in these cases noise dominates. Hence, a debiasing such as the one presented in Artigau et al. (2022) would probably slightly decrease the final LAV RVs obtained, and change the exact cut values that minimise the final LAV RV. -Our line selection followed an empirical approach, that is we selected minima present in the spectrum based on their profile and then classified the lines according to their correlation with an activity indicator, without any knowledge of the origin of the lines. By cross-matching the lines selected in our datasets with line databases, one could study dependencies between the sensitivity to activity and physical parameters (such as the excitation potential or the species giving rise to the line), or changes in the line profile (see e.g. Wise et al. 2018;Cretignier et al. 2020;Bellotti et al. 2022). Spectra observed at a higher resolution could help to characterise shape changes better and identify line blends. -Longer time coverage and/or denser sampling of the observations could also help to characterise short-and long-term changes in activity, including better correlations between the RV and activity proxies.
In conclusion, in this work we have presented an analysis of the sensitivity to activity of different individual lines in M dwarf stars, which provides a methodology for exploiting the wealth of information contained in the stellar spectra. We have shown that it is possible to identify lines that correlate with activity to varying degrees in several active M dwarf stars, and that this information can be used to effectively mitigate or enhance the effect of activity on RV measurements. With the current and next generation of high-resolution spectrographs reaching increasingly better RV precisions, stellar activity becomes the ultimate obstacle in RV observations (e.g. Crass et al. 2021). Studies about activity effects on spectroscopic observations such as the one presented here will therefore be key in the quest for small Earth-like exoplanets and planets around young stars.
Acknowledgements. We thank the anonymous referee for a constructive and timely report which has helped improve the contents of this article. CARMENES is an instrument for the Centro Astronómico Hispano-Alemán de Calar Alto (CAHA, Almería, Spain). CARMENES is funded by the German Max-Planck-Gesellschaft (MPG), the Spanish Consejo Superior de Investigaciones Científicas (CSIC), the European Union through FEDER/ERF FICTS-2011-02 funds, and the members of the CARMENES Consortium (Max-Planck-Institut für Astronomie, Instituto de Astrofísica de Andalucía, Landessternwarte Königstuhl, Institut de Ciències de l'Espai, Institut für Astrophysik Göttingen, Universidad Complutense de Madrid, Thüringer Landessternwarte Tautenburg, Instituto de Astrofísica de Canarias, Hamburger Sternwarte, Centro de Astrobiología and Centro Astronómico Hispano-Alemán), with additional contributions by the Spanish Ministry of Economy, the German Science Foundation through the Major Research Instrumentation Programme and DFG Research Unit FOR2544 "Blue Planets around Red Stars", the Klaus Tschira Stiftung, the states of Baden-Württemberg and Niedersachsen, and by the Junta de Andalucía. Based on data from the CARMENES data archive at CAB (INTA-CSIC). We acknowledge financial support from the Agencia Estatal de Investigación 10.13039/501100011033 of the Ministerio de Ciencia e Innovación and                                    Fig. 7, but using the following datsets: initial line list (black), lines that minimise the RV scatter of J07446+035 (orange), lines that minimise the RV scatter of J05019+011 (blue) and common lines in the two previous sets (green     1, but for J22468+443 and using the following datsets: initial line list (black), lines that minimise the RV scatter of J22468+443 (orange), lines that minimise the RV scatter of J05019+011 (blue) and common lines in the two previous sets (green).