Self-consistent Stellar Radial Velocities from LAMOST Medium-Resolution Survey (MRS) DR7

Radial velocity (RV) is among the most fundamental physical quantities obtainable from stellar spectra and is rather important in the analysis of time-domain phenomena. The LAMOST Medium-Resolution Survey (MRS) DR7 contains 5 million single-exposure stellar spectra at spectral resolution $R\sim7\,500$. However, the temporal variation of the RV zero-points (RVZPs) of the MRS survey, which makes the RVs from multiple epochs inconsistent, has not been addressed. In this paper, we measure the RVs of the 3.8 million single-exposure spectra (for 0.6 million stars) with signal-to-noise ratio (SNR) higher than 5 based on cross-correlation function (CCF) method, and propose a robust method to self-consistently determine the RVZPs exposure-by-exposure for each spectrograph with the help of \textit{Gaia} DR2 RVs. Such RVZPs are estimated for 3.6 million RVs and can reach a mean precision of $\sim 0.38\,\mathrm{km\,s}^{-1}$. The result of the temporal variation of RVZPs indicates that our algorithm is efficient and necessary before we use the absolute RVs to perform time-domain analysis. Validating the results with APOGEE DR16 shows that our absolute RVs can reach an overall precision of 0.84/0.80 $\mathrm{km\,s}^{-1}$ in the blue/red arm at $50<\mathrm{SNR}<100$, while 1.26/1.99 $\mathrm{km\,s}^{-1}$ at $5<\mathrm{SNR}<10$. The cumulative distribution function (CDF) of the standard deviations of multiple RVs ($N_\mathrm{obs}\geq 8$) for 678 standard stars reach 0.45/0.54, 1.07/1.39, and 1.45/1.86 $\mathrm{km\,s}^{-1}$ in the blue/red arm at 50\%, 90\%, and 95\% levels, respectively. The catalogs of the RVs, RVZPs, and selected candidate RV standard stars are available at \url{https://github.com/hypergravity/paperdata}.

The LAMOST, after a five-year low-resolution spectroscopic survey (LRS, R ∼ 1 800, 3 800Å < λ < 9 000Å), has started a five-year medium-resolution spectroscopic survey (MRS, R ∼ 7 500, 4 950Å < λ < 5 350Å and 6 300Å < λ < 6 800Å, see Liu et al. 2020) since Sep. 2017. The DR6 1 , the first data release of the MRS, contains the data obtained during Sep. 2017 to Jun. 2018 and is already public to the international astronomical community. The DR7 2 including the data from Sep. 2017 to Jun. 2019 (∼5 million spectra for over 800 000 stars) is currently open to the Chinese astronomical community only.
At the beginning Sc lamp was used to calibrate the wavelength of LAMOST MRS spectra, and later switched to ThAr lamp in 2018. Due to the short wavelength coverage, sky lines are not used in wavelength calibration as in LRS (Wang et al. 2010). The released spectra have been applied the barycentric correction and the wavelength uses vacuum standard. Both the LAM-OST pipeline and Wang et al. (2019) have measured RVs from DR7 spectra and estimated a static RV zero-point (RVZP) by comparing the measured RVs to those in the literature of a sample of RV standard stars (Huang et al. 2018) for each spectrograph. However, the temporal variation of the RVZPs (between exposures) is not clear so far despite a few trials. For example, Liu et al. (2019b) and Zong et al. (2020) show the temporal variation of RVZPs based on the data from the Kepler field using a sample of roughly selected RV-invariant stars, and Ren et al. (2020) shows that the RVZPs of the red arm vary between exposures and the difference can reach 4 km s −1 in the MRS-N fields (nebula survey, Wu et al. 2020).
Physically, the variation of the LAMOST MRS RVZPs can be explained by several reasons.
1. The LAMOST telescope has a long optical path and a large focal plane (1.75m diameter) on which 4 000 fibers are installed Cui et al. (2012), temperature variation is unavoidable.
2. Presently, arc lamp exposures are taken every about 2 hours (typical observing time of a plate) which is not frequent enough. The instrument might change its state between lamp exposures.
3. Since LAMOST MRS needs ∼20 ThAr lamps to illuminate the big focal plane simultaneously, the decay and replacement of the lamps and the adjustment of lamp exposure time are very often. They affect the signal-to-noise ratio of the lamp spectra and thus the wavelength calibration consistency over time.
4. The MRS spectrographs are mounted and dismounted monthly because the MRS survey is scheduled in the 14 bright/gray nights, while the other nights are for LRS survey. Therefore, the focuses are adjusted monthly.
All these factors and other potential defects in data reduction pipeline are finally reflected in the RVZPs of the spectra. Therefore, it is insufficient to use the RVs either from the LAMOST pipeline or Wang et al. (2019) at different epochs and perform a time-domain analysis, such as in studies of pulsating stars (the LAMOST-Kepler project, Zong et al. 2020;Fu et al. 2020), spectroscopic binaries (Gao et al. 2014(Gao et al. , 2017Liu 2019) and even searching for black holes (Liu et al. 2019a;Gu et al. 2019), which are important scientific goals of the MRS survey . In this paper, aiming at subsequent time-domain analysis, we measure the RVs for the 3.8 million singleexposure spectra with SNR > 5 in the LAMOST MRS DR7 v1.1 3 , and propose a robust method to selfconsistently determine the absolute RVZPs with the help of the Gaia DR2 data. In Section 2, we briefly describe the observaional rules of the LAMOST MRS survey and the instrumental parameters. In Section 3, we describe how we measure RVs. In Section 4, we show the algorithm that determines the RVZPs self-consistently. In Section 5, we present our RV and RVZP measurements and assess the precision and self-consistency, and select a sample of candidate RV standard stars based on our results. The information and instruction of our data products are described in Section 6, and the summary of this work is given in Section 7. Hα zoomed-in  The scientific goals of the LAMOST MRS survey mainly include stellar multiplicity, stellar pulsation, star formation, emission nebulae, Galactic archaeology, host stars of exoplanets, and open clusters. The details of the scientific plan and the survey strategy are described in . In this section, we summarize them briefly.
Each scientific goal has a PI who is responsible for its targeting.
A planid is assigned to each MRS plate (pointing), which has a form of [TD/NT]hhmmss[N/S]ddmmssXnn, where the first two digits denote time-domain (TD, will be repeatedly observed during the five-year survey) or non-timedomain (NT, just observed in a single night), hhmmss[N/S]ddmmss represents for the equatorial coordinate of the plate center and [N/S] means north/south, the digit X is used to denote the scientific goal (see Table 1), and the last two digits represent a serial number. For example, the planid TD062610N184524B01 means it is a time-domain plate dedicated to binarity/multiplicity research. There also exists some irregular planids which are testing fields, such as HIP8426401, NGC216801, etc. HIP8426401 means the central star of this plate is HIP84264, and NGC216801 means a plate toward the star cluster NGC2168.
The MRS survey uses the Local Modified Julian Minute (LMJM), an 8-bit integer defined as 1440 × the Local Modified Julian Date (LMJD) at the beginning of each exposure, as the stamps of each exposure. Typical exposure time is set to 1200s, while 900s and 600s exposures also exist, depending on the luminosity of targets. Each NT plate is observed with consecutive 3 exposures while each TD plate is observed until it is unobservable (usually 5-6 1200s exposures are allowed). An arc lamp exposure is taken at the beginning of each plate, and at the end of an observing night (every ∼2 hours). The targeting is mostly based on Gaia DR2 source catalog (Gaia Collaboration et al. 2018). For a 1200s exposure, the magnitudes corresponding to S/N = 5 in blue and red arm spectra are G ∼14.5 and 15.2 mag, respectively (see Figure 1 in Liu et al. 2020). Brighter than this magnitude covers the majority of the objects observed by the LAMOST MRS survey.

The LAMOST MRS Spectra
The whole LAMOST MRS DR7 (R ∼ 7 500) contains over 5 million single-exposure spectra of over 800 000 stars obtained from Sep. 2017 to Jun. 2019. In this work, 3 753 659 spectra with SNR > 5 either in blue arm or red arm for 600 771 stars are selected. Their distributions on number of exposures and time span is presented in Figure 3. The spectra are oversampled. Typical wavelength steps are 0.11 and 0.14Å at the blue and red arm, respectively. The sampling rate (λ/∆λ ∼ 45 000) is as high as 6 times the spectral resolution (R ∼ 7 500). In Figure 1, we show a spectrum of a K-type giant star as a demo of the LAMOST MRS survey. The blue arm and red arm are designed mainly for the Mg I b triplet at around 5 175Å and Hα at around 6 564Å. The released spectra are not corrected with response curves.

Preparation for RV Measurements
We noticed that cosmic rays frequently pollute the MRS single-exposure spectra and are neither identified nor removed by the LAMOST pipeline.Therefore, in our method, the first step is to carefully remove the cosmic rays in spectra. We smooth spectra with a 21pixel median filter followed by a 9-pixel Gaussian filter, and remove the original pixels which deviate from the smoothed spectrum by 4 and 8 times the local standard deviation in the upper and lower direction. The parameters of the filters are set empirically so that the absorption lines in the spectra of A-, F-, G-and K-type stars are not affected. The removed pixels are replaced by values linearly interpolated using neighboring pixels. We also note that the two ends of both blue and red arms are sometimes tilted and show unreasonable flux values probably due to extrapolation of the sky flux modeling, so we trim the 50 pixels at both edges of each arm.
The second step is to normalize spectra to pseudocontinuum to place the spectral features (usually absorption lines) on the same scale. We iteratively fit the spectrum with a smoothing spline function and clip the pixels away from the median values by 3 times the standard deviation in each 100Å window. The number of iterations is set to 3. As shown in Figure 1, for a typical K-type star with medium SNR, the normalization is quite adequate for the subsequent RV measurements.

Spectral Templates
We adopt the synthetic grid published by Allende Prieto et al. (2018) based on the ATLAS9 stellar atmosphere model (Kurucz 1979;Mészáros et al. 2012) as our spectral library. We degrade the spectral resolution from 10 000 to 7 500 to fit the MRS configuration and converted from air wavelength to vacuum wavelength using the formula proposed by Morton (2000). To limit the computational cost of the subsequent RV measurements at a reasonable amount, we interpolate the synthetic library to generate 100 spectral templates with stellar parameters randomly drawn from a uniform distribution in the range 3500 < T eff /K < 15000, 0 < log g/dex < 5, −2 < [Fe/H]/dex < 0.5 and −0.5 < [α/Fe]/dex < 0.7. Extremely metal-poor and extremely hot templates are not considered because the spectral features are not significant. Tests on high-SNR MRS spectra show that the sparsity of our templates induces a statistical error ∼ 0.10 and 0.20 km s −1 in the blue and red arm, respectively, which are negligible compared to other sources of uncertainties. Figure 2 shows the distribution of parameters of the 100 spectral templates and a series of PAR-SEC isochrones (Bressan et al. 2012) with solar metallicity and logarithmic age log 10 τ = 7, 8, 9 and 10. These spectral templates are normalized to pseudo-continuum using the same way as in Section 3.1.

RV Estimates
The cross-correlation function (CCF, Tonry & Davis 1979) method is widely used in spectroscopic surveys to measure stellar RVs (e.g., Nidever et al. 2015;Steinmetz et al. 2020b). One important advantage is that the CCF can be accelerated using Fast Fourier Transformation (FFT) once the spectrum is continuum-subtracted and re-sampled to a logarithmic wavelength grid. However, the drawback of such a scheme is that the sampling of the resulting CCF is generally very sparse. To evaluate the CCF at a smaller RV step, we do not follow the FFT way. In our implementation, the CCF at an RV of v is evaluated as where F is the vector of the normalized observed spectrum, G(v) is the vector of normalized synthetic spectrum shifted by an RV (v) and resampled to the wavelength grid of F , Var represents the variance operator, and Cov represents the covariance operator (see Appendix A for more details). Deriving the final RV consists of three steps.
1. The initial estimates are made from an RV grid from −1500 to 1500 km s −1 , with a step of 10 km s −1 . The template with the maximum CCF value is selected as the best-match template, and the corresponding radial velocity is adopted as the initial guess of the final RV of the observed star. And the parameters of the template (T eff , log g, [Fe/H], [α/Fe]) are recorded, which make a good prior for some following analysis such as stellar atmospheric parameter determination.
2. With the best-match template, we maximize the CCF to determine the final RV (v obs ) using the optimization routine scipy.optimize.minimize with the Nelder-Mead algorithm (Nelder & Mead 1965). The corresponding CCF value is recorded as CCFMAX to assess the likelihood between the best-match template and the observed spectrum. The SNR-CCFMAX relations are shown in Figure  4.
3. To obtain the measurement error σ v,obs , we use a Monte Carlo method, namely, we repeat this process 100 times and in each time we add Gaussian random noise to the spectrum according to the flux error. The measurement error is computed using 16th and 84th percentiles, i.e., σ v,obs = (v 84 − v 16 )/2. The SNR-σ v,obs relations are shown in Figure 5.
We avoid the Gaussian fitting to the CCF which is widely used in literature (e.g., Nidever et al. 2015;Wang et al. 2019). The reasons for doing this include that the CCF peak is obviously non-Gaussian and that at low signal-to-noise ratio (SNR) the fitting process is fragile. The final error of RV includes measurement error and a noise floor term which is due to (Replaced: systemtics replaced with: systematics) (e.g., background, detector imperfections, temperature changes, focusing issues, etc) and will be assessed in the following part of this paper.
In this work, one RV is estimated for the blue arm (v B ) and two for the red arm (v R and v Rm ). The v Rm is measured using the Hα-masked red arm spectrum. As shown in Figure 4 and 5, these RV measurements deteriorate rapidly at SNR < 20. For cool stars (e.g., FGK type), at a given SNR, v B is more precise than v R because of rich spectral features in blue arm (for a detailed discussion of spectral information content in MRS spectra, see Zhang et al. 2020b). However, for hot stars, v R is more reliable because of the Hα feature and v Rm is significantly less precise than v R Hα. Therefore, we recommend the readers to only consider v Rm when the targets have Hα emission.

RADIAL VELOCITY ZERO-POINTS
In essence, the RV zero-point (RVZP) is the bias of the wavelength solution of a specific spectrum compared to its true wavelength solution in terms of radial velocity. It is affected by many factors, e.g., the condition of the instrument, the quality of the arc lamp exposure, the reduction algorithm, and the non-simultaneous nature of the arc lamp exposure and the object exposure, etc. In this paper, we define the RVZP correction value ∆v by v abs = v obs + ∆v, where v abs is the absolute radial velocity and v obs the radial velocity directly measured from a spectrum.

R.A. [deg]
Grouped pointings of LAMOST MRS DR7 Figure 6. Grouped pointings of the LAMOST DR7 MRS observations using a friends-of-friends method with a linking length of 5 degrees. The field-of-view of LAMOST is a circle with 2.5-degree radius if all spectrograph are at work. The figure uses equatorial coordinate system with Mollweide projection, and the pointings in a particular group are shown with the same color.
In this work, assuming that the fibers in a spectrograph in one exposure (hereafter, we refer to it as a spectrograph-exposure-unit, or an SEU) share similar RVZPs, the systematic RVZP ∆v i,j can be determined as long as a homogeneous reference set of RVs can be found for a fraction of fibers in that SEU. The assumption is quite reasonable given the fact that the wave-length calibration of a multi-fiber spectrograph is done by fitting a 2D grating equation. And, as we will see in Section 4.2, the Gaia DR2 RVs (Katz et al. 2019) meet our needs for the reference set.
As a constrast, both the LAMOST pipeline and Wang et al. (2019) calculate ∆v j assuming the RVZPs for a specific spectrograph do not vary with time, and, therefore, get around the temporal variation of RVZPs. However, we noticed that there exist some weird absolute RVs (as we will see in the results in Section 5). Thanks to the ESA Gaia mission (Gaia Collaboration et al. 2016), providing us the largest RV dataset that matches the LAMOST MRS survey in velocity precision and magnitude limit. The spectral resolution of Gaia-RVS (R ∼ 11 500, Cropper et al. 2018) is slightly higher than the MRS (R ∼ 7 500). The magnitude limit of the Gaia DR2 RV catalog (Katz et al. 2019) is at G RVS = 12 mag or G ∼ 14 mag depending on spectral type and line-of-sight interstellar extinction, which is slightly brighter than the LAMOST MRS magnitude limit. The Gaia DR2 contains qualified median radial velocities for 7 224 631 stars derived from the Gaia-RVS spectra, with T eff in the range [3550, 6900] K excluding large RV variant stars (see Katz et al. 2019, for details). At the faint end, G RVS = 11.75 mag, the precisions for T eff = 5000 and 6500 K are 1.4 and 3.7 km s −1 , respectively.

The Gaia DR2 RVs as the Reference Set
Aiming at studying time-domain astrophysical phenomena, e.g, spectroscopic binaries, we proceed to carry out the second-best scheme -∆v i,j . Crossmatching the LAMOST MRS DR7 catalog with the Gaia DR2, 1 582 948 out of 3 753 659 single-exposure spectra (42.1%) have Gaia RVs, and the common objects usually have good SNR in MRS because they are relatively bright in LAMOST MRS. The number of objects in Gaia DR2 is ∼ 1 000 times larger than that in catalogs of RV standard stars such as Huang et al. (2018). The challenge arises because not all of the 7 million objects in the Gaia-RVS catalog are RV invariant, i.e., quite a number of them are pulsating stars or binary/multiple systems that have periodic/non-periodic RV variations. In the below, we demonstrate a robust method that can determine the RVZP self-consistently for each spectrograph in each exposure (∆v i,j ) by com-paring the observed RVs to the Gaia DR2 RVs without identifying RV standard stars.

Self-consistent RVZPs
Assuming that the RV variables are varying with random periods at random phases or non-periodically, and are not the majority of the observed stars, we can regard them as outliers and use a robust method, e.g, the Least Absolute Residual (LAR) regression (or Least Absolute Deviation regression, see Press et al. 2002), to estimate the ∆v i,j (the common RV bias shared by the objects in an SEU). From a Bayesian perspective, the LAR regression originates from an exponential likelihood while Least Squares (LSQ) regression comes from a Gaussian likelihood. Utilizing the LAR technique, extreme values have a lesser influence on the fit compared to the LSQ regression. Besides, since we aim at time-domain analysis, as long as our RVZPs are temporally self-consistent, the absolute scales are not very important.

Tricks to Accelerate the Algorithm
Several tricks are used to accelerate the algorithm. The first trick is to get a good initial estimation of ∆v. A good approximation can be made by igoring the second term in Eq. (4), so that with Gaia DR2 RVs we can roughly estimate ∆v i,j by minimizing (5) We note that if an SEU has only a few objects common with the Gaia catalog, the estimation is risky. Therefore, we require that an SEU at least have 10 objects common with Gaia DR2 to proceed otherwise we calculate the initial guess ∆v i,j,init but exclude this SEU in the iteration process.
The second trick is to cut down N SEU by separating physically detached SEUs, which fastens the index evaluation in each iteration. In the Eq. (3) and (4), the second terms contain cross terms, meaning that when solving the N SEU elements, an iteration process is needed to guarantee that the solution of ∆v is stable. However, the evaluation of indices is computationally expensive when the N SEU grows. Therefore, before performing the optimization process, physically detached sky areas can be separated, so that the many index evaluation processes can be accelerated by cutting down the array sizes. We group the pointings of LAMOST MRS DR7 using a friends-of-friends algorithm with a 5-degree linking length which is double of the radius of the field-ofview of LAMOST in case of any possible common stars between them. Eventually, 137 groups are obtained, as shown in Figure 6. Initial guesses of ∆v for each SEU is made by optimizing the cost function Eq. (3) for each group of plates.
In practice, we find that if the is too small, there is the possibility that the ∆v jumps back and forth between two solutions and does not converge. Further analysis shows that this is an optimization method related problem (we used the Nelder-Mead solver, changing it to the Powell solver does not solve the problem but the two solutions are different). We guess that this might due to the numerical problem of the optimization routine. To avoid such a situation, we then added random processes into the algorithm, i.e., if in lth iteration the solution is ∆v l and after looping over all related SEUs the solution is ∆v l,opt , we evaluate the l + 1th solution by ∆v l+1 = η(∆v l,opt − ∆v l ) + ∆v l , where the η is a learning rate randomly generated between η 0 and η 1 . We set η 0 = 0.5, η 1 = 1.0, and = 0.075, considering the expectation of η is 0.75, the effective tolerance of our solution of ∆v is 0.10 km s −1 , which is acceptable when compared to the typical precision of measured radial velocities (e.g., ∼ 1.5 km s −1 reported by Wang et al. 2019).
Finally, the RVZP corrections for all the 137 groups of pointings converge after several tens of iterations with a Dell Precision R740 workstation with 2 Intel Xeon Platinum 8260 CPUs (2.40GHz), among which the longest solution takes ∼ 10 hours. Compared to the computational cost of RV measurements using CCF for 3.8 million spectra including blue and red arms (∼ 1 week on the same machine), computing the RVZPs is quite fast.

Uncertainty Estimation
Rigorous uncertainties are very difficult to obtain for our RVZP corrections. The uncertainties of the RVZP correction values consists of two parts, namely the tolerance in the iteration process and the formal error. The tolerance is = 0.1 km s −1 as mentioned above. For the latter part, based on the discussion presented in Appendix B, we use the 16th and 84th percentiles to construct a fiducial error of our RVZP correction values ∆v i,j divided by an empirical correction ξ to construct the formal error. Hence, the total uncertainties of the RVZP correction values are evaluated via where i, j index the SEUs, q16 and q84 denote the 16th and 84th percentiles of the residuals of the Gaia DR2 RVs and the RVZP-corrected LAMOST MRS RVs, the N i,j is the number of Gaia DR2 objects with RVs, and ξ is the empirical correction factor for small number statistics.

RESULTS AND VALIDATION
In total, we have measured RVs from 3 181 157/3 723 934 single-exposure blue/red arm spectra in LAMOST MRS DR7 with SNR higher than 5. For 36 301/37 122/37 122 (B/R/Rm) out of 37 624 SEUs, we have successfully derived initial values of the RVZPs. After eliminating the bad SEUs with criteria described at the end of Section 4.3, we estimate the final RVZPs

The Temporal Variation of RVZPs
In Figure 7, we present the ∆v i,j for each SEU for Sc and ThAr arc lamps versus date. The Sc lamp was in use until Oct. 2018, after when it is totally replaced by ThAr lamp. The mean uncertainties of ∆v B , ∆v R and ∆v Rm are all ∼ 0.38 km s −1 which are quite good.
The median uncertainty is even 20% less. The ∆v B and ∆v R have different patterns while the ∆v R and ∆v Rm are very similar. In Figure 8 we show the distribution of the ∆v B , ∆v R and ∆v Rm of the SEUs solved. The µ here is estimated using the median, and σ estimated using (q84 − q16)/2. The µs are 0.49 and 6.47 km s −1 for ThAr and Sc lamp in the blue arm, while in the red arm they are 0.26 and 4.91 km s −1 , respectively. The σs are 1.07 and 1.06 km s −1 for Thar and Sc lamp in blue arm, and in red arm are 0.85 and 0.68 km s −1 . Generally, the different systematics for Sc and ThAr lampcalibrated data, which is consistent with (Wang et al. 2019). Despite the large systematics, we find that the precision of the Sc lamp calibrated data is no worse than those calibrated by ThAr lamp. The Rm results are very similar to that of R, except their µs have a 0.3 km s −1 difference. We note that this is reasonable considering that the systematics vary with wavelength as shown in Ren et al. (2020). In addition, ∼ 4 000 single-exposure spectra calibrated using Ne lamp are also found in DR7 v1.1. We confirm that the Ne lamp is used to calibrate the LRS spectra and these mistakenly calibrated data will be removed in DR7 v2 (the internationally public version), so we exclude these data in the following analysis. It is clear that the RVZPs are reasonably stable except after ∼ 1 May 2019, which seems to be correlated with the arc lamp exposure flux 4 . We plot the ∆v for each spectrograph for the time interval with available arc lamp intensity in Figure 9 and 10. Since our mean uncertainty of ∆v i,j is 0.38 km s −1 , we regard the ∆v i,j larger than 1 km s −1 as significant values, including the 7, 12, 13, 15th spectrographs of the blue arm and 7, 9, 15th spectrographs of the red arm. It is currently not sure whether these shifts are due to that the new ThAr lamps are brought into use at around 1 May 2019 or some other issues induced in the maintenance. Probably in DR8, with the RVZP data in a longer time baseline we can address the problem.

Validating with Other Datasets
We also validate our RVs with the stars common in other datesets, namely, the Gaia DR2 (Katz et al. 2019), APOGEE DR16 (Column VHELIO AVG, Jönsson et al. 2020), RV standard stars from Huang et al. (2018), GALAH DR3 (Column rv galah, Buder et al. 2020), and RAVE DR6 (Column hrv sparv, Steinmetz et al. 2020b). The mean µ and scatter σ derived from Gaussian fitting to the residuals are tabulated in Table 2 and shown as functions of SNR in Figure 11. Also shown are the results of the LAMOST pipeline and Wang et al. (2019). Note that, in DR7 v1.1, the LAMOST pipeline only provides two RVs measured from the blue arm using ELODIE empirical templates (Moultaka et al. 2004) and ATLAS9 synthetic templates (Castelli & Kurucz 2003), respectively. In the next version of DR7 (v1.2) and future data releases, the RVs using ELODIE templates will be removed, and the RVs based on ATLAS9 will be provided for both blue and red arms for better performance. Therefore, we use their calibrated RVs of blue arms based on ATLAS9 templates in our comparison (Column rv ku1). The Wang et al. (2019) catalog is a subset of ours (including data taken during the first one year and a half with SNR cut at 10). The spectra without our measurements (i.e., either SNR < 5 or ∆v is invalid) are excluded in this comparison, so that it is fair to the other two RV sources. At high SNR end (50 to 100), we found the standard deviations derived from Gaussian fitting for our results can reach 1.00/1.10, 0.84/0.80, 0.69/0.74, 0.72/0.78, and 1.77/1.88 km s −1 with respect to the Gaia DR2, APOGEE DR16, Huang et al. (2018), GALAH DR3, and RAVE DR6 in blue/red arm. The common stars among LAMOST MRS DR7, Gaia DR2 and the other four reference sets are used to calculate the fiducial accuracy of Gaia DR2. At the high-SNR end (50 < SNR < 100), the LAM-OST MRS gets close to the performance of Gaia DR2 (see Figure 11). Note that, in our algorithm, the Gaia DR2 RVs are used as a reference set, so the comparison with Gaia DR2 is not an independent validation. These results of the comparisons are quite fascinating. At high-SNR end, we outperform Wang et al. (2019) and LAMOST pipeline by ∼ 20% and 30%, respectively, according to the blue arm results compared to APOGEE DR16. At low-SNR end (10 to 20), this advantage increases to 58% and 47%, indicating that our algorithm of RV measurements and RVZP determinations are quite efficient. Since the Huang et al. (2018) sample is from APOGEE DR14, the comparison to Huang et al. (2018) has a 0.4 km s −1 systematic bias which does not exist in our comparison to APOGEE DR16. This may be due to the update of the APOGEE data release. The GALAH DR3 has a 0.23 km s −1 systematic bias compared to Gaia DR2. The comparison with RAVE DR6, whose spectral resolution is the same as the LAMOST MRS but lower than Gaia-RVS, APOGEE and GALAH, shows a large scatter but is still reasonable.

The Self-consistency
Besides the precision, a check on the self-consistency is necessary before using RVs in time-domain research. As a demonstration, in Figure 12 and 13 we validate the temporal RV variations of the standard stars (whose RVs are assumed invariant) from Huang et al. (2018) in two pointings, namely, planid=HIP8426401 and HIP4312101, which have 10 and 17 exposures and contain 27 and 41 RV standard stars, respectively. It is obvious that the RVs from the LAMOST pipeline show large fluctuations, which is as expected from the comparison in Section 5.2. It turns out that for many stars, the RVs of multiple exposures from Wang et al. (2019) catalog have exactly the same values. This is due to the failure of their Gaussian fitting process and then they fall back on best estimation from the 1 km s −1 RV grid, which is a defect in the algorithm. By comparing the measured RVs and the RVZP-corrected RVs, we find that the RVZP-corrected RVs are much cleaner, indicating that the RVs do benefit from our algorithm for ∆v i,j determination. The LAMOST pipeline and Wang et al. (2019) assume that the RVZP for a spectrograph is static, so that the RVZP corrections are basically a shift of the RVs.
To quantify the performance in self-consistency, we select RV standard stars from Huang et al. (2018) with at least 8 exposures have valid RVZP-corrected RVs in our catalog and calculated their standard deviation empirically corrected for the small-number-statistic effect which is discussed in Appendix B. The cumulative distribution function (CDF) of the standard deviations is shown in Figure 14. The RVs measured from blue arm (B) shows the best consistency while those from red arm (R) and Hα-masked red arm (Rm) follow. By calculating the 50th, 90th and 95th percentiles of the CDFs for the blue arm results, we find our RVs have significant advantages over the LAMOST pipeline, namely 16.5%, 35.5% and 37.5% better, while Wang et al. (2019) is at nearly the same level as LAMOST pipeline. This reveals the excellent self-consistency of our RVs in the time-domain analysis. Note that, these estimations of advantages are very conservative due to the facts such as Wang et al. (2019) dataset cut SNR at 10 but LAMOST   Note-Each column shows the RV standard deviation for the 678 standard stars at corresponding levels of the CDF. In the parentheses we show the advantages over the LAMOST blue arm results. and our sample cut at 5. Besides, we do not exclude any of the "exactly the same" RVs from Wang et al. (2019) as readers may find that the CDFs for LAMOST and Wang et al. (2019) jump at around 0 km s −1 position. With the self-consistency performance, we select 10 320 candidate RV standard stars by requiring that in blue arm (B) and red arm (R) These stars can be useful for RV calibration of lowresolution surveys such the LAMOST LRS survey (R ∼ 1 800).

DATA PRODUCTS
The data products of this work include 1. a catalog of ∼3.8 million measured RVs (but 5 million rows for completeness), associated errors and the information of observations for ∼ 0.8 million stars (Table 4), 2. a catalog of the ∆v i,j for B, R, and Rm measurements in each SEU, and their uncertainties corresponding to ∼3.6 million RVs (Table 5).
3. a catalog of 10 320 candidate RV standard stars with more than 8 exposures and standard deviation less than 1.45/1.86 km s −1 in blue/red arm over a time baseline longer than 180 days ( Table  6).
The catalogs can be cross-matched using the columns spid and lmjm. They will be available with the journal and also at https://github.com/hypergravity/paperdata once the paper is accepted. A few tips: the users who want to correct Doppler effects of their spectra (e.g., Zhang et al. 2020a) should use RVs without RVZP corrections, while the users who want to use absolute RVs can obtain them from our catalog via Eq (2). The uncertainties of the absolute RVs can be evaluated via σ 2 abs = σ 2 v,obs + σ 2 min + σ 2 ∆v + σ 2 mod , where σ v,obs is the measurement error, σ min is the wavelength calibration error floor which we can infer from the comparison to APOGEE DR16 that it is approximately 0.85 km s −1 or conservatively 1 km s −1 , σ ∆v is the uncertainty of the RVZP, and σ mod is the contribution from the sparsity of the spectral templates (0.10 km s −1 for blue arm and 0.20 km s −1 for red arm).

SUMMARY
In this paper, we measure the RVs from LAMOST Medium-Resolution Survey (MRS) DR7 stellar spectra and determine the RVZPs with the help of Gaia DR2 RVs, aiming at making the absolute RVs self-consistent and proper for time-domain analysis. More specifically, 1. we have measured RVs of ∼ 3.8 million singleexposure spectra for more than 0.8 million stars obtained from the LAMOST MRS DR7, including the blue arm, red arm (with and without Hα), 2. we determine the RVZPs exposure by exposure (for 3.6 million spectra) by comparing the measured RVs to the Gaia DR2 and multiple MRS exposures using a robust method to a mean precision of 0.38 km s −1 , 3. we find the RVZP vary significantly for some spectrographs before/after 1 May 2019, which confirmed the necessity of our algorithm to determine the RVZPs, 4. we find good agreements in the comparisons of our absolute RVs with APOGEE DR16, RV standard stars (Huang et al. 2018), GALAH DR3, and RAVE DR6, and the precision at 50 < SNR < 100 can reach 1.00/1.10, 0.84/0.80, 0.69/0.74, 0.72/0.78, and 1.77/1.88 km s −1 in the blue/red arm, respectively, 5. we show that our absolute RVs have 16.5, 35.5, and 37.5% better self-consistency at 50%, 90%, and 95% levels of the CDF of standard deviations, respectively, which benefits subsequent time-domain analysis.
6. we select a set of (Replaced: 9 708 replaced with: 10 320) candidate RV standard stars whose standard deviations of RVs are less than 1.45 and 1.86 km s −1 in the blue arm and red arm, respectively, over a time baseline of at least 180 days.
The algorithms and results presented in this work will be used in the subsequent works on spectroscopic binaries (Xiong et al. in prep., Zhang et al. in prep.). The LAMOST MRS DR7 v1.2 and v1.3 have been released. We confirm that, in DR7 v1.2/1.3 the spectra are the same as v1.1 but catalogs and parameters Note-Column 1 is the observational ID in LAMOST, Column 2 is the local Modified Julian Minite (1440 × the local Modified Julian Date) of the beginning of exposure which is an 8-bit integer assigned to each exposure, Column 3 is the barycentric Julian Day of the middle of exposure, Column 4 is the plan ID, Column 5 is the spectrograph ID, Column 6 is the fiber ID, Column 7-8 are equatorial coordinates, Column 9-10 are the SNRs in blue and red arms, Column 11-12 are the calibration arc lamp. Column 13-16 are the measured RV, associated measurement error, the effective temperature of the best-match template, the maximum of the CCF, respectively, for the blue arm. Column 17-20 and 21-24 are for the red arm and Hα-masked red arm. We refer readers to http://dr7.lamost.org/v1.1/doc/mr-data-production-description for detailed explanations of the Column 1-6.
will have some changes 56 . Therefore, our results can be cross-matched with the v1.2/1.3 catalogs directly. And we will release a new version of RVs on github once DR8 is released. On the other hand, since Gaia eDR3 is the same as in DR2 but with moderate filtering, our absolute RVs should be consistent with Gaia eDR3. In future LAMOST MRS data releases, we will update our RVs using the most recent Gaia RVs as reference set. Note-Column 1 is the plan ID, Column 2 is the local Modified Julian Minute, Column 3 is the spectrograph ID. Column 4-12 are the initial RVZP correction value (∆vi,j,init), number of objects, final RVZP correction value (∆vi,j ), number of objects for the first term of the Eq. (4), number of objects for the second term of the Eq. (4), the median number of exposures, the maximum number of exposures, the minimum number of exposures, the estimated uncertainty, respectively, for the blue arm. Column 13-21 and 22-30 are for the red arm and Hα-masked red arm, respectively. In this section, we explain our definitions of mean, variance, covariance, and CCF. Let X denote a vector containing N elements {X i } (e.g., a continuumnormalized spectrum with N pixels), the mean is defined as and the variance as and the covariance of two vectors X and Y as A normalize cross-correlation function can be calculated with standardized f and g, namely where F is one vector and G(v) is the other vector but shifted by radial velocity v. When utilizing this CCF to estimate stellar RVs, the G(v) is usually a spectral template whose signal-to-noise ratio is infinite and covers the wavelength range of F . Therefore, the shift could be implemented with an interpolation. The CCF in this form is essentially the linear correlation coefficient and varies between −1 and 1.

B. BIAS IN SMALL NUMBER STATISTICS
The estimators that characterize dispersion are often underestimated when the number of samples is small. For example, if we only have 3 or 5 measurements of a physical quantity, the standard deviation could be underestimated. In this section, we propose an empirical correction of this bias for the error of mean and standard deviation assuming Gaussian distribution P (x|µ, σ) where µ is its position and σ is its standard error.

B.1. The Deviation of Mean
We can minimize an L 1 -norm or L 2 -norm cost function, namely, i |x i −μ| and i (x i −μ) 2 /2, respectively, to get an estimate of the meanμ. The true deviation is by definition δ true =|μ − µ | .
However, in practice we do not know µ when we tackle such a problem. A fiducial deviation associated withμ can be constructed using the 84 and 16 percentiles (or interquantiles, see Lupton 1993;Ivezić et al. 2014), i.e., where N is the sample size. To obtain an empirical relation between δ est and δ true , we assume the following form, where the ξ(N ) is the empirical correction factor and is a function of N . Then, we draw mock data from a standard Gaussian distribution with the numpy.random module. In each experiment, we draw N samples and calculated δ est and δ true , and derive ξ. We repeat this experiment for 3000 times for each N ranges from 2 to 250 which is enough for our purpose, and show the 16, 50 and 84th percentiles of the results for each N in the left panel of Figure 15. It is obvious that this fiducial deviation is underestimation of δ true . The relation between medians of log ξ and log N is fitted with a 5th order polynomial function, whose coefficients are tabulated in Table B.1. With this relation, we can scale the fiducial deviation to a standard that is less affected by the sample size N .

B.2. The Standard Error
For a Gaussian distribution P (x|µ, σ), we can use sample-based standard deviation and the 16 and 84th percentiles to estimate the true standard error σ, i.e., and respectively. We define ζ by The results clearly show that both method underestimate the standard error. Similar to the procedures in the previous test, we fit the relation with a 5th order polynomial to the medians of ζ and log 10 N , the best-fit polynomials are shown in the right panel of Figure 15 and coefficients are tabulated in Table B.2. Compared to Eq. (B8), Eq. (B9) is more robust to outliers but suffers from more significant underestimation when N is small.   spectral convolution, continuum normalization, removal of cosmic rays, cross-correlation function and the empirical correction evaluation (Appendix B).
2. regli (Zhang 2020b): REgular Grid Linear Interpolator, a multi-dimensional linear interpolator based on gridded data. It is faster than the scipy.interpolate.LinearNDInterpolator in the python standard library in our performance test.
The source code and some tutorials of these packages can be found at https://github.com/hypergravity. Readers who are interested in LAMOST MRS spectra might find them useful for their research.