A Catalog of RV Variable Star Candidates from LAMOST

RV variable stars are important in astrophysics. The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) spectroscopic survey has provided ~ 6.5 million stellar spectra in its Data Release 4 (DR4). During the survey, ~ 4.7 million unique sources were targeted and ~ 1 million stars observed repeatedly. The probabilities of stars being RV variables are estimated by comparing the observed radial velocity variations with the simulated ones. We build a catalog of 80,702 RV variable candidates with probability greater than 0.60 by analyzing the duplicate-observed multi-epoch sources covered by the LAMOST DR4. Simulations and cross-identifications show that the purity of the catalog is higher than 80%. The catalog consists of 77% binary systems and 7% pulsating stars as well as 16% pollution by single stars. 3,138 RV variables are classified through cross-identifications with published results in literatures. By using the 3,138 sources common to both LAMOST and a collection of published RV variable catalogs we are able to analyze LAMOST's RV variable detection rate. The efficiency of the method adopted in this work relies not only on the sampling frequency of observations but also periods and amplitudes of RV variables. With the progress of LAMOST, Gaia and other surveys, more and more RV variables would will be confirmed and classified. This catalog is valuable for other large-scale surveys, especially for RV variable searches. The catalog will be released according to the LAMOST Data Policy via http://dr4.lamost.org.


INTRODUCTION
Binary stars play a crucial role in astrophysics. Statistics and identifications of binary systems are significant for several reasons, the major ones being that such basic issues as star formation and evolution, the initial mass function (IMF) and Galactic chemical evolution are all influenced by the binary properties of the stellar population. Despite the high fraction of binary stars (∼ 50% for main-sequence stars), our understandings of the physics of binary stars are still at a basic stage. Raghavan et al. (2010) presents the multiplicity of 454 solar-type stars within 25 pc at high completeness. They show that early-type and metal-poor stars dominate higher binary factions than late-type and metal-rich stars. The period distribution of the sample follows a log-normal distribution with a median of about 300 years. Meanwhile, early-and late-type stars do not stem from the same parent period distribution (Kroupa & Petr-Gotzens 2011). The discrepancy is yet to be explained and could be related to the mechanism of binary formation.
A summary on empirical knowledge of stellar multiplicity for embedded protostars, pre-main-sequence, main sequence, and brown dwarfs is performed by Duchêne & Kraus (2013). It is demonstrated that the multiplicity rate and breadth of the orbital period distribution are steep functions of the primary mass and environment. More efforts in recent years have been made in analyses of binary fractions based on large samples of survey data (e.g. Duquennoy & Mayor 1991;Gao et al. 2014Gao et al. , 2017Yuan et al. 2015a;Badenes et al. 2018;Tian et al. 2018, hereafter Paper I). These works investigate the binary fractions against stellar parameters, i.e. mass, T eff , and abundance. All the researches indicate that metal-poor stars have a higher binary fraction than metal-rich stars. However, metal-rich disk stars are found to be 30% more likely to have companions with periods shorter than 12 days than metal-poor halo stars (Hettinger et al. 2015). The binary fraction is not only related to stellar parameters but also orbital periods (Maxted et al. 2001;Moe & Di Stefano 2017).
Besides estimating binary fractions in large samples, identifications of binary systems have been carried out. The American Association of Variable Star Observers (AAVSO) contributes to building an International Variable Star Index (VSX; Watson et al. 2006). A database of thousands of eclipsing binaries is established (Matijevič et al. 2012, and references therein) with Kepler light curves Koch et al. 2010). Drake et al. (2014) presents ∼47,000 periodic variables found during the analysis of 5.4 million variable star candidates covered by the Catalina Surveys Data Release-1 (CSDR1, Drake et al. 2012), and investigates the rate of confusion between objects classified as contact binaries and type c RR Lyrae (RRc's) based on periods, amplitudes, radial velocities and stellar parameters. The General Catalog of Variable Stars (GCVS) containing binary stars is released in the latest version (GCVS Version 5.1, Samus' et al. 2017). The Binary star DataBase (BDB) collects data on physical and positional parameters of 240,000 components of 110,000 multiple-star systems (Kovaleva et al. 2015). Price-Whelan et al. (2018) makes use of the multi-epoch data obtained with the APOGEE (Majewski et al. 2017;Abolfathi et al. 2018) and selects ∼ 5000 evolved stars with probable companions. To build a sample of distant halo wide binaries, Coronado et al. (2018) searches stellar pairs with small differences in proper motion and small projected separation on the sky as binary candidates, and validates the sample through RVs from medium and low-resolution spectra obtained with SDSS (York et al. 2000). Binaries and Triples are identified using high-dispersion spectra, which can be much better fit with a superposition of two or three model spectra, drawn from the same isochrone, than any single-star model. El-Badry et al. (2018) applies the data-driven spectral model to APOGEE DR13 spectra of main-sequence stars and identifies unresolved multiple-star systems. Gaia Data Release 2 (Gaia DR2, Gaia Collaboration et al. 2018) enables catalogs of variable stars (Clementini et al. 2019;Mowlavi et al. 2018;Roelens et al. 2018;Rimoldini et al. 2019).
However, binary identification based on RVs derived from a low-dispersion spectroscopic spectra survey is still almost blank. Fortunately, Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) provided millions of stellar spectra, of which about 20% of the targets have been observed repeatedly. The quantity of these spectra can enhance time-domain studies of stars, stellar parameters, and their RVs and help select and confirm variable candidates. We build a catalog of RV variable candidates detected with the LAMOST. In Section 2, we describe the data used in this work. The method is presented in Section 3. The results are shown in Section 4, followed by discussions and conclusions in Section 5.

DATA
The LAMOST spectroscopic survey provides the largest database of low-resolution (R∼2000) spectra to measure stellar atmospheric parameters and radial velocities for millions of stars (Cui et al. 2012;Zhao et al. 2012;Deng et al. 2012;Liu et al. 2014;Yuan et al. 2015b). The survey has obtained ∼6.5 million spectra for ∼4.7 million unique stars in its DR4. In this survey, ∼1 million stars have been observed in 2 to over 40 epochs. As presented in Fig. 1, the number of stars decreases with the number of epochs nearly exponentially. Through comparing multi-epoch observations especially for RVs, variable stars could be detected. We adopt the RV and stellar parameters yielded by LAM-OST stellar parameter pipeline at Peking University − LSP3 1 (Xiang et al. 2015(Xiang et al. , 2017 to select variable stars. The pipeline estimates RV through cross-correlating with an ELODIE template (Prugniel & Soubiran 2001;Prugniel et al. 2007) with close values of atmospheric parameters. For the determinations of stellar atmospheric parameters (e.g. T eff , log g and [Fe/H]), templates from the MILES library (Sánchez-Blázquez et al. 2006;Falcón-Barroso et al. 2011), obtained with a spectral resolving power similar to that of LAMOST spectra and accurately flux calibrated, are used instead. As discussed in Xiang et al. (2015), the MILES spectra with low-resolution are wavelength calibrated to an accuracy of only approximately 10 km s −1 , not good enough for the purpose of RV determinations for the LAMOST spectra. However, the ELODIE library of high-resolution spectra is much proper to be used as RV templates. Furthermore, the LSP3 estimates RV prior to atmospheric parameters, which avoids systemic uncertainties of RV caused by adopting different spectral libraries in the pipeline. In the latest version of LSP3, 267 new template spectra obtained using the NAOC 2.16-m telescope and the Yunnan Astronomical Observatory (YAO) 2.4-m telescope (obtained by Wang et al. 2018) have been added to the MILES library to generate parameter estimates (Xiang et al. 2017).
The LSP3 pipeline ignores the effects of binary stars when estimating RV and other stellar parameters. Most of the stars have a radial velocity error of a few km s −1 . However, some of them, mostly hot stars with low signalto-noise ratios (SNRs), have errors as large as 20 km s −1 (Xiang et al. 2015(Xiang et al. , 2017. To identify binary systems or candidates reliably, we limit the SNR of spectra greater than 10.

RVs and their uncertainties
As discussed in Xiang et al. (2015Xiang et al. ( , 2017, the σ RV is quite sensitive to SNR and depends on other stellar parameters. The LSP3 pipeline estimates σ RV by comparing RVs from multi-epoch observations of similar SNRs and spectral types, assuming that σ RV is contributed from random error following a Gaussian distribution and systematic error. It considers the stars as single ones and ignores the influence of binary stars on RV, and at-1 http://dr4.lamost.org/v2/doc/vac tributes the variation in RV as uncertainties and therefore over-estimates σ RV . The σ RV has been reappraised in Paper I when estimating the binary fraction (f B ) of dwarfs with SNR > 50, taking into account the degeneracy between f B and σ RV . A comparison of the RV uncertainties from LSP3 (σ RV −LSP3 ) and those from Paper I (σ RV −I ) is presented in Fig. 2, which shows that the LSP3 pipeline over-estimates the uncertainties of RVs. The median σ RV of dwarfs with SNR > 50 is around 2.9 km s −1 while for the LSP3 pipeline it is ∼1.5 times higher at 4.3 km s −1 . The precision of RVs with high SNR is adequate enough to detect short-period binaries. Figure 3 presents the distribution of mean σ RV in the Hess diagram, which shows that σ RV of hot stars are higher than those of cooler stars. The distribution of the average number of epochs in the Hess diagram is shown in Fig. 4 . The distribution of the multi-epoch observations are uniform, which indicates that the σ RV are not biased by selection effects of epochs.

Reliability of the data
In this work, we adopt the RVs and σ RV from LSP3 in our binary identification. For a single star with multiple observations in the same condition, the RVs obey a Gaussian distribution with a mean RV and variance σ 2 RV . However, for a star observed repeatedly in different conditions, we have a sample of RVs for the star, RV 1 , RV 2 , . . ., RV n , where each RV value is from a Gaussian distribution having the same mean RV but a different standard deviation σ RVi . The weighting factor is the inverse of σ 2 RVi , thus, the weighted radial velocity RV is expressed as 2 where the error of the RV iŝ and the weighted error is ( The variance of the RV is (4) Figure 5. The distribution of the multi-epoch sources against the S 2 and number of epochs.
The distribution of S 2 for the sources with multiple epochs is shown in Fig. 5. The S 2 converges into 1 with enough epochs, which proves the validity of RVs with errors. Although the RVs of a binary or other RV variable star don't follow a normal distribution, we could also define their RV and σ RV through equations 1 and 3.
In order to analyze the feasibility of detecting binaries through ∆RV max , a simulation is performed. We construct a sample of 1 million binary stars and count the percentage of stars detected based on the LAMOST's capability. For the binary systems M B , we assume that: (1) the RVs are contributed by their primary stars; (2) their orbital orientations are isotropic in 3D space and initial phases follow a uniform distribution; (3) their primary masses follow the measured mass distribution of the LAMOST sample, which are determined by fitting the atmospheric parameters with the Yonsei-Yale (YY) isochrones (Demarque et al. 2004, and references therein); (4) the mass ratio q follows a power-law distribution (f (q) ∝ q 0.3±0.1 , e.g. Duchêne & Kraus 2013); (5) for the orbital period distribution, a log-normal profile (with a mean value of log P = 5.03 and a dispersion of σ log P = 2.28, where P is in units of days, see Raghavan et al. 2010) is adopted. The σ RVs adopted in the simulation follow those derived from the LAM-OST DR4 data. As shown in Fig. 6, the amplitudes of the simulated binary stars are strongly dependent on the period distribution. Considering that the typical exposure time of each observation is about one hour and the time span of LAMOST DR4 is less than 5 years, the detection is more efficient for binary systems with periods in the range of 0.1 day to 5 years rather than those with extremely short or long periods. We adopt 10 km s −1 (∼ 3.0σ RV for dwarfs with SNR higher than 50) as a threshold of RV amplitude to recognize RV variable stars in the simulation. The box in the figure marks out the 12% of simulated binaries detectable with LAMOST based on these thresholds. It demonstrates that a cer- Figure 6. The joint distribution of periods and amplitudes for the mock binaries. The box marks out the detection limit based on the LAMOST's capability. tain proportion of binary stars are detectable based on the LAMOST observations.

Probability of belonging to a binary system
The binary system could be identified by comparing ∆RV max with σ RV , where the ∆RV max presents the maximum radial velocity difference between any two epochs for the same object (e.g. Maoz et al. 2012). In order to test the effectiveness of the method, we mock three samples and count the percentages of detected stars at different thresholds. The three samples are defined as: The assumptions for the simulated samples are the same as those described in Section 3.1. The time separations of the multi-epoch observations are derived from the LAMOST DR4 data. The binary fraction 55% adopted in the CSP is the median value derived from the LAM-OST (Yuan et al. 2015a;Gao et al. 2014;Tian et al. 2018). Each sample consists of 1 million stars or systems. Note that intrinsic variables, e.g. pulsating stars, are ignored in these simulations. Under these assumptions, the distributions of ∆RV max for the SSP, BSP and CSP samples are constructed and presented in Fig.  7. The vertical dashed-lines from left to right in the figure mark the cutoffs of ∆RV max /σ RV equal to 1, 2 and 3, respectively. The BSP sample has less low-value ∆RV max and more high-value ∆RV max than the SSP sample. The low-value ∆RV max are dominated by random errors, while the high-value ∆RV max are produced by variations of binary phases in the BSP (and the CSP). The detection rate (DR), false positive rate (FPR), true positive rate (TPR) and the fraction of real RV variables (purity) of the identified binaries against cutoffs of ∆RV max /σ RV are presented in Fig. 8. Improving the threshold of ∆RV max /σ RV will increase the purity of the catalog, but reduces the DR at the same time.
Here the threshold of ∆RV max > 3.0σ RV is adopted to identify RV variable stars. There are 3%, 11% and 8% stars with ∆RV max greater than 3.0σ RV in the SSP, BSP and CSP samples, respectively. The stars with ∆RV max /σ RV > 3.0 in the CSP consist of 20% single stars and 80% binary systems. It indicates that the RV variable stars detected with the following method may be polluted by single stars. Given the value of ∆RV max from observations, the probability of the star being a binary could be calculated   based on the CSP simulation using Bayes' theorem: where p(M B ) and p(M S ) denote prior binary and single star fractions, respectively. Here we adopt a p(M B ) of 55% derived from the LAMOST. The p(∆RV max |M B ) and p(∆RV|M S ) indicate the probabilities of obtaining ∆RV max based on assumptions of the BSP and SSP models, respectively. Their values as functions of ∆RV max /σ RV are shown in Fig. 9. For stars with ∆RV/σ RV < 1.9, they are more likely to be a single star rather than a binary system. The probability of being a binary system P v as a function of ∆RV max /σ RV calculated through equation 5 is presented in Fig. 10. The higher value of ∆RV max /σ RV is, the higher probability of the star belonging to a binary system.  This method is more sensitive to short-period binary stars, since their RVs vary more rapidly than long-period ones. For long-period (e.g. ∼ 300 years) binary stars, the time span of the LAMOST DR4 observations (∼5 years) is too short to produce a large ∆RV max to test their binarity efficiently.
The Balmer lines are covered in the blue arm (3700 -5900Å) of LAMOST. Figure 11 plots the normalized LAMOST spectra for a representative star at two different epochs. The shift of H β is clearly seen in the bottom panel, demonstrating LAMOST's capability to measure ∆RV max . Here we measure the depths of H β from the normalized spectra. In order to ensure the reliability of RV measurements, we eliminate the sources with H β depths less than 0.3. Meanwhile, the sources with high ∆RV max values are confirmed by visual inspections to identify and remove the spectra affected by cosmic rays.

CATALOG OF RV VARIABLE STARS
We apply the method to the LAMOST (DR4) data and estimate the binary probabilities of stars. Here we adopt a threshold of P v > 0.6 (∆RV max > 3.0σ RV ) to identify binary stars and build a catalog of binary candidates. According to the simulation of CSP in Section 3.2, the FPR is about 3% at this threshold based on the capability of LAMOST. Since the cumulative run time of LAMOST is much less than the mean period of binary systems, the LAMOST data is not suitable for detecting long-period binaries. There are ∼120,000 stars with P v > 0.6 (∆RV max > 3.0σ RV ) in the LAMOST's DR4 sources with multiple epochs. After adopting the criteria of spectral depth and visual inspections, an assemblage of 80,702 RV variable star candidates remains in our final catalog as listed in Table 4. Note that in the simulation we only consider single and binary stars, but the sample observed with the LAMOST includes some intrinsic variables such as pulsating stars.
The distribution of the repeatedly observed stars in two-dimensional space of ∆RV max versus P v is shown in Fig. 12. The majority of the repeated targets that dominate low P v values (P v < 0.6) are single stars or unrecognized RV variables. Meanwhile, we present the fraction f v of stars with P v > 0.6 in each bin with a size of 0.02 dex by 0.2 dex for log T eff and log g respectively in Fig. 13. As shown in the figure, the extended distribution of main-sequence stars with P v > 0.6 is broader than those with P v < 0.6. Stars with high P v have higher probabilities of being binaries than those with low P v .

The purity of the catalog
In order to verify the purity of the catalog and estimate pollutions by single stars, we perform a cross-identification between the LAMOST multi-epoch sources and a catalog of RV standard stars published by Huang et al. (2018) based on the APOGEE data (Majewski et al. 2017;Abolfathi et al. 2018). There are 1,274 common sources between them. One hundred and three RV standard stars among the common sources have P v > 0.6. It means a single star contribution ∼ 8% to our catalog. The purity of our catalog is approximately 92%, which agrees with the simulation in Section 3.2. Considering the cross-identification between our catalog and Huang et al. (2018), as well as the pollution by single stars in the simulation from Section 3.2, the purity of our catalog is estimated to be higher than ∼ 80%.

Cross-match with Kepler Eclipsing Binaries
A database of thousands of Kepler eclipsing binaries (KEBs) is released by Matijevič et al. (2012, and references therein). In total 520 KEBs have been observed repeatedly by the LAMOST-Kepler project that uses the LAMOST to make spectroscopic follow-up observations for the Kepler targets (De Cat et al. 2015;Zong et al. 2018). Of those, 255 stars are detected as binary stars in our catalog based on the LAMOST observations. To test the rationality of such application on the Kepler data, we simulate a sample of 1 million eclipsing stars and count the rate of the detectable binaries. The assumptions of the mock sample are similar to those described in Section 3.2. However, for the simulated eclipsing stars, we fix the inclination of their orbits as π/2. The distribution of orbital periods for the mock sample is adopted from those of the KEBs. The joint distribution of periods and ∆RV max for the mock eclipsing binaries is shown in Fig. 14. The box in the figure marks out the detectable stars with periods in the range of 0.1 day -5 years and RV amplitude higher than 10 km s −1 . About 60% of the eclipsing binaries are detected in the simulation. The detection rate will be reduced to 44% given the limitation of periods of 0.5 day -5 years. The simulation provides an explanation for the detection ratio ∼50% of KEBs by LAMOST.
KEBs such as KIC 11084782 and KIC 9953894 have been observed in 11 and 7 epochs by LAMOST, respectively. Their RV time series are plotted in the top panels of Figs. 15 and 16. Given the orbital period measured with Kepler , we could fit the RVs of the binary system accurately with rvfit. The rvfit method fits RVs of stellar binaries and exoplanets using an adaptive simulated annealing (ASA) global minimization method, which quickly converges to a global solution minimum without the need to provide preliminary parameter values. The efficiency and reliability have been verified by Iglesias-Marzoa et al. (2015a,b). As shown in the middle panels of Figs. 15 and 16, the observed and fitted RVs against phases are presented. The residuals (O-C) are plotted in the bottom panels of the figures. The RVs from spectroscopic observations together with periods from photometric observations could constrain the orbital parameters well.

Cross-match with GDR2 variables
Since some stars exhibit RV variations due to periodic contraction and expansion they will, absent further characterization, contaminate the catalog of binary candidates. We cross-match the variable star candidates with Gaia DR2 (GDR2) variables including Cepheids, RR Lyrae, long-period variables (LPV) and short-period variables (SPV) (Clementini et al. 2019;Mowlavi et al. 2018;Roelens et al. 2018). The distribution of P v for the common stars is presented in Fig. 17. One hundred and ninety-eight variable stars from the 498 common sources are detected (P v > 0.6) with LAMOST. The common sources include 19 Cepheids, 442 RR Lyrae, 34 LPV and 3 SPV detected with Gaia. Among them, 10 Cepheids, 179 RR Lyrae, 4 LPV and 0 SPV are identified as RV variables in our database. The true positive rate of the catalog is about 39% for these intrinsic variables. This value is different than that of binary systems because of the different period distributions between intrinsic and extrinsic variables. The Period-∆t diagrams for these common Cepheids and RR Lyrae are presented in Figs. 18 and 19, respectively. Their periods are provided by Gaia variable catalogs, while the ∆t are from LAMOST observations. From the figures, we can see that the detection rates are related to sampling characteristics of observations as well as stellar periods. Figure 20 quantifies the distribution of detection rate f v against the (∆t mod Period)/Period for the common RR Lyrae between GDR2 variables and LAMOST multi-epoch targets. A Gaussian curve of the f v with a mean of 0.48 and variance of 0.29 2 illustrates the detection rate depends on sampling characteristics of observations and stellar periods.
Meanwhile, we cross-match the LAMOST multi-epoch targets with the catalogue of radial velocity standard stars from GDR2 (Soubiran et al. 2018). None of the 7 common stars were identified as RV variables in our catalog.
As examples indicated in Fig. 21, the phased variations of RV, T eff and log g for RR Lyrae are presented. Their periods are measured with Gaia and stellar parameters and RVs are derived from LAMOST spectra. Since the pulsation of an RR Lyra, its log g varies with the radius directly, meanwhile, its T eff decreases and increases with the contraction and expansion of the star, respectively. The variations of stellar parameters and RVs could be detected through the LAMOST observations. A detailed analysis of RR Lyrae observed with LAMOST is presented in Liu et al. (2020) and interested readers please refer to the paper.   Note that the LAMOST is not adequate to detect short-period RV variables with periods shorter than two hours based on Nyquist's theorem, especially for extreme short-period ones, since the typical exposure time of LAMOST is in the order of an hour. We list the 3 common sources between Gaia SPV and multi-epoch observed LAMOST targets in Table 2. From the table, Figure 19. Same as Fig. 18, but for RR Lyrae. we can see that low-period (high-frequency) SPV could not be detected as variables with LAMOST. It demonstrates that the probability of detection is related to the period (or frequency) of the target.

Cross-match with VSX
In order to investigate our catalog further, we crossmatch the catalog with other variable stars published in literatures. The VSX is a comprehensive relational database of known and suspected variable stars gathered from a variety of respected published sources (Watson et al. 2006). About 600,000 variable stars are collected and about three-fourths of them are provided with types and periods in VSX. There are 10,557 shared sources between VSX and LAMOST duplicated targets. Among them, 3,044 stars are detected as RV variables in our catalog. The types of the detected stars include binary stars and pulsating stars. The comprehensive detection rate of VSX is about 29% by LAMOST.

Cross-match with GCVS
The GCVS is another catalog of variable stars. The GCVS 5.1 version contains data for 53,626 individual variable stars discovered and named as variable stars by 2017 and located mainly in the Galaxy (Version 5.1, Samus' et al. 2017). An assemblage of 33,264 variables is provided with types and periods in GCVS 5.1. Among 924 common sources between GCVS 5.1 and LAMOST multi-epoch sources, 453 stars are recognized as RV variables in our catalog. The comprehensive detection rate of GCVS is about 49% by LAMOST.

Cross-match with ASAS-SN
The All-Sky Automated Survey for SuperNovae (ASAS-SN) scans the extragalactic sky visible from Hawaii roughly once every five nights in the V-band (Shappee et al. 2014). Catalogs of variable stars based on ASAS-SN have been released by Jayasinghe et al. (2018Jayasinghe et al. ( , 2019a. These catalogs collect 542,526 variable stars including 334,095 supplied with types and periods. There are 5,113 common sources between the ASAS-SN variable catalogs and LAMOST multi-epoch targets. Among them, 2,011 stars are recognized as RV variables in our catalog. The comprehensive detection rate of ASAS-SN variables is about 39% by LAMOST.

Characteristics of the catalog
A summary of the numbers of common sources between the published catalogs and LAMOST multi-epoch targets are listed in Table 3. Note that some variable stars are identified repeatedly in different published catalogs. There are 11,035 common sources between LAM-OST multi-epoch targets and the referred variable catalogs such as KEBs, GDR2 variables, VSX, GCVS, and ASAS-SN variable catalogs. 3,163 common sources are detected as RV variables in our catalog. The detection rate of our catalog is 29% for the variables published in the referred catalogs. Variable stars fall into two categories: intrinsic and extrinsic variables. Binaries belonging to extrinsic variables and pulsating stars from intrinsic ones could be detected through variations of RVs based on the LAM-OST's capability. There are 80,702 stars detected as RV variables among the 818,136 stars with multiple epochs by LAMOST. As discussed in Sections 4.2 and 4.3, not only binaries are included in the catalog, but also some intrinsic variables such as RR Lyrae and Cepheids. According to the CSP simulation in Section 3.2, about 8% of the sample are detected as binaries with a purity of 80%, which implies that 6.4% of the LAMOST targets with multiple epochs are binaries and 1.6% (∼13,000) are pollution by single stars given the LAMOST multiepoch targets consist of single and binary stars. However, the 80,702 detected stars dominate about 10% of the LAMOST targets with multiple epochs, which is higher than the detection rate about 8% in the CSP simulation. Consequently, the others (15,251 stars) in the catalog, dominating approximately 2% of the LAMOST multi-epoch sources, probably consist of some intrinsic variables and pollution by single stars. Applying the curve of pulsating star fractions against T eff (see Fig.  11 in Murphy et al. 2019) in the LAMOST targets with multiple epochs, the number of pulsating stars covered by LAMOST is expected to be approximately 20,000. However, only pulsating stars with period and RV amplitude in a specified range could be detected by LAMOST. Assuming a typical detection rate 30% of the pulsators, the number of detected pulsating stars in our catalog is approximately 6,000. Thus, the 15,251 stars are mainly constructed with binary stars and pulsating stars, probably. Therefore, the catalog consists of ∼62,000 binaries (77%), ∼ 6,000 pulsating stars (7%) and a pollution by ∼13,000 single stars (16%).
Based on the BSP simulation in Section 3.2, the detection rates of binaries against their periods are presented in Fig. 22. The detection rates drop exponentially with the increasing of periods. Figure 23 displays the detection rate of common sources between LAMOST repeated targets and the published catalogs mentioned before. The classifications of the shared stars through cross-identifications with the previous catalogs are listed in Table 4. The distribution of detection rate indicates that the method adopted in this work based on ∆RV max by LAMOST is sensitive to short-period RV variable stars such as short-period binaries and RR Lyrae. All the same, various types of variable stars appear in our catalog. However, most of the variables collected in our catalog, so far, are not able to be classified based on LAMOST spectra or data from other surveys. Figure 22. The detection rate verses orbital periods for the BSP simulation.

CONCLUSIONS AND DISCUSSIONS
We analyze the probabilities of being RV variable stars based on the duplicated observations for LAMOST DR4 targets. A catalog of 80,702 RV variable star candidates is constructed. The false positive rate of the catalog is about 3% based on the LAMOST ability. The purity of the catalog is estimated to be better than ∼80% through simulation and cross-identifications. Both intrinsic and extrinsic variable stars are collected in the catalog. It consists of 77% binary systems and 7% pulsating stars as well as 16% pollution by single stars. The catalog is a powerful database of RV variable candidates, which could be taken as an input source for RV variable surveys.
Since some intrinsic variables present variability of RV, the catalog is blended with pulsating stars such as Cepheids, RR Lyrae, LPVs and SPVs. The crossidentifications and classifications are carried out by matching with Kepler Eclipsing Binaries, GDR2 variables, VSX, GCVS, and ASAS-SN variables. A number of 3,138 stars in our catalog are classified. Although recognized as RV variables, most of the variable stars in the catalog are not classified based on the LAMOST data or other surveys. The efficiency of the method adopted in this work relies on not only sampling frequency of observations but also periods and amplitudes of variable stars.
The key foundation of this work is the accuracy of RVs and their uncertainties. Fortunately, over-estimating uncertainties will not affect the accuracy of identifying RV variables or their candidates, although some of them would be left out. In future work, we will make use of spectral and photometric data from LAMOST and other surveys to classify the catalog of RV variable stars as a follow-up to this work. The spectra of classified stars would be adopted as training set to recognize spectra of unclassified RV variables based on machine-learning method, probably. Meanwhile, the common sources between the RV variables and X-ray sources will provide more clues of binary interactions. This work has made use of data products from the Guoshoujing Telescope (the Large Sky Area Multi-Object Fibre Spectroscopic Telescope, LAMOST). LAMOST is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.
This work is partially supported by National Natural Science Foundation of China 11803030, 11443006, 11773005, 11803029 We would like to thank the anonymous referee for valuable comments which improved the manuscript. Table 4. Catalog of RV variable star candidates. The Ra and Dec of the stars are listed in columns 2-3. Number of epochs and time duration of observations for each star are shown in columns 4-5. The maximum variation of RV and the weighted error are listed in columns 6-7. SNRs and time of exposures responding to the maximum and minimum RVs are listed in columns 8-11. The probability of being a RV variable star is provided in the last column. LAMOST unique spectral ID, SNR, time for each exposure, stellar parameters and RVs together with their errors of each epoch see a detailed and inclusive version of the catalog online.

No.
Ra a The Notes label marks the common sources between LAMOST and other surveys.