A Sample of Am and Ap Candidates from LAMOST DR10 (v1.0) Based on the Ensemble Regression Model

Large samples of Am and Ap stars are helpful in studying the interplay between phenomena like atomic diffusion, magnetic fields, and stellar rotation in stellar astrophysics. Existing samples of Am and Ap stars, mostly obtained from spectral data with a signal-to-noise ratio in the g band (S/Ng) greater than 50, can benefit from expansion by exploring spectra with lower S/Ng. Therefore, this paper proposes an ensemble regression model applicable to spectra with a minimum S/Ng of 30. Using the model, we identify 21,361 Am candidates, of which 11,614 are new, and 6182 Ap candidates, of which 4978 are new, from LAMOST DR10. The Am sample size has increased by 60% and the Ap sample size has increased by 180% compared to the previous sample. In terms of effective temperature (T eff), the Am candidates range mainly from 6000 to 8500 K, while the Ap candidates range from 6000 to 11,700 K. The surface gravity ( logg ) distributions for Am and Ap candidates differ in the range of 3.25–4.75 dex. The number of Am candidates increases stepwise, in contrast to the relatively uniform distribution of Ap candidates across the entire surface gravity range. Regarding metallicity ([Fe/H]), Am candidates typically range from −0.75 to 0.38 dex, peaking near 0 dex, while Ap candidates are distributed from −1.38 to 0.38 dex, with a peak near −0.5 dex.


Introduction
Chemically peculiar (CP) stars are present in the upper main sequence of the H-R diagram and can provide a natural laboratory for understanding stellar evolution (Gray & Corbally 2009;Krticka et al. 2009).Generally, CP stars consist of four main types: Am stars, Ap stars, HgMn stars, and He-weak stars, as shown in Table 1.These types can be defined by certain abnormal features of the strong or weak absorption line that can respond to particular surface abundances (Preston 1974;Wolff & Preston 1978;Chojnowski et al. 2020;Paunzen et al. 2021).
Am stars, also known as metallic line A-type stars, exhibit an apparent underabundance of Ca (and/or Sc), along with an overabundance of Fe-group and heavier elements (Conti 1970).In terms of spectral classification, typical Am stars have a Ca II K line type that is at least five spectral subclasses earlier than the metallic line type (Roman et al. 1948).Nearly all heavy elements (with a few special exceptions) in Am stars are enhanced in the stellar photosphere, while only specific elements-such as silicon (Si), strontium (Sr), europium (Eu), chromium (Cr), or rare-earth elements-are greatly enriched in abundance in Ap stars or peculiar A-type stars.At the same time, Ap stars are the only class of early stars in which globally ordered and periodically varying strong magnetic fields of up to several tens of kilogauss can be universally observed (Babcock 1947;Aurière et al. 2007).The origin of this phenomenon is still somewhat controversial (Moss 2004), but much evidence has been established to support the idea that the magnetic field is a relic of the "frozenin" interstellar magnetic field, known as the fossil field theory (Braithwaite & Spruit 2004).
Several researchers have proposed various models to explain the formation of CP stars, such as the nuclear synthesis model (Fowler et al. 1965), the supernova model (Stothers 1963;Guthrie 1967), the atomic diffusion model (Browne 1968;Michaud 1970;Michaud et al. 1983), the magnetic field accumulation model (Havnes & Conti 1971), and the collision model (Cowley 1977).Among these, the explanation provided by the atomic diffusion model is widely accepted (Wolff 1983).The model suggests that the emergence of CP stars is caused by the layered motion of elements due to gravitational settling and radiative levitation.This process causes most elements to sink under gravity, but those with significant absorption lines are accelerated toward the surface of the star by a process of diffusion.However, the atomic diffusion model alone cannot account for all CP phenomena.This suggests that a combination of multiple theories, along with a substantial body of observational evidence on CP stars, may be necessary to support them.
The search for and identification of Am and Ap stars are crucial as the basis for an in-depth study of the evolutionary mechanisms and properties of these stars.A lot of research and Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.authentication work (Renson & Manfroid 2009;Hümmerich et al. 2018;Sikora et al. 2019) has been done.Just from the work related to the Large Area Multi-Object Fiber Optic Spectroscopic Telescope (LAMOST), we can clearly observe the efforts of various scholars who have advanced the search for Am and Ap stars.Two years after the commencement of the LAMOST survey program, Hou et al. (2015) first derived empirical segmentation curves for Am stars using Ca II K lines and nine sets of iron lines, identifying 3537 candidates in DR1.Gray et al. (2015) utilized MKCLASS, a tool that emulates the human decision-making process for automated star classification, to categorize the spectra of stars observed by LAMOST in the Kepler sky region.This process resulted in the identification of 1067 Am stars with H-line spectral types ranging from A4 to F1.Subsequently, Qin et al. (2019) identified 9372 Am and 1132 Ap candidates from LAMOST DR5 using the random forest (RF) algorithm (Breiman 2001).Hümmerich et al. (2020) collected 1002 Ap candidates from LAMOST DR4 and performed spectral classification by identifying features suppressed by 5200 Å flux, in combination with an improved version of MKCLASS.Following their approach, Shang et al. (2022) utilized XGBoost to newly identify 6917 Am stars and 1652 Ap stars from LAMOST DR8 data.The most recent study was conducted by Tian et al. (2023), who utilized the MKCLASS software to search for over 21,600 Am candidates from the LAMOST DR8 (v1.0), DR9 (v0), and DR10 (v0) data sets.
Their outstanding work has provided a strong boost to the study of this specific group of stars.However, there is still more work to be done on the searches for Am and Ap stars.On the one hand, previous researchers set the signal-to-noise ratio in the g band (S/Ng) above 50 for search convenience, thereby missing the opportunity to discover Am and Ap stars in the lower S/Ng data, which represent more than 63% of the data.On the other hand, the release of nearly 3.5 million lowresolution spectra of early-type stars in LAMOST DR10 (v1.0) presents an important opportunity to identify more Am and Ap stars.Therefore, based on previous studies, we combine the respective abundance patterns of the Am and Ap stars, integrate regression analysis (Owen 2007;Yang et al. 2022b) and ensemble learning techniques (Zhou 2012;Yang et al. 2022a;Cai et al. 2023), and construct their respective exclusive ensemble regression models (ERMs; Am_ERM and Ap_ERM).Finally, we obtain and statistically analyze 21,361 Am candidates and 6182 Ap candidates.
The remaining sections of this paper are structured as follows.Section 2 focuses on presenting our model and the associated experimental evaluations.The data are described in detail in Section 3. The specific statistical analyses of the Am star search results are presented in Section 4. We summarize and look forward to the relevant parts of this paper in Section 5.

ERM
The ERM, a classifier we constructed by combining regression analysis and ensemble learning, is inspired by two key aspects.On the one hand, there are potential correlations between various features in the abundance patterns of the Am and Ap stars.At the same time, the primary characteristics of Am and Ap stars are concentrated at the blue end, which is more susceptible to noise.It is possible to obtain more accurate models by utilizing feature correlations in conjunction with the features themselves to construct classifiers using regression.On the other hand, there are various combinations of features, and it is essential to integrate different classifiers to fully represent the correlation patterns of all combinations.As shown in Figure 1, ERM mainly consists of data preprocessing, feature extraction, model training, and classification.Model training specifically includes four parts: feature grouping, feature crossover, regression ensemble, and positive and negative error intervals.Different ERMs can be obtained depending on the number of selected features and Figure 1 shows the case where the number of selected features is four.The ERM model independently trains positive and negative case samples to derive their respective prediction error intervals.It subsequently calculates the prediction errors across all equations for the test samples and evaluates if the proportion of prediction errors meeting the positive case intervals surpasses that satisfying the negative case intervals.Consequently, the test sample is categorized as positive or negative.
Our methodʼs primary advantage lies in achieving a higher classification accuracy even with low-S/Ng data.Compared to empirical segmentation curves (Hou et al. 2015), ERM can obtain samples in a more precise manner.In contrast to using the MKCLASS method (Hümmerich et al. 2020), ERM demonstrates superior discriminative ability on low-resolution and low-S/Ng data.Compared to RF (Qin et al. 2019) and XGBoost (Shang et al. 2022), ERM achieves a similar classification accuracy as reported in their respective literature, with fewer yet more accurate relevant features.Additionally, our method can directly derive indicators representing the reliability of the samples in the model, facilitating further research on the data.

Divergence Score
The divergence score serves as the foundational metric in this study for sample classification.It is derived through a weighted average of ERMs with varying numbers of selected features, strategically designed to meet specific criteria.In the final classification step, a divergence score threshold is set; samples with a score equal to or higher than the threshold are classified as positive, while those below the threshold are classified as negative.(1996).
1.The divergence score should be a value between 0 and 1, with a higher value meaning that the model is more certain that the sample is a positive example; and 2. The divergence score needs to consider the impact of the S/Ngs of the raw spectra to balance the varying reliability levels of the initial features.
Taking the above conditions fully into account, our divergence score formula is as follows: where D is the divergence score; S is the degree to which the sample is affected by the S/Ng, where S is equal to 1 when the S/Ng is greater than or equal to 50 and S is equal to S/Ng / 50 when S/Ng is less than 50; M is the number of features of the sample; K is the number of combinations of model choices; + P K is the proportion of the number of equations that the sample satisfies in the model of the positive case of choice K; - P K is the proportion of the number of equations that the sample satisfies in the model of the negative case of choice K; and F K is the result of labeling the sample with the model that selects the number of combinations of K features, having a value of 1 when the sample is discriminated as a positive case and a value of 0 when the sample is discriminated as a negative case.
Considering the computation of the divergence score for the Am sample, which involves 10 attributes and a selectable range of values for K ranging from 2 to 10, the result is a total of nine models.The discriminative capacity of the model generally correlates positively with the number of equations produced; thus, a higher number of equations indicates a stronger discriminative capacity.Ideally, an Am sample achieves a divergence score of 1, while a non-Am sample obtains a score of 0, when the S/Ng exceeds 50.For samples with S/Ng below 50, the divergence score is directly proportional to S/Ng, decreasing as S/Ng decreases.Therefore, the proposed divergence score formula meets our specified criteria.

Reliability Factor
The reliability factor is used to indicate the magnitude of the certainty that the model will discriminate the sample as positive.It is obtained by transforming the divergence score with the following formula: where R is the reliability factor, D is the divergence score, and d is the divergence score threshold.A sample is classified as positive if D is greater than or equal to d, otherwise it is classified as negative.Furthermore, α is the parameter to be determined and represents the precision of the model at the divergence score threshold d.Therefore, based on the results of the following experiments, the parameter α is set to 0.95 for a divergence score threshold of 0.2 and to 0.90 for a divergence score threshold of 0.3.

Model Evaluation
This section emphasizes the primary performance metrics of our training models (Am_ERM, Ap_ERM), while a comprehensive exposition of the data processing is presented in Section 3. Subsequently, we delineate the procedure for preparing the training and test sets, along with the corresponding evaluation results of the models.

Training and Testing Data Sets
Three types of data sets are constructed for the training and evaluation of the model for the Am and Ap stars, respectively, including the training set, the original test set, and the test set with noise.Taking the Am data set as an example, in constructing the training and test sets, we first remove the Am data given in the Shang et al. (2022) catalog from the DR10 (v1.0) data that have completed the data preprocessing, and the remaining data are non-Am.Then, according to the amount of data given in Table 2, we perform a repetition-free random sampling, in which the Am training set is obtained from the data with a range of S/Ng of 50 to 200, and the original Am test set is obtained from the data with a range of S/Ng of 200 to 999.However, for the Am noise test set, the construction process is different from the previous two, in that it simulates low-S/Ng data by adding the appropriate noise to the original Am test samples we have chosen.
Since most previous studies use an S/Ng greater than or equal to 50 as the cutoff point, it is not possible to effectively distinguish whether or not an Am sample is present in data below 50.To verify the performance of the model on low-S/Ng data, we generate data with low S/Ng, based on the selected test samples and with reference to the description of Du et al. (2012) and Li et al. (2020).We assume that data with S/Ng greater than 200 are similar to the theoretically normalized spectrum with a signal strength of 1.Then, different S/Ng data are simulated by adding normally distributed random noise with a mean of 0 and a standard deviation of 1/S/Ng, as shown in Table 3.In the end, we simulate to obtain a total of 49 sets of Am noisy test data with S/Ngs ranging from 1 to 49 and spaced at intervals of 1.
For Ap, the corresponding data set is acquired in a similar way to Am, except that Ap has fewer data in the catalog given by Shang et al. (2022) than Am.

Evaluation
When applying Am_ERM and Ap_ERM to the sample retrieval, determining the divergence score threshold is imperative.As depicted in Figures 2 and 3, choosing a divergence score threshold of 0.2 yields precisions exceeding 0.95 for Am_ERM on the Am training set and Ap_ERM on the Ap training set.Increasing the divergence score threshold from 0.2 maintains precisions above 0.95 until 1, although with a subsequent decrease in recall and precision.Thus, to effectively balance precision and recall, we opt for a divergence score threshold of 0.2.
Subsequently, this threshold is applied to the original Am test set and the original Ap test set, generating a precision of 0.96 and a recall of 0.8 for Am_ERM on the original Am test  set and a precision of 0.97 and a recall of 0.7 for Ap_ERM on the original Ap test set.
To evaluate the performance of the model on S/Ng < 50, we further utilize Am_REM and Ap_REM for classification on the Am noise test set and the Ap noise test set, respectively.The outcomes are visualized in Figures 4 and 5.These figures illustrate that for data sets with S/Ng above 30, the precision improves with an increasing divergence score threshold, remaining above 0.9 and approaching or reaching 1 after exceeding a threshold of 0.3.This trend intensifies with higher S/Ng values.In essence, when applying the model to data with S/Ng < 50, both S/Ng and the divergence score threshold warrant consideration.Higher S/Ng and a raised divergence score threshold signify greater sample reliability.Notably, the results from simulated data cannot substitute for real data; however, setting stringent criteria based on their pattern aids in identifying purer candidate samples.
In conclusion, analyzing the performance of the model on diverse data sets, we propose the search criteria outlined in Table 4 to identify potential candidates for Am and Ap stars.The Guo Shoujing Telescope is a new type of large-field-ofview and large-aperture telescope independently developed and designed by China, known as a large-area multi-objective fiber optic spectroscopic astronomical telescope, abbreviated as LAMOST (Bu et al. 2017).As a special reflecting Schmidt telescope with 4000 optical fibers mounted on its focal plane, it can simultaneously observe 4000 targets in a 20 deg 2 field of view, greatly improving the acquisition rate of target spectra.With the application of thin-mirror active optics and splicedmirror active optics, LAMOST can observe objects as faint as 20.5 mag within 1.5 hr of exposure, making it the world's largest optical telescope in terms of large aperture and large field of view (Cui et al. 2012;Luo et al. 2012;Zhao et al. 2012).The data in this paper are from LAMOST DR10 (v1.0), a data set released on 2023 March 30, with 11,473,644 stellar spectra included in the release, with spectra covering the wavelength range of 3700-9000 Å, with spectral resolution R ≈ 1800 at 5500 Å.

LAMOST Subclass Scope
The LAMOST subclass of the Am and Ap catalogs provided by Shang et al. (2022) shows that Am and Ap stars have a major distribution ranging from B to F2 (Figure 6).Therefore, we select stellar samples of types B, A, F0, and F2 with S/Ng greater than or equal to 0 to carry out the relevant work, and the statistics of the specific search information are shown in Table 5.Since there are repeated observations of the same    target in LAMOST, in order to avoid the interference of duplicate data on the final classification results, we use 3″ crossmatching on the initial data, before proceeding to the subsequent tasks, and retain only the spectra with the highest S/Ng among the duplicate terms.

Preprocessing
With reference to previous studies (Qin et al. 2019;Shang et al. 2022;Cai et al. 2022;Yang et al. 2023), all raw spectra are preprocessed in the following steps.The results of the data processing are displayed in Figure 7, and the specific steps are as follows: 1. Spectra are removed if they have significant errors and zero flux values in the wavelength range 3800-5600 Å; 2. Based on the radial velocities provided by the raw data, the spectra are moved to the rest frame, while the spectra for which radial velocities are not provided are deleted; 3. The spectra are interpolated then processed to obtain wavelengths in the range 3800-5600 Å and spaced at intervals of 1 Å; and 4. The spectra are divided by the pseudocontinuum spectrum to obtain the final desired standardized spectra.

Feature Selection
From the definitions of Ap and Am in Section 1, it can be seen that the main features of Ap and Am perform differently.At the same time, due to the significant difference in the number of known samples between Ap and Am, if the Ap and Am samples are mixed for model training, the final discrimination may be more favorable to Am and ignore Ap, which has fewer samples.Therefore, we take full advantage of their features and construct the search models for Am and Ap separately.We obtain the sample data available for the model based on the Am features chosen by Hou et al. (2015) when they built the experience curve and the Ap features chosen by Hümmerich et al. (2020) when they improved MKCLASS, respectively, using an equivalent width calculation consistent with Hou et al. (2015).
The 10 Am features that we choose are shown in Table 6 and the 14 Ap features that we choose are shown in Table 7, which are band ranges or wavelength points.To calculate their equivalent widths, we first look for peaks within a range of 5 Å (for Ca II K, taking 10 Å) on either side of the center of the given features and use the straight line joining the peak points as a local continuum.The following formula is then used for  Note.The "Type" column in the table shows the stellar type of the search sample, with the "F" type only containing the total amount of data for F0 and F2 in LAMOST.The "Total" column indicates the total number of data given by the website, the "Ind_sample" column gives the number of independent samples available after the 3″ crossmatching, and the remaining amount of data is displayed in the "Dup_sample" column.In addition, the 3″ crossmatching is only used within the type of data given in the table, and different data types may still contain spectra with duplicate observations, so we use 3″ crossmatching again in the search results to obtain a final reasonable sample for analysis before the final data analysis.
the calculation for each feature: where λ 0 and λ 1 are the wavelengths of the left and right peaks of the feature band, and F Iλ and F Cλ are the observed fluxes and localized continuum, respectively.

List of Am and Ap Candidates
Based on the Am_ERM, the Ap_ERM, and the criteria shown in Table 4, we search for candidates for Am and Ap.
First, we obtain a total of 1,559,737 spectral data, with subtypes ranging from B to F2, from the official LAMOST website.After data preprocessing, an initial search set including 1,070,750 spectral data is obtained.Then we process the initial search set using Am_ERM and Ap_ERM to obtain the divergence score for each sample belonging to Am or Ap, respectively.Finally, we obtain a total of 21,361 Am candidates and 6182 Ap candidates, after removing candidates for which parameter information is not available and restricting them according to the criteria mentioned above.Among them, In the figure, the black line is the original spectrum, the red line is the spectrum after removing the radial velocity, which is so small that the two almost overlap, and the green line is the fitted pseudocontinuum spectrum.The bottom half of the figure shows the results of the completed processing of the sample, where the vast majority of the spectral fluxes have been normalized to between 0 and 1, while the main features of the spectra are well preserved.7 The Astrophysical Journal Supplement Series, 272:43 (14pp), 2024 June Yang et al. there are 9747 Am candidates related to the previous literature and 11,614 new candidates; there are 1205 Ap candidates related to the previous literature and 4978 new candidates.
Additionally, looking at the final data in terms of S/Ng, the Am candidates contained 1123 candidates with S/Ng < 50; the Ap candidates contained 713 candidates with S/Ng < 50.Table 8 shows some sample information for the Am and Ap candidates.We provide 13 relevant parameters in Table 8 mainly.The columns give the Obsid, S/Ng, R.A., decl., SpT_lamost, log g, T eff , [Fe/H], Gaia_G, SpT_mkclass, SpT_Quality, Reliability Factor, and Star Type, respectively.The Obsid column is the unique number ID that identifies the spectra in LAMOST; the S/Ng column represents the S/N of the spectra provided by LAMOST in the g band; the Spt_lamost column is the spectral type from LAMOST; the atmospheric parameters (log g, T eff , and [Fe/H]) in our sample stars are determined by comparing the observed spectra with the KURUCZ library of theoretical spectra (Castelli & Kurucz 2003); the Gaia_G column represents the G magnitudes provided by Gaia DR3 (Eyer et al. 2023); and the SpT_mkclass and SpT_Quality are reclassifications of the candidates obtained by using the MKCLASS code in conjunction with the libnor36 standard library (Gray et al. 2015), where they represent spectral subtypes and the quality evaluation of spectral subtypes, respectively.
Besides the atmospheric parameters shown in Table 8, we have also added atmospheric parameters from LAMOST DR11 and those obtained by cross-validation from Gaia DR3 (Stassun et al. 2019;Anders et al. 2022;Fouesneau et al. 2023) to the star catalog.As shown in Figure 8, we compared the parameters derived from the KURUCZ templates with those from Gaia.From the figure, it can be observed that the two sources of parameters show relatively consistent results in surface temperature, but the consistency is poorer for the other two parameters.Furthermore, compared to each other, the two methods exhibit better consistency in parameter estimation for Am spectra than for Ap spectra.This discrepancy may arise from various factors, including the methods themselves, different data sources, and differences in spectral quality, etc.To ensure smooth comparisons with previous studies and to meet the need for available parameters for our low-S/Ng data, all subsequent analyses are based on the parameters presented in Table 8.

Statistical Analysis
In the following sections, we provide a statistical analysis of the parameters related to the candidates.In addition to the overall statistics of the candidates, we also include a comparison between new and old candidates.As shown in Table 8, the data without asterisks represent candidates obtained by us that are related to the star catalog provided by Shang et al. (2022), hereafter referred to as "the old candidates."

Spectral Analysis
We show the mean values of the normalized spectra of the Am and Ap candidates in Figure 9.The spectral lines in the figure are the average results of randomly selecting 1000 spectra from both Am and Ap candidate spectra, within a range of T eff from 7000 to 8000 K. From the figure, we can clearly see that the Ca II K line of Am is deeper than the Ca II K line of Ap, which is consistent with the results obtained by Shang et al. (2022).Additionally, we mark the features used to classify Am with a black line and the features used to classify Ap with an orange line in the figure.It can be seen that although the mean spectra of Am and Ap are more similar on the whole, Am and Ap show different shapes near the features we identified, which reflects the rationality of our feature selection.Furthermore, if the corresponding features of Am and Ap are given by Equation (1) to calculate the equivalent width, it can be seen that the equivalent widths of all features are small, except for Ca II K.This means that the main features of both Am and Ap are susceptible to noise interference, which can lead to classification errors.Therefore, it is necessary to reduce the S/Ng in the search for candidates to consider the S/Ng as part of the sample divergence score.

Sample Subclass Distribution
We show the subclass distributions for the Am and Ap candidates in Figure 10.In the figure, the orange color represents the new candidates, whereas the blue color indicates the candidates that have been previously related in the current results.It can be seen that the distribution of the blue candidates is basically the same as that given by Shang et al. (2022), which indicates that Am_ERM and Ap_ERM are prominent for modeling the training candidates.However, in terms of the general distribution, the left and right subplots differ in F0 and F2 from the distribution of the sample subclass provided by Shang et al. (2022).Additionally, the overall distribution of the additional sample is similar to that of Shang et al. (2022).Furthermore, from the point of view of the comparison of Am and Ap, the Am subclass is mostly concentrated in A5, A6, A7, F0, and F2, while the Ap subclass is mostly concentrated in A1, A2, A5, A6, A7, and F0, and the peaks of the distributions are F0 and A1, respectively.Combining the two figures shows that the temperature distribution of the Am candidates ranges from ∼6000 to ∼8500 K, while that of the Ap candidates ranges from ∼6000 to ∼11,200 K.Although most of the candidates conform to the temperature ranges given in the Am and Ap definitions cited earlier in the article, a small number of candidates still fall below the lower limit 7000 K, and there are individual candidates that break through the ceiling.Furthermore, from the perspective of the temperature distributions of the respective old and new candidates of Am and Ap, the new candidates of Am have a lower overall temperature than the old candidates and a new peak at ∼6100 K, while the new candidates of Ap have a similar temperature distribution as the old candidates.The reason may be the addition of more F0  candidates in the new candidates.The differences shown here are worth exploring further.

Distribution of Atmospheric Parameters
In terms of surface gravity, the log g distribution of Am ranges from ∼3.2 to ∼4.8 dex, with a peak at 4.6 dex, while the log g distribution of Ap ranges from ∼3.2 to ∼4.8 dex, with a peak at about 3.9 dex, which is more uniformly distributed over a limited range compared to the distribution of Am.When the log g distribution of the new and old candidates is compared, the distribution of the new candidates for Am is centered on ∼3.6 dex to ∼4.8 dex, with a median of ∼4.2 dex, while the distribution of the old candidates for Am is centered on ∼4.2 dex to ∼4.8 dex, with a median of ∼4.5 dex.Meanwhile, the old and new Ap candidates show an approximate antisymmetry in the distribution of log g.The new Ap candidates have a median value of ∼4.02 dex and a median value of ∼3.78 dex for the old Ap candidates.
In terms of the distribution of [Fe/H], the distributions of the old and new candidates of Am are similar in shape, with relatively close peaks.In general, the main distribution of To demonstrate the differences between Am and Ap stars on the H-R diagram, we present Figure 14, where the isochrones on the left and right panels are derived from the PARSEC model.In conclusion, combining the above analysis, we can see that there is a certain regularity in the distribution of all atmospheric parameters of Am, but the distributional characteristics of Ap are not obvious in all the metrics, except for the regularity exhibited in the effective temperature.In other words, the candidate profile of Ap is more complex than that of Am, and it makes sense to study Ap candidates at a deeper level of detail.

Space Distribution
The space distribution of the candidates for Am and Ap is shown in Figure 15.It can be seen that the Am and Ap candidates, both old and new, show a significantly higher density of candidates in the Galactic anticenter (GAC) than in other regions.On the one hand, it shows that the density distributions of the Am candidates we provide are similar to those obtained by Hou et al. (2015), Qin et al. (2019), andShang et al. (2022).On the other hand, it shows that the density distributions of the data we provide with different S/Ngs are not significantly different.This density can be explained in terms of the observation strategy and the actual spatial distribution.First, the GAC survey is an important part of the LAMOST survey, covering Galactic longitudes of 150° ℓ 210°and latitudes of  b 30 | |  (Luo et al. 2015), so more observations are carried out in this region; and second, stars are mainly born in the Galactic disk, where more young objects are concentrated.Our work greatly increases the sample size of known Galactic Am and Ap stars and contributes to subsequent in-depth statistical studies.achieving high-precision classification results.Concurrently, we verify the applicability of the proposed model to data with lower S/Ng.Following this, the trained Am_ERM and Ap_ERM are applied to the LAMOST DR10 (v1.0) data set, having undergone data preprocessing.This application results in a total of 21,361 Am candidates and 6,182 Ap candidates.
Ultimately, we conduct a statistical analysis of the relevant properties of Am and Ap based on the candidates obtained.While Am and Ap exhibit some spatial distribution similarities, they manifest distinct properties in terms of atmospheric parameters and subclass distribution.A preliminary analysis indicates that the case of Ap candidates is more intricate than that of Am candidates, warranting a detailed investigation into Ap candidates.
In summary, the primary contribution of this paper lies in introducing a novel method for classifying Am and Ap, applicable in spectral classifications with significantly lower S/Ng.A substantial and labeled sample of Am and Ap is acquired, and the primary statistical properties of Am and Ap are obtained through preliminary analysis.In the future, we plan to delve deeper into the specific properties of Am and Ap candidates and explore aspects that are currently unexplained in this article.

Figure 1 .
Figure 1.The four components of the ERM are distinguished by the different background colors.In this paper, the data preprocessing includes the normalization of spectral data.The different colored blocks represent the various features extracted from the samples for feature extraction, and the model training provides an example of selecting the number of groups as four.In the model training, the different colored blocks in feature grouping and feature crossover represent various features of the samples.The multiplication sign indicates the multiplication of the features at the corresponding position, which is used to obtain the crossover features.The different colored blocks in the regression ensemble represent the regression equations in which the features of the corresponding colored blocks are used as the dependent variables.The colored blocks within positive and negative error intervals depict the error ranges of the regression equations.The top half of the model, with a light yellow background, represents the error intervals for positive examples, while the bottom half, with a light gray background, signifies the error intervals for negative examples.Various data flows are illustrated using distinct colors.The brown arrows denote the flow of data from the samples to be classified, the black arrows signify the data flow during the model training phase, and the thin black lines indicate the flow of the training data.

Figure 3 .
Figure 3. Changes in accuracy, precision, and recall for Ap_ERM on the Ap training set as the divergence score threshold changes.Figure 4. Changes in precision for Am_ERM on the Am noisy test set with different S/Ng as the divergence score threshold changes.

Figure 4 .
Figure 3. Changes in accuracy, precision, and recall for Ap_ERM on the Ap training set as the divergence score threshold changes.Figure 4. Changes in precision for Am_ERM on the Am noisy test set with different S/Ng as the divergence score threshold changes.

Figure 5 .
Figure 5. Changes in precision for Ap_ERM on the Ap noisy test set with different S/Ng as the divergence score threshold changes.

Figure 6 .
Figure 6.Subclass distribution of the Am and Ap catalogs in LAMOST provided by Shang et al. (2022).

Figure 7 .
Figure 7.The top panel shows the normalization of the spec-56094-kepler05B56094_sp08-044.fits.gzsample.The upper half of the figure shows the raw spectrum in the wavelength range 3800-5600 Å.In the figure, the black line is the original spectrum, the red line is the spectrum after removing the radial velocity, which is so small that the two almost overlap, and the green line is the fitted pseudocontinuum spectrum.The bottom half of the figure shows the results of the completed processing of the sample, where the vast majority of the spectral fluxes have been normalized to between 0 and 1, while the main features of the spectra are well preserved.

Figure 8 .
Figure 8.The comparison of effective temperature, surface gravity, and metallicity between KURUCZ and Gaia of the Am and Ap candidates, respectively.

Figures
Figures 11 and 12  show the distribution of atmospheric parameters for the candidates Am and Ap from different angles.

Figure 9 .
Figure 9. Averaged spectral maps of Am and Ap stars.

Figure 10 .
Figure 10.Statistical distribution of subclasses of Am and Ap candidates provided by LAMOST.

Figure 11 .
Figure 11.Statistical distributions of different combinations of parameters for Am and Ap candidates.

Figure 12 .
Figure 12.Comparison of the distribution of old and new candidates for Am and Ap over different atmospheric parameters.
[Fe/H] values for Am ranges from ∼−0.75 to ∼0.4 dex.For the old and new candidates of Ap, they show different shapes in the [Fe/H] distribution, with the median value for the new candidates of Ap at ∼−0.5 dex and the median value for the old candidates of Ap at ∼−0.1 dex.For the overall candidates of Ap, the distribution of [Fe/H] values ranges from ∼−1.4 dex to ∼0.4 dex.As a result, Ap does not show a distinct distribution profile compared to the distribution of Am on [Fe/H].In comparison to the [Fe/H] range provided by Ghazaryan et al. (2018) for Am and Ap stars, we observe that the [Fe/H] values of some of our candidate objects are lower.To address this, we provide Figure 13 for explanation.It can be observed that a significant portion of the Am and Ap candidates exhibit lower [Fe/H] values, which correspond to relatively lower S/Ng values.This indicates that the quality of the original spectral data affects the calculation of [Fe/H].Furthermore, the surface temperature may be another factor influencing [Fe/H].As the surface temperature decreases, the corresponding [Fe/H] values also tend to decrease.

Figure 13 .
Figure 13.The impact of S/Ng and T eff on the [Fe/H] value.

Figure 14 .
Figure 14.The distributions in the H-R diagram of Am and Ap candidates are shown in the left and right panels, respectively.The color is coded by the metallicity [Fe/H].The dashed lines in both panels represent the isochrones from the PARSEC model with the same age 1 Gyr but different metallicities, i.e., −1.4,−0.8, −0.2, and 0.4 dex, respectively.

Figure 15 .
Figure 15.Spatial distribution of Am and Ap candidates.

Table 2
Number of Different Data Sets * * Note.The * sign indicates that this portion of the data are artificially modeled rather than given in the original data.

Table 3
Relationship between Partial S/Ng and Standard Deviation Changes in accuracy, precision, and recall for Am_ERM on the Am training set as the divergence score threshold changes.

Table 4
Table of Guidelines for Am or Ap Candidates

Table 5
Statistics on the Amount of LAMOST Data to be Recognized

Table 6
Hou et al. (2015)nges of the Nine Fe-group and Ca II K Lines Given byHou et al. (2015)for the Am Classification At the same time, we integerize the wavelength ranges in the original table.

Table 7
Absorption Lines and Blends for Classifying Ap Given in Hümmerich et al.

Table 8
The Catalogs of Am and Ap Candidates in LAMOST DR10 (v1.0) *Note.Only a portion of the Am or Ap candidate information is shown in the table.The asterisk ( * ) marked in the "Star Type" column indicates that the sample is new.(Thistable is available in its entirety in machine-readable form.)