Prediction of Chinese suitable habitats of Panax notoginseng under climate change based on MaxEnt and chemometric methods

Notoginseng saponin R1; ginsenosides Rg1, Re, Rb1, and Rd; the sum of the five saponins; and underground-part fresh weight (UPFW) of single plants were used as quality evaluation indices for Panax notoginseng (Burk.) F. H. Chen (P. notoginseng). Comprehensive evaluation of P. notoginseng samples from 30 production areas was performed using that MaxEnt model. Spatial pattern changes in suitable P. notoginseng habitats were predicted for current and future periods (2050s, 2070s, and 2090s) using SSP126 and SSP585 models. The results revealed that temperature, precipitation, and solar radiation were important environmental variables. Suitable habitats were located mainly in Yunnan, Guizhou, and Sichuan Provinces. The distribution core of P. notoginseng is predicted to shift southeast in the future. The saponin content decreased from the southeast to the northwest of Yunnan Province, which was contrary to the UPFW trend. This study provides the necessary information for the protection and sustainable utilization of P. notoginseng resources, and a theoretical reference for its application in the quality evaluation of Chinese medicinal products.

Panax notoginseng (Burk.)F. H. Chen is a perennial herb of the Panax genus (Araliaceae), whose medicinal parts are dry roots and rhizomes 1 .P. notoginseng is distributed in Yunnan Province, Guizhou Province, Guangxi Province, Chongqing City, and other regions 2 .It disperses blood stasis, reduces swelling, and relieves pain.Blood circulation promotion is the main function of saponins, including notoginseng saponin R1 and ginsenosides Rg1, Re, Rb1, and Rd 3 .Presently, more than 100 saponins have been described in P. notoginseng and are often used to treat hematemesis (vomiting blood), traumatic bleeding, and chest and abdominal (tingling pain) 4 .The medicinal components of P. notoginseng are found under the ground.Underground-part fresh weight (UPFW) is related to Notoginseng Radix et Rhizoma yield.Therefore, the saponin content and UPFW reflect the Notoginseng Radix et Rhizoma quality, and yield, respectively.These findings are of importance in evaluating the quality of Notoginseng Radix et Rhizoma.
Since ancient times, medicinal plants distribution in China has changed.Climate change majorly affects on the distribution of medicinal plants.For example, the upslope distribution of many mountain biological communities has changed owing to climate warming, which was also one of the reasons for the changes in Eucommia ulmoides distribution 5 .The primary and secondary metabolic pathways of medicinal plants are affected by climate change.The active components in most herbs are derived from secondary metabolic processes.Therefore, studying the environmental impacts of climate change on medicinal plant distribution and quality is conducive to sustainable development.
P. notoginseng has continuous cropping obstacles in the planting process, as it needs to grow for at least 3 years to reach the standard composition for medicine, and the land needs to be fallow 6 .Recently, the demand for

Handling of environmental variables
Current climate data were used as the baseline data (1970-2000), including 19 climate variables and 12-month averages for temperatures, solar radiations, and precipitations 14 .The future periods used for climate data were 2050s, 2070s, and 2090s.These climate data were selected from the Beijing Climate Center-Climate System Model-Medium Reslution (BCC-CSM2-MR) (Coupled Model Intercomparison Project Phase 6, CMIP6), which has two shared socioeconomic pathways (SSP126 and SSP585) 13,15 .These data were obtained from the WorldClim database (http:// www.world clim.org/) at a data spatial resolution of 2.5 m (1 × 1 km).
The elevation, slope, and aspect were also obtained from the WorldClim database (http:// www.world clim.org/) at the same spatial resolution.Thirty-six soil variables were obtained from the Harmonized World Soil Database (http:// www.fao.org/ soil s-portal/soil-survey/soil-maps-and-database s/harmonized-world-soil-database -v12/ en/) 16,17 .Therefore, 94 environmental variables (Suppl.Material Table S2) were used in the model analysis for the current period.Terrain, soil, and climate data were not available for future scenarios.ArcGIS was used to extract and transform the data into the format required for the MaxEnt model operation 18 .
Environmental variables with percentage contribution > 0 were filtered out.When the Pearson correlation between two variables was > 0.80, the lower contribution rate was eliminated, and 10 environmental factors were selected to be used in the model analysis.www.nature.com/scientificreports/

Ecological regionalization model establishment
Prediction model and parameter optimization process Almost 75% P. notoginseng distribution points were randomly selected as the training set to establish the prediction model, and the remaining 25% of the distribution points were selected as the test set to verify the model 19 .
The jackknife method was selected to measure the importance of each variable, and the response curves of the environmental variables were created and projected onto the future environmental situation.This specific parameter setting was referred to as the optimized value, and the operation was repeated 10 times 20,21 .
In the MaxEnt model default values, the regularization multiplier (RM) was 1. Maxent has five features: linear (L), quadratic (Q), product (P), threshold (T), and hinge fragmentation (H) 22 .Model parameter optimization can avoid overfitting to a certain extent 23 .Delta Akaike information criterion coefficient (DELTA AICc) and omission rate (OR) were used to evaluate the degree of model optimization, using RGui scripts to simulate the different combinations of conditions.DELTA AICc, a standard measure of the goodness-of-fit of a statistical model, is calculated using R software.It can weigh the model complexity and fit the goodness-of-fit of the data, prioritizing the model with the lowest DELTA AICc.OR refers to the percentage of abnormal samples in the test sample that are not correctly classified to the total number of abnormal samples in the test sample; a small OR indicates a high prediction accuracy.The OR provides information on model differences and overfitting and evaluates the data used at a specific threshold.The model parameters were optimal when the OR was < 5% and the DELTA AICc was minimal 24,25 .

Classification of suitable habitual levels and changes in core distribution
The reclassification tool ArcGIS was used to classify the ecological suitability of P. notoginseng.Suitable habitats were divided into four levels: unsuitable (0-0.1),poorly suitable (0.1-0.25), moderately suitable (0.25-0.5), and highly suitable (0.5-1) 13 .Four different colors were used to distinguish the habitats of each level.
The SDM Toolbox v2.4 in ArcGIS converted the MaxEnt operation results into binary, converted suitable habitats at different periods into particles, and then analyzed their movement trajectory and distance 26 .
P. notoginseng saponins (PNS) were determined as reported by Qin et al. 31 and Liu 29 .A UV 2800 ultraviolet-visible spectrophotometer (Sunny Hengping Instrument, Shanghai, China) was used to scan 190-700 nm, and a maximum absorption wavelength of 546 nm was selected as the determination wavelength.The linear relationship has previously been investigated 32,33 .
Correlation and stepwise regression analyses were carried out on the data from notoginseng saponin R1; ginsenosides Rg1, Re, Rb1, and Rd; and the sum of five saponins in terms of PNS content and environmental factors.Models for chemical constituents and environmental factors were established.The kriging spatial interpolation method was used to construct the model when no regression equation was established for the environmental variables 34 .The chemical composition distribution map obtained by constructing the model method and the ecological suitability habitat map after reclassification were superimposed to obtain a quality regionalization map 30 .

Analysis of UPFW of a single plant
The UPFW of P. notoginseng samples from each sampling site was measured using an electronic scale with 0.01 g accuracy and average values were taken from each sampling site for analysis.The UPFW data and environmental factors were analyzed using stepwise regression, a model for UPFW, and environmental factors was established.The kriging spatial interpolation method was used to construct the model when no regression equation was established for the environmental variables.The UPFW distribution map obtained by constructing the model method and the ecological suitability habitat map after reclassification were superimposed to obtain a quality regionalization map 35 .

Credibility evaluation
The predictive accuracy of the MaxEnt model was measured using the area under curve (AUC) 36 .The AUC value was generally 0-1, in which 0-0.5 indicate model prediction failure.AUC 0.6-0.7,0.7-0.8,0.8-0.9, and > 0.9 indicate poor, general, good, and excellent prediction effect, respectively 37 .In the current period, 1,160 candidate models were generated after parameter optimization, with 193 and 3 models meeting the OR criterion and AICc, respectively.However, only one statistically significant model met both the OR and AICc.The parameters of this model were set as follows: FC = LPT and RM = 2.4.The DELTA AICc of this model was 0 and AUC value www.nature.com/scientificreports/0.965 ± 0.005 (Suppl.Material Fig. S2).For the future period, a total of 1160 candidate models were generated after parameter optimization, with 400 and 1 models meeting the OR criterion and AICc, respectively.However, only one statistically significant model met the OR criterion and AICc.The parameters for this model were set at FC = LP and RM = 0.1.The DELTA AICc of this model was 0 and the AUC value was 0.9640 ± 0.008 (Suppl.Material Fig S3).This result indicates high model fitting effect and reliable experimental results.

Environmental variable analysis
The environmental variables were filtered as described in Section "Handling of environmental variables", and 14 environmental variables were obtained for modeling.A heat map of the correlations between these parameters was generated (Fig. 1B and Suppl.Material Table S3).The permutation importance and percentage contribution of the replacement of the 14 environmental variables in the model were compiled (Suppl.Material Table S4).In summary, BIO_7, SRAD_6, SRAD_7, SRAD_10, and PREC_8 were the key factors limiting P. notoginseng growth and development and their permutation importance and percentage contributions were 90.1 and 89.6, respectively.
According to the jackknife test results, the environmental variables of temperature, precipitation, and solar radiation had greater than 1.1 regularization training and test gain values, with AUC > 0.9 (Fig. 2).These results suggest that environmental variables affect the geographical distribution of medicinal plants.

Potential suitable habitats and optimal environmental conditions of P. notoginseng in the current period
In the current period, the total suitable habitat area for P. notoginseng in China is 924,676.79km 2 (Fig. 3A).Among these, the area of highly suitable habitats was 332,953.01 km 2 , distributed in the southeastern Yunnan Province, southern Guizhou Province and eastern Sichuan Province.The area of moderately suitable habitats was 400,844.21km 2 , distributed in central and western Yunnan Province, most of Guizhou Province, northwestern Guangxi Province, southeastern Sichuan Province, and Chongqing City.The area of poorly suitable habitats was 190,879.57km 2 , distributed in the area surrounding moderately suitable habitats.Moderately and poorly suitable habitats surrounded and radiated around highly suitable habitats (Fig. 3B).This is consistent with the results reported by Zhan et al. 13 .
According to the response curve results of environmental variables, P. notoginseng growth was the most advantageous with 10-15 ℃ annual temperature range; 500-600 mm average precipitation in August; and 1500-1600, 18,500-19,500, and 8000-9000 kJ•m −2 day −1 , average solar radiation in June, July, and October, respectively.

Changes of P. notoginseng suitable habitats in the future
Examination of suitable P. notoginseng habitat distribution in different future periods and climate scenario models revealed that the trend of the total area of suitable habitats was to first expand, then reduce, and then expand in the SSP126 model (Fig. 4).Among these, the largest suitable habitats appeared in 2070s (1,648,892.29 km 2 ), which might be because the increase in area of moderately and highly suitable habitats of P. notoginseng in this period.The trend of the total area of suitable habitats first expanded and then reduced in the SSP585 model.The largest suitable habitats appeared in 2090s in the SSP126 model (1,732,954.80 km 2 ), which might be owing to relatively balanced proportion of poor, moderate, and highly suitable habitats of P. notoginseng in this period.The changes in area for each level of suitable habitat is presented in Fig. 3A.Changes in the distribution, expansion, shrinkage, and stability of ecological regionalization in future periods and climate scenarios were compared with those in the current period (Table 1).

Prediction of changes in P. notoginseng saponins
Saponin content in P. notoginseng Analyses of 300 samples from 30 production areas are presented in Suppl.Material Fig. S4.The concentration of notoginseng saponin R1 content in 179 samples was 0.5-1.5%.The concentration of ginsenoside Rg1 content in 127 samples was 2.5-4.0%.The ginsenoside Re content in 142 samples was < 0.5%.The concentration of ginsenoside Rd content in 140 samples was 0.5-1.0%.The sum of five saponin content in samples was 7.5-11.5%.The concentration of PNS content in 122 samples was 6.0-10.5%.

Construction of a regression model
The content of notoginseng saponin R1; ginsenosides Rg1, Re, Rb1, and Rd; the sum of five saponins; and PNS from 30 regions were measured, and stepwise regression analysis was performed with environmental variables to obtain the corresponding relationship model.
The relationship model between notoginseng saponin R1 and environmental variables was: (1) Y 1 = 1.709− 0.073 X 1 + 0.043X 2 ,  www.nature.com/scientificreports/where Y 1 is the notoginseng saponin R1 content; X 1 is the average precipitation in October, and X 2 is the average precipitation in June.In the regression equation F-test, P = 0.012 (< 0.05), F = 1.026, and R 2 = 0.280, indicating the significant effect of the equation.

The relationship model between ginsenoside Rg1 and environmental variables was:
where Y 2 is the Rg1 content; X 1 is the average precipitation in October; X 2 is the average precipitation in June; X 3 is the mean diurnal range; X 4 is the temperature seasonality; X 5 is the annual temperature range; X 6 is the average precipitation in August; X 7 , X 8 , and X 9 are the average solar radiation in October, June, and July, respectively; X 10 is the average temperature in April; and X 11 is the slope.In the regression equation F-test, P = 0.033 (< 0.05), F = 2.631, and R 2 = 0.617, indicating the significant effect of the equation.The relationship model between ginsenoside Rb1 and environmental variables was: where Y 3 is the Rb1 content; X 1 is the average precipitation in October; X 2 is the average precipitation in June; X 3 is the mean diurnal range; X 4 is the temperature seasonality; X 5 is the annual temperature range; X 6 is the average precipitation in August; X 7 , X 8 , and X 9 are the average solar radiation in October, June, and July, respectively; X 10 is the average temperature in April; and X 11 is the slope.In the regression equation F-test, P = 0.027 (< 0.05), F = 2.770, and R 2 = 0.629, indicating the significant effect of the equation.

The relationship model between ginsenoside Rd and environmental variables was
where Y 4 is the Rd content, X 4 is the temperature seasonality, X 10 is the average temperature in April, and X 11 is the slope.In the regression equation F-test, P = 0.000 (< 0.05), F = 15.396, and R 2 = 0.640, indicating the significant effect of the equation.The relationship model between the sum of five saponins and environmental variables was: where Y 5 is the sum of the five saponins, X 1 is the average precipitation in October, X 4 is the temperature seasonality, and X 6 is the average precipitation in August.In the regression equation F-test, P = 0.034 (< 0.05), F = 3.359, and R 2 = 0.279, indicating that the effect of the significant equation.The stepwise regression equation between ginsenoside Re, PNS content, and environmental variables was not significant.Hence, no regression equation was established.

Distribution analysis of P. notoginseng saponins
Based on the regression equation established between saponin components and environmental variables as well as the kriging spatial interpolation method, ArcGIS was used for spatial calculations to obtain the spatial distribution of notoginseng saponin R and ginsenosides Rg1, Re, Rb1, and Rd; the sum of five saponins; and PNS content (Fig. 5).

Spatial distribution analysis of UPFW of P. notoginseng
Using the measured UPFW from 30 regions and environmental variables for stepwise regression analysis, a corresponding relationship model was obtained.The relationship between fresh weight and environmental variables was as follows: where Y 6 is the fresh weight and X 1 is the average precipitation in October.In the regression equation F-test, P = 0.045 (< 0.05), F = 4.399, and R 2 = 0.136, indicating the effect of the significant equation.
Based on the regression equation established between the UPFW and environmental variables, ArcGIS was used for spatial calculations and then overlaid with the ecological suitability habitat map to obtain a distribution map of the UPFW of P. notoginseng (Fig. 5).
Comparing the UPFW and PNS distribution maps demonstrated an inversely significant relationship between them.In Yunnan Province, the UPFW were lower in areas with low elevations as well as areas with high saponin content.Among them, Maguan, Wenshan, and Pingbian Prefectures in Honghe Prefecture had higher UPFW and saponin content, which could potentially increase P. notoginseng cultivation.

Model precision analysis
Recently, machine learning-based modeling methods have been widely used to predict the ecological distribution of many species 38 .MaxEnt is one of the most widely used and recognized models 39 .It not only analyzes climate, soil, terrain and other environmental factors, but also considers the impact of human activities on the modeling process.When predicting the species distribution on a small scale, human factors such as vegetation cover and the urbanization index can be added to the model to help make the prediction more consistent with the actual situation.In this study, the control unit grating had only one species distribution point, overfitting environmental variables were removed, and the R software was used for parameter optimization.The AUC of the optimized model in both the training and test sets were > 0.9, which improved model accuracy and reliability.

Migration of suitable habitats for P. notoginseng under climate change conditions
BCC-CSM2-MR has four modes: SSP126, SSP242, SSP370, and SSP585.Among them, there was a significant difference between SSP126 and SSP585, which was representative.Therefore, these two factors were chosen as the focus of this research 40 .The distribution of suitable habitats for P. notoginseng was most closely related to annual temperature range, solar radiation, and precipitation.P. notoginseng grows in dark and humid environments, which was confirmed from the environmental factors obtained by the model.Under the general trend of global warming, the minimum temperature will increase, solar radiation will enhance, and the future suitable habitats of P. notoginseng will thereby be greatly affected and experience change 41 .
The predicted potentially suitable habitat distribution of P. notoginseng in this study was different from that reported by Zhan et al. 13 .The number of species distribution data inputs in MaxEnt and the selection of environmental variables had an impact on the modeling effect and result output 42 .The ecological regionalization area in this study was larger than that of the previous study; however, the suitable habitat conditions of the covered areas was consistent.This was because the distribution of P. notoginseng in Chongqing City, and the overall collection range of the sampling points might have been more extensive in this study.The distribution of P. notoginseng in China is characterized by a subtropical monsoon climate.The rain and heat brought about by this climate are conducive to the growth and accumulation of effective components of P. notoginseng 43 .The southeastern part of Yunnan Province and the southern part of Guizhou Province belong to the plateau, that is, the Yungui Plateau.Ocean currents from the Pacific Ocean can directly affect this area, resulting in hot and humid climates, that are the best growth areas for P. notoginseng.However, western Yunnan and northern Guizhou are blocked by plateaus and mountains; the influence of ocean currents is greatly reduced, thereby lowering the fitness index of P. notoginseng.Although the genuine areas of P. notoginseng were Yunnan and Guangxi Provinces, a change in the distribution core occurred in Guizhou Province.This is because the position of the centroid was obtained by a binary operation on suitable habitats rather than on highly suitable habitats 44 .Therefore, the centroid displacement reflected changes in the overall suitable habitats of P. notoginseng during different periods, which was also more consistent with reality.
According to ancient Chinese books, Panax notoginseng was originally cultivated in Guangxi Province and later moved to Yunnan Province.During the Qing Dynasty, Notoginseng Radix et Rhizoma from Guangxi and Sichuan Provinces were presented to the emperor; however, researchers did not find any records of P. notoginseng cultivation in Sichuan Province.Researchers will continue to expand the collection of historical distribution and climate records of P. notoginseng in China, which will be of great importance for studying P. notoginseng distribution.www.nature.com/scientificreports/

Correlation analysis of saponin content and UPFW with environmental factors
In the quality regionalization analysis of P. notoginseng saponins and UPFW, the prediction scope was narrowed to Yunnan Province as the 37 sampling points were concentrated in Yunnan Province.After correlating the quality and environmental variable data of Yunnan Province, the quality regionalization prediction was more representative and accurate.TB Tools v1.115 is used to draw correlation heat maps.The correlation analysis of environmental factors and quality indices revealed that notoginseng saponin R1 was significantly positively correlated with TAVG_4 (Fig. 6).Ginsenoside Rd positively correlated with temperature seasonality and negatively correlated with SRAD_10 and TAVG_4.UPFW was significantly and positively correlated with PREC_10 and PREC_8.Since the sampling points were concentrated in eastern Yunnan Province, quality regionalization could only predict changing trends.Owing to sampling point distribution and numbers, P. notoginseng quality regionalization of could be further improved.Although we attempted to use multiple models for joint predictions, these shortcomings could not be ignored.In addition, saponins are the main active component of P. notoginseng and play an important role in the treatment of trauma and cardiovascular and cerebrovascular diseases.In this study, the content of five saponins changed with the environment, which laid a foundation for the study of P. notoginseng quality and human health under the climate change.
In this study, we predicted the potential distribution area and future changes of P. notoginseng to a certain extent, which has guiding significance for expanding its cultivation in China.The ecological regionalization was innovatively combined with chemical composition and UPFW analysis, to intuitively present the change trend, which has a certain reference value in clinical medicine and the development of precise Chinese medicine.

Conclusions
In this study, the MaxEnt model with optimized parameters was used to evaluate and predict the suitable habitat distribution for P. notoginseng.Based on the analysis results of the current period, the optimal P. notoginseng growth conditions were 10-15 ℃ annual temperature range; 500-600 mm average precipitation in August; and 1500-1600, 18,500-19,500, and 8000-9000 kJ•m −2 •day −1 average solar radiation in June, July, and October, respectively.Highly suitable habitats were mainly distributed in the central and eastern Yunnan Province, southwest Guizhou Province, and Chongqing City.Other suitable habitats radiated around them and were distributed throughout southwestern China.In the future, owing to climate change conditions, P. notoginseng distribution core could shift to the southeast.Based on the analyses of saponin composition and single plant UPFW, saponin content decreased from southeastern to northwestern Yunnan Province and was the highest in Honghe and Wenshan Prefectures, Yunnan Province.The UPFW was inversely related to saponin content.The results of this study provide a reference for the planting status and prediction of suitable habitats for P. notoginseng.They lay a solid foundation and practical guidance for promoting P. notoginseng yield and quality.

Figure 1 .
Figure 1.Pictures of (A) Panax notoginseng distribution in China.(B) Correlation heat map of environment variables.

Figure 3 .
Figure 3. Changes in the distribution of Pananx notoginseng.(A) Area of P. notoginseng suitable habitats under different periods and climate scenarios.(B) Ecological regionalization distribution of P. notoginseng in current period.(C) Centroid displacement of P. notoginseng ecological regionalization.

Figure 6 .
Figure 6.Correlation heat map of environment variables and Index of Notoginseng Radix et Rhizoma.

Table 1 .
Stable, shrunken, and increased areas (km 2 ) of suitable habitat for Panax notoginseng in the future.