Untargeted Metabolomics Analysis Revealed the Difference of Component and Geographical Indication Markers of Panax notoginseng in Different Production Areas

Panax notoginseng (P. notoginseng) has excellent medicinal and food dual-use characteristics. However, P. notoginseng with a unique origin label has become the target of fraud because of people confusing or hiding its origin. In this study, an untargeted nuclear magnetic resonance (NMR)-based metabolomics approach was used to discriminate the geographical origins of P. notoginseng from four major producing areas in China. Fifty-two components, including various saccharides, amino acids, saponins, organic acids, and alcohols, were identified and quantified through the NMR spectrum, and the area-specific geographical identification components were further screened. P. notoginseng from Yunnan had strong hypoglycemic and cardiovascular protective effects due to its high acetic acid, dopamine, and serine content, while P. notoginseng from Sichuan was more beneficial for diseases of the nervous system because of its high content of fumarate. P. notoginseng from Guizhou and Tibet had high contents of malic acid, notoginsenoside R1, and amino acids. Our results can help to distinguish the geographical origin of P. notoginseng and are readily available for nutritional recommendations in human consumption.


Introduction
Panax notoginseng (P. notoginseng) is a regional medicinal herb with powerful abilities to promote blood circulation and relieve pain [1]. Its distribution in China is mainly concentrated in the southwest, including Yunnan, Guangxi, and Sichuan provinces. P. notoginseng contains a wide range of effective nutritional components, such as P. notoginseng saponins, flavonoids, amino acids, volatile oils, plant box alcohols, sugars, inorganic salts, inorganic ions, and other active components, and thus exerts extensive pharmacological effects and clinical uses [2]. Saponins, the main active components of P. notoginseng, can stimulate the brain center, promote blood circulation, and enhance brain memory [3,4]. Flavonoids have been shown to reduce vascular fragility, improve the vascular permeability, lower blood lipids and cholesterol, and prevent and treat senile hypertension and cerebral hemorrhage [5]. P. notoginseng must be grown in warm, shady, and humid conditions to ensure high-quality production. Wenshan county in Yunnan Province of China is an important P. notoginseng producing area, where it enjoys a good reputation among consumers for its high quality. This reputation is closely related to the unique environmental conditions, such as the abundant rainfall and consistent annual temperature on P. notoginseng was provided by Technology Center of Xiamen Customs and collected in the local market from July to September 2020 in Southwest China, including Yanshan County, Wenshan City, Yunnan Province; Nanyang District, Guiyang City, Guizhou Province; Qingchuan County, Guangyuan City, Sichuan Province; and Naidong County, Shannan District, Tibet Autonomous Region (12 P. notoginseng samples per producing area, a total of 48 samples). All the P. notoginseng samples were ground into powder with a hammer mill (GH-20B, Jiangyin Kejia Machinery Manufacturing Co., Ltd., Jiangyin, China), dried in a ventilated oven at 70 • C for 24 h, and then sifted with a 60-mesh screen and stored in sealed plastic bags in the dark at 20 • C.
The powder samples of P. notoginseng were extracted by Bligh-Dyer method [21,22] to obtain the freeze-dried powder. In brief, P. notoginseng powder samples (300 mg) were homogenized at 4 • C for 30 s in 1.2 mL methanol and 0.6 mL deionized water. The homogenates were transferred to a 10 mL glass tube, and 1.2 mL of chloroform and 1.2 mL of deionized water were added into each tube, then vortexed for 60 s. After resting on ice for 15 min, the samples were centrifuged at 4 • C and 10,000× g for 10 min. All supernatants were transferred to 5.0 mL tubes, then nitrogen blown after freezing, and finally lyophilized for 24 h to remove the methanol, chloroform, and water. The extracts were stored at −80 • C for NMR experiments.

1 H-NMR Spectra Acquisition and Processing
The NMR spectra of all P. notoginseng samples were performed on an 850 MHz Bruker Avance III NMR spectrometer (Bruker Corporation, Karlsruhe, Germany) equipped with a CPTCI probe operating at 850.13 MHz. The experiments were carried out by a pulse sequence of ZGPR at 298 K. The other parameters were set as: spectral width of 20 ppm, data points 16 K, 64 scans, 8 prior dummy scans, relaxation delay and acquisition time were 4 s and 1.58 s, respectively. In addition, three samples from each producing area were selected to collect a range of standard 2D NMR spectra to facilitate the subsequent accurate assignment of components in P. notoginseng, including COSY, TOCSY, HSQC, and HMBC.
The original data of P. notoginseng were preprocessed with MestReNova software (V9.0.1, Mestrelab Research, Santiago de Compostela, Galicia, Spain). Before Fourier transformation, all free induction decays were zero-filled to 64 K and processed by applying an exponential function with a line-broadening factor of 1.0 Hz, followed by phase and baseline correction, and the entire spectrum was corrected with TSP at δ0.0. The spectral interval of δ0.6-9.6 was sectionally integrated with an interval of 0.002 ppm. The baseline of all the spectrum peak ranges and the residual water resonance (δ4.66-4.92) were removed to eliminate baseline effects of water signals. The spectral interval of δ0.6-9.6 was sectionally integrated with an integral width of 0.002 ppm, then normalized to a total area of 100.

Statistical Analysis
The raw integral data were imported into SIMCA 14.1 (Umetrics, Umea, Sweden) software for multivariate statistical analysis, including unsupervised principal component analysis (PCA) for explaining natural distribution trends, partial least squares discriminant analysis (PLS-DA), and supervised orthogonal partial least squares discriminant analysis (OPLS-DA) for the identification of geographical indication components in P. notoginseng corresponding to the different producing areas. Parameters R 2 and Q 2 were used to evaluate the fitting performance and predictive ability of the model, respectively [23]. A 900-times permutation test was then conducted to validate whether the models were overfitted. Moreover, cross-validation analysis of the variance (CV-ANOVA) was performed to access significance of the models. p < 0.05 was considered to be statistically significant in all experiments.
Finally, a four-dimensional volcano plot was generated with MATLAB scripts (downloaded from http://www.mathworks.com (accessed on 27 June 2020)) with some in-house modifications to screen biomarkers using the geographical indication components of P. notoginseng from different geographical origins with a combination of univariate and multivariate statistical analyses. The univariate statistical analyses were performed by using the fold-change value and Student's t-test, where the fold-change was equal to the concentration ratio of each component in P. notoginseng between one special producing area and the three other areas, and the t-test was converted to p-values to assess and confirm the significant change in each component. In this study, a four-dimensional volcano plot, which is a scatter plot of −log10 (p-value) against log2 (fold-change), was applied to identify the distinguished metabolites between pairwise groups. The absolute correlation coefficient values r and the VIP values from the multivariate analysis served as two important indicators represented by circle color and circle size in volcano plot, respectively (warmer color symbolizes a higher |r|, and a larger circle size symbolizes a higher VIP value). The components that exhibited significant changes were screened out by combing the restrictions of three criteria: p < 0.05, |r| > 0.5, and VIP values in the top 10%, which was segmented by the horizontal threshold line p = 0.05 and tended to be located in upper zones of the plots with the larger circle sizes and warmer colors. On this basis, the components with a 1.2-fold increase in content and a correlation coefficient |r| greater than 0.65 were defined as geographical indication markers.
In addition, the absolute concentration of each component in P. notoginseng was accurately quantified by comparing integral of the characteristic peak with that of the internal standard (TSP) [14].
where C x is the molar concentration of any component x in P. notoginseng, in mol/L; A x is the integral area of the characteristic peak of component x in the 1 H-NMR spectrum; N x is the number of hydrogens contributing to the NMR signals; N TSP is the number of hydrogens corresponding to the singlet at δ0.00 (here, N TSP = 9); A TSP is the integral area of TSP at δ0.00; and C TSP is the molar concentration of TSP in P. notoginseng, in mol/L.

1 H-NMR Analysis of P. notoginseng from Different Areas
The high resolution one-dimensional 1 H-NMR spectrum provided a clear metabolic overview of P. notoginseng from different producing areas. Therefore, the differences in the components of P. notoginseng from different producing areas were visualized by the spectral peaks of different chemical components in the normalized 1 H-NMR spectrum. The stacked NMR spectra of P. notoginseng from four different areas in Yunnan (YN), Guizhou (GZ), Sichuan (SC), and Tibet (TB) are compared in Figure 1. The spectra were assigned according to chemical shift, peak multiplicity, and related literature data [24], and further confirmed by 2D NMR (COSY, TOCSY, HSQC, and HMBC NMR spectra), the public Biologic Magnetic Resonance Database (http://bmrb.protein.osaka-u.ac.jp/deposit (accessed on 10 June 2020), BMRB), and the Traditional Chinese Medicine Systematic Pharmacology Database (http://tcmspw.com/tcmsp.php (accessed on 10 June 2020), TCMSP) [25][26][27]. Fifty-two components were assigned from the 1 H-NMR spectra, and three signals were not identified. Table 1 lists the spectral information in detail, including chemical shift and peak multiplicity. By comparing the integral of the characteristic peak and internal standard (TSP), the concentrations of each component in P. notoginseng from different producing areas were quantitatively obtained. Low signal to noise (S/N) ratios and obvious peak overlap could hinder the correct quantification of the components in P. notoginseng. Therefore, in order to achieve accurate quantification, only the resonances that were well resolved or at least dominant in local regions of each component were chosen as characteristic resonances (underlined peaks in Table 1) for accurate quantification. In addition, an S/N = 10 was set as the quantitative critical threshold. Finally, 52 components across the four geographical origins of P. notoginseng were quantified, and their signal information was selected for quantification. The quantitative results are tabulated in Table 1.
The P. notoginseng from the different producing areas displayed similar metabolic profiles, implying that the nutritional compositions were similar. The main components were carbohydrates and saponins, but it was difficult to distinguish P. notoginseng from the four main producing areas only by comparing the NMR peaks of the carbohydrate compounds and saponins due to their approximate ratios. In fact, further analysis showed that the signals of low-content components in the P. notoginseng from different producing areas varied considerably, and thus made greater contributions to geographical indication. For example, the contents of fumarate (δ6.53) in P. notoginseng from SC and uridine (δ5.89) from TB were remarkably higher than from the other three producing areas, while the pyridoxine content (vitamin B6, δ7.67) of GZ P. notoginseng was clearly lower than in the other producing areas. It is difficult to distinguish the origin of P. notoginseng solely by relying on a visual comparison of the content of the components in the spectrum. Thus, pattern recognition methods were used to enable us to identify the geographical indicators of P. notoginseng in the four producing areas.

Geographical Origin Discrimination of P. notoginseng
PCA can identify possible outliers in samples, and it helped visualize the differences and similarities of P. notoginseng from the different producing areas in this study. A PCA score plot (Figure 2, left panel) showed the clustering of samples from the same geographical origin, although overlaps were present between different producing areas. For example, the P. notoginseng sample points from YN, SC, and TB were adjacent and overlapped with each other but were distinguished from the samples of GZ. The corresponding PLS-DA score plot (Figure 2, right panel) showed the separations of the P. notoginseng samples from the different producing areas with the good fitting performance (R 2 Y = 0.862) and favorable predictive ability (Q 2 = 0.676). In general, the samples of P. notoginseng from the four producing areas were roughly distributed in four areas of the PLS model. However, a slight overlap existed between the samples from SC, YN, and TB, suggesting their similar but distinguishing nutrient compositions. YN, SC, and TB are geographically adjacent to each other, resulting in similar compositional characteristics of P. notoginseng, while the differences in climate, altitude, and rainfall led to distinct differences in P. notoginseng nutrients. In addition, compared with the other producing areas, the samples of P. notoginseng in the TB and SC groups were dispersed, indicating that the composition of P. notoginseng in TB and SC was greatly affected by external factors.
(δ5.89) from TB were remarkably higher than from the other three producing areas, while the pyridoxine content (vitamin B6, δ7.67) of GZ P. notoginseng was clearly lower than in the other producing areas. It is difficult to distinguish the origin of P. notoginseng solely by relying on a visual comparison of the content of the components in the spectrum. Thus, pattern recognition methods were used to enable us to identify the geographical indicators of P. notoginseng in the four producing areas.

Geographical Origin Discrimination of P. notoginseng
PCA can identify possible outliers in samples, and it helped visualize the differences and similarities of P. notoginseng from the different producing areas in this study. A PCA score plot (Figure 2, left panel) showed the clustering of samples from the same geographical origin, although overlaps were present between different producing areas. For example, the P. notoginseng sample points from YN, SC, and TB were adjacent and overlapped with each other but were distinguished from the samples of GZ. The corresponding PLS-DA score plot (Figure 2, right panel) showed the separations of the P. notoginseng samples from the different producing areas with the good fitting performance (R 2 Y = 0.862) and favorable predictive ability (Q 2 = 0.676). In general, the samples of P. notoginseng from the four producing areas were roughly distributed in four areas of the PLS model. However, a slight overlap existed between the samples from SC, YN, and TB, suggesting their similar but distinguishing nutrient compositions. YN, SC, and TB are geographically adjacent to each other, resulting in similar compositional characteristics of P. notoginseng, while the differences in climate, altitude, and rainfall led to distinct differences in P. notoginseng nutrients. In addition, compared with the other producing areas, the samples of P. notoginseng in the TB and SC groups were dispersed, indicating that the composition of P. notoginseng in TB and SC was greatly affected by external factors.

Identification of Geographical Indication Components of P. notoginseng
To further explore the compositional differences of P. notoginseng and find significant geographical markers of P. notoginseng from the different producing areas, four pairwise comparison OPLS-DA models were constructed in the same variety (12 P. notoginseng samples per producing area, a total of 48 samples) between each special geographical origin and the other three origins. The OPLS-DA score plot with a confidence level of 95% (left panels), 7-fold cross validation and permutation tests (permutation number n = 900) (middle panels), and the corresponding volcano plots (right panels) derived from the NMR data of P. notoginseng are shown in Figure 3, and the model parameters, including R 2 X, R 2 Y, and Q 2 , are also provided. All models use the first prediction and four orthogonal (1 + 4) components and were also further validated by CV-ANOVA (Figure 3).
The OPLS-DA model for YN compared with the other three producing areas is shown in Figure 3a (left), with two sets of sample points clearly separated on both sides of the longitudinal axis. The high predictive ability (Q 2 = 0.979), good fitting performance (R 2 Y = 0.904), and very low p-value from CV-ANOVA (7.49 × 10 −16 ) further revealed the remarkable difference in nutrient composition of P. notoginseng between YN and the other three regions. The geographical indications corresponding to the specific producing area were identified according to the screening criteria identified in the Materials and Methods section. The volcano plot in Figure 3a (right) displays the ten potential geographical indication components of P. notoginseng in YN, and the detailed statistical parameters of these components are summarized in Table 2. Ten components in YN P. notoginseng, including acetic acid, dopamine, ginsenoside Rc, pentasonoside U, glucosamine 6-phosphate, fumarate, proline, corollactone, serine, and threonine, were significantly different from those in the other three producing areas. Among them, acetic acid (|r| = 0.66), dopamine (|r| = 0.70), serine (|r| = 0.65), and threonine (|r| = 0.65) were potential geographical markers of YN P. notoginseng. Furthermore, the contents of acetic acid (FC = 1.20), dopamine (FC = 1.66), glucosamine 6-phosphate (FC = 1.30), P. notoginseng U (FC = 1.51), and serine (FC = 1.31) were significantly higher in P. notoginseng from YN than from the other producing areas. However, the relative content of fumarate in YN P. notoginseng (FC = 0.69) was much lower than in P. notoginseng from other areas. Studies have demonstrated that acetic acid has a certain antagonistic influence on type II diabetes caused by obesity [28]; serine plays a dominant role in ensuring the normal development of the central nervous system [29]; and dopamine is considered to be an essential regulator of central and peripheral biological functions in humans and animals [30] and plays an important role in cardiovascular regulation [31].

Identification of Geographical Indication Components of P. notoginseng
To further explore the compositional differences of P. notoginseng and find significant geographical markers of P. notoginseng from the different producing areas, four pairwise comparison OPLS-DA models were constructed in the same variety (12 P. notoginseng samples per producing area, a total of 48 samples) between each special geographical origin and the other three origins. The OPLS-DA score plot with a confidence level of 95% (left panels), 7-fold cross validation and permutation tests (permutation number n = 900) (middle panels), and the corresponding volcano plots (right panels) derived from the NMR data of P. notoginseng are shown in Figure 3, and the model parameters, including R 2 X, R 2 Y, and Q 2 , are also provided. All models use the first prediction and four orthogonal (1 + 4) components and were also further validated by CV-ANOVA ( Figure 3).  Table 2. Summary of statistical parameters of the differential components in P. notoginseng from different producing areas.
The OPLS-DA model (Figure 3b (left)) of GZ vs. the other three geographical origins produced a more distinct separation according to the model parameters (R 2 Y = 0.968 and Q 2 = 0.820); the permutation test and the quite low p-value (1.07 × 10 −9 ) from CV-ANOVA also further indicate their differences in composition. The corresponding volcano plot (Figure 3b (right)) and the parameters of various components (Table 2) indicated that threonine, alanine, malic acid, acetic acid, inositol, α-glucose, ribolactone, raffinose, notoginsenoside R1, ginsenoside Rb2, and glucosamine 6-phosphate in GZ P. notoginseng were significantly different from P. notoginseng of the other three producing areas. Except for malic acid and P. notoginseng saponins, all the other potential components had positive correlation coefficients (r > 0). In addition, except for alanine (|r| = 0.63) and ginsenoside Rb2 (|r| = 0.64), the correlation coefficients of the other eight potential components were greater than 0.65 (|r| ≥ 0.65). The contents of malic acid and notoginsenoside R1 in GZ P. notoginseng were higher than those of P. notoginseng from the other regions (FC > 1), while the contents of the other eight components were lower. One study has shown that notoginsenoside R1 has multiple effects, such as hemostatic coagulation, hypolipidemic, antithrombotic, immune function, and cardiovascular protection [32]. In fact, malic acid has been favored by the cosmetics industry, mainly to adjust the pH of cosmetics, skin conditioning, and moisturizers, and has been used in nearly 50 cosmetic formulations [33]. According to the selected criteria mentioned above, our study showed that the higher content components of malic acid (|r| = 0.80, FC = 1.48) and P. notoginseng saponin R1 (|r| = 0.66, FC = 1.53) could be selected as GZ P. notoginseng geographical indication markers. Figure 3c reveals the comparison between SC P. notoginseng and those in other special geographical origins. The OPLS-DA model (Figure 3c (left)) displayed the clear separation between the two sets of samples, and the model produced high statistical values of R 2 Y (0.967) and Q 2 (0.784), and the quite low p-value from CV-ANOVA (2.11 ×10 −8 ) proved the obvious difference in the nutritional ingredients of P. notoginseng from SC compared to the other producing areas. The corresponding volcano plot (Figure 3c (right)) and parameters of various components (Table 2) revealed that there were eleven significant differential components between P. notoginseng from SC and the other regions, including ginsenosides Rb1, Rb2, Rc, Re, Rg1, ginsenoside Re, proline, threonine, fumarate, sucrose, and ethanolamine. It can be seen from Table 2 that only fumarate and sucrose had negative correlation coefficients, and ginsenoside Rg1 (|r| = 0.56), ginsenoside Re (|r| = 0.55), and sucrose (|r| = 0.54) had weak correlations (|r| < 0.65). Furthermore, the contents of fumarate and sucrose in P. notoginseng from SC province were higher than in P. notoginseng from the other producing areas (FC > 1), and were located in the positive region of the x-axis, as shown in Figure 3c (left). Researches have shown that fumarate plays a vital role in coping with diseases of the nervous system by modulating neurons [34]. Similarly, through this comprehensive analysis, fumarate (|r| = 0.80, FC = 1.38) can be used as a geographical marker of P. notoginseng in SC.
The OPLS-DA score plot displayed an obvious and substantial separation of P. notoginseng samples between TB and the other three producing areas (Figure 3d (left)) with high statistical values of R 2 Y (0.956) and Q 2 (0.836), which was further supported by the permutation test and the quite low p-value from CV-ANOVA (1.26 × 10 −11 ). According to the volcano plot (Figure 3d (right)), there were significant differences in seven components, including choline, alanine, proline, γ-aminobutyric acid, inositol, and ginsenoside Rb2 and Re between P. notoginseng from Tibet and the other regions, and these seven components had negative correlations (r < 0) ( Table 2), while alanine (|r| = 0.72), γ-aminobutyric acid (|r| = 0.82), and proline (|r| = 0.73) had positive correlations (|r| > 0.65). Further analysis of the relative content change (FC value) of these seven components showed a higher abundance in TB P. notoginseng than in the others (FC > 1). Among them, the relative content of γ-aminobutyric acid was the highest, and was 2.29 times that of the other P. notoginseng, and choline and alanine were 1.41 and 1.33 times that of the other P. notoginseng, respectively. These components with significant differences were all located in the positive region of the x-axis (right panel of Figure 3d). Studies have found that alanine has a potentially sup-portive effect for maintaining basic brain function in hypoglycemia, and γ-aminobutyric acid, one of the most famous neurotransmitter molecules [35], has a positive effect on inhibiting neural excitability. It has multiple physiological functions, such as promoting brain activity, relieving pain, and improving sleep [36,37]. Similarly, alanine (|r| = 0.72, FC = 1.33) and γ-aminobutyric acid (|r| = 0.82, FC = 2.29), with a high content and high correlation coefficients, could be considered as geographical indication markers of TB P. notoginseng.
The geographical indication components of P. notoginseng corresponding to specific geographical origins are generalized in Table 2. It can be observed that P. notoginseng from different producing origins displayed unique nutritional characteristics. For example, YN P. notoginseng has an excellent cardiovascular function due to its rich dopamine compounds, and SC P. notoginseng has a positive effect on neuron regulation due to its high content of fumarate. However, the potential impact of batch, season, and production year on the components of P. notoginseng was not considered in this study. In future work, these effects ought to be further investigated to validate and modify the conclusions.

Conclusions
In this study, an inexpensive and convenient untargeted metabolomics approach based on NMR technology was used to conduct origin traceability on P. notoginseng from four geographical origins in China. Fifty-two main components including amino acids, saponins, sugars, alcohols, and organic acids were analyzed and quantified from the 1 H-NMR spectrum of P. notoginseng. A combination of many multivariate statistical analyses, including PCA, PLS-DA, OPLS-DA, and 1 H-NMR, was successfully applied to the classification and visualization of P. notoginseng in Yunnan, Guizhou, Sichuan, and Tibet, and to further identify the geographical indication components of Chinese P. notoginseng corresponding to their specific geographical origins. The quantification of a wealth of various nutrients in P. notoginseng can be used for nutrition recommendations for human consumption. The geographical indication markers can help to quickly classify the geographical sources of P. notoginseng according to their nutritional characteristics. The results of this experiment indicated that NMR combined with pattern recognition could be used as an effective method to trace the geographical origin of P. notoginseng and provide a reference for the analysis and identification of other Chinese herbs.