Analysis of volatile compounds by GCMS reveals their rice cultivars

Due to the similarity in the grain and difference in the market value among many rice varieties, deliberate mislabeling and adulteration has become a serious problem. To check the authenticity, we aimed to discriminate rice varieties based on their volatile organic compounds (VOCs) composition by headspace solid phase microextraction (HS-SPME) coupled with gas chromatography mass spectrometry (GC–MS). The VOC profiles of Wuyoudao 4 from nine sites in Wuchang were compared to 11 rice cultivar from other regions. Multivariate analysis and unsupervised clustering showed an unambiguous distinction between Wuchang rice and non-Wuchang rice. Partial least squares discriminant analysis (PLS-DA) demonstrated a goodness of fit of 0.90 and a goodness of prediction of 0.85. The discriminating ability of volatile compounds is also supported by Random forest analysis. Our data revealed eight biomarkers including 2-acetyl-1-pyrroline (2-AP) that can be used for variation identification. Taken together, the current method can readily distinguish Wuchang rice from other varieties which it holds great potential in checking the authenticity of rice.

Rice (Oryza sativa L) is the staple food for approximately 3.5 billion people worldwide with an estimated production of 480 million tons annually 1 . Rice provides up to 50% of the dietarycaloric supply and a substantial part of the protein intake for about 520 million people living in poverty in Asia 2 . The market value of rice is primarily determined by the grain quality, which in turn depends on the genetic background, agronomic management, geographic factors and postharvest factors 3,4 . Although the grain quality and thus the price for rice of different geographic origins could differ vastly, their appearance may be very alike. Thus, deliberate mislabeling and adulteration represent a serious issue, which not only threatens the credit of the traders, but also infringes the right of consumers 5 . To prevent such actions, a high-throughput, sensitive and precise method is urgently needed to discriminate rice of different geographic origins. The key to determining the geographic origin was to identify specific biomarkers for a specific rice variety. Previous studies had used a wide range of properties such as stable isotope, mineral element content that are associated with geographic features to track the origin of rice 6,7 . Notably, multiple parameters can be used together as they may provide complementary information. For instance, the isotope composition of δ 13 C, δ 15 N, δ 2 H, δ 18 O had been employed in junction with the multielemental concentrations analysis to determine the origin of different rice varieties 8 . For a more comprehensive comparison of the components among various rice cultivars, spectroscopy approaches such as near-infrared reflectance spectroscopy (NIR), nuclear magnetic resonance spectroscopy (NMR), Raman spectroscopy (RS) had also been used [9][10][11] . These methods requires large amount of input for data collection and has no direct relationship with the quality of rice.
Volatile organic compounds (VOCs) are the main source of aroma in rice. Although difficult to describe by words, aroma differ greatly among varieties due to a different composition in VOCs. Indeed, a previous study has reported the geographical discrimination of rice based on VOCs 12 . Headspace solid phase microextraction (HS-SPME)was applied in volatile substance concentrate to improve the analytes placed on column 13 . Due to its simplicity, high sensitivity and reproducibility, SPME is particularly suitable for the analysis of volatiles in food 14,15 . Coupled with GC-MS, the aroma compounds can be identified precisely.
Different rice varieties have different aromatic characteristics and volatiles of rice can be considered to be biomarker candidates of geographical varieties determination. Rice considered to be has high quality because of the attractive aroma and good taste 16 . So, to characterize the aroma profile of fragrant rice is meaningful not only on adulteration but also good for breeding. Wuchang rice, as its name suggests, grows in Wuchang county www.nature.com/scientificreports/ in Heilongjiang province of northeast China 17 . It is known for its intense aroma and enlisted as a National Geographical Indication Product of China 18 . Wuyoudao 4 is the dominant variety in this area since 1990s. Despite big differences in composition and the grain quality between WuChang rice and other varieties, it is still challenging to distinguish them due to a lack of robust and sensitive method [19][20][21] .
In this study, we aimed to address this challenge by profiling Wuchang rice VOCs with HS-SPME extraction and GC-MS detection. First, an untargeted metabolomic approach was used to profile the VOCs in both WuChang and non-WuChang rice varieties. Subsequent multivariate analysis showed an unambiguous classification of the samples according to their geographic origins. In addition, our data led to the identification of eight VOCs that can be used as biomarkers in variety determination. The GCMS approach described is readily applicable to determine the geographic origin of other premium rice.

Materials and methods
Material. Twenty rice samples grown in 2018 were purchased from either Wuchang or other regions (see Table 1 for details). It was confirmed that all methods were performed in accordance with the relevant legislation. For each sample group, three biological replicates were obtained. All samples were stored at -80 °C until analysis. Headspace vials (20 ml in volume) were purchased from Agilent, each with a silver aluminum cap and a polytetrafluoroethylene (PTFE)/silicone rubber septum. Supplies for solid phase microextraction (SPME) including a manual injection handle and a 75 μm carboxen/polydimethylsiloxane (CAR/PDMS) extraction head were purchased from SUPELCO (St. Louis, USA) 22 .
Extraction of volatiles by HS-SPME. Four grams of rice grain were stored in a 20 mL headspace vial.
The headspace bottle was heated at 80 °C for 30 min in a water bath. Subsequently, a SPME fiber was inserted into the headspace portion of the vial and then incubated for 60 min to absorb VOCs from the rice grain 23 .
GC-MS analysis. GC-MS analysis was performed using a TRACE GC ULTRA (Thermo) coupled with a SQ QUANTUM XLS (Thermo). The SPME fiber was inserted into the inlet of the GC-MS with the fiber head pushed out, and VOCs were desorbed at 250 °C for 5 min. Helium was used as the carrier gas with a flow rate of 1 mL/min and a spitless injection was used. A DB-5 capillary column (30 m × 0.25 mm × 0.25 μm) was used for separation with the following temperature gradient: an initial temperature of 40 °C for 3 min; a linear increase to 100 °C at 5 °C/min and hold for 3 min; and ramping to 250 °C at 10 °C/min and hold for 4 min. Eluted compounds are ionized by electron ionization with an electron energy of 70 eV and an ion source temperature of 230 °C. Full MS scans were performed at the mass range of 40-300 m/z 24 .
Chromatographic analyses. The obtained MS spectra were matched to the National Institute of Standards and Technology reference spectra (NIST 08 and Wiley 7). The retention index (RI) of putative compounds is further compared to previous reports (Table S2). The normalized peak area was used for calculating the compound concentration. Data from three biological replicates were reported (Table S3). www.nature.com/scientificreports/ Clustering analysis. Prior to statistical analysis, the raw MS intensities were first subjected to log transformation and Pareto scaling. Feature selection was performed using standard methods such as fold change (FC), volcano plot, t-test and empirical Bayesian analysis of microarray (EBAM). Features with a relative standard deviations (RSD) > 30% were removed from further analysis. A feature was considered to be differentially expressed with the following criteria: (1) FC < 0.8 or FC > 1.2; (2) p-value < 0.05) and (3) false discovery rate (FDR) < 0.1. Euclidean distance-based hierarchical cluster analysis (HCA) was used to explore the variance distribution and classification patterns among samples with the Ward method 25 . Discriminant analysis of the rice samples was performed with principal component analysis (PCA), partial least square-differential analysis (PLS-DA), cross validation and Heatmap. Further evaluation of the accuracy of these predictions were performed by random forest analysis 26 . Finally, putative biomarker candidates to discriminate Wuchang rice from other varieties were chosen based on (1) a variable importance in the projection (VIP) score ≥ 0.8; and (2) differentially expressed in the univariate analysis 27 . All the data were analyzed were conducted by R 4.2.2.

Results and discussion
The shape and VOCs profile of the rice grain. The morphology of rice showed in Fig. 1. Some of the rice have different shapes with Wuyoudao4, but some cultivar has similar length and shape such as JSJ. Although Wuyoudao4 have the characteristics with long grains, but it is also hard to tell whether it is Wuyoudao4 only depend on the shapes by naked eye. To prevent mislabeling and adulteration, we developed a method to profile VOCs from Wuchang rice and non-Wuchang rice. The representative total ion chromatography was shown Fig. 2 with the peaks annotated. In total, 22 VOCs were identified from our samples. These VOCs covered compounds with a diverse range of chemical compositions including eight aldehydes, five alcohols, two ketones, two heterocyclic compounds, and five hydrocarbon compounds ( Table 2). In addition to VOCs, siloxane derivatives were also observed due to the presence of PDMS in the SPME fibers (Table S2), which is consistency with the previous report 28,29 .

Chemometric analysis of biomarkers for rice screening. To identify putative biomarker in determin-
ing the geographic origin of rice, we performed both single factor and multivariate analyses on the VOC dataset. First, differential analysis revealed that the concentration of eight features were significantly different among samples (p < 0.05, FDR < 0.1, and FC < 0.8 or FC > 1.2) showed in Venn diagram (Fig. 2). This indicates that the profile of VOCs could be used to distinct rice varieties.
The eight features showing differential concentration between Wuchang and non-Wuchang rice were further evaluated by the variable importance in the projection (VIP) score. All of them passed the 0.8 cutoff and were thus considered as putative biomarkers. They consisted of three aldehydes (heptanal, octanal, (E)-2-decenal), 3 alcohols (1-heptanol, (E)-2-decen-1-ol, 3,7,11-Trimethyl-3-dodecanol), one ketone (3-octene-2-one), and one heterocyclic compound (2-acetyl-1-pyrroline, or 2-AP). Details of the these compounds were summarized in Table S1. Compared to non-Wuchang rice, Wuchang rice showed a significantly lower level of 1-heptanol and 3-octene-2-one (P < 0.05), and a much higher level for the remaining six putative biomarkers (Fig. 3a). Among www.nature.com/scientificreports/ them, 2-AP has been reported as an important chemical trait to differentiate fragrant rice from non-fragrant rice. Other non-wuchang rice have a relative lower amount of 2-AP amount except TG(Jasmine 105). The identification of 2-AP (Fig. 3b) wast validate which the robustness of our method, but also aligns aligned well with previous findings. Despite we observed a significant difference in the 2-AP concentration between Wuchang and non-Wuchang rice, we next sought to see whether other volatiles show a similar pattern. Thus, we examined the correlations between 2-AP and other compounds. We observed a negative correlation for (E)-2-octenal, (Z)-heptenal, 1-octene-3-ol, dodecane, and heptadecane, and a positive correlation for the rest of 22 VOCs (Fig. 3c).

Evaluation of biomarker efficiency. Previous studies reported the use of 2-AP in determining fragrant
rice and non-fragrant rice 30 . However, the usefulness of other biomarkers remain unclear. To this end, we assessed the discrimination efficiency of putative markers identified here using several multivariate analysis techniques.  Clustering and heat map analysis. HCA clustering classified the samples into two groups neatly with respective to their geographic origins: Wuchang rice and non-Wuchang (Fig. 4a). A large distance between the two groups was also observed (Euclidean distance > 20), supporting the significant difference in grain quality between them. By contrast, the Euclidean distances among the 8 Wuchang samples are less than 5, suggesting that subtle variations exist for rice grown in the same area. Next, heatmap analysis was performed to visualize the correlation of eight putative VOCs (Fig. 4b). The 20 samples were separated into two groups. A high degree of correlation was found for the first group consisted of 9 samples, including all 8 varieties from Wuchang. Notably, samples in this group were also high in 2-AP, contribution to a better rice flavor. By contrast, a lower level of correlation in term of VOC content was found for the second group of 11 samples.
Unsupervised PCA was used to process the original data into orthogonal components. The results showed a percentage of variance of 33.9%, 26.8% and 11.1% for the first three PCs respectively (Fig. 5a). Consistent with the clustering analysis, a clear separation of Wuchang rice from other varieties was also observed using the first three PCs. It is notable that nine samples of Wuchang rice concentrated in the three-dimensional PCA plot, supporting that no statistical significant difference were found among them (p > 0.05). Conversely, a wide distribution was observed for the 11 other rice samples, reflecting the fact that they were grown in a diverse range of geographic locations.
Similar to clustering and PCA, the PLS-DA analysis showed a clear aggregation among Wuchang varieties (Fig. 5b), indicating that the composition and content of these nine samples were similar. By contrast, disperse To further evaluate the ability of the eight putative biomarkers in rice classification, we built a model based on the random forest algorithm (RF). RF is not only suitable for feature selection, but also provides useful information such as OOB (out-of-bag) error, variable importance measurement, and outlier measurement 31 . www.nature.com/scientificreports/ We found an excellent classification ability of the eight putative biomarkers, indicated by a classification error of essentially 0 (Fig. 5c). Our study has demonstrated the feasibility of using HS-SPME/GC-MS to profile the VOCs in rice grains. In addition, we found that Wuchang rice can be unambiguously distinguished from non-Wuchang rice by eight putative biomarkers. Thus, this work represents an important contribution to the field of non-destructive geographical authentication of rice. Application of the method in industrial settings would bring a broad impact on the production and management of Wuchang rice, commercial rice trading and consumer's satisfaction. Putative biomarkers include 2AP, consistent with its contribution to rice aroma and previous findings that 2AP can serve as a biomarker to distinguish fragrant rice from non-fragrant ones. More importantly, our untargeted metabolomic approach also identified other putative biomarkers of diverse chemical properties. In the following section, we discuss the relevance of these compound in distinguishing rice verities.
Among the eight putative biomarkers, 2-AP is the most important source of flavor in rice, followed by aldols and ketones. As the most important aroma compound in fragrant rice, 2-AP promotes appetite and improves human metabolim [32][33][34] . The odor of 2-AP at 0.05 ppm aroma was described as popcorn, which is positively correlated with "butter" and "corn. The odor threshold for 2-AP in water and air had been determined to be 0.1 nL/L, and 0.02 ng/L, respectively 35 . The extremely low threshold makes 2-AP an important source of food aroma. In rice, 2-AP is mostly produced by the plants, although it was originally thought as a product of the Maillard www.nature.com/scientificreports/ reaction during rice cooking 36,37 . In line with this, we detected 2-AP by GCMS in the rice grain with the highest concentration observed in the Wuchang samples. These results are also in consistent with previous reports that different levels of 2-AP are detected in distinct rice varieties 38,39 . For instance, a 10 times lower level of 2-AP was found in non-fragrant rice compared to that of fragrant rice. Together with other compounds such as hexanal, octanal, (E,E)-2,4-decanedialdehyde, (E)-2-nonanedialdehyde, 4-vinyl-2-methoxy -phenol and hydrazine, 2-AP has been identified as a common aromatic compound among three fragrant rice varieties 40 . Thus, the high level of 2-AP in Wuchang rice suggests that 2-AP is a major factor in determining the pleasant flavor of Wuchang rice. Aldehydes were odorous compounds found in many plants and foods, especially in fragrant rice. They are generated either by the oxidation of free fatty acids or the decomposition of linoleic acid 41 . We found three aldehydes that show differential levels between Wuchang and non-Wuchang rice: heptanal, octanal, and (E)-2-decenal. Previous studies have shown that heptanal, among other saturated aldehydes such as valeraldehyde, hexanal, heptaldehyde, octanal, and furfural, are odor-generating compounds in rice. Interestingly, heptanal and octanal give rice a pleasant fresh grassy scent and a light fruit scent at low concentration, while it produces a disgusting rancid taste if the concentration is too high. In addition, (E)-2-nonenal have a pleasant orange aroma at an extremely low concentration. It will be of interest to explore the contribution of these compounds, either alone or combined with other compounds, in rice flavor.
Ketones can be generated from a variety of pathways in plants, including oxidative or thermal degradation of polyunsaturated fatty acids, amino acid degradation or microbial oxidation 42 . Ketones give rice a pleasant aroma. Daygon et al. reported that compounds such as 2-heptanone, 2-hexanone, and 3-octene-2-one give fruity/ floral aroma in rice, contributing to rice flavor 43 . We observed a narrow distribution of 3-octene-2-one content in Wuchang rice (Fig. 3a), indicating a potential of 3-octene-2-one in determining the geographic origin of rice. Interestingly, 3-octene-2-one is considered to be the most active ketone in rice, with a rose aroma and excellent long-lasting flavor, leading to the most intense flavor of rice.

Data availability
The data that support the findings of this study are available from the corresponding author L M upon reasonable request. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.