Spatial Chemometric Analyses of Essential Oil Variability in Eugenia dysenterica

quimicamente a distâncias geográficas superiores a 120 km, um indicador da distância mínima entre amostras necessária para a conservação da diversidade genética das populações. Embora sendo raramente usadas com metabólitos secundários, essas metodologias possuem uma grande aplicação em conservação de espécies e podem permitir uma efetiva integração de perspectivas genética, química e ecológica. Chemovariations in essential oils were used for studying the spatial chemical structure of eight E. dysenterica populations in Central Brazilian Cerrado. Previously, multivariate Mantel autocorrelogram and chemical matrix variation partitioning, using the spatial and environmental data sets as predictors, have suggested a highly significant spatial variation in essential oils. In the present study, spatial chemometric methods using variograms and probability maps detected and characterized the spatial chemical structure among populations, as well as the environmental factors responsible for them. All these strategies indicated that the populations differ chemically whenever the geographical distance exceeds 120 km, an indicator of the minimal distance between samples required for conserving the genetic diversity of populations. Although being scarcely used with secondary metabolites, these methodologies may be used in a wide range of applications in species management and may lead to an effective integration of genetic, chemical and ecological perspectives.


Introduction
Phenotypic variation patterns in secondary plant metabolites have strong ecological significance and are an important factor in understanding the evolutionary history of natural populations, as they affect both intra and interspecific interactions. 1 Variation of chemical phenotypes can be explained by a combination of genetic or ontogenetic and environmental variation sources. 2,3 Spatial factors can be critical for each of these processes and the levels and spatial structure of phenotypic chemovariations may affect competition, 4 local adaptation to the presence of another plant species, 5 pollination, 6 nutrient cycling, 7 differential freezing resistance, 8 foraging behavior and habitat selection, 9 as well as influence the capacity of herbivores and pathogens to adapt and exert selection on plant chemicals. 10 Spatial patterns in essential oil variations have been described for E. dysenterica from different sites located in South East Goiás state, Brazil. 11 Multivariate Mantel autocorrelogram and trend surface analysis, with variance partitioning by partial redundancy analyses (pRDA), 12 have enabled the detection and quantification of spatial chemical patterns in the same genetically described E. dysenterica populations. [13][14][15] All genetic and chemical markers showed a similar profile for populations located at around 120 km, according to the isolation-by-distance model. This profile decreases as distance increases, so that populations located more than 190-200 km away become different as far as these descriptors are concerned. 11,[13][14][15] Here, the spatial chemical patterns of phenotypic variation in previously reported E. dysenterica essential oil data sets via spatial chemometric methods were explored. 11 In these techniques, theoretical variograms can be fitted to experimental ones, allowing the comparison of an observed structure with spatial chemical structures derived from chemometric models. Theoretical modeling may be used for understanding and as a predictive tool for interpolation (kriging) or probability mapping. According to this finding, it would be possible to decide which groups of local populations should be given priority in sampling or preserving, 16 as the distribution of genetic (chemical) variation in the geographical space is an essential factor for the conservation and management of wild populations. 17 Nevertheless, models predicting the spatial distribution of secondary metabolites are scarcely studied, mostly restricted to phenolic variations, [18][19][20] despite their importance in sampling and species management, as well as for individual and community success.

Plant material and essential oil data
The complete data of the E. dysenterica sampling collection was described previously. 11 Briefly, essential oils from 121 E. dysenterica trees were extracted from eight populations in South East Goiás state, Brazil. Sampling sites and populations are shown in the Supplementary Information (SI) section ( Figure S1). Essential oil data sets were represented by chemical constituents (121 samples × 49 variables) or oil constituents rearranged according to biosynthetic carbon skeletons (121 × 13).

Spatial chemometric analyses
In the spatial chemometric analyses, the fitted values on the first extracted axis of the redundancy analysis (RDA) were used to represent essential oil variability in the samples, since summarizes the multidimensional chemical information of every individual. RDA first axes correspond to the main variance fractions explained by significant spatial and environmental predictor variables. 11 Spatial analyses were also conducted for the percentage values of total oxygenated terpenes (sum of monoterpenes and sesquiterpenes), total oxygenated monoterpenes and total sesquiterpene hydrocarbons, the main biosynthetic classes of essential oils.
To determine the strength and scale of spatial chemical dependence among populations, the variance (formerly the semi-variance) γ(h) was estimated by using the following equation 1: 21 (1) where n(h) is the number of lag class pairs at h distance intervals; Z x and Z x + h are chemical parameter values at location x and x + h, respectively. The number of lag classes (h = 14) was determined by Sturge's rule, 22 an objective method which avoids arbitrarily inflating the explained spatial variation as a function of lag class size. The plot of γ(h) against h provides the variogram ( Figure S2 in the SI section), which will show either purely random behavior or systematic behavior described by a theoretical model (linear, spherical, gaussian or power law distribution). Most models start with a variance value above zero at the y-intercept, called the nugget (C o ), which is the unexplained variance. This can be attributed to variability at a smaller scale than the sampling resolution or to measurement errors. If there is spatial autocorrelation in the chemical data, the variance in the defined distance intervals increases from the y-intercept with distance until it reaches its maximum, known as the sill (C o + C). The variance between the nugget and the sill is called the partial sill (C) and is the part of the variance which is explained by the spatial structure of the chemical data. The ratio of the partial sill and the total sill [C / (C o + C)] determines the strength of the spatial autocorrelation (structural variance, Q), i.e., this statistic provides a measure of the proportion of sample variance (C o + C) that is explained by spatially structured variance (C). High values indicate a spatial pattern in the data.
Alternatively, the Cambardella index (I c ) was also used to compare the degree of spatial dependence. 23 This index estimates the proportion of variance which is erratic (not spatially structured) and gives a good indication of how data set is spatially arranged, namely: I c < 25: strong spatial dependence and small erratic variance; 25 < I c < 75: moderate spatial dependence; and I c < 75: random spatial distribution. The distance in which the model reaches the sill is known as the effective range, which shows the extent of the spatial dependence on the chemical data. Furthermore, the mean correlation distance (MCD = 3 / 8 × range × Q) was computed at each variable to compare the distance in which a high spatial dependence occurred in the variogram. 20 If the model shows only the nugget, then there is no spatial structure in the data.
The coefficient of variance (CV) was calculated for all response variables, and the nugget, sill, effective range, Q, I c , correlation coefficient (r), and MCD parameters were obtained from the model with the best fit to the variance data. All sample data were investigated for anisotropy by using directional variograms before calculating omnidirectional variograms. When spatial dependency was present, the spatial distribution of the chemical parameter (Z x ) was expressed by a kriged map. This is a linear interpolation technique that provides the best linear unbiased estimation, which minimizes prediction error variances for spatial variables. Although the resulting maps provide a powerful visualization of the spatial pattern, probability maps that exceed the mean value of the response variables were used to avoid excessive smoothing in the kriged map. 18 Probabilities were calculated by performing 1000 conditional simulations on the predicted distribution of values from response variables.
To assess environmental influence on oil variability, leaf nutrients (environmental predictors: P, K + , Mg 2+ and Cu 2+ ), represented by the first axes of the partial RDA, 11 were submitted to the probability to exceed the mean value maps. Thus, the maps of pure environmental influences were obtained shared with spatial descriptors. All calculations were performed with GS+ (Geostatistics for the Environmental Sciences) and variogram plotting was completed using the R library Gstat 24 version 2.15.0 of the R package. 25

Results and Discussion
To identify the existence of spatial structures and describe the spatial variability of response variables, variograms of RDA first axes from each oil data set, percentage values of oxygenated terpenes, oxygenated monoterpene and sesquiterpene hydrocarbons were computed for sampling populations. The study of variograms is important in natural product chemistry, as its nugget effect provides information on the error made by the measuring instruments and by chemovariations undetected in the sampling neighborhood. In addition, they offer valuable information regarding the spatial dependence of each response variable (nugget/sill ratio) and the effective range of spatial autocorrelation, which supports the establishment of operational chemical units, 11 similarly to genetic operational units for the purposes of conserving and managing populations. 26 In the present study, the gaussian models fitted the experimental variograms with great success in the majority of oil response variables (Table 1). This confirms the existence of a spatial structure in essential oil contents and rules out the possibility of a random distribution of essential oils throughout population sites. According to the Cambardella index, 23 the spatial dependence of essential oil content, as measured by each oil data set, was the highest among all response variables.
The nugget/sill ratios were 0.04 and 0.05% for oil constituents and oil carbon skeleton, respectively. Thus, at least 99.9% of total variance in essential oil data sets can be explained by spatially structured variance. On the other hand, the proportion of the nugget effect was greater for oxygenated monoterpenes (49.9%) than for the sum of oxygenated terpenes (8.9%), and in general reflected a moderate spatial dependence of the former biosynthetic class. A similar trend was observed for the coefficient of variation. For the oxygenated monoterpenes, the variability (CV = 62.8%) was more than 2.5 fold higher than the variability caused by changes in the oxygenated terpenes (CV = 24.9%). In contrast, the variogram model for sesquiterpene hydrocarbons (CV = 19.2%) showed only a nugget effect, which suggests there is no spatial structure in the data ( Figure S3 in the SI section).
For both oxygenated biosynthetic classes, the effective range of the variograms, which is the distance at which a plateau is reached, progressively increased from monoterpenes to the sum of oxygenated terpenes. The MCD value is an informative parameter of the range of distances within which a high spatial dependence exists For oxygenated monoterpenes, the MCD value (14588.4 m) was only twice the lag class distance interval (7823.8 m), leading to a higher proportion of nugget effect with moderate spatial dependence (C o = 0.1852, I c = 49.9%). In other words, the variability of the oxygenated monoterpene content in the essential oil occurred at shorter distances than those involving the other response variables, whenever two adjacent sampling sites were taken into account. These results confirm the change in the spatial pattern of the terpene content in the essential oil throughout different biosynthetic classes. The high nugget effect proportion may also reflect a strong degree of intra or interspecimen variation. Furthermore, this effect should be related at least partially to the collection of leaf samples, which occurred in July, during the dry season. During this time, the peak of leafing activities, senescence and emission of new leaves occur, 27 thus requiring large amounts of carbon and macronutrients for protein and RNA synthesis, which is markedly increased in young leaves with a high capacity for biosynthesized essential oils. During leaf growth, leaf volatiles may provide a constitutive defense (by deterring potential herbivores) or an induced response to herbivore damage by attracting predators or parasites. 28 However, as biosynthetic terpenes changed, differences in oxygenated sesquiterpene content decreased, much in the same way that the effective range and the MCD value increase for this class. Presumably, this effect reveals that a plateau has been reached in the essential oil, which results in intra and interpopulation homogeneity. The observed effective range for the oil constituent data set (120.2 km) occurred in a similar geographical distance (117 km) previously determined by multivariate Mantel autocorrelogram. 11 These results contrast with numerical simulations, 29 of which the Mantel test and derived forms could not correctly estimate the proportion of the original data variation explained by spatial structures.
The spatial pattern for E. dysenterica essential oils is in agreement with the evolutionary isolation-by-distance model observed for morphological and isozymatic descriptors, 13,14 as well as for simple sequence repeats (SSR) and random amplification of polymorphic DNA (RAPD) genetic markers. 15 All of these genetic descriptors showed a similar genetic profile for populations located at around 120 km. This similarity between the chemical data with genetic markers is consistent with the findings of other studies which used terpenoids and isozymes, RAPD, amplified fragment length polymorphism (AFLP), SSR and inter-simple sequence repeat (ISSR) molecular markers. 30 Predictive maps (Figure 1) based on fitted models for each spatially structured biosynthetic class and for each essential oil data set were obtained using the probability to exceed threshold value (mean value).
All the maps showed greater variation between populations with certain similarities in the spatial patterns of two well-differentiated areas: an area with high essential oil variation in populations located along the Corumbá River basin (north-south direction of the plot) and another area (east-west) with low oil complexity. This chemical variability may be explained as a result of localized inbreeding effects associated with a low gene migration rate among populations. 15,16,31 The Corumbá River basin separates the two North West populations, Goiânia (8) and Senador Canedo (7), through a depression formed by the river and its tributaries. This spatial barrier could contribute at least partially to ecological isolation, a pre-requisite for speciation and chemovariation between two sampling sites. In fact, the existence of chemotypic differentiation between populations from these sites could be confirmed by the fact that cultivated plants grown adjacently in the same environment displayed the typical composition of their wild populations. 32 Furthermore, samples from Senador Canedo and Goiânia tend to reveal a certain level of segregation for some response variables (Figures 1a and 1b). These variations were not previously observed and may be attributed to the Meia Ponte River basin that separates these sampling locations. On the other hand, samples from populations 4 (Campo Alegre de Goiás) and 5 (Cristalina) showed low oil variability (Figures 1a and 1b) or corresponded to a transition between the highest variability centers (Figures 1c and 1d). These populations are located between two geographical barriers, the mountain ridges of Cristais and Contraforte Central, which may be an important factor for isolation and speciation.
Another source of oil variation was the nutrient resource variability due to the fact that the environmental predictors were leaf nutrients (P, K + , Mg 2+ and Cu 2+ ), which account for 8% of the chemovariation. 11 The maps created for pure environmental influences and shared with spatial predictors coincide with the regions of oil variation ( Figure S4 in the SI section) and are a relative indication of the divergence in terpenes. These micronutrient effects should be associated with a strict requirement for sesquiterpene synthases with divalent metal ions as cofactors, especially Mg 2+ , which have also influenced the number of by-products obtained from these reactions. 33 Vol. 24, No. 5, 2013 Terpenoid accumulation was related to high P and K + soil contents or when culture media were supplemented with increased P concentration. 34 On the other hand, populations located around the Meia Ponte River (7 and 8) and the mountainous populations (4 and 5) were discriminated by pure spatial influence and by the fact that they are shared by environmental predictors along the sampling sites ( Figure S5 in the SI section).
Unlike the frequent assessment of the geographical distribution of genetic diversity, 35 the description of the spatial structure of essential oil variability has been scarcely studied until now. 11,36 These studies have focused on spatial autocorrelation approaches, such as the Mantel test and other derived forms, which could not correctly estimate the proportion of the original data variation explained by spatial structures. 29 Trend surface analyses have faced difficulties in interpreting the terms of the polynomials, in addition to the non-independence of the monomials. 37 Furthermore, they have been unable to correctly model fine-scale patterns. 38 Thus, spatial chemometric methods which use variograms of essential oil chemovariations may be used as additional tools to establish in situ conservation areas or sampling areas for ex situ conservation. In addition, the ecological role of essential oils in neotropical Myrtaceae has not yet been investigated, 39 but the studies referred to here are very important for conservation and management strategies.

Conclusions
In the present study, spatial chemometric methods detected and characterized the spatial chemical structure among populations of E. dysenterica based on fitted models using variograms and probability maps. In addition to multivariate Mantel autocorrelagram and trend surface analyses, 11 all chemical strategies showed that E. dysenterica populations differ chemically whenever the geographical distance exceeds 120 km, an indicator of the minimal distance between samples required for conserving the genetic diversity of populations. Although being scarcely used with essential oil variations, these methodologies show a wide range of applications in species management and may lead to an effective integration of genetic, chemical and ecological perspectives.

Supplementary Information
Supplementary data (Figures S1-S5) is available free of charge at http://jbcs.sbq.org.br as a PDF file.   Figure S4. Distribution of environmental predictors (leaf nutrients: P, K + , Mg 2+ , Cu 2+ ) of oil constituent variations in the leaves from eight E. dysenterica populations from Central Brazilian Cerrado: pure environmental influence (a) and environmental factors shared by spatial predictors (b). The maps show the probability of finding a value higher than the mean content. Figure S3. Variograms for essential oils from eight E. dysenterica populations represented by oil constituent data set (a) and for the sesquiterpene hydrocarbon content (b) along the separation distance of sampling sites. Fitted models show spatially structured variance (a) and nugget effect only (b), with no spatial structure in the data.
Vilela et al.

S3
Vol. 24, No. 5, 2013 Figure S5. Distribution of spatial predictors of oil constituent variations in the leaves from eight E. dysenterica populations from Central Brazilian Cerrado: pure spatial influence (a) and spatial factors shared by environmental predictors (b). The maps show the probability of finding a value higher than the mean content.