Discrimination of Picea chihuahuana Martinez populations on the basis of climatic, edaphic, dendrometric, genetic and population traits

Background Picea chihuahuana, which is endemic to Mexico, is currently listed as “Endangered” on the Red List. Chihuahua spruce is only found in the Sierra Madre Occidental (SMO), Mexico. About 42,600 individuals are distributed in forty populations. These populations are fragmented and can be classified into three geographically distinct clusters in the SMO. The total area covered by P. chihuahuana populations is less than 300 ha. A recent study suggested assisted migration as an alternative to the ex situ conservation of P. chihuahuana, taking into consideration the genetic structure and diversity of the populations and the predictions regarding the future climate of the habitat. However, detailed background information is required to enable development of plans for protecting and conserving species and for successful assisted migration. Thus, it is important to identify differences between populations in relation to environmental conditions. The genetic diversity of populations, which affect vigor, evolution and adaptability of the species, must also be considered. In this study, we examined 14 populations of P. chihuahuana, with the overall aim of discriminating the populations and form clusters of this species. Methods Each population was represented by one 50 × 50 m plot established in the center of its respective location. Climate, soil, dasometric, density variables and genetic and species diversities were assessed in these plots for further analyses. The putatively neutral and adaptive AFLP markers were used to calculate genetic diversity. Affinity Propagation (AP) clustering technique and k-means clustering algorithm were used to classify the populations in the optimal number of clusters. Later stepwise binomial logistic regression was applied to test for significant differences in variables of the southern and northern P. chihuahuana populations. Spearman’s correlation test was used to analyze the relationships among all variables studied. Results The binomial logistic regression analysis revealed that seven climate variables, the geographical longitude and sand proportion in the soil separated the southern from northern populations. The northern populations grow in more arid and continental conditions and on soils with lower sand proportion. The mean genetic diversity using all AFLP studied of P. chihuahuana was significantly correlated with the mean temperature in the warmest month, where warmer temperatures are associated to larger genetic diversity. Genetic diversity of P. chihuahuana calculated with putatively adaptive AFLP was not statistically significantly correlated with any environmental factor. Discussion Future reforestation programs should take into account that at least two different groups (the northern and southern cluster) of P. chihuahuana exist, as local adaptation takes place because of different environmental conditions.

In a recent study, Mendoza-Maya et al. (2015) suggested assisted migration as an alternative to the ex situ conservation of P. chihuahuana, taking into consideration the genetic structure and diversity of the populations and also predictions regarding the future climate of the habitat. However, detailed background information is required to enable development of plans for protecting and conserving species and in order to achieve successful assisted migration. Thus, it is important to identify differences between populations in relation to environmental conditions (Aguilar-Soto et al., 2015). The vitality and genetic diversity of populations, which affect vigor, evolution, and adaptability of the species, must also be considered (Frankham, Ballou & Briscoe, 2002;Reed & Frankham, 2003). In other words, genetic diversity is vital for increasing population fitness by reducing inbreeding depression in the short term and, in the longer term, to develop new local adaptations in response to environmental changes (Reed & Frankham, 2003). Genetic diversity also affects ecological processes such as primary productivity, population recovery from disturbances, interspecific competition, community structure, and fluxes of energy and nutrients (Hughes et al., 2008). AFLP markers (amplified fragment length polymorphism) can be used to describe genetic diversity (Meudt & Clarke, 2007). Outlier AFLP markers were found in several studies (e.g., Nunes et al., 2012), that were associated with different abiotic and biotic conditions (e.g., Jump et al., 2006;Wehenkel, Corral-Rivas & Castellanos-Bocaz, 2010).
In this study, we examined fourteen P. chihuahuana populations with the overall aim of discriminating the populations and clusters of this unique tree species. For this purpose we: (i) determined 74 variables: 22 climatic, 27 edaphic, 10 dasometric, four density variables and other two population variables, as well as six genetic variables and three species diversity indices were tested by using putatively neutral and adaptive AFLP markers, (ii) identified suitable variables for separating populations, and (iii) tested for correlation between genetic diversity, dasometric, and environmental factors. Our purpose was seeking for any significant differences, in order to predict species distribution by discriminant analysis; the results led us to make proposals for ex situ conservation of P. chihuahuana.

Study area
The study was conducted in 14 populations of P. chihuahuana located in five municipalities of the state of Durango and two municipalities of Chihuahua, Mexico (Table 1 and Fig. 1). The 14 locations were selected in order to cover three geographically distinct clusters of the natural distribution along the species (north, center, and south). Each location was represented by one 50 × 50 m (0.25 ha) plot established in the center of the respective Notes. *Taken from Table 6 of Ledig et al. (2000). population. Following , all trees with diameter at breast height (DBH) ≥7.5 cm were scored in regard to position, DBH, height, and species affiliation. Field experiments were approved by the Secretariat of Environment and Natural Resources, Mexico (SEMARNAT; permit number SGPA/DGVS/02835/12).

Determination of climate variables
The climate model developed by Rehfeldt (2006), based on thin plate spline (TPS) of Hutchinson (1991) and Hutchinson (2004), and explained for its Mexican implementation

Determination of edaphic variables
In each location, a soil subsample (250 g) was collected at a depth of 0-15 cm at the base of the stems of four Picea chihuahuana trees. The four soil subsamples were combined to make a 1,000 g sample per population (14 samples in total) for analysis of 27 edaphic variables: the texture (relative proportion of sand, silt, and clay), density (Den) (g/cm 3 ), concentration of calcium carbonate (CaCO 3 ) (%), pH (CaCl 2 , 0.01 M), concentrations of potassium (K)(ppm), magnesium (Mg) (ppm), sodium (Na) (ppm), copper (Cu) (ppm), iron (Fe) (ppm), manganese (Mn) (ppm), zinc (Zn) (ppm), and calcium (Ca) (ppm) in the soil were determined by the methods described by Castellanos, Uvalle-Bueno & Aguilar-Santelises (1999). Phosphorus (P) (ppm) was determined by the method of Olsen et al. (1954). Nitrate (NO 3 ) (kg /ha) was determined by the method of Baker (1967) and the relative organic matter (%) (OM) contents were determined by the method of León & Aguilar (1987). Electrical conductivity (CE) (dS/m) was determined by the method described by Vázquez & Bautista (1993). Finally, the cation exchange capacity (meq 100 g soil) (CEC) and the relative proportions (%) of hydrogen, Ca, M, K, Na and other bases (o.b.) in the CEC were estimated on the basis of the Ammonium Acetate Method (pH 8.5).
The hydraulic conductivity (HC) (cm/h) was determined by the method of Mualem (1976) and percent saturation (Sat) (%) was estimated by the method of Herbert (1992). Edaphic variables are described in Table 3.

Determination of dasometric, density and population variables
For each of the 14 plots we estimated the individual diameter at breast height (DBH), basal area (G), height (H), maximum diameter at breast height (DBH max ), maximum height (H max ) of P. chihuahuana trees. For each plot we also estimated the following variables considering together all tree species found per plot (see details in : individual total diameter at breast height (DBH tot ) and individual total height (H tot ). Besides we registered the total maximum diameter at breast height considering together all tree species per plot (DBH max,tot ) and total maximum height for all tree species found per plot (H max,tot ), according to Assmann (1970). We also estimated the total number of individuals of P. chihuahuana per plot (N), quadratic DBH of P. chihuahuana per plot (D g ), total number of individuals per plot (N tot ), basal area per plot (G tot ) and quadratic DBH per plot (D g,tot ), according to   (Table 4). Two other population variables were considered: population size (T) and geographical distance between neighbor populations (d min ). T was taken from Table 6 of Ledig et al. (2000). d min was calculated by GenAlex 6.5 (Peakall & Smouse, 2006) (Table 1). All the 40 known populations, based on their geographical coordinates (Table 1), were included for the distance calculations.

Determination of genetic diversity variables
Needles were sampled from 669 individuals (seedlings, saplings and trees) of P. chihuahuana in the 14 populations (plots) studied (i.e., 17-57 individuals per plot), for determination of genetic diversity variables (Table 5). The DNA was extracted using the DNeasy 96 Plant Kit (QIAGEN, Hilden, Germany). The amplified fragment length polymorphism (AFLP) analysis was conducted according to a modified version of the protocol of Vos et al. (1995), described by Simental-Rodríguez et al. (2014). The restriction enzymes used were Eco RI (selective primer: 5 -GACTGC GTACCAATTCNNN-3 ) and Mse I (selective primer: 5 -GATGAGTCCTGAGTAANNN-3 ). The primer combination E01/M03 (EcoRI-A/MseI-G) was used in the pre-AFLP amplification. Selective amplification was carried out with the fluorescent-labelled (FAM) primer pair E35 (EcoRI-ACA) and M70 (MseI-GCT). The AFLP products were separated in an ABI 3100 Genetic Analyzer, along with the GeneScan 500 ROX internal lane size standard (Applied Biosystems, Foster City, California, USA). Selection of the amplified restriction products was totally automated, and only strong and high quality fragments were considered. The size of the AFLP fragments was determined with the GeneScan R 3.7 and Genotyper R 3.7 software packages (Applied Biosystems, Foster City, California, USA). Binary AFLP matrices were created from the presence (code 1) or absence (code 0) at probable fragment positions. The quality and reproducibility of the analysis were verified according to Ávila-Flores et al. (2016).  The AFLP data were used to calculate three genetic diversity indices (Table 5): the modified frequency-down-weighted marker value (DW), the polymorphism percentage (POLY) (Schönswetter & Tribsch, 2005), and, the mean genetic diversity (v 2 ) Gregorius (1978), where: p ij is the relative frequency of a variant from the i to the j locus and N is the sample number.
The value of DW is expected to be high when rare AFLPs are accumulated (Schönswetter & Tribsch, 2005). In order to equalize dissimilar sample sizes, the values of the three diversity indices were multiplied by a correction term (N /(N − 1)), (Gregorius, 1978).
The values of these three genetic diversity indices were also calculated for putatively adaptive AFLP markers under natural selection (adaptive AFLP), detected in P. chihuahuana by Simental-Rodríguez et al. (2014).
The values of tree species richness (ν sp,0 ), Simpson index (ν sp,2 ), and number of prevalent tree species (ν sp,inf ) in the 14 plots were taken from Simental- Rodríguez et al. (2014) who used the same sampling strategy as in the present study (Table 5).

Cluster analysis
First, in order to detect the optimal cluster set for population conditions which were almost homogeneous inside each cluster, but clearly different from any other clusters, we used the recent Affinity Propagation (AP) clustering technique, with the input preference to the 0 quantile (q) of the input similarities (Bodenhofer, Kothmeier & Hochreiter, 2011), along with the k-means clustering algorithm (k-means) (Hartigan & Wong, 1979). We also utilized the Calinski-Harabasz criterion (CHC) to determine the optimal number of clusters. CHC minimizes the within-cluster sum of squares and maximizes the between-cluster sum of squares. Therefore, the highest CHC value is related to the optimal set (of most compact clusters). The optimal set can be identified by a peak or at least an abrupt elbow on the linear plot of CHC values (Legendre & Legendre, 1998).
By contrast to the k-means, the conceptually new AP simultaneously includes all data points as potential exemplars. Furthermore, AP has several advantages over related techniques, such as k-centres clustering, the expectation maximization (EM) algorithm, Markov chain Monte Carlo procedures, hierarchical clustering and spectral clustering (see details in Frey & Dueck, 2007). More importantly, it does not need a pre-defined number of groups (Bodenhofer, Kothmeier & Hochreiter, 2011).
For all the P. chihuahuana populations, both the AP (q = 0) technique and the k-means clustering along with CHC were firstly applied to all the 74 predictor variables together, and then separately for the 22 climate variables, 27 soil variables, nine genetic and species diversity variables, 10 dasometric variables, four density variables, T and d min (Tables 2-5).
All analyses were implemented using the R Script for k-Means Cluster Analysis and ''apcluster'' software packages (Bodenhofer, Kothmeier & Hochreiter, 2011) executed in the R free statistical application (R Core Team, 2015).
The AP and k-means clustering techniques recommended only two clusters of P. chihuahuana populations under study, which were completely separated from each other by the latitude and several other predictor variables.

Principal component analysis and logistic regression
Stepwise binomial multivariate logistic regression was used, which accepts independent variables even with heteroscedasticity and without a multivariate normal distribution (Hosmer, Lemeshow & Sturdivant, 2013). This regression tested for significant differences in climatic, dasometric, soil, genetic and species diversity variables between the southern populations (value zero) and the northern (value one) populations of P. chihuahuana (Table 1). The R software (version 3.3.2) was used to conduct the analysis. A linear discrimination analysis (Fisher, 1936) was not applied, since not every independent variable was normally distributed.
From the 74 predictor variables in Tables 1-5 only those that were not highly correlated with other predictor variables were included, because logistic regression requires each variable to be independent from each other (i.e., little or no multicollinearity). These predictor variables were found applying a varimax-rotated Principal Component Analysis (PCA) (Pearson, 1901). Therefore, only one variable from each PCA factor and with the highest factorial loads was selected for logistic regression.
Variables were excluded from the models if the probability of incorrectness (p) was greater than or equal to 5%. Stepwise selection (forward and backward) was performed to select the most informative variables for inclusion in the models. This procedure was done using the glm (generalized linear model) (family = ''binomial''), the step AIC (Akaike information criterion) function and the exact AIC using the ''MASS'' package (Venables & Ripley, 2002) in R (R Core Team, 2015). The AIC, standard error (SE) and residual deviance were used to evaluate the goodness-of-fit.

Ordinary kriging analysis
Ordinary kriging (ordinary Gaussian process regression model) was used to illustrate the spatial distribution of genetic diversities (v 2 , POLY, and DW) in P. chihuahuana (Batista et al., 2016). The mathematical models for describing the semivariance were: the spherical model, exponential model, Gaussian model, and Stein's parameterization. The best interpolation model was detected using 10-fold cross validation point-by-point. Correlation between the observed and predicted values (r k ) and the Unbiased Root Mean Squared Error of the residual (URMSE) were used to assess the goodness-of-fit. Finally, the model with the best fit was selected to create the prediction surface map of genetic diversity.

Spearman correlations
Spearman's correlation (r s ) test (Hauke & Kossowski, 2011) was used to analyze the relationships between genetic diversity and the climatic, soil, dasometric variables, d min and T. The test was implemented using R 3.2.3 statistical software (R Core Team, 2015). A Bonferroni correction was applied to calculate the new critical significance level (α * = 0.00023), by dividing the proposed critical significance level (α = 0.05) by the number of comparisons (m = 213) (Hochberg, 1988).

Cluster analysis
The Affinity Propagation clustering technique and the k-means clustering algorithm recommended two clusters based on the 74 predictor variables; the same grouping was found by using only the 22 climate variables under study (Fig. 2). The first cluster included the nine most northern P. chihuahuana populations under study (TN, RC, CV, TY, TR, VN, LQ, PPR and QD). While the second group comprised the five most southern populations (CB, SJ, SB, ACH and LP) (Table 1, Fig. 1). A cluster analyses was also applied with respect to the 27 soil variables, six genetic diversity indices, three species diversity indices and 14 dasometric variables, but patterns related to the geographical coordinates (i.e., latitude and longitude) were not found.

Principal component analysis and logistic regression
Eight uncorrelated variables (Mmin, Gsdd5, v sp,0 , Dg tot , NO 3 , Zn, %Mg and H max,tot ) from the 14 P. chihuahuana populations were selected for logistic regression analysis. This selection was based on a PCA (Fig. 3). The logistic regression analysis revealed that the Mmin clearly separated the southern from northern populations (Fig. 4A).
However, Mmin is a variable from the PCA factor group 1 (F1), and was strongly correlated with other eight F1 variables with high factorial loads (Long, Map, Gsp, Mtcm, Mmax, Mmindd0, Smrp, and %Sand), indicating that these eight variables were also important for characterizing and separating the two clusters. Since these eight variables characterized to 100% the two clusters, we considered that the binominal logit models were no longer needed.
Moreover, the probability (P) of being a northern population is higher if the sand proportion in the soil (%Sand) was significant lower (SE of Intercept = 10.178, p = 0.0242, SE of %Sand = 0.146, p = 0.0199, residual deviance: 9.467 on 12 degrees of freedom, AIC: 13.467) (Fig. 4B). The model is: (1) Significant differences in genetic variables and species diversity between southern and northern populations and locations were not found, although higher v 2 and DW were more probable in the northern populations.
According to the most important variables for the separation of the two clusters (Tables S1-S8) the logistic regression analysis of the P. chihuahuana populations revealed that the southern locations were characterized by more abundant precipitation in the summer, in the growing season and in the annual average in comparison to the northern locations. The southern populations also showed higher mean temperature in the coldest month, lower mean maximum temperature in the warmest month and less degree-days below 0 • C (based on mean minimum monthly temperature).

Ordinary kriging analysis and Spearman correlations
There was not a genetic diversity gradient from the northern to the southern cluster. The best kriging model was found for v 2 using the exponential model (r k = 0.842; URMSE = 0.019) (Fig. 5A). On the other hand, the goodness-of-fit of both the PLOY and DW models was poorer, respectively (r k = 0.633, URMSE = 0.063 and r k = 0.4165, URMSE = 0.123). The prediction and standard error surface maps of v 2 are shown in Fig. 5B.
After Bonferroni correction, the mean genetic diversity v 2 of P. chihahuana was significantly correlated with the mean temperature in the warmest month ( • C) (Mtwm)   Table 2-Table 5. (p = 0.0002) ( Table 6, Fig. 6). Genetic diversity of P. chihuahuana calculated with putatively adaptive AFLP markers was not statistically significantly correlated with any environmental factor. Finally, no significant positive correlations were observed between any of the three genetic diversity indices and population size. The negative association between genetic diversity and geographical distance to the next population was not significant (r s (v 2 × d min ) = −0.46, p = 0.09; r s (PLOY × d min ) = −0.42, p = 0.13; r s (DW × d min ) = −0.24, p = 0.41).

DISCUSSION
Our main findings show that the southern and northern P. chihuahuana populations are characterized by different climate conditions. Seven climate variables, besides the geographical longitude and the sand proportion in soil (Fig. 3, Tables S1-S8) were identified    , Condit et al., 2013;Toledo et al., 2012;John et al., 2007).
The southern locations were characterized by more oceanic climate, probably caused by absence of the mountain barrier of Baja California peninsula, northwestern Mexico. The maximum temperatures in the northern locations of the P. chihuahuana populations were also higher than in the southern ones. However, the future climate conditions, i.e., likely even higher temperatures and less precipitation may strongly restrict biomass production and the vitality of the most northern populations. This was observed by Ledig et al. (2010) who identified the most northern locations as the first group that may be threatened with extinction in some climate change projections.
The genetic diversity in P. chihuahuana is mostly moderate compared with other Picea species (Simental-Rodríguez et al., 2014;Wehenkel & Sáenz-Romero, 2012). The genetic diversity across all the AFLPs studied was not an important variable for separating the two clusters of P. chihuahuana populations (Fig. 5). However, it was significantly correlated with Mtwm (Table 6 and Fig. 6), where sites with warmer Mtwm harbor populations with larger genetic diversity. The most northern populations in the municipality of Bocoyna, Chihuahua were the sites with the highest Mtwm and aridity (lower precipitation values) (Tables S1-S8).
The genetic diversity among the putatively adaptive AFLPs was not significantly related to other variables. The relationships observed were probably not determined by adaptation, but by differences in the degree of isolation, which could influence gene flow and genetic drift (Ledig et al., 1997;Jaramillo-Correa et al., 2006;Quiñones Pérez, Sáenz-Romero & Wehenkel, 2014). In comparison to the center and south, the most northern populations (municipality of Bocoyna) were much closer. After considering together the 11 documented populations in the Municipality of Bocoyna, Chih., the separation distances were: minimum 0.1 km, mean 13 km and maximum 25 km to each other. The fact that northern populations are located closer to each other may directly lead to a greater genetic exchange and a lower tendency for genetic drift and inbreeding and thus, to a higher level of genetic diversity (Hamrick, Godt & Sherman-Broyles, 1992;Ledig et al., 1997). This assumption was confirmed by the negative, but not significant association between genetic diversity and geographical distance between neighbor populations detected in our study.
Jaramillo -Correa et al. (2006) also found that the diversity of cpDNA in P. chihuahuana decreased from northern to southern areas (with the highest to the lowest Mtwm, respectively). These authors assumed that genetic drift, rather than selection, was the main factor determining the population diversity in the Chihuahua spruce. Moreover, the observations of Ledig et al. (1997), based on isozyme analysis, also suggest the importance of drift and inbreeding in the recent evolution of this tree species.
Measurement of these environmental variables may be useful to identifying suitable and similar sites to those where the original stands are still growing, which may help to improve reforestation success. However, it will be important to specifically consider local micro climatic conditions that are not easy modelled with simple macro climate models (Aguilar-Soto et al., 2015), but can be recorded at new local weather stations within the populations.

CONCLUSIONS
Our findings have three important practical implications in relation to ex situ conservation: first, at least two different groups (clusters of natural populations) of P. chihuahuana exist (according to the results of our cluster analysis), as local adaptation takes place because of the different climate and soil conditions. Climate has been recognized as the main driver of adaptation (Vander, Bischoff & Smith, 2010). These different groups are also designated by genetic differences between the southern and northern populations (Ledig et al., 1997;Jaramillo-Correa et al., 2006;Quiñones Pérez, Sáenz-Romero & Wehenkel, 2014), even if only most likely using neutral markers. Therefore, future reforestation programs should only be established with seed sources from the same geographical group. Second, there are not relevant climate environmental and genetic differences within each of the two clusters. Thus, seed from different populations of the same group could be mixed for improvement of genetic diversity levels. Third and finally, this study revealed the special macro-climate and soil conditions needed in the locations where P. chihuahuana is growing. Therefore, knowledge of these special conditions may be very helpful to find adequate reforestation locations in Mexico and other countries, which should have similar characteristics to the original sites.