Estimation of Genetic Diversity in Seven Races of Native Maize from the Highlands of Mexico

Characterizing the genetic diversity of maize (Zea mays L.) populations by their morphological and molecular attributes makes it possible to place populations into specific groups; thus, facilitating the design of procedures for their optimum and sustainable use. In this study, data from two lines of evidence were analyzed simultaneously to robustly classify maize populations and to determine their genetic relationships. Seven maize races of the central high plateau of Mexico were characterized using a combined analysis of 13 morphological traits and 31 microsatellite loci. The germplasm assessed included samples of 119 accessions held at Mexican germplasm banks. Cluster and principal component analyses were performed. Also, genetic and geographic relationships among the accessions were determined. Principal component analysis separated the different accessions into well-defined groups using first three principal components. The accessions of Arrocillo Amarillo and Elotes Cónicos races did not exhibit a grouping pattern, indicating greater genetic complexity. Better grounded grouping and phylogenetic relationships were obtained when traits of both lines of evidence were used simultaneously.


Introduction
The evolutionary history of maize (Zea mays L.), as well as its diversity, has been of interest for numerous studies [1], which have focused mainly on morphological variation [2,3]. In other classic studies, Cervantes, Goodman, Casas and Rawlings [4] used genetic effects and the interaction genotype × environment, while Sánchez, Goodman and Rawlings [5] involved the effects of the interaction genotype × environment along with stability parameters, which enabled to classify individuals more precisely into discrete units [6], although these parameters may not necessarily provide reliable information on possible phylogenetic relationships [7]. More recently, molecular tools, such as simple sequence repeats of DNA (SSRs) have been applied in studies of maize landraces [8][9][10][11][12], and the data and findings have added substantially to understanding maize evolution and diversification.
Some classification work on Mexican maize landraces has been based on phenetic analysis, using morphological traits [12,13], in combination, in some instances, with isozyme allelic frequencies [14,15], and combinations of morphological traits, isozymes and microsatellites have 2260 masl), state of Puebla and Montecillo, State of Mexico (19 • 27 latitude N, 98 • 54 longitude W, altitude 2250 masl). Experimental design was complete randomized blocks with two replications in each site. The experimental unit or plot consisted of two rows 5-m long and 0.8-m wide in which 44 seeds were equidistantly planted.

Morphological Traits Measured
In each experiment phenotypic traits were recorded. We randomly chose five competitive plants per plot in order to collect data for phenologic, vegetative, tassel, ear and kernel traits. Also, some ratios among those traits were calculated (Table 1).

Assessment of Microsatellite Polymorphism
The 107 accessions were analyzed using 31 microsatellite loci distributed on the ten maize chromosomes; there is ample information on these loci published in the Maize Genetics and Genomics Database (MaizeGDB), available on line at http://www.maizegdb.org/ssr.php# (Table 2). ' For the molecular analysis, genomic DNA was extracted from 100 mg of mesocotyl, coleoptile, and young leaf tissue individually from 25 plants per accession with a commercial DNA extraction kit (ChargeSwitch ® gDNA Plant, Invitrogen, Thermo Fisher Scientific, Carlsbad, USA), using a DNA extraction and purification robot (King Fisher Flex ™ , ThermoScientific, Waltham, MA, USA). Amplification was performed individually by PCR in volumes of 25 µL that contained 10 mM nucleotides, 25 mM MgCl 2 , 5× buffer, 100 ng template DNA, 1 unit DNA Taq polymerase, and 4 pmol of each primer. The protocol for PCR amplification consisted of initial denaturation for 4 min at 95 • C, followed by 25 cycles of 1 min at 95 • C, 2 min at 55 • C, 2 min at 72 • C, and one final extension of 60 min at 72 • C. PCR products were separated by capillary electrophoresis in a DNA sequencer (Genetic Analyzer ABI 3130 ™ , Applied Biosystems, Foster City, CA, USA) and detection was based on the presence of the fluorescent labels 6-FAM, ROX or HEX at the 5 extreme of the forward primers using LIZ-500 as the size standard internal marker. Data files were generated with the allelic profile of the markers for each of the populations with the software GeneMapper ® V. 4.0 [28], and allele variability with POPGENE 1.31 [29] software.

Statistical Analysis
Using the averages of the morphological data from the three sites, a combined analysis of variance was performed with SAS V.9.0. [30]. The linear model used was: where Y ijk is the observation of the ith accession in the jth environment of the kth block, µ is the general mean, α i is the effect of the ith accession, γ j is the effect of the jth environment, δ ij is the interaction effect of the ith accession with the jth environment, B(γ) k(j) is the effect of the kth block nested into the jth environment, and ε ijk is the random error associated with the experimental unit [31]. 6.00-6.08 6-FAM-GATGGGCCCTAGACCAGCTT//GCCTCTCCCATCTCTCGGT Morphological traits were then selected using two statistical methods for additional analyses. The first method was based on repeatability coefficient (r), as suggested by Sánchez, Goodman and Rawlings [5], and the second one was based on the structure of the correlation matrix of independent variables distributed on Gabriel graphics [32]. The repeatability coefficient is a criterion proposed by Goodman and Paterniani [6], which is based on discriminating morphological variables based on the stability of the characters across the environments. When ≥1, the variable is considered stable and appropriate for classification.
Allele frequencies were selected based on the level of significance between populations for each of the alleles (p ≤ 0.05) Also selected were those alleles that had a frequency above 2%. This was done to avoid problems of distancing between accessions and of interpretation when involving low frequency or single alleles in the grouping analyses.
With the morphological traits and allele frequencies selected from the 31 microsatellite loci detected in each of the individuals, a data matrix was constructed with 107 populations, using a total of 224 variables (13 morphological traits and frequency of 211 alleles previously selected). The variables were standardized by subtracting the mean and dividing by the standard deviation. The dataset was used to calculate Gower distances, which are recommended when variables of different nature are used [33] among populations; this was done with SAS V.9.0. [30] software. A phylogenetic tree was then constructed with the Neighbor-Joining method [34] in NTSYSpc V.2.2 software [35], and the population Sina-2 of the Chapalote race, considered one of the ancient maize landraces [2] was used as an outgroup.
Allele frequencies and morphological information were used to construct a correlation matrix among the traits. Each morphological trait and each allele were considered as an independent variable. The principal component analysis was based on the correlation matrix and processed with the statistical software SAS V.9.0. [30].

Analysis of Variance
Analysis of variance evidenced highly significant differences among populations for all the measured characteristics (Table 3). For population-environment interaction, it was observed that traits as plant height, peduncle length of the tassel, length of branched part of tassel, ear length, kernel width, ear grain percentage, weight of 100 kernels, volume of 100 kernels, kernel thickness/kernel length, and weight of 100 kernels/volume of 100 kernels had non-significant interaction. Finally, estimators of variance components produced values of repeatability r ≥ 1 for 19 out of the 32 traits (Table 4).  Table 4. Variance components for populations (σ 2 g ), localities (σ 2 l ), interaction populations × localities (σ 2 g×l ), error (σ 2 e ), broad-sense heritability (H 2 ) and repeatability (r) of the traits studied.

Cluster and Principal Component Analyses
To define the final set of traits, the correlation matrix was generated, and one trait of those pairs with a correlation coefficient above 0.7 was eliminated. In this way, 13 final traits (FF, LCS, NLE, KR, EL, ED/EL, KW, W100K, KW/KL, KT/KL, KT, KL and W100K/V100K) were chosen for the assessment of racial diversity and classification.
According to the cluster analysis, a total of seven groups were observed ( Figure 1). Group I was integrated by accessions of the Purépecha, Palomero Toluqueño and Elotes Cónicos races. Group II, was placed in the upper part of the phylogram, it was represented by accessions Hgo-116, Mex-6, Pueb-618, Ver-537 and Tlax-255, belonging to the Elotes Cónicos race. Group III included nine populations of the Cónico race, originating from the states of Hidalgo and Tlaxcala, and three Palomero Toluqueño accessions from the states of Mexico and Tlaxcala. Group IV was formed by a mixed set of different races (Cacahuacintle, Elotes Cónicos, Arrocillo Amarillo, Chalqueño, Purépecha). Group V comprised eight Chalqueño populations, predominantly from the states of Durango, Guanajuato, Jalisco, Zacatecas and Querétaro, and two Cacahuacintle populations. Group VI was formed by Cónico accessions from the states of Mexico, Hidalgo and Tlaxcala as well as Arrocillo Amarillo accessions from states of Veracruz and Mexico. Group VII integrated populations of the races Cónico and Chalqueño, predominantly from the states of Mexico, Hidalgo Puebla and Tlaxcala, and one population (Pue-512) of the Elotes Cónicos race, distinguished for its recent formation.
Dispersion of the 107 maize accessions of the highlands of Mexico was represented in a three-dimensional space determined by the first three principal components (Figure 2), which exhibited broad variation within each racial group and groups defined by race. The first principal component explained 8.7% of the total variation, and the alleles that most contributed were phi072-C, phi121-B, phi121-H, phi346482-J, phi093-E, phi015-K, phi050-H, phi024-Q, phi427913-F, phi346482-I, phi402893-G and phi127-B toward the positive end, while the morphological traits that had the largest influence were ED/EL, EL and KT/KL. The second principal component explained 4.9% of the total variation and was influenced mostly by the high frequency of the alleles phi033-L, phi115-C, phi265454-M, phi402893-B, phi115-A, phi051-D, phi053-G, phi96100-C, phi101249-B, phi96100-N and phi265454-P at the positive end of the component. The third PC explained 4.6% of the total variation and was more closely associated with morphological traits (LCS, EL, KW, W100G, W100K/V100K, KT and FF) and with some allele frequencies (phi050-G and phi427913-I). Dispersion of the 107 maize accessions of the highlands of Mexico was represented in a threedimensional space determined by the first three principal components (Figure 2), which exhibited broad variation within each racial group and groups defined by race. The first principal component explained 8.7% of the total variation, and the alleles that most contributed were phi072-C, phi121-B, phi121-H, phi346482-J, phi093-E, phi015-K, phi050-H, phi024-Q, phi427913-F, phi346482-I, phi402893-G and phi127-B toward the positive end, while the morphological traits that had the largest influence were ED/EL, EL and KT/KL. The second principal component explained 4.9% of the total variation and was influenced mostly by the high frequency of the alleles phi033- L, phi115-C, phi265454-M,  phi402893-B, phi115-A, phi051-D, phi053-G, phi96100-C, phi101249-B, phi96100-N and phi265454-P at the positive end of the component. The third PC explained 4.6% of the total variation and was more

Discussion
The high degree of variation found suggests the existence of broad genetic diversity among genotypes. Based on the value of repeatability, a total of 32 traits were selected at the first stage. With the exception of PH and LBT, the selected traits in this stage were not statistically significant for the interaction genotypes × localities. This gives them properties that are desirable for classification. According to Sánchez, Godman and Rawlings [5], the traits that are less affected by environment are

Discussion
The high degree of variation found suggests the existence of broad genetic diversity among genotypes. Based on the value of repeatability, a total of 32 traits were selected at the first stage. With the exception of PH and LBT, the selected traits in this stage were not statistically significant for the interaction genotypes × localities. This gives them properties that are desirable for classification. According to Sánchez, Godman and Rawlings [5], the traits that are less affected by environment are more useful for characterizing populations. Some of the traits detected here as appropriate for classification were also selected by [5], who indicated that ear, and in general reproductive traits are the most suitable for characterization of maize races. Among these NL, LCS, KW, ED/EL and KW/KL are included.
The dendrogram differentiated seven groups, with subgroups within two of them. Group I had three subgroups. Subgroup I-A included six accessions of the Elotes Cónicos race and two of the Chalqueño race; Subgroup I-B comprised most of the populations of the Purépecha race, which formed a well-defined cluster, with poorly differentiated native populations within the Sierra Purépecha region (Salvador Escalante, Charapan, Tingambato, Nahuatzen and Paracho, Michoacán), which is isolated by several geographic barriers created by the mountainous orography; cultural identity is strong and there are semi-collective forms of land use [38]. The results suggest that these populations constituted a genetic group different from the Chalqueño group, and strongly justify their integration into the newly described race Purépecha, as it was proposed by Romero, Castillo and Ortega [39] and later confirmed by Subgroup I-C comprised four Palomero Toluqueño accessions.
The nine populations (Group III) of the Cónico race were grouped intermediately between Palomero Toluqueño populations (Subgroup I-C) and those of Cacahuacintle (Subgroup IV-A). This is in agreement with [2], who proposed that the Cónico race could be a product of hybridization between Palomero Toluqueño and Cacahuacintle races. In this group, the Cónico accession Pue-116 is outstanding as it appeared as a recently formed population.
Group IV is divided into two subgroups. Subgroup IV-A integrated six accessions of Cacahuacintle, three Elotes Cónicos, two Arrocillo Amarillo, two Chalqueño and one Purépecha population (Pur-86). Subgroup IV-B was defined by geographic origin, exhibiting populations from the state of Puebla (five Arrocillo Amarillo, three Cacahuacintle and one Cónico), as well as one population of the Chalqueño race from Totolapan, Morelos. The Cacahuacintle and Arrocillo Amarillo races are associated in both Subgroups (IV-A and IV-B), indicating a higher degree of diversity [2,13,40], since both are closely related and share similar traits.
The interspersing of the Cónico and Chalqueño races in Groups V, VI and VII indicates that the two races are closely related due to the constant gene flow among their populations. This did not allow placing the populations in well-defined groups because their geographic distribution is almost identical. Results reveal that the principal maize races of the Mexican highlands share a common origin and have a diffuse genetic background extending throughout the populations; this is also indicated by the presence of common alleles of the molecular markers and similar values of the morphological traits; therefore, it is inferred that differentiation of these races has been influenced mainly by natural and artificial selection pressure in relatively recent times.
The Cónico population Mor-93 was placed alone in the far inner part of the phylogram. It is likely that this population has not been in contact with those of other regions, or it has had influence from materials not included in this study.
Regarding the principal component analysis of the combined dataset (frequencies of 211 SSR alleles and 13 morphological traits), the first 20 principal components explained 55.1% of the variance. When allele frequencies are involved in the analysis, it is common that the percentage of explained variance decreases notably; in contrast, when only morphological characterization is used, a higher percentage is reached with a smaller number of principal components [37,41]. Such reduction could be due to the rare alleles present in the population, effect of molecular markers in the genetic architecture of the traits, and the linkage between the molecular markers and major or minor genes explicating the phenotypic diversity present in the population [42].
The principal component analysis revealed grouping patterns very similar to those observed in the phylogenetic analysis, indicating that the definition of the associations is highly consistent. Races Cacahuacintle, Purépecha, Elotes Cónicos, Palomero Toluqueño constitute well-defined groups, while the Chalqueño race exhibis two groups, in which it was possible to distinguish accessions from the states of Durango, Zacatecas, Jalisco and Querétaro from those of the traditional core of this race located in the highlands of central Mexico. The Cónico accessions overlapped with the Chalqueño race, indicating genetic complexity influenced by neutral selective markers. Spatial distribution of populations observed on the principal component analysis was similar to that reported by Wellhausen, Roberts, Hernández and Mangelsdorf [2], who found that the race Cónico was localized between Palomero Toluqueño and Cacahuacintle. Such result suggested that the race Cónico was derived from the cross between both races, or that a close relationship does exist between the three races. In addition, a strong grouping was observed between the races Cónico and Elotes Cónicos, and some authors as Wellhausen, Roberts, Hernández and Mangelsdorf [2] even considered that Elotes Cónicos is only a pigmented sub-race of Cónico.
Spatial distribution of the Purépecha race, according to the three first principal components, shows that the populations are very similar, observed in the compact group toward the positive end of the first principal component. It can also be observed that populations of the Cónico race are found far apart, indicating that the variation into this race is very broad, or rather that the populations are not appropriately classified in the Mexican germplasm banks. Most of the maize accessions from Mexican germplasm banks were classified according to morphological variables. However, the morphological variables are often susceptible to phenotypic plasticity [43], which caused a bias in the classification of some populations of the Mexican germplasm banks. Our study demonstrated that this disadvantage can be solved with the combined use of molecular markers and morphological variables.
In the same context, it can be observed that accessions Mex-122 of the Cacahuacintle race and Pur-86 of the Purépecha race do not coincide with their respective groups. This suggests that they are perhaps not correctly classified or that they have had genetic influence from other races.

Conclusions
The molecular and morphological information analyzed together for racial classification provided strong elements for understanding diversification and evolutionary relationships that exist among the maize landraces of the highlands of Mexico. Results obtained in this study, in general, agree with the description in that the Pyramidal complex (races Cónico, Chalqueño, Palomero Toluqueño, Arrocillo Amarillo and Elotes Cónicos) are more closely related genetically, while the Cacahuacintle and Purépecha races are not closely associated with this complex.

Conflicts of Interest:
The authors declare no conflict of interest.