INTRODUCTION
Coffea arabica L. is a shrubby autogamous species originating from the southeast of Ethiopia and the north of Kenya on the African continent. It derived from the hybridization between Coffea canephora Pierre ex A. Froehner and Coffea eugenioides S. Moore, therefore its genome is the fusion of the two genomes (Romero et al., 2010). The C. arabica cultivars are responsible for 70% of the coffee trade worldwide; while the remaining 30% comes from C. canephora and others related species. In addition, the main cultivars grown in coffee-producing countries, and particularly in the Americas, have a narrow genetic pool, because their propagation started from very few plants that belonged to the Typica and Bourbon cultivars (Aerts et al., 2012). Accordingly, the Arabica cultivars that are distributed worldwide rose from the mutation and hybridization of those materials (Van Der Vossen et al., 2015). However, in the 1960s coffee genetic improvement programs performed interspecific crossbreeding among traditional cultivars of C. arabica and C. canephora to introduce new traits into the crop, such as pest and disease tolerance, especially leaf rust caused by Hemileia vastatrix. For this reason, a spontaneous hybrid between C. arabica and C. canephora, discovered on the Timor Island in Southeast Asia, was used as an intermediary (Romero et al., 2014). These programs have increased the genetic diversity of C. arabica by crossbreeding traditional cultivars and Timor hybrid descendants therefore introducing genes of interest along with unwanted traits.
Peru is an important producer of specialty coffee beans in the world. However, when and how did coffee cultivars arrive in Peru? It is still unclear. According to the Junta Nacional de Café, the first plants arrived in the mid-18th century and were sown in the province of Chinchao (Huanuco Region). Shortly after, others cultivars arrived from Guayaquil, Ecuador, and were taken to the forest in central Peru. However, at different times throughout the history of coffee cultivation in Peru, improved genotypes might have arrived from neighboring coffee producing countries without being registered. Furthermore, according to Palomino et al. (2014) a mixture of cultivars is believed to exist. Despite the long history as coffee producer, Peru has no records of any genetic improvement program similar to those implemented in Colombia, Brazil and other countries in Central America. However, it is known that the coffee grown in the Peruvian forest produces grains with high organoleptic qualities, which are an important requirement for a progenitor in genetic improvement programs. Therefore, knowing the genetic potential of genotypes in a germplasm collection is relevant in order to perform efficient crossings.
Molecular markers have been very useful to perform genetic diversity and population structure analysis of germplasm collections in most cultivated species. In C. arabica crop the genetic analysis has been performed with molecular markers such as random amplified polymorphic DNA (RAPD), inter-simple sequence repeats (ISSR) (Yan et al., 2019), amplified fragment length polymorphism (AFLP), restriction fragment length polymorphisms (RFLP), simple sequence repeat (SSR or microsatellites) and expressed sequence tag-derived simple sequence repeats (EST-SSRs). Microsatellites, despite showing some limits in their use in tetraploid species such as coffee, have been considered very efficient to determine the genetic diversity, population structure and germplasm characterization in coffee collections (Vieira et al., 2010; Teressa et al., 2010; Geleta et al., 2012; Al-Murish et al., 2013; Mishra et al., 2014; Motta et al., 2014; Ferrao et al., 2015; Alemayehu, 2017; Vieira et al., 2017). Microsatellites have also been shown to be useful in linkage mapping (Pestana et al., 2015). Sequence-related amplified polymorphism (SRAP) markers have recently been used efficiently in studies of genetic diversity and population structure in germplasm collections of coffee (Al-Murish et al., 2013; Mishra et al., 2014; Jingade et al., 2019; Yan et al., 2019). These are characterized by a forward primer containing a GGCC sequence close to the 3’ terminus, which preferably aligns with GC-rich sequences in coding regions. On the contrary, the reverse primer contains an AATT sequence, which preferably aligns with intron sequences. In addition, SRAP markers are cheap when compared to others molecular markers, require little amount of DNA, their band patterns are unambiguous and highly reproducible (Jingade et al., 2019). Consequently, in this work the genetic diversity and population structure of a Peruvian collection coffee germplasm were analyzed using microsatellite and SRAP molecular markers.
MATERIALS AND METHODS
Plant material
In this study 54 genotypes were used belonging to 17 Coffea arabica L. and one Coffea canephora Pierre ex A. Froehner accessions (Table 1) from the coffee germplasm collection of the coffee estate Santa Teresa (47º4’26” S, 88º5’20” W; 1513-1750 m a.s.l.), located in Villa Rica (89% RH annual average; 23.4 ºC maximum, 15.8 ºC minimum annual average temperature; 1978 mm annual precipitation), Provincia Oxapampa (Pasco Region), Peru. The origin of the genotypes is unknown, nevertheless, they were grouped in accessions according to the similarity of plant morphology and the cultivar name that was said to the farmer when acquired the seeds. Furthermore, Santa Teresa is a main producer and supplier of coffee botanical seeds to many farms in Peru. Besides that, their germplasm is included in the denomination of origin for Villa Rica coffee.
DNA extraction
Genomic DNA was extracted from 0.1 g young and healthy leaves tissue of unstressed coffee plants. For this purpose, the protocol of Doyle and Doyle (1990) was used with the CTAB buffer (hexadecyltrimethylammonium bromide 3%,
2.1 mM NaCl, 250 mM Tris HCl pH 8, 50 mM EDTA pH 8) and some modifications (Palomino et al., 2014). Each sample was then digested with RNAse (10 mg mL-1) to eliminate the remaining RNA. The concentration and purity were determined by measuring absorbance at 260 nm and the ratio 260/280 nm, respectively. The DNA integrity was determined by electrophoresis in 1% agarose gels using the TBE buffer solution (0.9 M Tris base, 0.9 M boric acid, 0.5 M EDTA, sterile milli-Q water), dyed with a nucleic acid gel stain (GelRed, Biotium, Fremont, California, USA) and visualized under UV light.
Microsatellite markers
The PCR amplification was performed using 18 pairs of microsatellite primers (Table 2). The reaction mixture contained 60 ng DNA in 10 μL final volume. The master mix and the PCR amplification profiles were similar to those reported for each primer pair (Combes et al., 2000; Baruah et al., 2003; Silvestrini et al., 2007; Teressa et al., 2010; Geleta et al., 2012; Romero et al., 2014) but in some pairs the alignment temperatures were modified (Table 2). The amplification products were separated in 6% denaturing polyacrylamide gel and visualized with silver staining. The size of DNA fragments was estimated using a 100 bp marker.
SRAP markers
The PCR amplification was obtained with 14 combinations of 12 primers (5 forward and 7 reverse) (Table 3) which were selected from Mishra et al. (2014) to show the highest percentages of polymorphism. The PCR amplification was carried out in 10 μL containing 60 ng DNA, 3 mM each primer, 2.5 mM MgCl2, 0.2 mM dNTPs, 1 U Taq DNA polymerase and 1X buffer solution. The amplification profile was set according to Mishra et al. (2014). The amplification products were separated by electrophoresis on 2.5% agarose gels and visualized under UV light. A DNA marker (50 to 1000 bp) was used to estimate the size of the fragments.
Data analysis
Amplicons from both microsatellite and SRAP markers were evaluated in the same way, by assigning 1 as presence and 0 as absence (Motta et al., 2014), because the microsatellite analysis of tetraploid genotypes like C. arabica does not allow to distinguish between different combinations of diallelic and triallelic genotypes in electrophoresis gels. Therefore, electrophoresis bands were considered as dominant markers in both marker systems (microsatellites and SRAPs). Hence, the genetic diversity and population structure could be calculated using both molecular marker systems together. In addition, the reproducibility of molecular markers was verified by repeated amplification.
Polymorphic information index (PIC), effective multiplex ratio (EMR) and marker index (MI) (Chesnokov and Artemyeva, 2015) were calculated to assess the molecular marker system.
In the genetic diversity analysis of coffee germplasm, the following descriptive parameters of the population were calculated: percentage of polymorphic loci, effective number of alleles (Ne), gene diversity (h) (Nei, 1973) and Shannon information index (I) using the POPGENE software Version 1.32 (Yeh et al., 1999).
The population structure was analyzed by the coefficients of genetic differentiation (GST) and the gene flow (Nm), performed using POPGENE version 1.32 (Yeh et al., 1999). The composition of the variability was determined by analysis of molecular variance (AMOVA), using the Arlequin software view. 3.5 (Excoffier and Lischer, 2010). In addition, Nei’s genetic distance matrix (Nei, 1978) of accessions was calculated with the POPGENE version 1.32 (Yeh et al., 1999). The grouping analysis with the unweighted pair group method with arithmetic mean (UPGMA) algorithm and the principal coordinate analysis were performed using DARWIN version 6.0.021 (Perrier and Jacquemoud- Collet, 2006). The population structure was corroborated with the Bayesian analysis through a bar chart, which was built by STRUCTURE software version 2.2 (Pritchard Lab, Department of Genetics, Stanford University, Stanford, California, USA). This analysis uses a Bayesian model that determines the groups of genetically similar genotypes (Pritchard et al., 2000). The analysis was performed with K values from one to 15, running 15 repetitions to each K, for which the mixing model (admixture) was applied with 400 000 iterations of the Markov Monte Carlo Chain (MCMC) and a period of 1 000 000 burn-in. The appropriate K-value was determined using the online program STRUCTURE HARVESTER web v0.6.94, which perform the Evanno method (Earl and vonHoldt, 2012).
RESULTS
Molecular marker assessment
Using 18 pairs of microsatellite marker primers, 100 bands were generated from the whole coffee germplasm collection, 97 of them were polymorphic (97% of the total); moreover, the means of PIC, ERM and MI were 0.168 ± 0.09, 5.285 ± 2.72 and 0.909 ± 0.65, respectively (Table 4). Table 4 shows that the largest number of polymorphic bands per primer pair was 14, which was obtained by CaM03, the smallest was two and was obtained by E12-3CTG. All the SSR markers had polymorphic bands.
The 14 combinations of SRAP primers generated 245 bands, 175 of which were polymorphic, representing 71.43% of the polymorphism; moreover, the means of the PIC, EMR, MI were 0.112 ± 0.04, 9.133 ± 2.77 and 1.079 ± 0.62, respectively (Table 4). In addition, all bands showed high quality and reproducibility (Figure 1). The largest number of polymorphic bands per primer pairs was 17 and was obtained with the Me6-Em5 combination, the lowest was six and was obtained with the Me2-Em6 combination (Table 4). The highest proportion of polymorphic bands was 91.67% which was obtained with the Me3-Em12 and Me1-Em6 combinations (Table 4).
Genetic diversity analysis
Genetic diversity analysis of coffee accessions was carried out using both marker systems together. Nei (1973) genetic diversity index, Shannon index, effective number of alleles and percentage of polymorphic loci were calculated for each accession and for the total population (Table 5). Table 5 also shows that the PchA accession, belonging to C. arabica species, shows the highest values of the genetic diversity index (h = 0.0967), Shannon index (I = 0.137) and effective number of alleles (Ne = 1.1834). The Lim accession on the other hand has the lowest values for such estimators (h = 0.006, I = 0.0087, and Ne = 1.0108), while the Rob accession, belonging to the C. canephora species, showed the highest values for number and percentage of polymorphic loci (74 and 21.45, respectively).
MB: Number of monomorphic bands; PB: number of polymorphic bands; TB: total number of bands; APB: average of polymorphic bands; PIC: polymorphic index content; EMR: effective multiplex ratio; IM: marker index; SRAP: sequence-related amplified polymorphism. The values in parentheses are standard deviation
Table 6 reveals a slight decrease in genetic diversity values (h = 0.096; I = 0.1614; Ne = 1.1428; 178 polymorphic loci, representing 51.59% of the total bands) when the C. canephora accession was excluded. In addition, the calculation of genetic diversity estimators in C. arabica accessions for each molecular marker systems separately showed that SSR markers had higher average values than SRAP markers (Table 6).
Analysis of the population structure
Analysis of the population structure was carried out using only the C. arabica accessions. The genetic differentiation GST showed high values (GSTSRAP = 0.6033, GSTSSR = 0.7568 and GSTSRAP+SSR = 0.6584); while gene flow (Nm) had low values (NmSRAP = 0.3288, NmSSR = 0.1606 and NmSRAP+SSR = 0.2594) (Table 6). In addition, AMOVA analysis allowed to understand the genetic potential in term of accessions variability (Table 7): the variation among and within accessions was 43.05% and 56.95%, respectively, while the fixation index (FST) was 0.43045 (p-value < 0.0001).
Ne: Number of effective alleles; h: Nei (1973) diversity index; I: Shannon index; PL: polymorphic loci; PPL: percentage of polymorphic loci. The values in parentheses are standard deviation.
*Coffea canephora.
PL: Polymorphic loci; PPL: percentage of polymorphic loci; Ne: Number of effective alleles; h: Nei (1973) diversity index; I: Shannon index; GST: genetic differentiation; Nm: gene flow.
The values in parentheses are standard deviation.
The genetic distance matrix was then used to build an UPGMA dendrogram (Figure 2) to assess the relationships among 18 accessions from the coffee collection. There are no duplicate accessions in the dendrogram, therefore each accession is different from the other. In addition, at 0.1 of distance, two groups appear well separated and structured: the first group contains only the C. canephora accession while the second contains all C. arabica accessions (Arabica). Furthermore, the Arabica coffee group can be further divided into six subgroups: the first two are composed of a single accession (Marg and GrC, respectively); the third is the largest and includes CatA, Visar, CatR, Caui, TypA and Pch accessions; the fourth consists of GeiA, BorA, GeiR and JBrack; the fifth is made up of Lim and Cat, while the sixth is composed by PchA and Pac. The principal coordinate analysis was performed using the same distance matrix (Figure 3), with the x-axis corresponding to the coordinate one and the y-axis corresponding to the coordinate two. In Figure 3, the Rob accession of C. canephora is separated from the C. arabica accessions. Likewise, the analysis of the genetic structure performed by the Bayesian method on the 54 coffee tree genotypes from the collection was carried out using the mixing model. In the determination of the number of groups with the delta K of Evanno method, a first peak was generated at K = 2, therefore two groups can be identified in the collection: the first with the C. canephora genotype and the second containing the C. arabica accessions. However, the delta K method also generated a second peak at K = 7, suggesting the existence of the same six subgroups of C. arabica generated by the UPGMA dendrogram (Figures 2 and 4).
Groups are numbered in Roman numerals, from I to VI are Coffea arabica accessions while VII is C. canephora.
The x-axis corresponds to component 1 and the y-axis to component 2. Rob is the Coffea canephora accession.
DISCUSSION
The evaluation of the microsatellite and SRAP markers systems in terms of percentage of polymorphism, PIC and MI allowed their combined use to analyze the population structure and genetic diversity of the coffee germplasm collection under study. Firstly, the percentages of polymorphism in both systems were high, even when only C. arabica accessions were evaluated (47.35% in SRAP and 62% in SSR) (Table 6). These results are congruent with the studies by Teressa et al. (2010) and Jingade et al. (2019) where SSR and SRAP markers were used, respectively. In addition, the analysis revealed a PIC index for SSR (PIC = 0.1679) higher than for SRAP (PIC = 0.1123), while, on the other hand, the MI index, which measures the efficiency of the molecular marker system, was lower for microsatellites (MI = 0.9089) than for SRAP (MI = 1.0788) (Table 4). Therefore, based on the IM, the SRAP would be a more efficient tool to determine the polymorphism, even if its PIC average value is low. Moreover, the electrophoresis in agarose gel used in SRAP markers is faster and less laborious than the electrophoresis in polyacrylamide gel for microsatellites. On the other hand, Casadevall et al. (2011), using microsatellites and SRAP markers to study the genetic diversity of Cynara cardunculus, recommended to use microsatellite markers for their easy visualization and codominance information. The fact that in this study as well as in our research (Table 6) the estimators of genetic diversity obtained by SRAP appear lower than those obtained by microsatellite might be due to the characteristics of the marker system. The SRAP markers are focused on chromosome regions rich in genes and the corresponding polymorphic bands are based on introns and exons, whilst, according to Abdurakhmonov (2016), the microsatellites markers are mainly linked to non-coding genomic regions like intergenic spaces and introns. Therefore, considering the polyploidy of C. arabica and the results of our analysis of marker systems, we believe that SSR and SRAP marker systems should both be used to estimate the genetic diversity and structure population of C. arabica Peruvian germplasm.
To study the genetic diversity of the coffee accessions, only one database was generated by screening the gel electrophoretic profiles of both markers system. The percentage of polymorphic bands obtained was high (51.59%) (Table 6) and is within the range of polymorphism reported in previous studies of C. arabica genetic diversity (Teressa et al., 2010; Geleta et al., 2012; Al-Murish et al., 2013; Mishra et al., 2014; Romero et al., 2014; Jingade et al., 2019). Furthermore, this would indicate the existence of a high molecular variability in the germplasm evaluated.
On the other hand, other genetic diversity estimators obtained in the germplasm of C. arabica were low. The index of genetic diversity value obtained (Table 6) was lower than the one reported by Geleta et al. (2012), who analyzed a bigger population from the main coffee producer areas of Nicaragua. The Shannon index (I = 0.1614) was lower to the one reported by Silvestrini et al. (2007) in a collection from Ethiopia, Eritrea, Yemen and Brazil (I = 0.21) and the one by Aga et al. (2003) (I = 0.3) in Arabic coffee from Ethiopian forests. However, Silvestrini et al. (2007) and Scalabrin et al. (2020) both suggested that the C. arabica diversity is very low as compared to other crops because of the recent hybrid origin of the species and of the way the world commercial cultivars have been generated from a very low number of genotypes. In addition, the preferred autogamic reproductive fashion of the species allows only a 10% of cross fertilization. On other side, the low genetic diversity index obtained for C. arabica is in contrast to the high index obtained for C. canephora (Table 6), similarly to Silvestrini et al. (2007) and Motta et al. (2014).
The high index of genetic differentiation (GST = 0.6584), similar to what obtained by Silvestrini et al. (2007), might be mainly due to the autogamic reproduction of C. arabica. Therefore, the different accessions of the germplasm collection might be all different from each other and, even if they grow in a same farm, their genetic flux was low (Nm = 0.2594). Furthermore, these accessions might not be very different from the cultivars from which they were generated, after being introduced to Peru. Furthermore, the genetic diversity index higher than zero for each accession would indicate that the level of genetic diversity among the Peruvian cultivars is relatively high, similarly to what reported by Geleta et al. (2012) in Nicaragua. In terms of the origin of cultivars grown in Peru, there are written reports only limited to the Spanish viceroyalty and the beginning of the republican period. Furthermore, it cannot be ruled out that improved plants could have also been introduced from Central America, Brazil and Colombia, which would present, among others, genes introgressed mainly from the Timor hybrid. It is also possible that in some areas of Peru there are still descendants from a group of genotypes arrived in Peru from Ethiopia and Eritrea thanks to a FAO expedition of 1964-1965, according to Silvestrini et al. (2007). Therefore, in the different coffee producing areas in Peru there might be germplasm, whose genetic diversity is not yet evaluated, that could be used in genetic improvement programs.
The AMOVA (Table 7), that assigns the most variability to the inter-accessions source, allowed the structuring of the coffee germplasm into different accessions. Furthermore, the high FST value reinforced the hypothesis of low gene flux between accessions. Besides, the grouping of accessions generated from the UPGMA dendrogram and principal coordinates analysis was congruent with the Bayesian analysis. In this analysis, the graph generated by STRUCTURE with K = 2 has clearly revealed the expected genomic differences between C. canephora and C. arabica, as revealed also by UPGMA dendrogram and principal coordinates analyses. When the Bayesian analysis was performed with K = 7, it allowed to clearly group the Arabica accessions into six clusters that correspond to those observed in the UPGMA dendrogram (Figures 2 and 5). This clustering of C. arabica accessions might be due to the relationships existing among the accessions within each cluster, because they probably came from the same cultivar or cultivars that are genetically close. Nevertheless, some accessions like PchA (h = 0.0967) have a genetic diversity higher than the remaining accessions. This relatively high diversity would be explained by the different origins of genotypes, which could have arrived to the farm at different times, but that were grouped together because of their morphological similarity.
Each color represents a subpopulation, while the Roman numerals from I to VII identify the groups of accessions generated with the UPGMA dendrogram.
Nevertheless, the high differentiation and low gene flow between the accessions of the present study contrast with Palomino et al. (2014), who, using only one molecular marker system, proposed that in Villa Rica there is a mix of cultivars. The analysis based on the combination of SRAP and microsatellite molecular marker systems seemed able to differentiate the different cultivars, therefore might shed some light on the relationships existing among coffee cultivars grown in Peru.
CONCLUSIONS
The high genetic differentiation between the coffee accessions in the collection, the genetic structuring within the Coffea arabica accessions and the low percentage of cross pollination would indicate that the cultivars, from which the accessions were originated, have been preserved over time. In addition, the genetic diversity that is maintained in the different coffee producing areas in Peru would be the associated with the bicentennial presence of coffee in Peru along with the reported and unreported introduction of improved cultivars from different origins.