The hyperdominant tropical tree Eschweilera coriacea (Lecythidaceae) shows higher genetic heterogeneity than sympatric Eschweilera species in French Guiana

Background and aims – The evolutionary history of Amazonia’s hyperabundant tropical tree species, also known as “hyperdominant” species, remains poorly investigated. We assessed whether the hyperdominant Eschweilera coriacea (DC.) S.A.Mori (Lecythidaceae) major limitation to their correct botanical identification is the scarcity of diagnostic characters on sterile herbarium vouchers, because many closely related tree species are difficult to distinguish based on vegetative charac ters alone, trees are tall and bear little or no flowers or fruits for most of the year (Mori & Prance 1990; Mori et al. 2001; Goodwin et al. 2015). In some clades, tree species are weak-ly differentiated due to relatively recent speciation events or occasional hybridization (Gonzalez et al. 2009; Pennington & Lavin 2016; Caron et al. 2019). It is thus reasonable to as-sume that botanical identification errors may occur in hyper -dominants; for instance, a rare taxon may be lumped with the local dominant taxon (Hardy et al. 2017). A related possibil-ity is that some hyperdominants may include cryptic species that are morphologically indistinguishable (based on a limited set of characters), but that represent distinct evolutionary lineages (Cavers et al. 2013; Turchetto-Zolet et al. 2013; Torroba-Balmori et al. 2017). Molecular markers may contribute to species delimitation in species complexes where identification based on morphology is challenging (Duminil & Di Michele 2009). Here we examine if the hyperdominant and morphologically variable Amazonian tree species Eschweilera coriacea (DC.) S.A.Mori represents a single genetically cohesive taxon and whether it presents high genetic variation, an indica-Conclusions – We found no conclusive evidence for cryptic species within E. coriacea in French Guiana. SSRs detected fewer gene pools than expected based on morphology in the Parvifolia clade but discriminated evolutionary relationships better than available plastid markers. A positive trend between demographic abundance of species and allelic richness illustrates that hyperdominants may have a high evolutionary potential. This hypothesis can be tested using more powerful genomic data in combination with tree phenotypic trait variation and characterization of niche breadth, to enhance our understanding of the causes of hyperdominance in Amazonian trees.


INTRODUCTION
Neotropical rainforests are the world's most diverse terrestrial ecosystems, harbouring 90 000-110 000 species of seed plants, which represents ca. 37% of all seed plants worldwide (Gentry 1982;Barthlott et al. 2007;Antonelli & Sanmartín 2011;Eiserhardt et al. 2017). The relative abundances of plant species and their geographic distribution ranges vary strongly in these forests (Peters et al. 1989;Pitman et al. 2001;Macía & Svenning 2005;ter Steege et al. 2013). In Western Amazonia, small sets of common and abundant species, known as oligarchies, are an ubiquitous feature of tree communities at local, landscape and regional scales (Pitman et al. 2001Arellano et al. 2014Arellano et al. , 2016. Across lowland Amazonia, a similar pattern is observed, with only 227 species with estimated population sizes of > 3.7 × 10 8 trees (ter Steege et al. 2013). These hyperdominant species represent only 1.4% of the estimated total of 16 000 Amazonian tree species, but as much as 50% of the estimated total number of tree stems (ter Steege et al. 2013). Among the 25 families richest in tree species in Amazonia, Arecaceae, Myristicaceae and Lecythidaceae have the highest proportion of hyperdominant species (ter Steege et al. 2013). However, even for the most abundant of these species, it remains unknown whether their taxonomic definition based on morphological characters includes a single, or several evolutionary lineages. We address this question in Eschweilera coriacea (DC.) S.A. Mori (Lecythidaceae), the only tree species that qualified as hyperdominant in all six Amazonian regions -Guiana Shield, northwest, southwest, south, east, and central Amazonia (ter Steege et al. 2013).
The causes of hyperdominance in Amazonia are an active field of research. Considerable overlap has been observed in the species composition of regional Western Amazonian oligarchies and Amazonian hyperdominants, suggesting that the basin-wide pattern arises in part from the combined smallerscale processes (Pitman et al. 2001ter Steege et al. 2013). At the regional scale, oligarchic species have been found to display a wider environmental tolerance, on average, than non-oligarchic species in the same communities (Arellano et al. 2014), which may suggest a high adaptive potential. At larger geographic scales, the strength of the oligarchic pattern was found to decrease, due to the pure effect of area and due to reduced landscape connectivity (Arellano et al. 2016). These results suggest geographic and physiological limits to dominance patterns and are congruent with ter Steege and colleagues' observation that most hyperdominant species are habitat specialists and are only dominant in certain forest types and in certain regions of the basin (ter Steege et al. 2013). Hyperdominants include many species useful to humans, thus humans may have contributed to shaping hyperdominance patterns (Levis et al. 2017;McMichael et al. 2017). Notwithstanding the reasons for the wide-range dominance patterns in Amazonia, their main implication is that a relatively small suite of tree species accounts for a large proportion of Amazonian ecosystem services, such as water, carbon and nutrient cycling, which should have the potential to greatly simplify research and modelling efforts in forest ecology and biogeochemistry (ter Steege et al. 2013).
The inference of the hyperdominant species status is predicated on a correct botanical identification of tree species in inventory plots. However, correct identification is not trivial because many hyperdominants belong to taxonomically difficult, species-rich genera such as Eschweilera (Lecythidaceae), Protium (Burseraceae), or Licania (Chrysobalanaceae) (Funk et al. 2007) in which several species can co-occur sympatrically (Gonzalez et al. 2009; Baraloto et al. 2012). One major limitation to their correct botanical identification is the scarcity of diagnostic characters on sterile herbarium vouchers, because many closely related tree species are difficult to distinguish based on vegetative characters alone, trees are tall and bear little or no flowers or fruits for most of the year (Mori & Prance 1990;Mori et al. 2001;Goodwin et al. 2015). In some clades, tree species are weakly differentiated due to relatively recent speciation events or occasional hybridization (Gonzalez et al. 2009;Pennington & Lavin 2016;Caron et al. 2019). It is thus reasonable to assume that botanical identification errors may occur in hyperdominants; for instance, a rare taxon may be lumped with the local dominant taxon (Hardy et al. 2017). A related possibility is that some hyperdominants may include cryptic species that are morphologically indistinguishable (based on a limited set of characters), but that represent distinct evolutionary lineages (Cavers et al. 2013;Turchetto-Zolet et al. 2013;Torroba-Balmori et al. 2017). Molecular markers may contribute to species delimitation in species complexes where identification based on morphology is challenging (Duminil & Di Michele 2009).
Here we examine if the hyperdominant and morphologically variable Amazonian tree species Eschweilera coriacea (DC.) S.A.Mori represents a single genetically cohesive taxon and whether it presents high genetic variation, an indica-tor of large population size and adaptive potential (Hoffmann et al. 2017). We test these hypotheses in the Guiana shield, more specifically, in French Guiana. Eschweilera coriacea is a common canopy tree species, with a maximum height of up to 37m (Lopes 2007). According to Mori et al. (2017), it belongs to the Parvifolia clade, the most diverse clade in the family Lecythidaceae, which encompasses 63 of the 215 Neotropical species in the family. This clade is nested within the Neotropical Bertholletia clade, and represents ca. half of its 125 species (Huang et al. 2015). Species in the previously defined genus Eschweilera (Mori & Prance 1990) fall into three unrelated clades (Integrifolia, Tetrapetala and Parvifolia clades) and evolutionary relationships within clades remain poorly resolved, especially in the Parvifolia clade, either based on morphology or on genetic characters (Huang et al. 2015). Numerous species of the Parvifolia clade commonly occur in sympatry, thus forming complexes of sympatric species: for example, 11 and 15 species in forests in French Guiana (La Fumée Mountain) and Central Amazonia (BDFFP 100 ha plot near Manaus), respectively (Mori 1987;Mori & Lepsch-Cunha 1995;Mori et al. 2001;Huang et al. 2015). These sympatric species share plastid DNA haplotypes (Gonzalez et al. 2009;Caron et al. 2019), suggesting that plastid DNA sequences cannot discriminate species, which can either be due to recent common ancestry and incomplete lineage sorting, or to inter-specific hybridization.
We used microsatellites to delimit gene pools and obtain genetic diversity estimates in individuals morphologically determined as E. coriacea from several sites across French Guiana and in sympatric Eschweilera individuals determined as belonging to closely related species of the Parvifolia clade. We addressed the following specific questions: (1) Does the delimitation of gene pools in the Parvifolia clade coincide with the species determination based on morphology in French Guiana? Which species are best supported by genetic data?
(2) Is the hyperdominant E. coriacea a single cohesive species or a complex of cryptic species? Does it harbour indications of stronger genetic structure, indicating cryptic lineages, and/or higher diversity, a proxy for adaptability, in comparison with other species, and how is the variation distributed geographically?

Study species and sample collection
The Lecythidaceae family in the New World, known as the Lecythidoideae subfamily, comprises ten genera and 215 described species with a centre of geographic distribution in Amazonia (Huang et al. 2015;Mori et al. 2017). Neotropical Lecythidaceae are sub-canopy to canopy-emergent trees with fibrous bark, and distinctive showy and morphologically diverse flowers with either actinomorphic or zygomorphic androecia (Prance & Mori 1979;Mori & Prance 1990). The Parvifolia clade, to which E. coriacea belongs, is characterized by a closed androecium and a double coiled androecial hood. Another synapomorphy of this clade is the presence of a lateral aril on the seed (Huang et al. 2015). The flowers are visited and presumably pollinated by bees, as it is the case for most Lecythidaceae species (Mori & Prance 1990). Lecythidoideae produce woody fruits; their seeds are gravity dispersed and are found in large numbers directly under the parent trees. Additionally, rodents and primates consume the seeds and might play a role in seed dispersal (Mori & Prance 1990). In Paracou, a lowland forest in French Guiana, sympatric Eschweilera species have different preferences for soil water availability, e.g. E. coriacea prefers wetter habitats than E. sagotiana Miers (Allié et al. 2015), although their ecological tolerance is broad and niches are largely overlapping (S. Schmitt, Univ. Bordeaux, INRAE, France, pers. com.). Flowering occurs synchronously in October-November and leaf trait variation is also largely overlapping among species (S. Schmitt and M. Heuertz, pers. obs.).
We sampled sympatric Eschweilera species at nine locations mostly within and sometimes close to forest inventory plots of the GUYAFOR and GUYADIV networks (http://atdnmorphospecies.myspecies.info/node/781) in French Guiana, in North-Eastern South America (table 1, supplementary file 1). Leaf material of 152 individual trees was collected, representing 11 species. Botanical determinations were reached through a continual effort over years in repeated inventories of marked trees, using the vegetative and reproductive characters described in Flora Neotropica (Mori & Prance 1990) and "The Lecythidaceae Pages", a website based on Flora Neotropica (Mori & Prance 1990) Molino and D. Sabatier, pers. obs.). As reference specimen for this putative taxon, we used voucher Sabatier & Molino 4945: this specimen has flowers, and duplicates are deposited in CAY, K, NY and P. Special emphasis in the sampling was on E. coriacea, represented by 56 individuals (table 1, supplementary file 1). The plant material was dried in paper bags with silica gel immediately after collection in the field. A subset of the individuals collected were vouchered and deposited at the Herbier IRD de Guyane (CAY) or in the GUYADIV working collection at IRD-Cayenne (supplementary file 1).

Microsatellite isolation and screening
For microsatellite (simple sequence repeat, SSR) isolation, four trees identified as E. coriacea, E. parviflora, E. simiorum (Benoist) Eyma or E. wachenheimii were sampled in French Guiana. Total DNA was extracted from dried leaf or cambium materials following a CTAB method (Doyle & Doyle 1987). SSR-enriched libraries were constructed following the protocol of Techen et al. (2010). Briefly, DNA was digested with AluI and HaeIII restriction enzymes, and ligated with SSRLIB3 adapter. Three libraries for each species were built by hybridization to biotinylated oligo repeats groups,  (Martin 2011) and reads were trimmed using Sickle (Joshi & Fass 2011) based on a sliding window approach and a minimum Phred score of 20; reads shorter than 80 bases after trimming were removed. The quality of the remaining reads was checked using FastQC version 0.10.0 (https://www.bioinformatics.babraham.ac.uk/projects/ fastqc/). The resulting reads were assembled using the default options of CAP3 (Huang & Madan 1999), and microsatellite loci were identified using the QDD pipeline version 3.0 (Meglécz et al. 2010). Primer pairs were designed using Primer3 version 0.4.0 (Rozen & Skaletsky 2000).
In total, 34000 reads were assembled into 7282 sequences which contained SSRs. Twenty primer pairs were tested on 39 Lecythis and Eschweilera species. Of these, three amplified reliably and were scorable in Eschweilera species of the Parvifolia clade: eschw11740, eschw5831 and eschw64683 (supplementary file 2). We also tested nine primer pairs developed for Eschweilera ovata (Cambess.) Mart. ex Miers, a species endemic of the Brazilian Atlantic forest (Santos et al. 2019): a single locus, EO25, amplified reliably and was scorable in Eschweilera species from the Parvifolia clade (supplementary file 2). Overall, four loci were used for genotyping: the EO25 locus and the three newly developed loci.

DNA extraction and microsatellite genotyping
Genomic DNA was extracted from all 152 samples following the CTAB protocol (Doyle & Doyle 1987). DNA concentrations were measured using a Nanodrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and samples were diluted to 10 ng/µL. The four SSR markers were PCR amplified in two mixes using 5'-labelled forward primers in combination with the Qiagen Multiplex PCR kit (Qiagen, Hilden, Germany). Reactions contained 1 µL DNA (10 ng/ µL), 5 µL of 2x Qiagen Multiplex PCR Master Mix, 3 µL ultrapure water and 1 µL of primer mix (10 µM of each forward and reverse primers). The amplification reaction was performed on a Veriti 96-Well Thermal Cycler (Applied Biosystems, Foster City, Canada) using the following protocol: initial denaturation step at 95°C for 15 min, followed by 30 cycles of 30 s denaturation at 94°C, 90 s annealing at 60°C, 60 s extension at 72°C, and a final extension step at 60°C, for 30 min. Amplified fragments were separated on an automated capillary sequencer (ABI 3700, Applied Biosystems, Foster City, CA, USA). Fragment sizes were determined using ABI GeneMapper v4.1 (Applied Biosystems) in comparison with the GeneScan™ 500 LIZ™ dye size standard (Applied Biosystems), and binned into alleles manually using the frequency distribution of observed fragment sizes. Up to four alleles per genotype were found for three of the loci, we thus suspected our taxa to be tetraploid or paleopolyploids that are diploidized to some extent (Parisod et al. 2010), as had previously been suggested for the related Brazil nut, Bertholletia excelsa (Buckley et al. 1988).

Genome size and ploidy
The chromosome base number for Lecythidaceae is x = 17 (Mori et al. 2007 and references therein) and several species of the Neotropical Bertholletia clade belonging to the nonmonophyletic genera Eschweilera and Lecythis are diploid with x = 17 (Kowal et al 1977;Kowal 1989  Eschweilera species, we collected leaf or flower bud tissue from nine individuals in the Paracou inventory site representing six Eschweilera species (E. coriacea, E. grandiflora, E. pedicellata, E. sagotiana, E. squamata and E. wachenheimii) as well as one related outgroup species, Lecythis persistens Sagot, and conserved the tissues in RNAlater (Qiagen) until flow cytometry analysis at the University of Coimbra. Nuclear suspensions were obtained following Galbraith et al. (1983) by chopping RNA-later conserved tissue of the studied species and fresh leaf tissue of Pisum sativum 'Ctirad' (internal reference standard, 2C = 9.09 pg; Doležel et al. 1998) in 1 ml of WPB buffer (Loureiro et al. 2007). The nuclear suspension was then filtered using a 50 µm nylon mesh and 50 µg / ml of propidium iodide (PI, Fluka, Buchs, Switzerland) and 50 µg/ml of RNAse (Fluka, Buchs, Switzerland) were added. Samples were analysed in a Partec CyFlow Space flow cytometer (Partec GmbH., Görlitz, Germany; 532 nm green solid-state laser, operating at 30 mW) and results were acquired using Partec FloMax software version 2.4d (Partec GmbH, Münster, Germany).

Genetic diversity
Flow cytometry did not detect any ploidy differences between samples (see Results) and data was congruent with the literature (see Discussion), we thus assumed that allele number variation for all species and loci was due to paleopolyploidy, i.e., to a single ancient genome duplication event common to all the species and loci studied (see Discussion).
Since duplicated copies of the analysed SSR loci could not be separated in our data, we used an autotetraploid model for downstream data analysis (Hardy 2016) to account for this probable ancient genome duplication. We used SPAGEDI version 1.5a (Hardy & Vekemans 2002) to estimate multilocus genetic diversity parameters at the level of a) species, b) gene pools (for gene pool delimitation, see below) and c) sampling sites of E. coriacea. The genotypes with a single allele or with four alleles were coded as unambiguous tetraploid homozygotes or heterozygotes, respectively, whereas genotypes with two or three alleles were coded as incomplete genotypes to account for allele copy number ambiguity. We calculated the effective number of alleles Nae, the expected heterozygosity H E , the observed heterozygosity H O , the inbreeding coefficient F IS , and we estimated the allelic richness for a standardized sample size of eight gene copies, A R(k=8) . Standard errors (SE) for H E and H O were calculated based on the standard deviation of estimates from data subsets representing the four possible combinations of three SSRs. Significance levels for F IS to deviate from zero, i.e., deviation from Hardy-Weinberg genotypic proportions, were assessed by 10 000 permutations of gene copies among individuals. We wished to gain insight into the relationship between the abundance of a species and its genetic diversity, to address the hypothesis that hyperdominant species have increased genetic diversity, a proxy for increased adaptive potential (Hoffmann et al. 2017). We attributed a relative rank of demographic abundance to each investigated species based on raw occurrence data (supplementary file 3) from the GUYAFOR and GUYADIV networks of forest inventory plots, representing a total of 316 plots of 0.12 to 0.16 ha and ca. 143 000 stems inventoried, and we performed lin-ear regression analysis and a Spearman rank correlation test between allelic richness and ranked abundance in R version 3.5.0 (R Development Core Team 2008).

Gene pool delimitation and genetic structure
Gene pool delimitation in the SSR data was conducted using the Bayesian clustering algorithm implemented in STRUC-TURE 2.3.4 (Pritchard et al. 2000), using a tetraploid genotypes model that is robust to allele copy number ambiguity (Falush et al. 2007). The data matrix was converted from the GeneMapper output to a STRUCTURE input file that accounts for allele copy number ambiguity using the R package polysat version 1.7-2 (Clark & Jasieniuk 2011) in R version 3.5.0 (R Development Core Team 2008). In STRUCTURE, ambiguity of allele copy number was accounted for by using RECESSIVEALLELES = 1 and setting the recessive allele code to MISSING, "-9", as described in the software documentation. To infer individual assignment proportions q in K gene pools, or genetic clusters, we used an admixture model with correlated allele frequencies between clusters, running ten repetitions for each K, with K = 1 to K = 18, using a burn-in length of 100 000 and a run length of 200 000 MCMC steps. The results were summarized using STRUC-TURE Harvester web software version 0.6.94 (Earl & von-Holdt 2012) and the Clumpak server (Kopelman et al. 2015). The number of clusters K that best describes the data structure was determined based on the posterior log likelihood of runs plotted against K and using the delta K approach (Evanno et al. 2005).
These analyses were carried out a) on the full data set of 152 Eschweilera samples, b) on a reduced data set of 136 samples including only the six species represented by at least eight individuals: E. coriacea, E. decolorans, E. micrantha, E. pedicellata, E. sagotiana and E. sp. 3 and c) on 56 individuals determined as E. coriacea. We assessed the congruence of genetic and morphological species delimitation by comparing the assignment of individuals to gene pools at a threshold of STRUCTURE ancestry proportion q > 0.875 with their botanical determination. The q > 0.875 threshold was chosen because this assignment category is expected to contain genetically pure individuals and second-or later-generation backcrosses (Guichoux et al. 2013), and should thus a priori allow us to identify individuals confidently assigned to their respective gene pools, or candidate genetic species.
Additionally, we conducted a principal component analysis (PCA) using the function dudi.pca() implemented in the adegenet package (Jombart 2008) in R version 3.5.0 (R Development Core Team 2008). For this, the genotype matrix was converted to a genind object with presence and absence data of alleles using polysat version 1.7-2 (Clark & Jasieniuk 2011).

Genetic differentiation
Population genetic differentiation was assessed as overall and as pairwise F ST (Weir & Cockerham 1984) between species and between gene pools (individuals assigned with q > 0.875) in SPAGEDI. Significance was assessed by 10 000 permutations of individuals among species or gene pools. To correct for multiple testing, a false discovery rate approach Missing genotypes, unsuccessful amplification; incomplete genotypes, genotypes with two or three alleles recorded; Nae, effective number of alleles; A R(k=8) , allelic richness for a sample size of 8 allele copies; H E , expected heterozygosity (standard error); H O , observed heterozygosity; F IS , inbreeding coefficient (significance levels: n.s., not significant; *, P < 0.05; **, P < 0.01, ***, P < 0.001); n.a., not available.
was applied using the R package qvalue version 2.8.0 (Storey 2003) in R version 3.5.0 (R Development Core Team 2008).
We also assessed isolation by distance in E. coriacea at the level of sampling locations by regressing pairwise F ST / (1-F ST ) values on the logarithm of pairwise spatial distance (Rousset 1997) and at the individual level by regressing pairwise kinship coefficients (Loiselle et al. 1995) between individuals on the logarithm of pairwise spatial distances (Vekemans & Hardy 2004), and using permutation tests in SPAGEDI. For the analysis at the individual level, we added random within-location variation to individual coordinates because exact individual coordinates were unknown.

Genome size and ploidy
Eight of the nine samples, all representing leaf tissue, were analysed successfully with flow cytometry and yielded holoploid genome size estimates comprised between 1.72 and 2.14 pg for all species (table 2); no differences in ploidy were detected among the samples. Based on the literature (Kowal et al. 1977;Kowal 1989;de Barros et al. 2019) we assumed holoploid genomes to represent diploids, which led to the estimation of haploid genome sizes of 1C = 842 to 1047 Mbp for the analysed Eschweilera species (table 2).

Genetic diversity and differentiation of Eschweilera species
All 152 samples belonging to 11 Eschweilera species were successfully genotyped at a minimum of three of the four SSR markers. SSR profiles commonly displayed up to four alleles per genotype, suggesting that the investigated Eschweilera species represent diploidized paleopolyploids which retain duplicated copies at some loci. Specifically, genotypes with three or four alleles were found in all species, except in the two species with the lowest sample sizes (E. collina and E. chartaceifolia, n = 2 each). Since alleles from duplicated loci could not be told apart, we analysed the data using a tetraploid framework. A total of 56 alleles were detected across the four loci, with 7 to 21 alleles per locus (supplementary file 2). Expected heterozygosity and allelic richness at the species level were highest in E. coriacea (H E = 0.751, A R(k=8) = 4.1) and lowest in E. sp. 3 (H E = 0.524, A R(k=8) = 2.65) and E. sagotiana (H E = 0.516, A R(k=8) = 2.68), considering only species with at least eight individuals assessed (table 3). The inbreeding coefficients were positive and deviated significantly from zero in most species (table 3).
A positive trend was identified between demographic abundance of species and allelic richness ( fig. 1), with the fitted regression equation A R(k=8) = 2.842 + 0.085 x (abundance rank), R 2 = 0.138, however the relationship was not significant (Spearman signed rank correlation rho = 0.452; p-value = 0.268). Eschweilera sagotiana was the most abundant species in French Guiana, however, its allelic richness was lower than expected based on the equation.
Genetic differentiation between species represented by at least eight individuals was significant, with global F ST = 0.193 (P < 0.001). Pairwise F ST was significant for all pairs of species, except for E. decolorans and E. pedicellata, and was highest between E. sagotiana and E. sp. 3, F ST = 0.464, P < 0.001 (table 4).

Inference of gene pools, their composition and genetic diversity
The Bayesian genetic clustering analysis in STRUCTURE revealed a hierarchical structure for the complete data set.

Table 4 -Pairwise genetic differentiation (F ST ) between Eschweilera species.
Only species with at least eight individuals were considered. Numbers in brackets indicate the number of samples. Significance was assessed using 10 000 permutations. Significance levels based on corrected P-values: n.s., not significant; **P < 0.01, ***P < 0.001.    fig. 3). Eschweilera decolorans and E. micrantha also had 50% or more of their individuals assigned to a single gene pool, Cl1 and Cl5, respectively, other individuals being admixed or assigned to other gene pools (table 5, fig. 3). For E. coriacea the levels of genetic diversity and structure were particularly high: it had individuals assigned to all five gene pools, and a high proportion (52%) of its individuals were admixed between two or more gene pools (table 5, fig. 3    The STRUCTURE analysis on the six species with at least eight individuals confirmed the hierarchical structure in the data (supplementary file 4), with a clustering result for K = 5 that was very similar to the STRUCTURE result on the complete data set ( fig. 3, supplementary file 4).

E . a p i c u l a t a E . c h a r t a c e i f o l i a E . c o r i a c e a E . c o l l i n a E . d e c o l o r a n s E . m i c r a n t h a E . p a r v i f l o r a E . p e d i c e l l a t a E . s a g o t i a n
The first three axes of the PCA explained jointly 35.09% of the variance in the data. Eschweilera sagotiana formed a relatively cohesive cluster at negative values of PCA1 which only overlapped little with other species (fig. 4). Eschweilera coriacea was widely scattered in the PCA space ( fig.  4). STRUCTURE clusters overlapped with each other but showed a more segregated distribution in the PCA space than morphologically determined Eschweilera species (fig. 4).

Genetic structure and diversity in Eschweilera coriacea
The STRUCTURE analysis of n = 56 individuals determined as E. coriacea revealed an optimum at K = 2 (supplementary file 4.3) but differentiation between the two clusters was low and not significant (F ST = 0.010 n.s.).
No isolation by distance was detected in E. coriacea, neither at the sampling site nor at the individual level (supplementary file 5). Genetic diversity estimates in E. coriacea could only be estimated for three sites (Paracou, Acarouany and Bafog) which were represented by at least 10 samples. Heterozygosity and allelic richness were high and very similar in the three sites, 0.710 ≤ H E ≤ 0.775; 3.67 ≤ A R(k=8) ≤ 3.98 (table 8).

Can SSR markers resolve evolutionary relationships between Eschweilera species from the Parvifolia clade?
Our SSR data on eleven species of Eschweilera belonging to the Parvifolia clade displayed a signature of paleopolyploidy at both the species and the gene pool levels. Other species of the Bertholletia clade are diploid (e.g., Bertholletia excelsa, Eschweilera pittieri, Eschweilera neei, Lecythis minor, Lecythis tuyrana; Kowal et al. 1977;Kowal 1989 and references therein) and our haploid genome size estimates were of the same order of magnitude, ca. 1 Gbp, as those of Bertholletia excelsa. We therefore suggest a single, common paleopolyploid origin for the locus duplications observed in our data set: these loci appear to have remained in duplicated state whereas other parts of the genome appear to have diploidized (Parisod et al. 2010) probably resulting in a diploid karyotype in most species of the Bertholletia clade. The suggested paleopolyploid origin could coincide with an ancient genome duplication event reported at the base of the Ericales order to which the Lecythidaceae belong (Shi et al. 2010).
The power of our genetic data to discriminate species was relatively poor, considering that the most robust clustering solution revealed only five gene pools in our data, as  Table 6 -Genetic diversity estimates for STRUCTURE clusters Cl1-Cl5 in the Eschweilera data set. Individuals were assigned to clusters based on an assignment threshold q > 0.875; n, number of individuals assigned. Nae, effective number of alleles; A R(k=8) , allelic richness for a sample size of 8 allele copies; H E , expected heterozygosity (SE, standard error); H O , observed heterozygosity; F IS , inbreeding coefficient. Significance levels: n.s., not significant; *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Cl2
Cl3 Cl4 Cl5 Cl1 (18)   Only sampling sites with at least 10 samples were included in this analysis. Missing genotypes, unsuccessful amplification; incomplete genotypes, genotypes with two or three alleles recorded; Nae, effective number of alleles; A R(k=8) , allelic richness for a sample size of 8 allele copies; H E , expected heterozygosity (SE, standard error); H O , observed heterozygosity; F IS , inbreeding coefficient (significance levels: ***, P < 0.001).   opposed to eleven species determined based on morphology. Marker number and information content of each marker strongly affect the clustering solution (Rosenberg et al. 2005). Despite the coarse resolution, we are confident that the five clusters recovered meaningful evolutionary relationships for the multilocus marker set in our study. Our genetic taxon delimitation represents an improvement over the use of plastid DNA sequences: a reexamination of the data of Gonzalez et al. (2009) andCaron et al. (2019) showed extensive plastid haplotype sharing among species and did not allow to resolve evolutionary relationships in Eschweilera in the Parvifolia clade. Groups of individuals assigned to genetic clusters were genetically more distinct from each other (F ST = 0.390) than groups of individuals assigned to species based on morphology (F ST = 0.193). This result should be interpreted with caution, considering the discrimination power of the data in combination with the markers and methods employed. The clustering solution of the STRUCTURE software can be biased by the sampling scheme, notably by unbalanced sampling among populations (Kalinowski 2011), and by the stochastic lineage sorting specific to each marker (Orozco-terWengel et al. 2011). We believe that the first possible bias was limited, given that STRUCTURE results were consistent when using a subset of the data with even representation among species. The paucity of loci did not allow us to address the second possible bias. Ideally, genetic species delimitation should rely on a set of complementary approaches, and conclusions should only be based on a conservative interpretation of congruent results among methods (Carstens et al. 2013). According to Carstens et al. (2013), "in most contexts it is better to fail to delimit species than it is to falsely delimit entities that do not represent actual evolutionary lineages". In our case, STRUCTURE, PCA and differentiationbased methods yielded congruent results for the delimitation of genetic clusters with our marker set.
Our data delimited gene pools that largely coincided with groups of samples morphologically determined as E. sp. 3 (Cl3) or E. sagotiana (Cl4) and, to a lesser extent, determined as E. decolorans (Cl1) or E. micrantha (Cl5). The clusters identified with greatest confidence corresponded to the species pair that displayed the highest level of differentiation, E. sp. 3 and E. sagotiana, suggesting that these two species may represent the most divergent taxa in this sympatric species complex. In the fifth cluster, Cl2, 50% of individuals corresponded to E. coriacea, and there was a markedly lower agreement in the cluster definition vs. its species composition (see also below). The genetic heterogeneity of E. coriacea may largely account for such mismatch, although hybridization between closely related taxa in the Parvifolia clade (Caron et al. 2019) may also contribute to hindering taxon delimitation with a limited number of markers.

Reasons for poor congruence between morphological and genetic species delimitation
Morphological and genetic species delimitations represent different abstractions to deal with the complex reality that biodiversity represents. Since both abstractions rely on different species concepts (de Queiroz 2007), they can be congruent, but are not necessarily expected to be. In our study, where marker resolution was low, only two out of eleven species showed a good congruence between genetic and morphological species delimitation. Hybridization and introgression, as suggested in the Parvifolia clade (Gonzalez et al. 2009;Huang et al. 2015;Caron et al. 2019) inevitably leads to low genetic differentiation between species, which causes challenges for genetic delimitation. In rainforest tree species complexes that contain lower numbers of species than considered in our study, and where more powerful markers were used, genetic species delimitation has proven successful, e.g., in Carapa (Meliaceae), Erythrophleum (Fabaceae) and Milicia (Moraceae) (Duminil et al. 2006(Duminil et al. , 2010Daïnou et al. 2016). Another reason for poor congruence between morphological and genetic delimitation could be mistaken species identification based on morphology in our data, e.g., three individuals morphologically identified as Eschweilera sagotiana were assigned genetically to the cluster mainly identified as E. decolorans (Cl1 in red, in fig. 3).

Choice of molecular approaches for taxon delimitation
We chose to use SSR data and methods based on allele frequency differences for gene pool delimitation in the Parvifolia clade. Given the expected large population sizes in E. coriacea (ter Steege et al. 2006(ter Steege et al. , 2013 and the lack of phylogenetic signal using plastid DNA markers in the Parvifolia clade (Gonzalez et al. 2009;Caron et al. 2019), we assumed that evolutionary relationships may be shallow in the clade, which is why we opted for population genetic, rather than phylogenetic approaches for species delimitation (Medrano et al. 2015;Luo et al. 2018). Indeed, coalescent theory shows that the expected time to the most recent common ancestor (TMRCA) for any two homologous sequences is equal to the effective population size, Ne, in numbers of generations, Ne being the size of a (diploid) population evolving according to a Wright-Fisher model with random mating and discrete generations (Nordborg 2001). In the absence of inter-specific gene flow, the performance of phylogenetic methods for species delimitation depends on the ratio of population size to divergence time (Luo et al. 2018): phylogenetic methods tend to succeed if species divergence time is (substantially) older than the mean TMRCA of gene copies within species (Maddison 1997;Rosenberg & Nordborg 2002). For hyperabundant and widespread tree species that maintain huge population sizes over large areas due to efficient seed and pollen dispersal, this condition is unlikely to be fulfilled. Even if Ne is often much smaller than the census population size N, for example because of variation in reproductive success (Hartl 2000), the TMRCA of gene copies in hyperabundant species is likely to be many million years in the past, and should thus regularly fall within the ancestral species, before the speciation event(s) of interest. The large effective population sizes of common rainforest trees are thus the main reason why phylogenetic trees are often poorly resolved (Pennington & Lavin 2016). An analogous situation is observed in the conifer genus Pinus, in which evolutionary relationships were long debated (Willyard et al. 2009), and where it took a set of 21 low-copy nuclear genes with 665 SNPs to obtain a phylogeny with concordant placement of > 75% of the species in the subgenus Pinaster, the Mediterranean pines (Grivet et al. 2013). Conversely, SSR markers and population genomic approaches led to successful genetic species delimitation in tropical tree species complexes (Duminil et al. 2012;Daïnou et al. 2016). A prospect for a better phylogenetic resolution and a correct inference of evolutionary relationships is nevertheless offered by the use of multi-locus approaches in a multi-species coalescent framework (Knowles & Carstens 2007;Degnan & Rosenberg 2009;Mirarab et al. 2014).

Genetic constitution and hyperdominance
Our results based on four SSRs suggested that Eschweilera coriacea was genetically more diverse and more heterogeneous than related Eschweilera species, i.e., E. sagotiana, E. sp. 3, E. decolorans and E. pedicellata occurring sympatrically with E. coriacea in French Guiana. Although E. coriacea individuals were assigned to several genetic clusters when other species were included in the analysis, significant evidence of several genetic clusters was not found when only the morphologically determined E. coriacea individuals were analysed. Thus, given the limited power of our SSR markers, our data did not contain robust evidence for E. coriacea to be a complex of cryptic species in French Guiana. However, absence of evidence is not evidence of absence! Given the wide distribution of the species, with presence in all six Amazonian regions, and the weak but nevertheless significant genetic structure in the species (F ST = 0.059 in French Guiana), spatial and temporal population genetic processes are expected to occur, which make it indeed likely that E. coriacea may contain several biological species across Amazonia.
We observed a weak linear trend between allelic richness and ranked abundance of Eschweilera species. This relationship is not a robust biological result as it would most likely vary by excluding or adding taxa, sampling sites, loci. This relationship simply serves to illustrate our expectation that the level of genetic diversity of a population, the effectiveness of selection and the strength of genetic drift all depend on the effective population size Ne (Charlesworth 2009;Hoban et al. 2014;Hoffmann et al. 2017). The high diversity and heterogeneity of E. coriacea thus suggest that it harbours a larger Ne and a higher adaptive potential than other sympa-trically occurring species, as expected for a hyperdominant tree species with a census population size as large as 5 × 10 9 individuals across Amazonia (ter Steege et al. 2013). On the other hand, a high census population size is not necessarily a surrogate for high Ne. A notable outlier of our identified trend is E. sagotiana in French Guiana, in which diversity was more reduced, despite it being the most common Eschweilera species in our inventories, and despite its large distribution across the Guianas and the Brazilian states of Amapá and Pará (The Lecythidaceae Pages, http://sweetgum.nybg. org/science/projects/lp/). The two most common Eschweilera species in French Guiana appear thus to have contrasting evolutionary histories. This observation also illustrates that it is difficult to derive any causal relationship when observing a biological pattern, such as that of hyperdominance. For instance, Arellano et al. (2014) observed a wider environmental tolerance in oligarchic than non-oligarchic species, which the authors interpreted as niche breadth causing dominance. But the opposite could also be true: dominant species are more widespread and thus they appear in more habitats, which results in greater observed realized niches, whereas rare species are observed less frequently, thus their niche breath may be poorly estimated or even biased.
A substantially larger set of genetic markers and a larger and more balanced sampling design should help to shed additional light on the genetic constitution of E. coriacea, the characterization of hybridization and introgression in the Parvifolia clade, and the evolutionary history of abundant vs. rare Eschweilera species, to understand the genetic underpinnings of hyperdominance in Amazonian tree species.

Conclusions
Our data revealed high genetic diversity and heterogeneity, indicative of high adaptive potential, in the hyperdominant Eschweilera coriacea in comparison with other Eschweilera species of the Parvifolia clade with which it occurs sympatrically in French Guiana. However, we found no conclusive evidence for cryptic species within E. coriacea. Our data set had relatively poor power to delimit species in Eschweilera individuals from the Parvifolia clade, although delimitation power was improved in comparison with available plastid DNA markers. Promising avenues for future research on species delimitation and adaptive evolution in species complexes such as Eschweilera, Parvifolia clade, will be the combined use of morphological trait data, data on ecological niche characterization and genomic resequencing data using high throughput approaches.

SUPPLEMENTARY FILES
Five supplementary files are associated to this paper: