Nested singletons in molecular trees: Utility of adding morphological and geographical data from digitized herbarium specimens to test taxon concepts at species level in the case of Casearia (Salicaceae)

Abstract Using the genus Casearia, we assessed the status of nested singletons: individual specimens corresponding to accepted species but in molecular trees appearing nested within clades of closely related species. Normally, such cases would be left undecided, while on the other hand, timely taxonomic decisions are required. We argue that morphological, chorological, and ecological data can be informative to illuminate patterns of speciation. Their use can provide a first step in testing taxon concepts at species level. We focused on five cases of nested singletons in trees of the genus Casearia. We employed PCA and cluster analysis to assess phenotypic differentiation. Using geocoordinates, we calculated niche space differentiation based on 19 bioclim variables, by means of PCA and niche equivalency and similarity tests and generated dot maps. We found that the singletons were morphologically distinctive in two of the five cases (Casearia selloana and C. manausensis), relatively distinctive in two other cases (C. zizyphoides and C. mariquitensis), and partially overlapping in the last case (C. grandiflora). For two cases (C. mariquitensis and C. selloana), ecological niche space was broadly overlapping, in two cases it was found broadly nested (C. grandiflora and C. zizyphoides), and in one case narrowly nested (C. manausensis), but in no case niche differentiation was observed. Niche overlap, similarity and equivalency showed corresponding patterns. Given these data, one would interpret C. selloana and C. manausensis as presumably well‐distinguished taxa, their narrow distribution ranges suggesting recently emerging lineages. The other three cases are not clearcut. Morphological data would suggest particularly C. grandiflora conspecific with C. arborea, but differences in the distribution are intriguing. Our approach would reject the notion of potential synonymy based on nested phylogenetic placement for at least two of the five cases. The other case also shows no complete lack of differentiation which would support synonymy.


| INTRODUC TI ON
Delimiting species is a challenging task, with respect to disparate species concepts that range from morpho-species to different approaches reflecting complex speciation mechanisms in plants, but also in practical terms, regarding sufficient sampling of characters and individuals to make reliable assessments (Comes, 2004;Naciri & Linder, 2015). Historically, alpha-taxonomy in plant species has been done based on the phenotype, using putatively diagnostic morphological characters shared among individuals presumed to belong to a thus-defined species, denoting a morpho-species concept (Stuessy, 2009). This approach has been implemented by the taxonomic community since the raise of formal species descriptions named with binomials in the mid of the 18th century (Linnaeus, 1753). Following the advent of phylogenetic systematics (Hennig, 1950(Hennig, , 1966 which provided a method to infer ancestordescendant relationships and thus to reconstruct the history of species diversifications (Hillis, 1987). DNA sequences information is now routinely used in studies on evolution, systematics, and biogeography, yielding large numbers of molecular trees. The taxonomy of plants, as well as that of other organisms, is now in a transitionary phase from alpha taxonomy that recognized species based on the comparison of morphological characters, to the application of evolutionary methods in order to first infer distinct biological entities, which subsequently can be formally classified and named. However, such an evolutionary approach has only been thoroughly applied to a limited number of taxa. Current classification systems at the genus and even more so at the species level, which exhibit a mixture of taxa still defined on morpho-species concepts, whereas others have been evaluated employing evolutionary methods . In this ongoing process of taxonomic knowledge generation, species limits and the corresponding taxon concepts at species level are tested and eventually adjusted. DNA sequencing has in many cases challenged traditional taxon concepts at species level, either unveiling that molecular phylogenies do not agree with the morphology-based classification as evident by terminal clades with samples identified with one species name contain also samples identified with other currently accepted names. The use of DNA can unravel instances of cryptic speciation, when entities that share similar phenotypes while the respective individuals are found to be phylogenetically distinct (Simpson, 1951;Fišer et al., 2018) but also indicate that taxa currently accepted as different by morphology-based treatments may in fact not represent different species. One of the major challenges is to utilize information from phylogenetic trees to revise taxonomic treatments to overcome the phylogeny to classification gap (Mayo et al., 2008;Hinchliff et al., 2015). Users of biodiversity information need to rely on biologically meaningful species classification in a most timely manner (Vogel Ely et al., 2017;Supple & Shapiro, 2018;Stanton et al., 2019), but at the same time cannot wait until all species limits will have been eventually clarified on the basis of densely sampled phylogenomic data sets. At the same time, a wealth of specimen information is currently becoming available in a digital form from herbaria worldwide (Thiers et al., 2016;Le Bras et al., 2017;De Smedt et al., 2018;Klazenga, 2018;Seregin, 2018) so it becomes feasible to explore morphological and geographical evidence for putative taxa using many specimens.
The state of knowledge for most genera is that molecular phylogenetic trees represent a great proportion of currently accepted species. Serving the goal of delivering first overall hypotheses of species relationships (Mansion et al., 2012), they usually represent species by single or few individuals and find different levels of resolution and node support in different parts of the tree. A frequent case in such molecular phylogenetic trees is the nested placement of singletons (i.e., individual samples representing a currently accepted species) within clades composed by individuals of another species.
The existence of paraphyletic species as a result of peri-or parapatric speciation also involving incomplete lineage sorting is now widely accepted (Crisp & Chandler, 1996;Hörandl & Stuessy, 2010;Carnicero et al., 2019;Kato et al., 2019). Nested singletons could therefore represent biologically distinct entities, deserving recognition at species level, or just represent an own haplo-or genotypes, thus exhibiting infraspecific genetic diversity of the so far better represented species in the molecular phylogeny. The occurrence of singletons usually corresponds to the rarity of the corresponding taxon and the difficulty to access suitable material. And often they belong to recently diverged shallow clades that show a lack of phylogenetic differentiation relative to the apparent phenotypic or ecological differentiation (Lexer & Widmer, 2008;Ravinet et al., 2017).
Species may vary considerably in their infraspecific phylogenetic structure when multiple individuals from different populations throughout their geographic range are included (Borsch et al., 2018).
Despite their frequent occurrence (see for example trees published by Bengtson et al., 2021;Frost et al., 2021;Lu-Irving et al., 2021;Majure et al., 2021, García-Moro et al., 2022, cases of nested singletons are almost never discussed. In addition to evidence from molecular trees, the morphological, ecological, and/or chorological differentiation is relevant to get further insights if putative taxa represented by single sequences are biologically distinct entities. Such an integrative taxonomy approach is being increasingly used (Dayrat, 2005 Pante et al., 2015). A fundamental principle of integrative taxonomy is to generate specimen-based character data (Kilian et al., 2015) that can allow precise testing of the placement of individuals (i.e. individual samples) in an evolutionary context when subjected to different inference methods. By using the geographical occurrence points of the respective specimens, ecological parameters can also be assessed and integrated in models of distribution and species delimitation.
Here, we address a different situation that occurs during taxonomic work that needs to deliver the best possible judgment of species limits during treatments of genera or even families at global or regional levels, in the course of which a comprehensive molecular analysis is not realistic for time, capacity, material availability and resource reasons. Our use of an integrative taxonomic approach is, therefore, based on the assumption that morphological, geographical, and ecological data still show a pattern related to the evolutionary history of the putative species under study (Thompson et al., 2005). The advantage of these data is that they can be obtained for a large and representative number of specimens, now facilitated by herbarium digitization. Those individuals represented by sequences in the molecular tree and investigated at the same time for the nonmolecular data constitute a link between the available phylogenetic hypotheses and entities discovered by analyzing the nonmolecular data (PCA or clustering algorithms, spatial and ecological models). We analyze currently accepted taxa revealed as nested singletons in a recent molecular phylogenetic analysis of the genus Casearia Jacq. (Mestier et al., 2022) within the presumably widespread and common species C. arborea (Rich.) Urb., C. mollis Kunth, and C. sylvestris Sw.
Casearia is a pantropical genus that comprises around 220 species of shrubs or trees, half of which are found in the Neotropics (Sleumer, 1980). It is the largest genus within a broadly defined Salicaceae, including the tribe Samydeae, which is sometimes classified at the rank of family (Alford, 2005). Casearia has alternate, serrate leaves that present pellucid dots and/or lines and flowers in axillary and usually fasciculate inflorescences. The flowers are apetalous with five sepals and they present staminodes, alternating with the stamen or sometimes inserted outside of the row of stamen (Warburg, 1895;Sleumer, 1980). Most species are widely distributed and found across various habitats in the Neotropics, including Amazonian rainforests, Brazilian cerrados (Sleumer, 1980;Gutiérrez, 2000;Marquete & Mansano, 2012), dry forest (DRYFLOR et al., 2016), or savannas (Devecchi et al., 2020), whereas others are considered range restricted (Breteler, 2008) or endemic (Marquete & Mansano, 2010;Applequist & Gates, 2020). About 30 species occur in the Caribbean (Sleumer, 1980;Correll & Correll, 1982;Howard, 1989;Liogier, 1994;Gutiérrez, 2000), which have evolved as a result of multiple migrations of ancestors to the islands since the late Miocene (Mestier et al., 2022). Sleumer (1980) provided the so far most complete revision of the genus in the Neotropics, but some species remain unclear.
The specific objectives of this study are to evaluate the degree of phenotypic differentiation, differentiated distribution, and ecological niche differentiation, for five currently accepted species-level taxa of Casearia appearing as part of terminal clades composed of individuals of C. arborea (Rich.) Urb., C. mollis Kunth, and C. sylvestris Sw. in comparison to their widespread relatives.
We included all available herbarium specimens that could be reliably assigned to the respective, currently accepted taxa. Based on PCA and clustering analysis for the morphological data and distribution and niche space analyses our goal is to explore in how far such nonmolecular evidence can help to delimit species and thus can be used to support the circumscription of taxon concepts at species level. Moreover, our aim is to discuss our findings considering the current implementation integrative approaches in flowering plant taxonomy.

| Taxon sampling and phylogenetic reconstruction
Our phylogenetic reference tree was based on the combined rps4-trnLF, trnK-matK, petD, and rpl16 and the nuclear data set of Mestier et al. (2022). For the present investigation, we added 11 newly generated sequences of available relevant samples (voucher information in Appendix S1). Laboratory protocols were followed as in Mestier et al. (2022). We finally decided not to add further sequences downloaded from NCBI although the potential of this source was evaluated. However, vouchers were either not available online to allow for checking the identification or the respective specimens were not sequenced for the majority of the genomic regions used here for tree inference.
The alignment by Mestier et al. (2022) was used to incorporate the further sequences (Appendix S2 for alignments) implementing a motif-alignment approach (see Löhne & Borsch 2005) in PhyDE . Short regions of uncertain homology (hotspots) were excluded from the analyses, and gaps were coded using the simple indel coding method (Simmons & Ochoterena, 2000) as implemented in SeqState version 1.4.1 (Appendix S3 for matrices used in tree inference).
We used MrBayes v.3.2.7.a (Ronquist et al., 2011) for Bayesian inference (BI). The optimal nucleotide substitution models were chosen using jModelTest v.2.1.7 (Darriba et al., 2012) under the Akaike information criterion (AIC). The best-fit model for each partition can be found in Table 1. For the indels, the F81 model was used, as suggested by Ronquist et al. (2011). Four runs were performed with four chains and 40 million generations. Convergence of the runs was verified using the average standard deviation of split frequencies and post burn-in effective sampling size (ESS).
As a burn-in, the first 10% of the trees were discarded, and the remaining trees were used to construct a 50% majority-rule consensus tree. Maximum likelihood (ML) was implemented in RAxML v. 8.2.12. Rapid bootstrap support (BS) was estimated based on the majority-rule consensus tree from 1000 pseudo-replicates with 200 searches. The models general time-reversible (GTR) + τ and binary (BIN) + τ, respectively, were used in nucleotide and indel partitioning. All those analyses were realized through the CIPRES portal (Miller et al., 2011). The ML phylogram was illustrated in FigTree v1.4.4 (Rambaut, 2010). We performed parsimony analysis (P) in PAUP* v.4.0b10 (Swofford, 2008) using the commands obtained from the parsimony ratchet (Nixon, 1999) as implemented in PRAP (Müller, 2004). PRAP generated files including all characters with equal weight and the gaps were treated as missing characters. Ratchet setting included 200 iterations, unweighting 25% of the positions randomly (weight = 2) and 100 additional cycles.
Jackknife support (JK) was obtained through a single heuristic search in PAUP within each of 10,000 JK pseudo-replicates, tree bisection-reconnection branch swapping, and 36.79% of characters being deleted in each replicate. All trees were processed using TreeGraph 2 (Stöver & Müller, 2010), and node support values of all inference methods were depicted on the Bayesian majority rule topology.

| Target taxa
The following cases of nested singletons were selected for study.
C. grandiflora Cambess and C. manausensis Sleumer nested within C. arborea (Rich.) Urb.; C. selloana Eichler and C. zizyphoides Kunth nested within C. sylvestris Sw., and C. mariquitensis Kunth being part of the C. mollis Kunth clade. Our sampling of these deviant taxa has been limited due to the availability of material, and thus they are so-called "singletons." Mestier et al. (2022) also retrieved C. spinescens nested within C. aculeata, however, given the incongruence between plastid and nuclear trees, where C. spinescens is retrieved as sister to C. aculeata, we chose to not further analyze it here.

| Locality data
Using a set of specimens corresponding to the above taxa following the taxon concept at species level sensu Sleumer (1980), we com- but also morphological data available through GBIF (Robertson et al., 2014). For GBIF data, we filtered for specimen-based occurrences only. We only considered specimen records identified by specialists for Casearia and allies, or those with digital voucher images for which we could verify the identification. We manually verified that coordinates matched with corresponding localities. Missing coordinates were added when locality data were precise enough to allow for reliable georeferentiation. For Colombian samples, we used centroid coordinates of either municipalities, veredas, natural parks, or reserves, following the administrative divisions of Colombia (DANE, 2017). For the remaining samples, we used Google Earth (GoogleInc., 2020). We then deleted duplicate specimen, filtering the data by coordinates and localities using R v4.0.3 (RCoreTeam, 2013). ( Table 2). For all specimens, we examined the length and the width of the leaf, as well as the length of the petiole. Further characters TA B L E 1 Summary of character statistics, evolutionary models, and trees statistics for each dataset under maximum parsimony, maximum likelihood, and Bayesian inference  Olson, 1999Sleumer, (1980; Olson, 1999Sleumer, (1980) Sleumer, (1980) Sleumer, (1980 Note: Flowering characters are presented for general information but are not used in the analyses. Discolorous: Superior side of the limb darker than the inferior side (presence/absence), leaves pilosity: Presence (or absence) and type of pubescens on the limb, tip of the leaves: Tip shape, style: Entire or parted.

| Morphological analyses
were specifically studied for each pair of nested vs. the corresponding paraphyletic taxon, indicated as being diagnostic in taxonomic treatments (Sleumer, 1980;Olson et al., 1999;Nepomuceno & Alves, 2020). Quantitative measurements were performed using the digital image analyses software ImageJ 1.53a (Schneider et al., 2012). We computed descriptive statistics for all quantitative variables (mean, standard deviation). For categorical variables, we used the "fastdummies" package (Kaplan, 2020) RCoreTeam, 2013), which transforms the variables into binary variables, recoding states as presence/absence variables. We employed principal component analysis (PCA) and cluster analyses using the Ward.D2 method with the NbClust package (Charrad et al., 2014) to analyze the character matrices for nested versus corresponding paraphyletic taxon pairs in multivariate fashion. All information regarding the specimens and the respective measurements can be found in Table S1.

| Environmental niche space analysis
To test divergence in environmental niche space between nested vs. including taxon, we obtained 19 climatic layers from WorldClim at 1 km 2 resolution (http:/www.world clim.org/bioclim). A shape layer was generated by cropping the grid data to the area of the Neotropics using R v.4.0.3 (RCoreTeam, 2013). In order to reduce complexity and avoid overparametrization, we carried out a collinearity test, using the Pearson correlation coefficient from the "remove Collinearity" function of the "VirtualSpecies" package (Leroy et al., 2016), with a cutoff value that we set at 0.75. We selected one for each group of correlated environmental variables, usually the variable representing the annual trend (mean). This reduced the data set to nine climatic layers (Table S2).
We retrieved data for a total of 931 occurrences (information regarding the specimen used can be found in Table S3). From these, 219 belonged to C. arborea, 168 to C. grandiflora, 12 to C. manausensis, 105 to C. mariquitensis, 33 to C. mollis, 39 to C. selloana, 324 to C. sylvestris, and 33 to C. zizyphoide.
Based on the georeferenced locality data for specimens representing each taxon, we realized PCA analyses to visualize potential differences in the ecology between pairs of taxa. To assess niche equivalency and similarity, we used the "Ecospat" package (Di Cola et al., 2017). First, we computed the Schoener's D statistic, to quantify niche overlap between pairs of species, ranging between 0 for no overlap in environmental space and 1 for identical environmental space. Given that in the case of allopatric species, geographical differences might lead to differences in the environmental conditions available, we conducted a niche similarity test, which used the model of one species to predict the occurrence of the second species (Warren et al., 2010). Information regarding the specimens used for the analyses can be found in Table S3.

| Distribution maps
We generated distribution maps with the geographic information software QGIS 3.10 (QGIS association, 2020), using the locality data of specimens with verified identification and locality data from local flora to cover the entire range of a species distribution, even when no specimens were available with reliable coordinates (Table S3), These were drawn by nested vs. including taxon pairs in order to observe potential geographic differentiation.

| Molecular data sets
The

| Phylogenetic relationships of Casearia and positions of nested singletons
The plastid topology is shown in Figure 1 and provides a wellresolved phylogenetic framework for the monophyletic genus

| Morphological analyses
The PCA analysis of C. grandiflora versus C. arborea showed a morphological overlap, but a strong tendency of differentiation along the two perpendicular axes (Figure 3a), whereas cluster analysis revealed four distinct groups that did not coincide with the two species ( Figure 4a). In this case, distribution of individuals between the main clusters was rather homogeneous. The morphological overlap between C. manausensis versus C. arborea was less pronounced than in the previous case ( Figure 3b) and cluster analysis

| Environmental niche space analysis
The results for C. grandiflora and C. arborea showed considerable ecological overlap (Figure 5a). In the case of C. manausensis versus C. arborea, the PCA analysis showed a pattern with individuals from C. manausensis being nested within C. arborea, i.e. pointing to a much narrower ecological niche of C. manausensis (Figure 5b). For C. mariquitensis and C. mollis, we also retrieved a high ecological overlap in the PCA analysis ( Figure 5c). Casearia selloana and C. sylvestris also presented no discernible ecological differentiation (Figure 5d), and the same pattern was found for C. sylvestris and C. zizyphoides ( Figure 5e).
Niche similarity tests were significant for all cases of paired species; but one, as C. arborea and C. grandiflora showed no niche similarity, nor equivalence. For the other four pair, the niche similarity was always higher than expected by chance (

| Distribution
Casearia arborea and C. grandiflora are both widely distributed.

| An integrative approach for species delimitation in the case of Casearia
Proper species delimitation is crucial not only for accurate biodiversity assessments and biodiversity monitoring but also for downstream studies, such as ecology and conservation (Agapow et al., 2004;Rojas-Soto et al., 2010;Ruiz-Sanchez & Londoño, 2017;Sheridan & Stuart, 2018). Mostly with respect to insufficiently resolved molecular trees or sampling gaps in molecular data sets, Edwards and Knowles (2014) and Mayo (2022) argue that an integrative taxonomy approach including additional F I G U R E 2 Bayesian 50% majority-rule consensus tree of Casearia based on the nuclear marker ITS. Values above the node indicate posterior probability (PP, bold) and bootstrap support (BS, italics), and jackknife (JF) support is indicated below the node. In square brackets are the values with conflicting topologies between Bayesian analysis and parsimony. The tip of the node indicate the DNA number followed by the name of the species and the code of the country where the individual was collected.
kinds of data can help toward further assessing species limits. In our case of neotropical Casearia species, the molecular trees are inconclusive in depicting deviant species currently accepted based on morphology (e.g. Sleumer, 1980) as part of terminal clades of other species. While these species remain phylogenetically unresolved, we can reliably assume close relationships with the including species as annotated on the trees (Figures 1 and 2). Therefore, our resulting species-level taxon pairs therefore provide a valid set up for the comparative analysis of morphological and ecological data, as well as for the comparison of their respective ranges, to test for differentiations not evident in the limited molecular data available. The nested position of the five study cases (C. manausensis and C. grandiflora within C. arborea; C. mollis within C. mariquitensis; C. selloana and C. zizyphoides within C. sylvestris), leaving the residual species paraphyletic, could be interpreted as a lack of resolution by the molecular markers applied so far. Resolving such situations with additional molecular data would be desirable, but for practical reasons is challenging due to the difficulties in targeted sampling. Therefore, in a taxonomic treatment based on the currently available data, a decision would have to be made in either situation as to recognize one or more taxa. Evidence from morphology, ecology, and geography could therefore be of fundamental importance to retain putatively distinct biological entities that warrant continued recognition as a distinct taxon.
In the case of C. mariquitensis versus C. mollis, there is some evidence for morphological differentiation between the two taxa, but there is a lack of niche differentiation and a broadly overlapping distribution, suggesting that C. mollis could be maintained as a species different from C. mariquitensis based on morphological features only. Individuals identified as C. grandiflora or C. arborea showed strong morphological differentiation, while exhibiting limited niche equivalency and niche similarity. In addition, although both appear widely distributed across the Neotropics, the first is more abundantly found in the Amazon and adjacent dry forests and the second more frequently in the Andes, the Atlantic forest, and Central America and the Caribbean. These patterns clearly support the continued acceptance of two separate taxa. Casearia manausensis was also strongly differentiated morphologically from C. arborea and was narrowly nested within the ecological niche space of the Therefore, in all five tested cases, we argue to maintain the hitherto used classification rather than sinking the respective accepted names into synonymy, while highlighting the need for additional investigation (Scherz et al., 2017;Guenser et al., 2022).

| Kinds of data used and their potential for taxonomic decision making
By integrating different data, a structured taxonomic decisionmaking process can be supported. This requires evaluating the relative contributions of these different kinds of data. In this investigation, the molecular data mostly came from the recently presented phylogeny of Casearia (Mestier et al., 2022), with sequences here added for further individuals of C. grandiflora, C. mollis, and C. sylvestris. Despite of this, the now sequenced individuals still do not represent populations from throughout the ranges of the respective species nor do they fully cover the morphological variation encountered in the available herbarium specimens, which were obtained in decades of collecting in many countries. Therefore, the full assessment of molecular variation in a putative species throughout its assumed range was not possible due to a lack of adequate material in our Casearia exemplars. In light of possible infraspecific variation, more material would in fact be required. However, considering large neotropical ranges of most of the respective taxa, this would not have been feasible in any timely context that allowed to deliver the best possible treatment for syntheses like the World Flora Online (Borsch et al., 2020).
Our morphological and distributional data came in part from herbarium specimens serving as vouchers for our molecular analysis C. selloana (Sleumer, 1980). In this case, the morphological characters used in this investigation already showed a clear morphological distinction of C. selloana from C. sylvestris, so including this character would not have changed our conclusion.
We hypothesized that for a nested singleton to represent a separate species, it should present some phenotypic differentiation toward the taxon it is nested within. Bromham et al. (2002) pointed out that rates of phenotypic differentiation can be higher than substitution rates of the studied genomic markers. Phenotypic differentiation could therefore be an indicator of reproductive isolation and parapatric speciation. In addition, or alternatively, the ecological niche of the taxon in question should reveal differentiation, and finally a differential distribution range, in line with allo-, peri-, or parapatric

| Placing our results in the context of previous taxonomic treatments
Casearia grandiflora, although retrieved as a nested singleton within C. arborea, presented some degree of phenotypical differentiation, whereas ecological and chorological data remained inconclusive.

TA B L E 3 Results of the ecological niche analysis
We therefore conclude that C. grandiflora should be maintained as a separate species based on at least one line of evidence (Table 4).
According to Sleumer (1980), the two species are hard to distinguish when sterile, but some flower characters such as the presence of a peduncle in C. arborea versus sessile flowers in C. grandiflora allow a clear differentiation of fertile material, thus supporting the distinction of the two species.
Casearia manausensis shows strong phenotypical differentiation toward C. arborea, but is narrowly nested within the environmental niche space of the latter. Compared to the wide distribution of C. arborea throughout the Neotropics, from Central America to Southern Brazil and into the Caribbean, C. manausensis is restricted to a small area within the Amazon. This points to a speciation process within an area subset, where a widely distributed species gave rise to a species with a much smaller range, by ecological differentiation (Rundle & Nosil, 2005;Foote, 2018) and perhaps parapatric speciation.
A sequenced singleton of Casearia mariquitensis from Guyana appeared among the samples of C. mollis. While there is some morphological differentiation, the ecological analysis revealed considerable overlap. Olson et al. (1999) stated that C. mariquitensis and C. mollis, along with three other species, C. decandra Jacq., C. arguta H.B.K. and C. pitumba, formed a poorly understood complex.
Casearia mariquitensis and C. mollis were both described in the same work by Kunth (Humboldt et al., 1815), the first based on a type specimen from Colombia, Tolima, and the second with a type from Venezuela (Aragua). Kunth distinguished C. mariquitensis as having leaves with an acute base, denticulate margins and being glabrous, whereas C. mollis was said to have leaves with a rounded base, dentate margins, and being tomentose beneath. Our morphological analysis largely supported this distinction, although the exact point of delimitation between the two taxa remains unclear.
For Casearia selloana, the morphological analysis showed a strong differentiation toward C. sylvestris in the PCA, supporting their current treatment as different species. The environmental analysis showed a little ecological differentiation, although the niches of the two species were not fully equivalent. Sleumer (1980) suggested that C. selloana might be a variant of C. sylvestris in very dry habitats, a notion that remains conflictive given that C. selloana is limited to northeastern Brazil, not exactly a dry ecosystem. Notably, C. sylvestris, found throughout the New World tropics, encompasses a subspecies C. sylvestris subsp. myricoides (Griseb.) J.E. Gut., endemic to serpentine areas in Cuba (Gutiérrez, 2000), which is also morphologically distinct by having smaller leaves. This particular case of adaptation to soil type (Borhidi, 1991;Reeves et al., 1999) was not investigated for the five cases analyzed here but should be considered for future assessments. Casearia sylvestris shows considerable phylogenetic structure already based on a few loci, which suggests that it could represent a species complex, and so the nested position of C. selloana and especially C. zizyphoides may eventually be resolved as reciprocally monophyletic.

| Handling singletons in the context of an integrative taxonomy approach
Assessing taxon validity by analyzing only line of evidence is not enough and can result in an over-or underestimation of species numbers (Carstens et al., 2013). Therefore, studying multiple lines of evidence is crucial as it allows to take into account the various mechanisms involved in speciation (Schlick-Steiner et al., 2010). Whereas morphological evidence has been very frequently matched with phylogenetic or phylogeographic trees and networks to illuminate species limits and support species classification (Huang et al., 2016;Šmíd et al., 2017;Perkins, 2019;Yang et al., 2021;Andriamihaja et al., 2022), the inclusion of ecological data to achieve this goal is more recent (Boucher et al., 2016;Duan et al., 2019 (2) Infraspecific divergence definition a single specimen, hence without any statistic power, we use the analysis of a broad sample of nonsequenced specimens as a proxy to assess the potential status of the taxon represented by the nested singleton. In doing so, we provide a quantitative framework using three lines of evidence (morphology, ecology, and distribution) to interpret the status of a taxon beyond the limited and inconclusive molecular information. Given that nested singletons are a frequent problem in published molecular phylogenies, and given that their status us usually not assessed, thus leaving unresolved taxonomies, our approach appears to be a useful strategy to bridge the lack of more abundant molecular data for the taxa in question.
In this investigation, we want to explore the use of morphological, ecological, and distributional data ( Table 4) for delimitating species when taxon sampling in the available molecular trees is limited and molecular phylogenetic analyses alone remain inconclusive to support taxonomic treatments at species level. Specifically, we addressed singletons found in our phylogenetic analysis of Casearia as an exemplar.
Our approach shows that quantitative evaluation of nonsequenced specimens that were identified based on morphological characters and using existing prephylogenetic treatments can be successful in evaluating the status of so-called nested singletons that were found in phylogenetic analyses. Such singletons are frequent in published molecular phylogenetic trees based on multiple sequence alignments of few to multiple loci (Bengtson et al., 2021;Lu-Irving et al., 2021;García-Moro et al., 2022) and as well in phylogenomic analyses using RAD (Böhnert et al., 2022) or hyb seq data (Jones et al., 2019;Xu & Chen, 2021). Under normal circumstances, one would target several specimens of a species complex to address species delimitation, then also ideally combining molecular, morphological, ecological, and distributional data in an integrative taxonomy approach. However, singletons are usually the result of nontargeted sampling, i.e., such taxa have not been specifically targeted and they are included as singletons in phylogenetic analysis either because the overarching question is at a different taxonomic level (e.g. genus delimitation or genus placement) or because they represent opportunistic sampling within a larger clade. Still, the respective phylogenetic trees provide useful information for species delimitation and challenge currently used taxon concepts at species level. In such cases, our strategy offers an effective approach: initial hypothesis of potential synonymy due to nested phylogenetic placement, subsequent testing using quantitative morphology, ecology, and distribution of numerous nonsequenced samples taxonomically identified as a given species. These results will make taxon hypotheses explicit, also with respect to data deficiencies and inform targeted sampling in future studies. Our approach will be relevant to assess the status of taxa in case further sampling is logistically challenging but taxonomic decisions are needed in a timely manner such as for completing checklists and flora treatments or the evaluation of the conservation status.
Integrative taxonomy has sometimes been considered as a "solution to the plurality of existing species concepts" (Dayrat, 2005; Schlick-Steiner et al., 2010). Considering that there are different (biological) species concepts that connect to particular speciation mechanisms in flowering plants, we argue that in many cases of hypothesized species, the challenge is to obtain sufficient evidence (both molecular and nonmolecular) to unravel which species concept will precisely apply. Morphological, ecological, and geographical data can provide evidence in favor of speciation hypotheses such as allopatric, parapatric, or petripatric, which by themselves have a spatial dimension. Moreover, they allow to include the wealth of existing specimens in herbaria. We further observe that phylogenomic analyses increase the resolution within shallow clades, encompassing one to several putative species (Prata et al., 2018;Lin et al., 2021;Smith et al., 2022). However, the delimitation of species, and the subsequent circumscription and naming of taxa, from the background of the molecular topology is usually being done by matching morphological character states to parts of the topology, underscoring the relevance of an integrated taxonomic approach. Phylogenomic analyses with a population-level sampling to represent the genetic diversity within putative species in order to inform model approaches to recognize discontinuities resulting from speciation are still rare due to their complexity and the high effort that they require.

ACK N OWLED G M ENTS
We are thankful to Mac H. Alford (University of Southern Mississippi, Hattiesburg, U.S.A) for providing additional samples for this study.
We are also thankful to the support of the laboratory team at BGBM, especially Kim Govers and Julia Dietrich. We thank the German

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interests.