Phylogeny of Saxifraga section Saxifraga subsection Arachnoideae (Saxifragaceae) and the origin of low elevation shade‐dwelling species

Abstract Saxifraga section Saxifraga subsection Arachnoideae is a lineage of 12 species distributed mainly in the European Alps. It is unusual in terms of ecological diversification by containing both high elevation species from exposed alpine habitats and low elevation species from shady habitats such as overhanging rocks and cave entrances. Our aims are to explore which of these habitat types is ancestral, and to identify the possible drivers of this remarkable ecological diversification. Using a Hybseq DNA‐sequencing approach and a complete species sample we reconstructed and dated the phylogeny of subsection Arachnoideae. Using Landolt indicator values, this phylogenetic tree was used for the reconstruction of the evolution of temperature, light and soil pH requirements in this lineage. Diversification of subsection Arachnoideae started in the late Pliocene and continued through the Pleistocene. Both diversification among and within clades was largely allopatric, and species from shady habitats with low light requirements are distributed in well‐known refugia. We hypothesize that low light requirements evolved when species persisting in cold‐stage refugia were forced into marginal habitats by more competitive warm‐stage vegetation. While we do not claim that such competition resulted in speciation, it very likely resulted in adaptive evolution.

, the climatic oscillations of the Quaternary have been shown to have resulted in evolution, i.e., genetic differentiation and speciation, often by changes in geographical distribution resulting in range fragmentation and divergence in geographical isolation, but also by hybrid speciation upon secondary contact (Kadereit & Abbott, 2022).
In Europe, the distribution of intraspecific genetic variation, and also of regional endemics, in combination with geological and palaeoenvironmental evidence, have been used to identify major refugial areas for, e.g., alpine species in and around the Alps (Schönswetter et al., 2005;Tribsch & Schönswetter, 2003), and for the Mediterranean area (Hewitt, 2011;Médail & Diadema, 2009;Nieto Feliner, 2011). Environmental conditions in areas which served as glacial refugia were not the same as in those areas in which species went extinct in glacial periods of the Quaternary (Bennett, 1997;Davis & Shaw, 2001;Hewitt, 1996Hewitt, , 2000Hewitt, , 2004. Thus, refugia clearly were not simply sanctuaries where species were preserved from extinction (Nieto Feliner, 2011), but adaptation to different conditions in these glacial refugia may have resulted in genetic divergence and eventually speciation (Davis & Shaw, 2001;De Lafontaine et al., 2018;Hewitt, 1996Hewitt, , 2000Stewart et al., 2010;Stewart & Stringer, 2012). Moreover, abiotic and biotic conditions in areas serving as refugia clearly were subject to changes through Quaternary times. Considering alpine species, the glacial refugial areas in and around the European Alps identified by Schönswetter et al. (2005) were cold-stage refugia (Birks & Willis, 2008). As shown by fossil evidence, many alpine species were much more widespread outside these refugia in glacial times (Birks & Willis, 2008;Tzedakis et al., 2013). In Quaternary interglacials and the Holocene, both changing climatic conditions and increasing competition resulted in the extant ranges of most alpine species, which can be considered interglacial (Bennett & Provan, 2008) or warm-stage refugia (Bhagwat & Willis, 2008;Birks & Willis, 2008). However, populations of alpine species also persisted in mostly small areas within the former cold-stage refugial area in habitats unsuitable for more competitive components of warm-stage vegetation (Birks & Willis, 2008;Gentili et al., 2015;Pigott & Walters, 1954). Such persisting populations or species experienced dramatic changes in environmental conditions, from cold-stage to warm stage-conditions, through time, and these changes have driven evolutionary divergence in some cases (e.g., Scheepens et al., 2013).  Tkach et al. (2015), and currently comprises 12 species (Ebersbach et al., 2017;Gerschwitz-Eidt & Kadereit, 2020;Tkach et al., 2019).  (Figure 1). The subsection is well known ecologically (Kaplan, 1995;Landolt et al., 2010;Webb & Gornall, 1989) and is most remarkable in terms of ecological diversification. While, e.g., the calcifuge S. muscoides grows largely above the tree-line at elevations of up to 4200 m, the calcicole S. berica is limited to a small area in the Colli Berici near Vicenza (northern Italy) outside the Alps where it grows in shady hollows under overhanging rocks at elevations lower than 450 m. Other species of the subsection mostly growing in very shady and humid conditions under overhanging rocks, in recesses and hollows or at the entrance of caves, mostly at lower than alpine and frequently at collin or montane elevations, are S. arachnoidea and S. paradoxa, which have often been interpreted as Tertiary relics in the past (Gams, 1933;Meusel, 1943;Pitschmann & Reisigl, 1959;von Hayek, 1908).
Published phylogenetic analyses of subsection Arachnoideae either did not include all taxa and did not succeed in fully resolving phylogenetic relationships (Ebersbach et al., 2017;Tkach et al., 2015Tkach et al., , 2019, or, when sampling all species (Gerschwitz-Eidt & Kadereit, 2020), did not succeed in resolving phylogenetic relationships and identified supported conflict between nuclear and plastid phylogenetic trees. Against this background, we here aim at reconstructing and dating the phylogeny of subsection Arachnoideae with a Hybseq approach using a bait set of 329 protein-coding nuclear loci designed for phylogenetic reconstructions in Saxifragales Stubbs et al., 2018). This phylogenetic tree will be used to explore the ancestral ecology of the subsection. It seems possible that either species from shady conditions and mostly low elevations (S. arachnoidea, S. berica, S. paradoxa) or species from bright habitats at high elevations represent the ancestral habitat. We will also ex-  (Table 1). DNA was extracted from dried specimens using a Macherey-Nagel NucleoSpin Plant II kit (Machery-Nagel GmbH & Co. KG, Düren, Germany) and purified by ethanol precipitation following Sambrook and Russell (2001). F I G U R E 1 Distribution of the species of Saxifraga subsection Arachnoideae based on Kaplan (1995). The brown asterisk indicates a disjunct population of S. prenja in the Apennines as found by Gerschwitz-Eidt and Kadereit (2020). For clarity, distribution ranges are shown in two maps (a, b

| Sequence assembly and alignment
We merged forward and reverse reads for each sample and removed PCR duplicates using ParDRe v2.2.5 (González-Domínguez & Schmidt, 2016). Adapter sequences as well as read segments of poor quality were removed with Trimmomatic v0.36 (Bolger et al., 2014) using a sliding window of 4 bp with a sliding window minimum phred quality score of 20. DNA sequence assembly was performed using the BWA version of HybPiper v1.3.1  with default settings. Exons were filtered for paralogs (Method 1 in Appendix A) and aligned across all samples locus-wise using MAFFT v7.305 . Seven species of Saxifraga subsection Saxifraga aligned poorly in most loci with high numbers of base mismatches and indels. We removed these samples from the data set and repeated the alignment of the exons with the 58 remaining samples. Subsequently, the non-exonic sequences were added to the exonic alignments using the '-addlong' option in MAFFT. The resulting alignments frequently contained indel-rich stretches of poorly aligned non-exonic sequences which were several thousand base pairs long. We used BuddySuite v1.3.0 (Bond et al., 2017) to reduce the number of these regions by trimming alignment positions that consisted of more than 50% gaps. Any alignments that were missing more than 20% of the samples or the S. hirsuta sample were also excluded from further analysis. We removed residual paralogs with TreeShrink v1.3.3 ; Method 2 in Appendix A).
The resulting data set consisted of 405 loci in 58 samples.

| Phylogenetic analysis
We calculated bootstrapped maximum likelihood (ML) gene trees for all 405 loci of the full data set in RAxML v8.2.12  and used ASTRAL v5.7.3 (Zhang et al., 2018) to infer two bootstrapped summary coalescent (SC) species trees from the 405 ML gene trees, using a taxon map for the second run (Rabiee et al., 2019).
Taking into account the results of the first ASTRAL run, the S. muscoides samples were coded into two groups (S. muscoides 1 and 2) for the taxon map of the second ASTRAL run.

SRR7901234
Note: SRA accession numbers in italics were newly generated in this study.

TA B L E 1 (Continued)
Additionally, we inferred phylogenetic networks to include both incomplete lineage sorting (ILS) and reticulation in the modeling process under the multispecies network coalescent (Wen et al., 2016).
We calculated SC species networks from the 405 gene trees under the maximum pseudo-likelihood (MPL) method "InferNetwork_MPL" (Yu & Nakhleh, 2015) in PhyloNet v3.8.0 (Than et al., 2008;Wen et al., 2018), using a taxon map and a gene tree bootstrap threshold of 70. We computed 11 MPL networks with a number of hybrid nodes (K) ranging from zero to ten. Each network was inferred with 10 independent runs to optimize the pseudo-likelihood. For each K the best network weighted by pseudo-likelihood was identified.
We inferred the major trees, i.e., the species trees underlying the networks, from the best networks for K = 1-10 to explore the commonalities of the network topologies. For this purpose, all minor edges were removed from the best networks using the "majorTree" and removed any alignments that were missing more than 20% of the samples or the outgroup sample, resulting in a total of 388 loci. We here used S. pedemontana as outgroup because this species, which is part of section Saxifraga subsection Saxifraga, is more closely related to subsection Arachnoideae than S. hirsuta which was used as outgroup in those analyses which sampled across

| Divergence times estimation
Divergence times for Saxifraga subsection Arachnoideae were modeled on a multi-labeled phylogenetic tree (Blanco-Pastor et al., 2012;Huber et al., 2006;Pirie et al., 2009). The multi-labeled species tree used was based on the ASTRAL species tree as calculated from the full data set. Based on the network reconstructions, a second parent lineage of the hybrid species S. facchinii was added to the ASTRAL species tree and the two lineages of S. facchinii were labeled 'S. facchinii 1' and 'S. facchinii 2'. Of the full data set of 58 samples, we selected 10 loci each for which the ML gene trees could be unambiguously assigned to either of the two topologies contained in the multi-labeled tree (Method 4 in Appendix A). The 20 associated DNA alignments were visually inspected in MEGA X v10.0.5 (Kumar et al., 2018) and manually re-aligned to ensure overall good alignment quality. In the ten alignments used to reconstruct the lineage 'S. facchinii 1', sequences of the two samples of S. facchinii were labeled with the suffix '1'. Correspondingly, in the ten alignments used to reconstruct the lineage 'S. facchinii 2', S. facchinii sequences were labeled with the suffix '2'. Time-calibrated phylogenetic trees were calculated with STARBEAST2 v0.15.5 (Ogilvie et al., 2017) as implemented in BEAST v2.6.3 (Bouckaert et al., 2019), using an uncorrelated lognormal relaxed molecular clock (Drummond et al., 2006) with a birth-death speciation process (Gernhard, 2008;Stadler, 2009)  run them for a total of 450 million generations. Every 5000th generation was sampled. We used Tracer v.1.7.1 (Rambaut et al., 2018) to assess convergence and an effective sample size of at least 200 to identify an appropriate burn-in for each run. The four runs were combined using LogCombiner v.2.6.3 (Drummond & Rambaut, 2007). We used TreeAnnotator v.2.6.3 (Drummond & Rambaut, 2007) to calculate a maximum clade credibility tree from the species tree distribution. Node heights of the maximum clade credibility tree were set to the medians of the respective node height distributions. The chronogram was plotted using the R package phyloch (Heibl, 2013) in R v.4.1.2 (R Core Team, 2021).

| Ecological trait reconstruction
We reconstructed the evolution of three ecological traits of Saxifraga subsection Arachnoideae using the ecological indicator values (EIV) of Landolt et al. (2010). According to Silvertown et al. (2006), EIVs can be considered as numerical representations of ecological niche traits and therefore can be used to reconstruct ecological niche evolution in a phylogenetic framework (Prinzing et al., 2001;Silvertown et al., 2006). Because EIVs represent quasi-cardinal numbers (Ellenberg, 1992), they can be directly translated into continuous Since Landolt EIVs were not available for S. prenja, we collected and comparatively reviewed available information on the ecology of S. prenja from various sources (Hörandl, 1993;Kaplan, 1995;Webb & Gornall, 1989). We concluded that the ecological traits of S. prenja are comparable to those of S. hohenwartii and used the EIVs of the latter for the former.
All taxa for which no Landolt EIVs exist were pruned from the time-calibrated multi-labeled species in the R package ape v5.0 (Paradis & Schliep, 2019). The hybrid species S. facchinii was removed because its inclusion violates the model's essential assumption of a bifurcating tree. As two independent measures of the strength of phylogenetic signal, we calculated Blomberg's K (Blomberg et al., 2003) and Pagel's λ (Pagel, 1999) using the R package phytools v.0.7-90 (Revell, 2012). We used the R package geiger v2.0.7 (Pennell et al., 2014) to fit four models of ecological trait evolution to all three EIVs, i.e. (1)  We also used the Maximum Parsimony approach for the reconstruction of continuous characters implemented in Mesquite v3.70 (Maddison & Maddison, 2021). Landolt EIVs were reconstructed as continuous characters with the squared change assumption. For the analysis we used the same phylogenetic tree as above.

| DNA assembly and alignment statistics
In the initial assembly, reads mapped to 326 of the original 329 reference loci. After three iterations of assembly, visual paralog assessment and reference file editing, reads were mapped successfully to 567 loci in the final assembly, of which 463 were retained for further analysis. After filtering for a minimum sample coverage of 80% and for loci that included a sequence of the outgroup sample S. hirsuta,

| Phylogenetic relationships and hybridization
For the full data set with 58 samples, the two ASTRAL species trees

| Divergence times
For the first topology in the multi-labeled phylogenetic species tree, with the parent lineage of S. fachinii in clade 4, we selected 10 alignments with a length of 40,970 bp. For the second topology, with the parent lineage of S. fachinii in clade 3, we selected 10 alignments with a length of 33,130 bp. The maximum clade credibility tree was constructed from a total of 133,204 dated species trees (Figure 4). The stem age of subsection Arachnoideae was dated to 5.12 (95% confidence interval 3.28-9.32) million years ago (myr). Their crown age was estimated at 3.54 (2.21-6.43) myr.

| Ancestral ecology
Based on Blomberg's K and Pagel's λ, a strong phylogenetic signal was detected in the temperature indicator (K = 0.947, p = .004; λ = 1.009, p = 8.01 × 10 −5 ). Consistent with this finding, a simple Brownian motion model was fitted as the best model (wi = 0.642; Table 2). The stem group node of subsection Arachnoideae was reconstructed as 1.96, corresponding to subalpine elevations, and its descendants shifted into lower (clade 1) or higher elevations (

F I G U R E 3 Summary coalescent species networks of Saxifraga subsection
Arachnoideae modeled with (a) one and (b) two reticulation events. Branch lengths are non-informative. Reticulation edges are indicated by blue lines for the major parent and green lines for the minor parent of a reticulation event. Numbers in blue or green below reticulation edges are inheritance probabilities. Numbers in black above branches are branch support values of regular edges. Numbers below a branch in magenta indicate a hybrid branch and are hybrid edge support values. Numbers in brown circles next to a network node are support values for that specific combination of two parental reticulation edges and a progeny hybrid edge. The stem branch of subsection Arachnoideae is marked with an arrow. mostly grow at elevations lower than 500 m but occasionally can be found at higher elevations (Kaplan, 1995;Webb & Gornall, 1989).
Considering the extant distribution of the three species ( Figure 1) and their relationships to each other (Figure 2) Central Alps' refugium for calcifuge taxa (Schönswetter et al., 2005;Tribsch & Schönswetter, 2003), and that of S. berica in the 'Monte Baldo, Monti Lessini, and Prealpi Bellunesi' refugium for calcicole taxa (Schönswetter et al., 2005;Tribsch & Schönswetter, 2003). Of  S. prenja in the Apennines. They all occur at subalpine to alpine elevations and grow on limestone and dolomitic scree. As shown in detail by Hörandl (1993), the niche of S. hohenwartii is much narrower than that particularly of S. sedoides. Saxifraga hohenwartii as sister to S. sedoides + S. prenja is likely to have split allopatrically from the latter two species. The species has been interpreted as a relict endemic by Hörandl (1993), and its current range lies in the 'southeasternmost calcareous Alps' refugium for calcicole taxa (Tribsch & Schönswetter, 2003). Considering the extant ranges of S. sedoides and S. prenja, their last common ancestor most likely attained a wide range including areas outside the Alps, and the two species most likely originated by vicariance. Equally, S. sedoides, by occurring in the southern and northeastern Alps, a disjunction long known from other taxa (Merxmüller, 1952), and also in the Apennines, must have had a wider range in the past.
The same applies to S. prenja which also occurs in the Apennines. Clade 2 (except for S. prenja) essentially is distributed allopatrically with clade 1 in the south and the remainder of the subsection in the north (Figure 1). The three species of clade 3, S. muscoides, S. presolanensis and S. tenella, are also distributed allopatrically.
In summary, the distribution of the four clades recognized and the distribution of species within these four clades is largely paraand allopatric.

| Saxifraga facchinii, a hybrid species
As evident from our analyses, S. facchinii originated through hybridization between the last common ancestor of S. presolanensis and S. muscoides on the one hand and the last common ancestor of S. aphylla and S. arachnoidea on the other hand (Figure 2). Although a hybrid origin of S. facchinii had never been suspected, its relationship to S. muscoides is reflected in its treatment as a variety of that species by Engler (1872), and a close relationship between S. facchinii and S. aphylla had been found by Vargas (2000). Gerschwitz-Eidt and Kadereit (2020) found the species to be closest relative of S. muscoides/S. presolanensis in their ITS phylogenetic tree, and to be closest relative (unsupported) of one sample of S. aphylla in their plastid phylogenetic tree. Saxifraga facchinii grows on calcareous rocks and scree at alpine elevations between 2250 and 3000 m and in this respect is ecologically comparable to S. aphylla and S. muscoides of its two parental lineages. Accordingly, this hybridization event did not result in major changes in ecological preferences of the resulting hybrid species.
The distribution of S. facchinii on the one hand and its two parental lineages on the other hand is allopatric, as very commonly observed in both homoploid and allopolyploid hybrid species (Kadereit, 2015). Following Kadereit (2015),

| The ancestral ecology of subsection Arachnoideae
Subsection Arachnoideae is part of section Saxifraga (Ebersbach et al., 2017;Tkach et al., 2015) together with subsection Saxifraga, subsection Androsaceae and S. irrigua. Subsection Saxifraga is found mainly in the Iberian peninsula, subsection Androsaceae mainly in the Alps, and S. irrigua is from Crimea, where the species grows in rocky woods and on shady cliffs, always on limestone, between 500 and 1250 m elevation (Webb & Gornall, 1989). Relationships among these three wellsupported clades and S. irrigua were not resolved in the phylogenetic tree of Tkach et al. (2015), although Tkach et al. (2019) obtained support for a sister relationship between S. irrigua and subsection Androsaceae.
As reconstruction of the ancestral ecology of subsection Arachnoideae requires an outgroup, our reconstruction used part of our species tree based on Hybseq sequences (Figure 2) which altogether included nine species of subsection Saxifraga and resulted in a non-monophyletic subsection Saxifraga. In our reconstruction, the stem group node of subsection Arachnoideae was reconstructed as subalpine, neutral to slightly alkaline soils and (just below) bright ( Figure 5). Accordingly, the calcifuge ecology of S. paradoxa (and S. muscoides) and the reduced light requirement of S. berica, S. paradoxa and S. arachnoidea evolved twice independently each within the subsection. With respect to elevational distribution, our results imply that high elevation (subalpine) rather than low elevation distribution is ancestral in the subsection.

| Origin of divergent ecologies through adaptive evolution in refugia -a hypothesis
The definition of a refugium as an area where a particular species survived for an entire glacial-interglacial cycle (Hewitt, 2004) appears to suggest that such areas were sanctuaries where species were preserved from extinction (Nieto Feliner, 2011). However, it is widely appreciated that evolutionary change resulting in speciation took place in the Quaternary through geographical isolation in refugia and through hybrid speciation upon secondary contact (Kadereit & Abbott, 2022). Evolutionary change most likely also took place in response to exposure to novel selection pressures in biotic and abiotic conditions, different from the conditions in areas where the species lived and went extinct in glacial periods of the Quaternary (Davis & Shaw, 2001;De Lafontaine et al., 2018;Nieto Feliner, 2011;Stewart et al., 2010;Stewart & Stringer, 2012). Abiotic and biotic conditions in refugia changed through Quaternary times. Considering alpine species, the glacial refugia in the area of the European Alps identified by Schönswetter et al. (2005) were cold-stage refugia (Birks & Willis, 2008). In interglacials or the Holocene, both changing climatic conditions and competition in the former cold-stage refugia in most cases will have resulted in the re-migration of alpine species into high elevation habitats, i.e., into their warm-stage refugia (Birks & Willis, 2008).
Considering the extant distribution of S. berica, S. paradoxa and S. arachnoidea in well-known refugial areas, and assuming that these refugia, identified for the Last Glacial Maximum (Schönswetter et al., 2005;Tribsch & Schönswetter, 2003), can be taken as proxies for the location of cold-stage refugia in earlier parts of the Quaternary, we here hypothesize that the extant ecology of these species originated through adaptive evolution in refugial areas where they persisted irrespective of climatic change. Saxifraga paradoxa is one of two calcifuge species of the subsection by growing on gneiss or mica-schists. We consider it highly likely that the shift from calcicole to calcifuge followed the climatically enforced migration of a calcicole ancestor into a cold-stage refugium for calcifuge plants. Shifts from calcicole to calcifuge and vice versa linked to and likely resulting from Quaternary migration and dispersal into edaphically different areas had before been postulated for Adenostyles Cass. (Dillenberger & Kadereit, 2013) and Jovibarba (DC.) Opiz and Sempervivum L. (Klein & Kadereit, 2015).
The low light requirements of S. berica, S. paradoxa and S. arachnoidea are likely to have evolved in response to changing environmental conditions in their respective distribution ranges. These three species all have very small distribution ranges which lie within well known cold-stage refugia (Schönswetter et al., 2005). Accordingly, Arachnoideae, were assessed as stress-tolerators (vs. competitors; Grime, 1974) by Landolt et al. (2010). Interestingly, the habitats of S. berica, S. paradoxa and S. arachnoidea somewhat resemble some of those habitats occupied by extant alpine species when growing at low elevations. These, in the British Isles, include screes, northfacing slopes, steep cliffs and cool ravines (Birks & Willis, 2008;Pigott & Walters, 1954), and gorges in the Mediterranean area (Gentili et al., 2015). were colonized only as the result of increasing competition or had been part of the niche of these species before. In the first case, novel selection pressures resulting in evolutionary divergence thus not only arose from abiotic environmental conditions but probably mainly from interspecific competition, the 'new neighbors' of Hewitt (1996Hewitt ( , 2000Hewitt ( , 2004. When considered together with their closest relatives, S. berica (closest relatives: S. petraea, S. paradoxa), S. paradoxa (S. petraea) and S. arachnoidea (S. aphylla), persisting in cold-stage refugial areas upon climatic warming, can be considered stable rear edge populations or species (Hampe & Petit, 2005). For such populations, Ackerly (2003) proposed, in his 'trailing edge hypothesis of adaptive evolution', that the most important condition for adaptive evolution to occur at the rear edge is the exclusion of competitors. In case of S. berica, S. paradoxa and S. arachnoidea this appears to have been realized by the evasion of competitors through adaptation to shady habitats unsuitable for them.
If indeed adaptation to novel selection pressures in S. berica, S. paradoxa and S. arachnoidea took place in warm stages of the late Pliocene and Pleistocene, as also postulated for adaptive differentiation within Taxus baccata L. (Mayol et al., 2015), geological time available for evolutionary change was even shorter than in cold stages because warm stages were substantially shorter than cold stages in the Pleistocene (Birks, 2019). Although it has been argued that periods of isolation required for speciation were never long enough in the Quaternary (Willis & Niklas, 2004), speciation in Quaternary glacials has been implied by Kadereit et al. (2004

CO N FLI C T O F I NTE R E S T
The authors declare we have no conflict of interest. Previous sequence references for these loci were deleted so that, F I G U R E A 1 Cladograms of the major trees of 11 maximum pseudo-likelihood phylogenetic networks calculated in PhyloNet. The stem branch of subsection Arachnoideae is marked with an arrow.

DATA AVA I L A B I L I T Y S TAT E M E N T
in a second DNA assembly, sequences from these loci could be assembled exclusively against the updated reference sequences. Using this updated reference sequence file, DNA assembly with HybPiper and post-processing steps was repeated as described above. Paralog assessment was repeated as described above and additional reference sequences from putative orthologs were added to an updated reference file. Finally, after a third DNA assembly, all loci still containing paralogs in samples of Saxifraga subsect. Arachnoideae were discarded. For the remaining loci, all further steps of HybPiper were performed to also obtain the non-exonic sequence data.

Method 2: non-exon paralog filtering
We used TreeShrink v1.3.3  to identify individual samples or clades that contributed disproportionately to the length of a phylogenetic tree by containing, e.g., few paralogs in an otherwise mostly orthologuous dataset, and to automatically remove the corresponding DNA sequences from the alignments. First, we calculated ML gene trees with RAxML v8.2.12  using an unpartitioned GTR + Γ nucleotide substitution model with S. hirsuta as outgroup. Second, all gene trees and alignments were analyzed together in a single TreeShrink run to remove outlier sequences. Alignments were finally re-examined with BuddySuite v1.3.0 to trim all alignment positions that consisted of more than 50% gaps, and to remove all alignments missing more than 20% of the samples or the S. hirsuta sample.

Method 4: locus selection for divergence times estimation
We used the full data set of 58 samples to identify several loci for which gene trees could be unambiguously assigned to one of the two topologies contained in the multi-labelled tree as inferred from the 42 samples network reconstructions. We aimed to select a total of 20 gene trees, with ten trees for each of the two topologies. To achieve this, we first calculated Robinson-Foulds (RF) distances of the gene trees to the two species tree topologies in PhyloNetworks v0.11.0 for each of the 405 gene trees of the 58 samples data set. For the calculation of RF distances to the first topology, we used the original 58-samples species tree calculated with ASTRAL ( Figure A1). For the calculation of RF distances to the second topology, the ASTRAL species tree was permuted by moving the S. facchinii clade from clade 4 to clade 3. We identified all gene trees that had different RF distances to the two species tree topologies. These gene trees were visually inspected to determine whether their topologies could each be unambiguously assigned to one of the two topologies contained in the multiple-labelled phylogenetic species tree. Nodes with bootstrap support less than 70% were interpreted as polytomies in this process. Using these criteria, 10 loci with a taxon sample as complete as possible were selected for the divergence times estimation for each of the two species tree topologies.