Comprehensive phylogeny of Myrmecocystus honey ants highlights cryptic diversity and infers evolution during aridification of the American Southwest

The New World ant genus Myrmecocystus Wesmael, 1838 (Formicidae: Formicinae: Lasiini) is endemic to arid and semi-arid habitats of the western United States and Mexico. Several intriguing life history traits have been described for the genus, the best-known of which are replete workers, that store liquified food in their largely expanded crops and are colloquially referred to as “honeypots”. Despite their interesting biology and ecological importance for arid ecosystems, the evolutionary history of Myrmecocystus ants is largely unknown and the current taxonomy presents an unsatisfactory systematic framework. We use ultraconserved elements to infer the evolutionary history of Myrmecocystus ants and provide a comprehensive, dated phylogenetic framework that clarifies the molecular systematics within the genus with high statistical support, reveals cryptic diversity, and reconstructs ancestral foraging activity. Using maximum likelihood, Bayesian and species tree approaches on a data set of 134 ingroup specimens (including samples from natural history collections and type material), we recover largely identical topologies that leave the position of only few clades uncertain and cover the intraand interspecific variation of 28 of the 29 described and six undescribed species. In addition to traditional support values, such as bootstrap and posterior probability, we quantify genealogical concordance to estimate the effects of conflicting evolutionary histories on phylogenetic inference. Our analyses reveal that the current taxonomic classification of the genus is inconsistent with the molecular phylogenetic inference, and we identify cryptic diversity in seven species. Divergence dating suggests that the split between Myrmecocystus and its sister taxon Lasius occurred in the early Miocene. Crown group Myrmecocystus started diversifying about 14.08 Ma ago when the gradual aridification of the southwestern United States and northern Mexico led to formation of the American deserts and to adaptive radiations of many desert taxa.


Introduction
The New World ant genus Myrmecocystus Wesmael, 1838 (Formicidae: Formicinae: Lasiini) is endemic to arid and semi-arid habitats of the western United States and Mexico and represents the most prominent formicine ant genus in all four American deserts (the Great Basin, Mojave, Sonoran and Chihuahuan deserts) (Snelling, 1982(Snelling, , 1976. Myrmecocystus ants are well-known for their "replete" workers, which are colloquially referred to as "honeypots" and store sugar, lipid and protein solutions in their expanded crops for regurgitation in times of low resource availability (Conway, 1990(Conway, , 1977. Repletism also occurs in at least five other ant genera (Conway, 1991;Froggatt, 1896;Roth, 1908;Schmid-Hempel and Schmid-Hempel, 1984;Schultheiss et al., 2010) and is frequently associated with arid environments, where water and food are scarce. In Myrmecocystus, repletes have been reported in 20 of the 29 described species and are believed to be an autapomorphy of the genus, which likely promoted its diversification in arid habitats (Snelling, 1982(Snelling, , 1976. A few Myrmecocystus species have been the focus of detailed study because of other intriguing life history traits. For example, unrelated queens can form foundress associations to initiate colonies, a phenomenon called pleometrosis and known in at least four species (Bartz and Hölldobler, 1982;Eriksson et al., 2019;Leonard, 1911;Wheeler, 1917). These associations occasionally persist even when colonies are mature (facultative primary polygyny; Eriksson et al., 2019;Hölldobler et al., 2011), unlike in most other pleometrotic species where queen culling follows colony foundation (Hölldobler and Wilson, 1977). Moreover, colonies of M. mendax and M. mimicus engage in intraspecific "ritualized tournaments" to defend territorial borders without physical fighting, which frequently results in brood and replete raids on the inferior colony (Eriksson et al., 2019;Hölldobler, 1976;Lumsden and Hölldobler, 1983). This facultative dulotic behavior also occurs interspecifically and without preceding tournaments (Hölldobler et al., 2011;Kronauer et al., 2003). Both primary polygyny and intraspecific brood stealing are thought to be rare in ants (Hölldobler and Wilson, 1990;Keller, 1993). To what extent these behaviors are present in other Myrmecocystus species remains to be discovered.
Despite their interesting biology and prominence in North American arid ecosystems, the natural and evolutionary histories of Myrmecocystus ants remain largely unknown. The genus originated approximately 18.5 Ma ago and is placed within or as sister taxon to the genus Lasius (Blaimer et al., 2015). The taxonomy within the genus Myrmecocystus has been subject to much debate and repeated revisions (e.g., Cole, 1936;Creighton, 1950Creighton, , 1956Emery, 1893;Forel, 1901;Gregg, 1963;McCook, 1882;Smith, 1951;Wheeler, 1908Wheeler, , 1913. The most recent systematic revision and taxonomic key were presented in a monograph by Snelling (1982Snelling ( , 1976, subdividing Myrmecocystus based on morphology into three subgenera (Myrmecocystus s. str., Endiodioctes, Eremnocystus), eight species groups and 29 species. Although Snelling's classification remains the most comprehensive and detailed to date, it still presents an unsatisfactory systematic framework. Workers of several species differ only marginally in diagnostic traits, and sexuals and minor workers may not be identifiable at all (Snelling, 1982(Snelling, , 1976. Intraspecific variation in morphology and behavior indicates that cryptic diversity likely occurs in several species (e.g., Eriksson, 2018;Eriksson et al., 2019;Hölldobler et al., 2011;Snelling, 1976). In M. mendax, for instance, inter-population differences in hair length, colony founding behavior, and social structure have been observed. In addition, a significant portion of the present diversity of the genus may be undescribed and thus not accounted for in the current taxonomy (Johnson and Ward, 2002;R. Johnson, pers. obs.). These issues have led to misidentifications, impeding the study of the ecology and evolution of many species in the genus. Thus, a thorough taxonomic revision based on a comprehensive phylogenetic framework is required to resolve species boundaries and reveal cryptic diversity.
Two previous studies aimed at resolving the molecular systematics of the genus Myrmecocystus (Kronauer et al., 2004;O'Meara, 2008). However, they left large parts of its phylogeny unresolved because they covereded only 15 (Kronauer et al., 2004) and 21 (O'Meara, 2008) of the 29 described species and the low number of markers used (three mitochondrial and nine mitochondrial/nuclear markers, respectively) provides only limited confidence. In addition, by including a single specimen for most species, intraspecific variability was unaccounted for and cryptic diversity remained undetected.
In recent years, ultraconserved elements (UCEs) have emerged as powerful molecular markers to study evolutionary relationships due to rapid advances in target enrichment and next-generation sequencing techniques (reviewed in Zhang et al., 2019). UCEs are genomic loci shared among distantly related taxa that are composed of a highly conserved core and two more variable flanking regions (Bejerano et al., 2004;Faircloth et al., 2012). Bait sets to capture and enrich these loci have been published for many taxa to date (e.g., Alfaro et al., 2018;Faircloth, 2017;Faircloth et al., 2012;Quattrini et al., 2018), leading to an ever-increasing number of studies employing UCEs as phylogenomic markers. The rising popularity of UCEs stems from several advantages they confer over traditional multi-locus approaches. First, thousands of loci can be sequenced in relatively short time and at little cost (Blaimer et al., 2015), even when DNA quality is low, which is often the case for museum specimens (Blaimer et al., 2016). The amount of sequence data generated in this way yields a higher resolution than traditional approaches (Blaimer et al., 2015;Gilbert et al., 2015), enables accurate divergence dating, and allows statistical testing of the processes that can lead to incorrect phylogenetic inference such as introgression or incomplete lineage sorting (e.g., Blair and Ané, 2019). Second, UCEs are phylogenetically informative across timescales and taxonomic ranks, ranging from ancient to populationlevel divergences, due to their variability in sequence conservation. For instance, they have successfully been applied to place the Formicidae as sister taxon to Apoidea  and to infer subfamily-, genus-, species-and population-level relationships within ants (Borowiec, 2019a;Branstetter et al., 2016Branstetter and Longino, 2019;Ješovnik et al., 2017;Pierce et al., 2017;Prebus, 2017;Ward and Branstetter, 2017). Finally, the increasing use of UCEs as phylogenomic markers results in a growing pool of published sequences, that can easily be combined with newly generated UCE, exome and transcriptome data to answer additional research questions (e.g., Bossert et al., 2019;Kieran et al., 2019). It also leads to a constant improvement of analysis workflows and software tools that will make phylogenetic inference from UCEs more robust (e.g., Allio et al., 2019;Andermann et al., 2018;Minh et al., 2020a;Tagliacollo and Lanfear, 2018).
The present study provides a comprehensive UCE-based phylogenetic framework for the genus Myrmecocystus for future behavioral, ecological, and evolutionary studies as well as for a thorough specieslevel taxonomic revision that is congruent with evolutionary history. Based on a data set of 2,324 UCE loci, we present a well-resolved phylogeny with 134 ingroup specimens (including samples from natural history collections, and type material for nine species) that covers six undescribed and 28 of the 29 described Myrmecocystus species while also capturing intraspecific variation. We find evidence for cryptic diversity in multiple taxa. Moreover, we estimate divergence times using four outgroup fossil calibrations, reconstruct the evolution of foraging activity, and present a scenario for the diversification of the genus.

Taxon sampling
Our sample set comprised 231 Myrmecocystus specimens, that were collected in the western United States and Mexico or borrowed from natural history collections of universities and museums. Samples were either stored in ethanol or point-mounted and were up to 58 years old. All specimens were reidentified using keys in Snelling (1976Snelling ( , 1982 and by comparison to type specimens. Initial species identifications are given in Table S1 and differed from ours for several specimens. UCE sequences were successfully generated for 211 Myrmecocystus specimens, covering 28 of the 29 described species (with type material for nine species) plus six undescribed species (including three of the four new species reported in Johnson and Ward, 2002; see Table S1 for further information). Fourteen species were added as outgroups. UCE sequence data were generated for six outgroup species. Sequences of the remaining eight species were extracted from published genome assemblies (Bonasio et al., 2010;Konorov et al., 2017) or obtained from Borowiec et al. (unpublished),  and Messer et al. (unpublished). Table S1 lists collection and voucher information for all specimens used in this study.

DNA extraction, library preparation and target enrichment
DNA was extracted non-destructively from specimens using the T. van Elst et al. DNeasy Blood & Tissue Kit or the QIAamp DNA Micro Kit (QIAGEN). Specimens were ventrally punctured three times and incubated overnight according to the respective kit's instructions. DNA concentrations were quantified using a Qubit® 2.0 Fluorometer (High Sensitivity Kit; Thermo Fisher Scientific, Inc.). Subsequently, 8.9-100 ng of sample DNA were sheared on a Q800R3 Sonicator (Qsonica, LLC) to an average size of 600 bp.
The sheared DNA was input to a modified genomic DNA library preparation protocol (KAPA Hyper Prep Kit; Kapa Biosystems, Inc.) as described in Faircloth et al. (2014). It includes "with-bead" cleanup steps (Fisher et al., 2011) using a generic SPRI substitute (Rohland and Reich, 2012) called speedbeads. Prior to adapter ligation, clean-up was performed with speedbeads providing a clean-up ratio of 2.0X SPRI and 1.1X SPRI for fragments shorter than 75 bp and 200 bp, respectively. For ligation, custom TruSeq-style, dual-indexing adapters (iTru: i5 and i7)  were used. One-half of the resulting library volume (15 μl) was then PCR-amplified with the following reaction mix: 25 μl KAPA HiFi HotStart ReadyMix (Kapa Biosystems, Inc.), 5 μl nuclease-free ddH 2 0 and 2.5 μl of each i5 and i7 primer. The PCR was run with the following thermal protocol: 98 • C for 45 s, 13 cycles of [98 • C for 15 s, 60 • C for 30 s, 72 • C for 60 s], and 72 • C for 5 min. Following PCR, another clean-up was performed with speedbeads at 1.2X SPRI. Final DNA concentrations were estimated using a Qubit® 2.0 Fluorometer (Broad Range Kit). For samples with a concentration below 10 ng/nl, the PCR was repeated with 14 cycles. From 8 to 10 libraries were subsequently pooled together at equimolar ratios and reduced in vacuo to produce 3.4 μl of pooled library with a desired DNA concentration of 147 ng/μl (=500 ng). Final DNA concentrations of pools ranged from 28.7 to 185 ng/μl. Pools were enriched using 9,446 probes (myBaits®; Arbor Biosciences) targeting 2,524 UCE loci  and following the hybrid selection protocol by Blumenstiel et al. (2010). The exception to this protocol was using a 0.1X dilution of the original myBaits® concentration and adding 0.7 μl of 500 μM custom blocking oligos (designed against the custom sequence tags). After 24 h of enrichment incubation at 65 • C, all pools were bound to streptavidin beads (Dynabeads™ MyOne™ Streptavidin C1; Thermo Fisher Scientific, Inc.) and purified according to Blumenstiel et al. (2010). For PCR recovery, 15 μl of the streptavidin bead-bound enriched libraries were combined with a reaction mix of 25 μl KAPA HiFi HotStart ReadyMix, 2.5 μl of each Illumina TruSeq primer (forward and reverse) and 5 μl nuclease-free ddH 2 0 (see Faircloth et al., 2014 for details on with-bead approach to PCR recovery). The PCR was run with the following thermal protocol: 98 • C for 45 s, 20 cycles of [98 • C for 15 s, 60 • C for 30 s, 72 • C for 60 s], and 72 • C for 5 min. The resulting reactions were purified with speedbeads at 1.0X SPRI clean-up.
DNA concentrations of enriched libraries were estimated on a Qubit® 2.0 Fluorometer (Broad Range Kit). To measure DNA concentrations with higher confidence, qPCR was performed on a 7500 Real Time PCR System (Applied Biosystems™, Thermo Fisher Scientific, Inc) using the KAPA Library Quantification Kit (Kapa Biosystems, Inc.) with the KAPA SYBR® FAST qPCR Master Mix and universal Illumina primers. For each pool, dilutions of 1:1,000,000 and 1:2,000,000 were analyzed (three technical replicates each). DNA concentrations were then estimated assuming that the average library fragment length was 600 bp. Based on these concentrations, libraries were pooled again at equimolar concentrations to a final volume of 200 μl. This resulted in two final pools that included libraries for 100 and 121 samples, respectively. The pooled libraries were then sequenced using two full lanes of a 125-cycle paired-end Illumina HiSeq 2500 run at the University of Utah High Throughput Genomics Core Facility at Salt Lake City.

Data processing and alignment
Raw sequence data were demultiplexed and converted to FASTQ format by the University of Utah High Throughput Genomics Core Facility. All raw reads were cleaned using default settings in Illumiprocessor (Faircloth, 2013), a wrapper around Trimmomatic (Bolger et al., 2014). Default settings in metaSPAdes v3.13.1 (Nurk et al., 2013) were used for de novo raw read assembly. The resulting contigs were processed using several scripts included in the Phyluce software package v1.6.7 (Faircloth, 2016). First, probe sequences were matched against contigs to identify enriched UCE loci and filter those for paralogs with the script phyluce_assembly_match_contigs_to_probes and default settings. Identified loci in each taxon were stored in a relational database. Two different taxon sets were created (see Table S1). Phylogenetic analyses were performed on 134 Myrmecocystus specimens and seven outgroups (Nylanderia terricola, Paratrechina longicornis and five Lasius species). For divergence time estimation, a reduced taxon set was created to alleviate computational burden and increase confidence in the input tree topology. It retained only one representative per Myrmecocystus taxon and excluded low-quality specimens and taxa with incongruent placement between phylogenetic inference on concatenated UCE loci and species tree analyses (except for the clade containing M. pyramicus and M. christineae, see Section 2.6). This resulted in 29 Myrmecocystus specimens. Twelve formicine and one ectatommine species were added for calibration. The script phyluce_assem-bly_get_match_counts was then run to search the database and output a list with UCE loci enriched for each taxon in these sets (option -incomplete-matrix). This list was used as input for phyluce_assem-bly_get_fastas_from_match_counts to extract sequences for each locustaxon combination and store these in a monolithic FASTA file. Finally, phyluce_assembly_explode_get_fastas_file was used to explode the monolithic file per locus. In addition to raw read assemblies, UCE loci extracted with the scripts phyluce_probe_run_multiple_lastzs_sqlite and phyluce_probe_slice_sequence_from_genomes (extracted flanking region set to 400) from the published genome assemblies of Lasius niger and Camponotus floridanus were input into this pipeline.
Contig and UCE locus count and lengths were calculated with the Phyluce script phyluce_assembly_get_fasta_lengths. The effect of specimen age and preservation method (ethanol vs. point-mounted) on UCE capture success was estimated using the F-test of linear regression analysis in R v3.5.1 (R Core Team, 2018).
UCE loci with a taxon completeness of at least 70% for phylogenetic analyses (>98 of 141 taxa) and 90% for divergence dating (>37 of 42 taxa) were aligned with default settings in MAFFT v7.429 (Katoh and Standley, 2013) and trimmed using the -gappyout method of trimAl v1.4.rev15 (Capella-Gutiérrez et al., 2009). After concatenating trimmed single-locus alignments with AMAS (Borowiec, 2016), spruceup (Borowiec, 2019b) was used to detect outlier sequences and substitute these with missing data indicators (window_size = 20, overlap = 15, criterion = lognorm, cutoffs = 0.98 and 0.99 for phylogenetic analyses and divergence dating, respectively). Phylogenetic inference on the 141taxon alignment (hereafter referred to as the main alignment) produced unrealistically long branches for several taxa (see Fig. 1). Therefore, two alternative alignments were created for maximum likelihood phylogenetic inference on concatenated UCE loci to estimate the effects of long branch attraction on the phylogenetic position of these taxa and on the overall tree topology: alignment produced as above but long-branch taxa were removed from the alignment prior to running spruceup (alternative 1), and alignment produced as above but long-branch taxa received manual cutoff values in spruceup for more conservative sequence trimming (alternative 2; Table S2). Subsequent screening of all concatenated alignments in AliView (Larsson, 2014) did not reveal any obviously misaligned regions. Finally, alignment statistics were calculated with AMAS.

Phylogenetic analyses: Concatenated UCE loci
The main alignment was partitioned by locus and the resulting scheme was optimized with the rclusterf algorithm (Lanfear, 2015; Nodal support represents percent ultrafast bootstrap (1000 replicates), posterior probability from the Bayesian inference by ExaBayes that resulted in the same topology, gene concordance factor, and site concordance factor (in this order). Asterisks indicate type specimens. Lanfear et al., 2014) in PartitionFinder v2.1.1 (Lanfear et al., 2016) while simultaneously selecting for appropriate models of evolution (criterion: AICc). This partitioning scheme was also applied to the two alternative alignments. Tagliacollo and Lanfear (2018) argued that partitioning by locus was not sufficient due to substantial rate variation within UCE loci. They proposed splitting each locus into triplets (a core and two flanking regions) and using these as input partitions for Parti-tionFinder rather than single loci. This was not done for our data set because the approach can overestimate branch lengths during phylogenetic inference (M. Borowiec, pers. obs.) and computation time of PartitionFinder v2.1.1 with triplets as input partitions was not feasible.
Maximum likelihood (ML) analyses were performed on the three partitioned alignments using IQ-TREE v1.6.11 (Nguyen et al., 2015) with ultrafast bootstrapping (1000 replicates). Number of unsuccessful iterations needed to stop the search of the parameter space was set to 200 while all other settings were kept at default. Four independent runs were conducted for each alignment.
Bayesian tree inference was performed on the main alignment with ExaBayes v1.5 (Aberer et al., 2014) on the CIPRES science gateway v3.3 (Miller et al., 2011) using the same partitioning scheme as for the ML analysis. The analysis was carried out with two runs and four coupled chains for 1 million generations with a burn-in of 25%. Consensus trees were created with the consense algorithm of ExaBayes. Convergence of chains and an adequate effective sampling size (ESS > 200) were evaluated with Tracer v1.7.1 . All trees were rooted with Nylanderia terricola and Paratrechina longicornis.

Phylogenetic analyses: gene-tree species-tree discordance
Phylogenetic inference based on a concatenated alignment assumes that all sites share the same evolutionary history. This assumption is likely violated due to processes such as incomplete lineage sorting or introgression (Maddison, 1997). Therefore, ASTRAL v5.6.3 (Zhang et al., 2018) and SVDquartets, as implemented in PAUP* v4.0a (Chifman and Kubatko, 2014;Swofford, 2002), were run with default settings to estimate effects of gene-tree conflict on species-tree inference under the multispecies coalescent. The statistical binning pipeline by Mirarab et al. (2014) was used to bin UCE loci into supergenes prior to running ASTRAL. Gene trees necessary as input were then inferred by ML with IQ-TREE. Model search was performed with ModelFinder (Kalyaanamoorthy et al., 2017) and restricted to GTR, GTR + Γ and GTR + Γ + I (criterion: AICc). Branches with bootstrap support values below 20% were contracted in resulting gene trees using Newick utilities (Junier and Zdobnov, 2010). ASTRAL was also run in this way on an unbinned data set because Adams and Castoe (2019) showed that binning can lead to profound model violations during coalescent-based species tree inference. All trees were rooted with Nylanderia terricola and Paratrechina longicornis.
To further estimate the effects of conflicting evolutionary histories on phylogenetic inference, concordance factors were calculated for the ML tree topology using the algorithm of Minh et al. (2020a), as implemented in IQ-TREE v2.0.5 (Minh et al., 2020b). Concordance factors provide information on the variation present in the data. Consequently, they are valuable supplements to phylogenetic support values such as bootstrap and Bayesian probabilities, which are measures of sampling variance, which is expected to be low for large phylogenomic data sets (Kumar et al., 2012;Minh et al., 2020a).

Divergence time estimations
Divergence times were estimated with MCMCTree of PAML v4.8 (Yang, 2007), using two alternative input topologies and the 42-taxon alignment (Table S6). The input topologies differed in the position of the clade containing M. pyramicus and M. christineae, which was incongruent between phylogenies inferred from concatenated UCE loci (IQ-TREE and ExaBayes) and species tree analyses (ASTRAL and SVDquartets) (see Sections 3.2 and 3.3). Given that there is no fossil record for Myrmecocystus, minimum age constraints were placed on four nodes based on Priabonian Baltic amber fossils of the outgroups Formica, Lasius and Nylanderia (Perkovsky, 2016) and the New Jersey Cretaceous amber fossil †Kyromyrma neffi (Grimaldi and Agosti, 2000). Maximum age constraints for three nodes and the minimum root age were taken from the 95% highest posterior densities of the time-calibrated formicine phylogeny (UCE-100 best) of Blaimer et al. (2015). The maximum age of the split between Lasius and Myrmecocystus was not constrained in this way because it would have resulted in an extremely narrow age range of 33.9-36.2 Ma. Divergence dating was also performed with an alternative calibration scheme that placed minimum and maximum age constraints on the stem of the Prenolepis genus group (represented by the clade containing Nylanderia terricola and Paratrechina longicornis in our tree) in addition to the other calibrations. This was because Boudinot et al. (unpublished) indicated that the Baltic amber species †L. schiefferdeckeri and †Prenolepis henscheii might be placed as sister taxa to the Lasius and the Prenolepis genus groups rather than within these, respectively. Age constraints of both calibration schemes are given in Table S3. All analyses were run on unpartitioned concatenated alignments using the independent-rates clock model and the HKY85 + Γ model of sequence evolution. For each analysis, two independent chains were run for 150 million generations at a sampling frequency of 5,000 and with a burn-in of 10%. Convergence of chains and an adequate ESS (>200) were evaluated with Tracer v1.7.1. Prior and posterior age distributions were plotted and compared using MCMCtreeR (Puttick, 2019) to ensure that signal is coming from the actual sequence data and not from prior belief, as suggested by Brown and Smith (2018).

Ancestral state reconstruction
Detailed and reliable natural history information across Myrmecocystus species is not available. Moreover, taxonomic problems and presence of potential cryptic and undescribed species likely led to previous misidentifications that make natural history information even less reliable. Accordingly, reconstructing ancestral states of habitat distribution range or other characters remains difficult for this genus. One of the better documented traits across Myrmecocystus species is foraging activity (i.e., diurnal, nocturnal, matinal/crepuscular). This trait likely played an important role in the diversification of Myrmecocystus ants and is crucial for understanding the evolution of adaptations to the desert heat, such as pilosity, pigmentation, and walking speed. Therefore, evolution of foraging activity was reconstructed using Bayesian stochastic character mapping in SIMMAP (Bollback, 2006), as implemented in the phytools R package v0.6-99 (Revell, 2012). As input, the two topologically different divergence dated trees calibrated without age constraints on the stem of the Prenolepis genus group were used. Analyses were conducted with 500 simulations and three different reconstruction models: equal rates (ER), symmetrical rates (SYM), and all rates different (ARD). Best model fit (criterion: AICc) was estimated using the fitDiscrete function of the R package geiger v2.0.6.2 (Harmon et al., 2008). Foraging activity was coded into the three categories diurnal, nocturnal and matinal/crepuscular based on literature records. Prior probabilities are given in Table S4.

UCE sequencing statistics
From the 236 samples for which DNA was extracted, 17 were excluded from UCE sequencing because of insufficient pre-or postlibrary preparation DNA concentrations. Sequencing failed for three of the remaining 219 samples, including the only M. ewarti specimen. This resulted in 211 Myrmecocystus and five outgroup specimens for which UCE sequences were successfully generated. UCE capture success (measured as total UCE base pairs recovered per sample) significantly decreases with specimen age (F 1210 = 302.28, p < 0.001), but is not affected by preservation method (F 1210 = 0.32, p = 0.572) or the interaction of age × preservation method (F 1210 = 3.45, p = 0.065) ( Supplementary Fig. S1). Library preparation and UCE capture statistics are given in Table S5.
Two taxon sets were created for phylogenetic analyses and divergence dating (Table S1). For the 141 taxa selected for phylogenetic analyses, we recovered an average of 2,288 loci with a mean length of 793 bp. For the 42 specimens selected for divergence dating, we recovered an average of 2,378 loci with a mean length of 1,070 bp (Table 1).

Phylogenetic analyses: Concatenated UCE loci
The final concatenated matrices after locus filtering and alignment cleaning included 2,324 UCE loci with a total length of 1,585,695 bp and 11.6-14.2% phylogenetically informative sites. Matrix statistics and the proportion of missing data per specimen are given in Tables S6 and S7, respectively. Alignments were partitioned into 646 subsets ranging in length from 185 to 13,445 bp (mean length = 2,455 bp) using the rclusterf algorithm of PartitionFinder. All four independent ML analyses on the main alignment converge on the same tree topology with strong support for most nodes (Fig. 1). Ultrafast bootstrap values are below 95% for only five nodes that are divergences at the species level or deeper. The position of M. nequazcatl as a sister taxon to a clade composed of M. kennedyi and M. romainei receives a particularly low support of 44%. Bayesian analyses result in two conflicting topologies that differ in the position of the M. nequazcatl clade but are otherwise congruent with each other and the ML topology ( Fig. 1 and S2). Posterior probabilities are 1 for all but two nodes, which represent withinspecies divergences. Both MCMC chains reached stationary after approximately 50,000 generations. The average standard deviation of split frequencies (ASDSF) is zero and ESS values are >200 for all parameters. In both ML and Bayesian trees, several specimens exhibit considerably longer branch lengths than their sister taxa ( Fig. 1 and S2). This is mostly the case for old, point-mounted material, which has generally lower sequence quality (Table S5). Therefore, ML analyses were also performed on two alternative alignments: removing lowquality specimens from the alignment (alternative 1) does not result in topological differences (Fig. S3), and trimming sequences of low-quality specimens more conservatively (alternative 2) results in a different placement of the taxa M. lugubris, M. nequazcatl and the paratype Myr-mecocystus_tenuinodis_RRS_no#_USA_CA (Fig. S4).
In all analyses, the genus Myrmecocystus is divided into three major clades that reflect the subgeneric classification by Snelling (1982Snelling ( , 1976, but with two exceptions. Species in the nominate subgenus, i.e., Myrmecocystus s. str., are the sister group to all other Myrmecocystus species. However, in contrast to Snelling (1982Snelling ( , 1976, Myrmecocystus testaceus is inferred outside of this subgenus as the sister taxon to Endiodioctes and Eremnocystus. In addition to M. testaceus, M. yuma is not placed within its subgenus because it falls into the Endiodioctes rather than into the Eremnocystus clade. The placement of these two species renders the three subgenera described by Snelling (1982Snelling ( , 1976) non-monophyletic. The five Endiodioctes species groups and the testaceus group are not monophyletic either. Moreover, our phylogeny disagrees with both previous phylogenetic hypotheses for the genus Myrmecocystus (Kronauer et al., 2004;O'Meara, 2008)  Our phylogenetic analyses combined with morphological examination identify multiple distinct clades for the taxa M. mendax, M. mexicanus and M. placodops (e.g., M. sp. cf. mendax-01, M. sp. cf. mendax-02, etc. in Fig. 1). Specimens of M. tenuinodis also cluster separately but with very low support because of the uncertainty in the placement of the paratype (Fig. 1, S3, S4, Fig. 1), and a number of specimens in these clades also exhibit intermediate morphological traits. These findings strongly suggest the presence of cryptic diversity in the abovementioned species. At this point it is not possible to determine which clades represent the true species because neither respective type specimens nor the type localities were included in our analyses (except for M. tenuinodis).
The six undescribed Myrmecocystus species (see Table S1) are placed congruently and with high statistical support in all analyses. Myrmecocystus sp. cf. kennedyi and M. sp. cf. navajo fall into the M. kennedyi and M. testaceus clades, respectively, but differ from these in morphology. The remaining four species are genetically distinct from all described species.

Phylogenetic analyses: Gene tree species tree discordance
ASTRAL species tree topologies for the binned (Fig. S5) and unbinned (Fig. S6) UCE data sets are congruent except for the positions of M. koso and M. tenuinodis (paratype). Both topologies differ from the one recovered by SVDquartets (Fig. S7) Table S7). This suggests that the uncertainty in the position of these clades results from low sequence quality rather than from evolutionary processes such as incomplete lineage sorting.
Gene and site concordance factors (gCF) range from 0.05% to Table 1 Ultraconserved element (UCE) sequencing and capture statistics (mean, minimum, maximum, and standard deviation) averaged for specimens in the two taxon sets that were used for concatenated phylogenetic inference and divergence time estimation. For detailed statistics of all specimens, see Table S5. 94.44% (mean = 18.12%) and 31.25-98.01% (mean = 60.84%), respectively (Fig. 1). Particularly low genealogical concordance (gCF < 5%) is found for 30 nodes, most of which represent within-species divergences where genealogical concordance is expected to be low due to ongoing gene flow. Low mean estimates of concordance factors can also be explained by the young age of the genus Myrmecocystus (~14.08 my; see Section 3.4) which results in overall short branches. This makes it difficult to resolve taxon relationships especially for single locus trees. Detailed results of the concordance factor analysis are given in Table S8 and can be assigned to nodes via Fig. S8.

Divergence dating
The concatenated matrix for divergence dating included 2,238 UCE loci with a total length of 2,063,183 bp and 18.8% phylogenetically informative sites after locus filtering and alignment cleaning. Matrix statistics and the proportion of missing data per specimen are given in Tables S6 and S7, respectively. MCMCTree runs reached adequate convergence with ESS values above 200 for all parameters. Comparison of prior and posterior age distributions indicate that divergence time estimates are not biased by calibration priors (Figs. S9-S12). Independent runs with the same input topology and calibration scheme converge on similar divergence times with maximum age differences of 0.35 Ma. Alternative input topologies and calibration schemes also produce similar divergence times, that differ by maximally 0.68 and 1.42 Ma, respectively (Figs. S13-S16). For these reasons, we focus on results obtained from running MCMCTree on the topology inferred by ML from concatenated UCE loci with the calibration scheme that did not include age constraints on the stem of the Prenolepis genus group (Fig. 2 and Table 2). Accordingly, the split between Myrmecocystus and Lasius occurred in the early Miocene, about 18.84 (95% HPD: 13.5-26.8) Ma ago, which is significantly younger than the minimum age constraint we placed on stem-group Lasius based on Priabonian Baltic amber fossils (33.9 Ma; Table S3). A similar divergence time for this split (18.55 Ma ago) and incongruence with its minimum age constraint was found by Blaimer et al. (2015). Crown group Myrmecocystus started diversifying approximately 14.08 (95% HPD: 10.2-19.9) Ma ago, diverging into Myrmecocystus s. str. (excluding M. testaceus) and a clade consisting of the remaining described species. Around 11.11 (95% HPD: 8-15.8) Ma ago, the split between the subgenera Endiodioctes and Eremnocystus occurred. Endiodioctes experienced a major radiation starting 8.1 (95% HPD: 5.9-11.6) Ma ago which is responsible for present day diversity of the subgenus.

Ancestral state reconstruction
Posterior probabilities for ancestral states of foraging activity are largely congruent between input topologies but differ between the three reconstruction models. The equal rates model was best fitted for both topologies but differences in AICc are small. Results of the ancestral state reconstruction are given in Fig. 2 and S17. All models suggest with high support that the ancestral foraging activity for the subgenera Endiodioctes and Eremnocystus was diurnal and matinal/crepuscular, respectively. Myrmecocystus s. str. and the genus as a whole most likely originated from a nocturnal ancestor, but matinal/crepuscular activity receives some support as well under the ER and SYM model. We note that reconstruction of ancestral foraging activity might have been affected by incomplete taxon sampling because we excluded at least seven Myrmecocystus taxa for divergence dating and consequently also for this analysis due to low sequence quality and/or incongruent Fig. 2. Divergence time estimates and ancestral states of foraging activity for the topology inferred via maximum likelihood analysis on the main concatenated and partitioned UCE data matrix (Fig. 1). Divergence time estimation was performed by MCMCTree on an unpartitioned alignment under the independent-rates clock model and the HKY85 + Γ model of sequence evolution. The calibration scheme used is shown in Table S3 (without constraints on stem Prenolepis genus group). Ancestral state reconstruction was performed by SIMMAP under the equal rates model. Prior probabilities of foraging activity were assigned as shown in Table S4. Node numbers refer to Table 2. Node bars represent 95% highest posterior density ranges of divergence times. Pie charts on nodes represent posterior probabilities of foraging activity. Timings of the middle Miocene climatic optimum and Neogene uplift are according to Zachos et al. (2001) and Wilson and Pitts (2010), respectively. Outgroups other than Lasius have been removed from the tree. placement between phylogenetic inference on concatenated UCE loci and species tree analyses.

Phylogeny and systematics
Our UCE based analyses provide a well-resolved phylogeny for the genus Myrmecocystus including all described species except M. ewarti, six undescribed species, and intraspecific variation for most of these species. Our phylogeny largely clarifies the molecular systematics within Myrmecocystus with high statistical support and is the first comprehensive molecular evaluation of Snelling's classification (1982Snelling's classification ( , 1976. We used old point-mounted specimens (up to 58 years old) from natural history collections including type material for nine species to complement the taxon sampling in our study. In this way, we were able to include many Myrmecocystus species in the phylogeny that are scarcely sampled and for which recent samples were not available. UCE capture success was negatively correlated with specimen age, and, consequently, old samples had lower numbers of recovered UCE base pairs and a higher percentage of missing sequence data in alignments (Fig. S1, Tables S5 and S7). Most likely as a result of this, many old specimens exhibited artificially long branches in phylogenies inferred from the main concatenated alignment. While this did not seem to compromise overall tree topology, it likely confounded the phylogenetic positions of the taxa M. lugubris, M. nequazcatl and the paratype Myr-mecocystus_tenuinodis_RRS_no#_USA_CA (Section 3.2). Most of the remaining topological uncertainties, that were revealed by species tree approaches and the concordance factor analysis, were also associated with old material (an exception is the clade comprised of M. christineae and M. pyramicus). However, this does not mean that topological ambiguities were present for all old samples. In fact, many of these specimens could be placed with high statistical support and congruently across phylogenies. In total, our results corroborate the potential of UCE data harvested from historic material to resolve clades with few available specimens (Blaimer et al., 2016) while also illustrating limitations of this approach. The higher phylogenetic uncertainties in our study compared to Blaimer et al. (2016), who used Xylocopa bees to illustrate the potential of museum material for UCE-based phylogenomics, most likely stem from the much smaller body size of Myrmecocystus ants. For a number of specimens, it was not possible to obtain sufficient input DNA for robust phylogenetic inferences. Therefore, phylogenetic hypotheses inferred from historic material alone should be interpreted carefully and thoroughly tested (e.g., by species tree approaches and concordance factor analyses), especially for small insect species.
We discovered higher species diversity in the genus Myrmecocystus than previously thought by including multiple specimens for most described species. Evidence for cryptic diversity was found in at least seven species, namely M. mendax, M. mexicanus, M. placodops and the taxon pairs M. mimicus / M. flaviceps and M. kennedyi / M. romainei, because multiple well-supported clades were recovered for these species (Fig. 1). It had already been recognized by Snelling (1976) that variation in morphology (e.g., pilosity) indicative of cryptic diversity exists in all of these species except M. kennedyi. Our findings show that part of this variation is indeed interspecific because several clades cluster more closely with a different taxon than with their 'conspecifics'. For example, M. sp. cf. placodops-03 groups with the M. koso paratype rather than with the other M. sp. cf. placodops clades (Fig. 1). While these clades can certainly be considered distinct species, it remains unclear whether sister clades (e.g., M. sp. cf. mendax-03 and M. sp. cf. mendax-04) also present separate species. Discovering populations where representatives of both clades occur in sympatry and testing for reduced gene flow and genetic differentiation will be necessary to clarify the taxonomic status of these clades. Evidence for such differentiation has been presented for M. mendax by Eriksson (2018). Alternatively, the observed genetic and morphological differentiation might represent geographic clines, as sister clades in our phylogeny were not sampled in sympatry. This hypothesis may be supported by the clines in pilosity, punctuation and/or worker size, that were observed by Snelling (1976) in M. mendax, M. mimicus / M. flaviceps and M. romainei. In any case, it will prove promising to investigate whether morphological traits and behavioral traits previously described to be polymorphic (colony founding behavior, social structure) (Eriksson, 2018;Eriksson et al., 2019;Hölldobler et al., 2011;Snelling, 1976) differ significantly between the sister clades. For example, colony founding behavior and social structure should be compared in regions where clades have overlapping distributions (if present) because social structure variation has been hypothesized to promote sympatric speciation in ants (e.g., Rabeling et al., 2014;Shoemaker and Ross, 1996;Ward, 1989). Snelling's morphology-based revision (1976Snelling's morphology-based revision ( , 1982 predicted the molecular systematics of the genus Myrmecocystus with high accuracy. The three subgenera Myrmecocystus s. str., Endiodioctes and Eremnocystus are well-supported clades that are only rendered non-monophyletic due to the placement of two species. In addition, except for seven taxa that likely represent complexes rather than single species, the species recognized by Snelling for which we included multiple specimens were recovered as monophyletic. Nevertheless, the inconsistencies between Snelling's classification and the molecular phylogenetic hypothesis highlight that a modern taxonomic revision is required that redefines subgenera, species groups and species boundaries while accounting for the undescribed species and uncovered cryptic diversity. Special attention should be directed towards resolving the relationships within the  best, a comprehensive taxonomic study will also identify new diagnostic traits that allow for identifying species without relying on genetic data despite the low interspecific phenotypic variation in the genus. The necessity of such diagnostics becomes especially apparent considering the discrepancy between our species identifications and those of collectors (see Table S1). Facing these taxonomic issues will require more sampling and the integration of distributional, morphological, and genetic data because species boundaries are difficult to define using only morphological characters, and type specimens of several species might be too old to yield sufficient molecular data. Besides facilitating the identification of reliable and consistent diagnostic traits as well as species limits in cryptic taxa, this will also help to determine the phylogenetic positions of clades that we could not place with high statistical support.

Evolutionary history
In this study, we inferred divergence times for Myrmecocystus via outgroup calibration, while accounting for uncertainties in tree topology and the fossil record. Our analyses suggest that the genus diverged from its sister genus Lasius approximately 19 Ma ago, being consistent with previous estimates of Blaimer et al. (2015), who used UCEs to infer a comprehensive and dated phylogeny for the ant subfamily Formicinae. This divergence time estimate indicates that the Priabonian Baltic amber fossils described in Perkovsky (2016) belong to the stem group of the clade containing Lasius and Myrmecocystus rather than to the genus Lasius (even when considering the 95% HPD of 13.5-26.8 Ma ago). Even though Blaimer et al. (2015) used different taxon sampling and inferred divergence times with BEAST  instead of MCMCTree, their estimates are not independent from ours because some of our age constraint were based on their 95% HPD ranges. Independent evidence and morphological re-examination of the fossils will therefore be necessary to clarify their taxonomic position.
While the study of Blaimer et al. (2015) resulted in a confident age estimate for the genus Myrmecocystus, it was inconclusive about the taxonomic position of its most recent common ancestor because two conflicting topologies for the relationship between Lasius and Myrmecocystus were inferred. One topology suggested that the two genera share a common ancestor and are each other's sister group. The alternative topology indicated that Lasius might be paraphyletic with Myrmecocystus originating from within Lasius, because L. californicus clustered more closely with M. flaviceps than with L. niger. We cannot determine with certainty which evolutionary scenario took place because of limited Lasius sampling, but our findings strongly support the sister taxon relationship between Myrmecocystus and Lasius. All of our phylogenetic analyses recovered a well-supported monophyletic clade comprised of all included Lasius samples, suggesting that L. arizonicus, a close relative to L. californicus within the subgenus Acanthomyops (Manendo, 2008), is more closely related to L. niger than to Myrmecocystus (Fig. 1). Our data support the hypothesis that Lasius and Myrmecocystus split from a common ancestor with a Nearctic (or Holarctic) distribution. This split probably happened in the western United States or Mexico considering that Myrmecocystus is endemic to these regions. It remains difficult to reconstruct the ancestral range of the genus because our knowledge regarding the geographic distribution of most species is incomplete.
According to our findings, diversification of Myrmecocystus ants started during the Middle Miocene about 15 Ma ago and peaked between 10 and 5 Ma ago, eventually leading to the present day diversity found across the four American deserts, i.e. the Great Basin, Mojave, Sonoran, and Chihuahuan deserts. Most likely, these deserts formed as a consequence of widespread uplift of the American Cordillera (including the Rocky Mountains, the Sierra Nevada and the Sierra Madre Occidental as well as the Oriental ranges) that produced a rain shadow over the inland southwestern United States and northern Mexico leading to a significant decrease in precipitation (Ruddiman and Kutzbach, 1989). The exact timing of the formation of the deserts remains controversial and more biological and geological evidence is needed for confident estimates, but the majority of studies agrees that this uplift started approximately 15 to 10 Ma ago (Hay et al., 2002;Wilson and Pitts, 2010). The resulting aridification fragmented existing habitats and opened up new niches. Moreover, the end of the middle Miocene climatic optimum about 15 Ma ago led to a gradual decrease in global temperature (Knorr et al., 2011;Zachos et al., 2001), potentially allowing the colonization of more arid habitats. Under these conditions, the radiations of many successful plant and animal taxa of the American deserts took place (e.g., Leopold et al., 1992;Moore and Jansen, 2006;Van Devender, 1995). Our divergence time estimates suggest that the evolutionary diversification of Myrmecocystus ants can also be attributed to the abovementioned extensive climatic and biotic changes. Plausibly, the key adaptation repletism, that occurred early in the evolution of the genus, promoted the colonization of many newly emerging resource poor habitats by enabling the storage of water, sugar and proteins. The subsequent differentiation of foraging activity in the three subgenera further increased the number of available niches by minimizing foraging competition in shared habitats. The ancestral state estimation suggests that the common ancestor of Myrmecocystus was most likely nocturnal, followed by the evolution of matinal/crepuscular foraging about 13-14 Ma ago and two independent origins of diurnal foraging from a matinal/ crepuscular ancestor (in the stem of Endiodioctes about 8-11 Ma ago and in M. colei). The evolution of matinal/crepuscular and subsequently diurnal foraging was likely enabled by the gradual decrease in temperature starting about 15 Ma ago as higher temperatures during the middle Miocene climatic optimum would have made foraging even during dusk or dawn difficult. Further evidence will be necessary to confirm this sequential hypothesis for the evolution of foraging activity in Myrmecocystus because our analyses also provide some support for matinal/ crepuscular foraging as the ancestral state and potentially suffered from incomplete taxon sampling.
Funding This research was supported by Arizona State University and awards from the U.S. National Science Foundation (DEB-1654829 and CAREER DEB-1943626) to Christian Rabeling. In addition, this research was funded by the German Research Foundation (DFG: https://www.dfg. de/) as part of the SFB TRR 212 (NC3), project number 396780988.
Funding sources were not involved in study design, collection, analysis and interpretation of data, manuscript writing, or the decision to submit the paper for publication.
Research Data To increase scientific transparency and reproducibility of this study, we documented experimental design and analyses using the phylotocol template of DeBiasse and Ryan (2019) (Supplementary material). Trimmed reads generated for this study are available at the NCBI Sequence Read Archive (BioProject ID PRJNA657464). Individual Bio-Sample accessions are given in Supplementary Table S1. Sequence files, alignments, configuration files, analyses output files and a notebook file (.ipynb) with bash scripts are available at Zenodo (https://doi.org/ 10.5281/zenodo.4061988).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.