The phylogeny and global biogeography of Primulaceae based on high-throughput DNA sequence data

angiosperm family Primulaceae is morphologically diverse and


Introduction
The angiosperm family Primulaceae comprises >2600 species, distributed nearly worldwide and representing diverse ecologies (Stevens, 2001 onward). In the APG IV (2016) system followed here, Primulaceae is usually regarded as having four subfamilies, i.e., Primuloideae, Myrsinoideae, Theophrastoideae, and Maesoideae (Notes S1). Morphological diversity within the family ranges from tropical trees (e.g., Ardisia copelandii Mez) to lianas, woody shrublets, alpine cushion plants, and herbs ( Fig. 1). Species of several genera, including Androsace L., Cyclamen L., and Primula L., are cultivated as ornamentals. Others, including Ardisia Sw., Embelia Burm.f., Lysimachia L., and Myrsine L., have a history of use in traditional medicine (Quattrocchi, 2012). Aegiceras Gaertn. is widely distributed along coastal Indo-Malaya, Australia, and islands of the Pacific and is one of only 31 extant angiosperm genera with species that inhabit mangrove ecosystems (Duke, 2017). Some species of Ardisia and Lysimachia are considered troublesome invasives outside their native ranges (Muñoz and Ackerman, 2011). Species of the genera Hottonia L. and Samolus L. have evolved to live in wet and even aquatic habitats, and some species of Samolus are also salt-tolerant (Ståhl, 2004). This extensive diversity makes Primulaceae an exceptional system for studying evolutionary patterns in morphology, life history, and biogeography. Knowledge on the evolutionary relationships among the clades of Primulaceae and their position within angiosperms has changed dramatically over the past three decades with the application of molecular phylogenetic methods (Notes S1). Despite many advances, notable gaps in understanding the diversification of subclades within the family remain. Recent studies have suggested that Ardisia, a pantropical genus and the largest in the family with ca. 700 species, is not monophyletic (Julius et al., 2021;Rose et al., 2018;Yang and Hu, 2022). These studies have focused primarily on Paleotropical species and have used data from only a few genes, mostly those from nuclear ribosomal and chloroplast regions. Thus, the relationships among Neotropical Ardisia and closely related genera and the biogeography of these clades remain largely unexplored. Morphologically based generic concepts have led some authors to suggest that several taxa in Myrsinoideae, including Hymenandra A.DC. ex Spach Ricketson, 2000, 1999), have disjunct amphi-Pacific tropical distributions, but whether these represent recent trans-oceanic dispersal events, relicts of ancient boreotropical forests, or artifacts of taxonomic error have not been rigorously assessed. Lack of phylogenetic resolution among members of Myrsinoideae, including Ardisiandra Hook.f., Coris L., and Stimpsonia C. Wright ex A.Gray has limited the ability to identify transitions in morphology (Wanntorp et al., 2012), such as herbaceous versus woody habit and capsular versus drupaceous fruits. Whether Oncostemum A. Juss. is rendered paraphyletic by the inclusion of Badula Juss., both of which are endemic to islands of the Indian Ocean, is uncertain because of low phylogenetic support, thus obscuring the evolution of floral morphology in the clade (Bone et al., 2012;Strijk et al., 2014). Poor understanding of relationships among other morphologically similar taxa with overlapping ranges such as Discocalyx Mez and Tapeinosperma Hook.f. has also limited our understanding of evolution in those groups, including transitions among mating systems, the characters from which have historically been used to differentiate genera. There is thus a particular need for a phylogenetic investigation that includes broad sampling across the entirety of subfamily Myrsinoideae to advance understanding of the global biogeographic and morphological history of Primulaceae.
In this study, we produced the first broadly inclusive phylogenetic tree of Primulaceae using high-throughput DNA sequence data, generated with the Angiosperms353 universal probes (Baker et al., 2022Johnson et al., 2019), to investigate the biogeographic history of the clade and its phylogenetic position in the broader Ericales. Beyond presenting an updated phylogenetic hypothesis for the position of Primulaceae within Ericales and the relationships among nearly all genera of Primulaceae, we sought to answer the following questions: 1) What is the biogeographic history of Ardisia and other morphologically similar woody genera of Myrsinoideae?, 2) Which lineages gave rise to various island endemic taxa, including Badula and Oncostemum (Indian Ocean Islands); Heberdenia Banks ex A.DC. and Pleiomeris A.DC. (Canary Islands); and Discocalyx, Elingamita G.T.S.Baylis, Loheria Merr., and Tapeinosperma (Indo-Malaya, Australasia, and Pacific Islands)?, and 3) Where have transitions between an herbaceous versus woody habit occurred within the phylogeny of Primulaceae? We also discuss the implications of our results for the systematics of Primulaceae, particularly highlighting the need for a major generic revision of subfamily Myrsinoideae.

Sequencing, assembly, and sequence curation
DNA was extracted from tissue with various modified CTAB protocols as in Baker et al. (2022). Illumina sequencing libraries enriched for target gene regions were generated with the Angiosperms353 kit (Johnson et al., 2019). Enriched libraries were sequenced with various Illumina machines to generate either 150-base pair (bp) or 250-bp paired-end reads. Some samples were sequenced in multiple sequencing runs and all reads for these samples were concatenated into a single pair of read files per sample prior to analysis.
We employed various approaches to remove exon sequences that were short, paralogs, or suspected to have assembly errors. We removed all sequences shorter than 25% of the average length of sequences for that gene, those flagged as potential paralogs by HybPiper, and those from poor quality samples (Methods S1). We aligned sequences with the E-INS-i algorithm in MAFFT v7.310 (Katoh and Standley, 2013) and inferred gene trees with IQ-TREE v1.6.12 (Nguyen et al., 2015). We trimmed long internal and terminal branches from gene trees (Yang and Smith, 2014) and realigned the corresponding sequences (Methods S1; Tables S4-S5). This resulted in curated exon data for 353 genes for species from across Ericales, which we herein refer to as the Ericales1exon dataset. A supermatrix was constructed for this dataset with the pxcat command in phyx v1.2 (Brown et al., 2017).

Ericales-wide phylogenetic analyses
The Ericales1-exon supermatrix was used for maximum likelihood (ML) phylogenetic analyses as implemented in IQ-TREE. Independent tree searches and ultrafast bootstrapping (Hoang et al., 2018) with 1,000 replicates were conducted with Edge-linked-equal, Edge-linked-proportional, and Edge-unlinked partition models specified with the "-q," "-spp," and "-sp" flags, respectively (Chernomor et al., 2016). A separate model partition for each of the 353 genes was applied in each of these analyses. We considered 95% as the threshold for strong ultrafast bootstrap support (Minh et al., 2013). Gene trees for each exon alignment were estimated separately with IQ-TREE and used to generate a species tree with ASTRAL v5.6.2 (Zhang et al., 2018). We considered 0.95 and 0.70 as the thresholds for strong and moderate local posterior probability (LPP), respectively, for nodes in species trees estimated with ASTRAL (Sayyari and Mirarab, 2016).

Primulaceae-specific dataset construction and phylogenetic analyses
To further investigate phylogenetic relationships within Primulaceae and test the effect of including data from introns, we generated two datasets using the 150 Primulaceae samples that met the threshold for gene recovery (Methods S1), with two samples of Ebenaceae and two of Sapotaceae as outgroup taxa. We retrieved intron data for all samples using the intronerate.py script included with HybPiper and used a modified script from Larson et al. (2021) to create a set of intron sequences with corresponding exon sequence in the Ericales1-exon set. We then created a subset of both the exon and intron sequences that included only the Primulaceae and outgroup samples with the pxrms command in phyx. Intron and exon sequences for each gene were aligned separately with the E-INS-i algorithm in MAFFT. Then, columns with > 50% missing data were removed from all intron alignments with the pxclsq command in phyx because introns had a high proportion of missing data, expected because of the nature of target capture sequencing. The resulting aligned exon data, which we refer to as the Prim2-exon dataset, were an exact subset of the Ericales1-exon dataset but with sequences realigned with MAFFT. The Prim3-intron dataset included the same aligned data from exons, as well as data from corresponding introns.
Gene trees for the Prim2-exon set were generated with IQ-TREE and the GTR + Γ model. Gene trees for the Prim3-intron set were generated by concatenating the intron and exon sequences for each gene and estimating a tree in IQ-TREE with separate GTR + Γ model partitions for the intron and exon data (i.e., two partitions per gene). A species tree was estimated for both sets of gene trees with ASTRAL. A supermatrix for each dataset was constructed by concatenating alignments for each gene. For each supermatrix, three tree searches and ultrafast bootstrapping were conducted with IQ-TREE, each with a separate model partition for each of the 353 exon and intron sequences (where applicable) as above.

Selection of preferred phylogenetic model
In total, we generated four phylogenetic trees for each of our three datasets, one with ASTRAL, and three with ML and different models implemented in IQ-TREE. We chose the Edge-linked-proportional model as our preferred model for ML based on the Bayesian Information Criterion and Robinson-Foulds distances (Robinson and Foulds, 1981) and herein limit the discussion of supermatrix analyses to results that employed this model (see Methods S2; Tables S6-S8, referring to these phylogenetic trees as the ML tree for each dataset (i.e., the Ericales1exon-ML, Prim2-exon-ML, and Prim3-intron-ML trees). For visualization, a phylogenetic tree of Ericales with one sample per family was produced with the Ericales1-exon-ML tree and the pxrmt command in phyx.

Time-calibrated phylogenetic trees
We generated ultrametric trees with branch lengths corresponding to time using the penalized likelihood method as implemented in the program treePL (Sanderson, 2002;Smith and O'Meara, 2012). We used this program because it is computationally efficient for large datasets such as ours. We time-calibrated three trees to account for topological uncertainty in our downstream analyses: the Ericales1-exon-ML, the Ericales1-exon-ASTRAL, and the Prim2-exon-ASTRAL trees. For ASTRAL trees, prior to dating, we re-estimated branch lengths (in units of substitutions per base pair) with the corresponding supermatrix for that dataset and IQ-TREE with the same settings as for ML trees. We consider the ML tree from the Ericales1-exon dataset (i.e., the Ericales1exon-ML tree) our focal tree because its branch lengths were coestimated with the tree topology. Prior to running treePL, we trimmed trees to include only samples from Primulaceae and the four selected outgroup taxa and trimmed duplicate samples representing the same species. Although the fossil record for Primulaceae is sparse, we used three fossil calibrations within Primulaceae (Boucher et al., 2016;Friis et al., 2010Friis et al., , 2021 as well as one secondary calibration for the stem age of the family (Magallón et al., 2015) to estimate divergence times for the three trees. We estimated uncertainty for the dated Ericales1-exon-ML tree by estimating divergence times on two sets of 100 bootstrapped trees (Table 1; Methods S3). All treePL analyses were conducted with a smoothing parameter of 100 and optimization of other parameters with the prime option.

Biogeographic dataset construction and analysis
We conducted biogeographic analyses using the DEC* model (Massana et al., 2015;Ree and Smith, 2008) as implemented in BioGeoBEARS (Matzke, 2013) and based on the time-calibrated phylogenetic trees described above. We used the biogeographic realms and boundaries of Olson et al. (2001), i.e., 1) Nearctic, 2) Palearctic, 3) Neotropics, 4) Afrotropics, 5) Indo-Malaya, 6) Oceania, and 7) Australasia. We primarily based the scoring on the native range of each species as described in the Plants of the World Online database (POWO, 2022), but also considered additional information such as regional online Floras and the Table 1 The fossils and secondary calibration used to calibrate divergence times in phylogenetic trees. Global Biodiversity Information Facility (https://www.gbif.org), especially for species that occur near transition zones between realms. We generally excluded a species from a realm when only a small part of its range crossed a biogeographic boundary (Table S1). This situation occurred near the Nearctic/Neotropics boundary in North America and the Palearctic/Indo-Malaya boundary in east Asia. As an example, a species with a Caribbean distribution that also extended into northern Florida, USA, was considered only Neotropical rather than Nearctic as well.

Ancestral state reconstruction of growth habit
We conducted stochastic character mapping of growth habit with the make.simmap function in phytools v1.2-0 (Revell, 2012) and custom R scripts. We scored each species as either woody or herbaceous, with species exhibiting slight woodiness (e.g., herbaceous perennials with a woody base) scored as herbaceous. For each dated tree, we conducted stochastic character mapping with three models in a Bayesian framework, each with 100 replicates and the prior for the ancestral state of Sapotaceae + Ebenaceae + Primulaceae fixed to woody. The first was an equal rates (ER) model with equal transition probabilities between woody and herbaceous states, with the Q="mcmc" option such that the transition matrix was sampled from its posterior distribution with Markov-chain Monte Carlo. The second was an all rates different (ARD) model with the same settings, except that the probabilities for the two possible transitions were allowed to differ. The third was a "no reversal" (NR) model where the ancestral state of Sapotaceae + Ebenaceae + Primulaceae was fixed to woody and herbaceous to woody transitions were not possible. The NR model therefore requires that all ancestors of the woody members of Myrsinoideae are inferred to be woody and was included in order to illustrate the number of transitions to herbaceousness implied by this specific and hypothetical evolutionary history. To compare models, we generated maximum likelihood estimates for each model and compared Akaike information criterion (AIC) and corrected Akaike information criterion (AICc) scores.

Gene recovery
We recovered 105,115 assembled exon sequences using HybPiper, for an average of 298 sequences for each of the 353 target genes of the Angiosperms353 kit (including short sequences and those flagged as paralogs). After removing paralogs, short sequences, and low-quality samples, we retained 95,646 exon sequences and after phylogenybased trimming, we retained 93,326 (264 per gene on average; Table S4-S5). Full occupancy for exons would be 324 samples each with 353 sequences or 114,372 total sequences; therefore, our Ericales1-exon dataset had an average gene occupancy of 81.6%. For the 150 Primulaceae and four outgroup taxa in the Prim3-intron dataset, we recovered 17,577 intron sequences (an average of 50 sequences per gene) of the 54,362 possible (32.3% occupancy). After trimming, the average length of exon and intron alignments were 1,144 bp and 627 bp, respectively (Table S4).

Relationships among the major clades of Ericales
When not otherwise noted, phylogenetic relationships discussed refer to those recovered in the Ericales1-exon-ML tree (i.e., our focal tree). Primulaceae was sister to Ebenaceae in both the Ericales1-exon-ML (100% ultrafast bootstrap support) and Ericales1-exon-ASTRAL (0.90 LPP) trees. Not including the balsaminoid clade (Marcgraviaceae, Tetrameristaceae, and Balsaminaceae), which we used to root our Ericales-wide phylogenetic trees, Sapotaceae, Polemoniaceae + Fouquieriaceae + Primulaceae + Ebenaceae, and Lecythidaceae were successively sister to the clade formed by the other families of Ericales (Fig. 2). Relationships among all families of Ericales were supported by 100% ultrafast bootstrap support (Fig. 2), except for a sister relationship between Primulaceae + Ebenaceae and Polemoniaceae + Fouquieriaceae, which received 84% ultrafast bootstrap support in our focal tree and was not recovered in the Ericales1-exon-ASTRAL tree ( Figure S3).

Fig. 2.
Phylogenetic relationships among families of Ericales recovered in a maximum likelihood analysis with the Ericales1-exon dataset. All relationships had 100% ultrafast bootstrap support except the sister relationship between Primulaceae + Ebenaceae and Polemoniaceae + Fouquieriaceae, which received 84% support. The tree was rooted with Marcgraviaceae + Tetrameristaceae, which with Balsaminaceae (not sampled) forms a clade sister to the rest of Ericales. By rooting the tree this way, the length of the branch separating this clade from the rest of the tree is not informative. Other branch lengths are in units of estimated substitutions per site. Within Primulaceae, Theophrastoideae were sister to Myrsinoideae + Primuloideae, and Maesoideae were sister to the rest of the family (Fig. 3).

Relationships within Myrsinoideae
The pantropical genus Ardisia was non-monophyletic in our results, with 19 other accepted genera interdigitated among its members (Figs. 3 and 4; Figures S1-S8). Because of this and other inferred relationships, it  is difficult to apply existing taxonomic names above the genus level. For example, the tribe Ardiseae as defined by Mez (1902), whose work still represents the most recent species-level systematic treatment of Myrsinoideae as a whole, includes Aegiceras, Ardisia, Conandrium Mez, Heberdenia, and Hymenandra. Mez's tribe Myrsineae contained most other woody Myrsinoideae, including Badula, Loheria, Myrsine, and Tapeinosperma. Both tribes are non-monophyletic in our results. Therefore, for clarity, we assign informal names to certain strongly supported clades of Myrsinoideae. We use the term "Ardisioids" to refer to the clade that includes Ardisia and Tapeinosperma. Within this clade, we refer to the "New World Ardisioids" as the clade that includes Ardisia tinifolia Sw. and Stylogyne A.DC. We refer to the clade that includes Discocalyx and Badula as the "Old World Ardisioids" (Fig. 3). The Old World Ardisioids does not include Elingamita and Tapeinosperma because in our results they appear to be more closely related to the New World Ardisioids. We use "Myrsinoids" to refer to the clade that includes Myrsine, Cybianthus Mart., and Embelia. This use of "Myrsinoids" is not equivalent to subfamily Myrsinoideae but is instead more restrictive, nor is it synonymous with tribe Myrsineae of Mez (Fig. 3). For the more inclusive clade that includes Aegiceras, Monoporus A.DC., and all Ardisia, we use the term "Woody Myrsinoideae" (Fig. 3).
The phylogenetic tree of Ardisioids displayed a clear geographic signal, and the sampled members of Ardisia were clearly differentiated as members of either the New World Ardisioids or Old World Ardisioids (Figs. 3 and 4). In all ML and ASTRAL trees, Elingamita and Tapeinosperma were sister to one another and formed a clade with the New World Ardisioids. All sampled members of Ardisia subgenus Tinopsis (i. e., Ardisia sumatrana Miq., A. purpurea Reinw. ex Blume, and A. celebica Scheff.), widespread across southeast Asia, formed a clade that was sister to the New Guinea endemic Conandrium in all analyses. All samples of Loheria and Discocalyx formed a clade that was sister to the other Old World Ardisioids in five of the six trees, but not in the Ericales1-exon-ASTRAL tree where Loheria was instead sister to all other Ardisioids. Aegiceras and the Madagascar endemic Monoporus formed a clade that was sister to all other members of the Woody Myrsinoideae in five of six trees. The exception was in the Prim2-exon-ASTRAL tree, where Monoporus was instead sister to the New World Ardisioids + Elingamita + Tapeinosperma. However, the branch corresponding to that sister relationship was near-zero in length and received only 0.42 LPP. Geissanthus Hook.f., Parathesis Hook.f., Stylogyne, and Wallenia were placed within the New World Ardisioids. Parathesis was not monophyletic, as one sample (Parathesis sp.) was nested within a small clade of Ardisia. We sampled one species of Hymenandra, the Neotropical Hymenandra pittieri (Mez) Pipoly & Ricketson, which was also nested within the New World Ardisioids. Amblyanthopsis, Antistrophe A.DC., Badula, Conandrium, Discocalyx, Emblemantha B.C.Stone, Fittingia, Labisia Lindl., Loheria (except in the Ericales1-exon-ASTRAL tree), Oncostemum, Sadiria Mez, and Systellantha B.C.Stone were placed within the Old World Ardisioids. Our two samples of Antistrophe were not sister in any tree. Pleiomeris and Heberdenia were nested within Myrsine except in the Ericales1-exon-ASTRAL tree (where Pleiomeris + Heberdenia was sister to all sampled Myrsine) and in the Prim2-exon-ASTRAL tree (where Heberdenia was sister to Myrsine + Pleomeris). All ML trees recovered Pleiomeris and Heberdenia as nested within Myrsine with 100% ultrafast bootstrap support.
Lysimachia and Cyclamen were sister to each other in all six trees ( Figures S1-S8). Ardisiandra was sister to Lysimachia + Cyclamen in four trees (87% ultrafast bootstrap support in our focal tree), but not in the Prim2-exon-ML or Prim3-intron-ASTRAL trees, in both of which Ardisiandra, Coris, and Stimpsonia were successively sister to all other Myrsinoideae. Stimpsonia was sister to all other Myrsinoideae in all trees, except in the Prim2-exon-ASTRAL tree, where it was sister to Primuloideae, but with only 0.67 LPP.
Within Primuloideae, Dionysia Fenzl and Kaufmannia Regel were nested within Primula in all six trees. The topology within the subfamily was identical among all trees, except for one node within the clade formed by Primula, Kaufmannia, and Dionysia (Fig. 3). The monotypic Pomatosace Maxim. was nested within Androsace in all trees, and Androsace + Pomatosace was sister to the rest of Primuloideae. Omphalogramma Franch. and Bryocarpum Hook.f. & Thomson were sister to each other, with Soldanella L. and Hottonia successively sister to those (Notes S3).

Diversification times and biogeographic analyses
Biogeographic analyses with each of the three dated trees were generally similar (Fig. 4, Figures S14-S15); therefore, we highlight results of the analysis with our focal tree. Primulaceae were inferred to have begun diversifying 77.1 (76.3-77.7) Ma (Fig. 4) and to have had an ancestral area comprising both the Palearctic and Neotropics; however, there were several areas with similar marginal probabilities (Notes S4; Table S9). The crown ages of Maesoideae, Theophrastoideae, Primuloideae, and Myrsinoideae were estimated to be 10.2 Ma, 65.6 Ma, 55.0 Ma, and 56.6 Ma, respectively, and the associated uncertainty intervals were generally narrow (Figures S9-S13; Notes S5). The ancestral area of Maesa Forssk. was inferred to be the Palearctic and Indo-Malaya, with subsequent expansion into the Afrotropics. The ancestral areas of Myrsinoideae + Primuloideae and of Myrsinoideae + Primuloideae + Theophrastoideae were inferred to be the Palearctic. Within Primuloideae, Primula and Hottonia were inferred to have dispersed to the Nearctic by 13.7 (12.3-15.2) Ma and 14.9 (12.9-16.6) Ma, respectively. The most recent common ancestor (MCRA) of Samolus and tribe Theophrasteae was inferred to have been Neotropical, as was that of Samolus, with the genus later diversifying and expanding to its current, near-cosmopolitan distribution.
The ancestral area of Myrsinoideae was inferred to be the Palearctic, with Lysimachia colonizing the Nearctic by 41.3 (39.8-42.4) Ma. The ancestral area of the Woody Myrsinoideae was inferred to be Indo-Malaya, whereas the MRCA of the Ardisioids and that of the Myrsinoids were both inferred to have occupied the Indo-Malayan and Neotropical realms, implying two separate dispersal events to the Neotropics after these two lineages diverged. The crown age of the Ardisioids was inferred to be 24.7 (23.2-26.1) Ma, with the New World Ardisioids and Old World Ardisioids dating to 20.0 (18.6-21.0) Ma and 23.5 (21.5-24.8) Ma, respectively (Fig. 4). The Old World Ardisioids were inferred to have begun diversifying in the Indo-Malayan realm beginning 23.5 (21.5 -24.8) Ma, with various lineages later dispersing to the Oceanian, Australasian, Afrotropic, and Palearctic realms. The ancestral area of the MRCA of Elingamita, Tapeinosperma, and the New World Ardisioids was inferred to be the Neotropical and Australasian realms (Fig. 3).

Ancestral state reconstruction of growth habit
Stochastic character mapping of ancestral growth habit suggested an herbaceous ancestor in the history of Myrsinoideae for all dated trees when either of the two models that allowed herbaceous to woody transitions (i.e., the ER and ARD models) were employed (Fig. 5,  Figures S16-S24). For each tree, the ER model received the best AIC and AICc scores (Table S10) and we therefore highlight stochastic mapping results generated with this model. We report the percentage of stochastic mapping replicates with each state as the posterior probability of each state for a given node.
In the analysis with the Ericales1-exon-ML tree and the ER model (Fig. 5, left), the posterior probability for a woody MRCA of Primulaceae was 81%, whereas the MRCA of Primuloideae + Myrsinoideae (100%) and that of Theophrastoideae (63%) was inferred to be herbaceous, as was the MRCA of Theophrastoideae + Primuloideae + Myrsinoideae (64%). Samolus was inferred as sister to Primuloideae + Myrsinoideae in the Ericales1-exon-ASTRAL and Prim2-exon-ASTRAL trees: stochastic character mapping with either of these trees and the ER model recovered the MRCA of Samolus + Primuloideae + Myrsinoideae as herbaceous with posterior probabilities > 90% (Figures S17-S18). Stochastic character mapping with the NR model recovered transitions from woodiness to herbaceousness at least four times when considering the Prim2-exon-ASTRAL tree and at least five times for both the Ericales1-exon-ML and Ericales1-exon-ASTRAL trees (Fig. 5, right; Figures S22-S24).

Phylogeny of Primulaceae
We performed the first phylogenetic analysis of Primulaceae based on high-throughput DNA sequence data and broad taxonomic sampling to address systematic and biogeographic questions across the family. We found a different topology for relationships among Primulaceae, Ebenaceae, Sapotaceae, Polemoniaceae + Fouquieriaceae, and Lecythidaceae as compared to those from several recent studies (Larson et al., 2020;Leebens-Mack et al., 2019;Stull et al., 2020;Zhang et al., 2020). Whereas our results offer a new hypothesis for the relationship of Primulaceae relative to other Ericales, it remains one hypothesis among many, reflecting the extensive phylogenomic conflict that has characterized various family-level divergences in the order (e.g., Larson et al., 2020).
Our results agree with those of previous studies regarding the relationships among the subfamilies of Primulaceae (e.g., Källersjö et al., 2000;Larson et al., 2020;Leebens-Mack et al., 2019;Rose et al., 2018), although support for a sister relationship between Samolus and tribe Theophrasteae was not definitive in our results. Samolus and species sampled from tribe Theophrasteae were sister in our focal tree (95% ultrafast bootstrap support), but two of our six trees (Ericales1-exon-ASTRAL and Prim2-exon-ASTRAL) placed Samolus sister to Primuloideae + Myrsinoideae rather than sister to tribe Theophrasteae. These results suggest that there is phylogenetic discordance among gene trees in our datasets, with a non-trivial number of genes supporting Samolus as Fig. 5. Stochastic character mapping of habit with the focal tree under two different models of trait evolution, either with equal non-zero transition probabilities under an "equal rates" model (left) or only allowing transitions from woody to herbaceous habit under a "no reversal" model (right). Pie charts at nodes indicate the proportion of replicates in which woody (brown) and herbaceous (green) states were mapped at that node (i.e., the posterior probability of each state). Colored stars indicate branches along which the ancestral state with the highest posterior probability changed. sister to Primuloideae + Myrsinoideae. These results could be due to biological causes such as high levels of incomplete lineage sorting or introgression, in addition to systematic error, including issues regarding accurately reconstructing nodes in individual gene trees corresponding to ancient divergences that occurred in rapid succession. The phylogenetic placement and taxonomic treatment of Samolus has historically been controversial. A morphological cladistic analysis by Anderberg and Ståhl (1995) placed Samolus as sister to most other Primulaceae sensu stricto (s.s.; now recognized as subfamily Primuloideae). Early molecular phylogenetic analyses suggested that Samolus did not form a clade with Primulaceae s.s. but was instead likely sister to tribe Theophrasteae . Caris and Smets (2004) conducted a detailed morphological and developmental analysis of Samolus and concluded that there are no "unambiguous morphological characters" that are synapomorphies for Samolus + tribe Theophrasteae. A possible synapomorphy is the presence of staminodes, which occur in both, but may develop through different ontogenetic pathways in the two groups (Caris and Smets, 2004). Furthermore, staminodes appear to be evolutionarily labile in Primulaceae because they are absent in some species of Samolus (Wanntorp et al., 2012) and are also present in Soldanella (Primuloideae) and sometimes in Maesa (Anderberg et al., 1998;Anderberg and Ståhl, 1995;Friis et al., 2021). Future studies should continue to investigate the strength of the evidence for a sister relationship between Theophrasteae and Samolus.
The phylogenetic placements of Stimpsonia, Ardisiandra, and Coris have historically been difficult to resolve based on morphology, although early molecular studies placed them within Myrsinoideae (formerly recognized as Myrsinaceae; Källersjö et al., 2000;Wanntorp et al., 2012). The relationship of Stimpsonia as sister to all other Myrsinoideae is strongly supported overall by our results, although in the Prim2-exon-ASTRAL tree, the genus was recovered as sister to Primuloideae with weak support. The placement of Ardisiandra was somewhat uncertain; it was weakly supported as sister to Cyclamen + Lysimachia in our focal tree (87% ultrafast bootstrap support), as well as in three of five other trees, but not in the Prim2-exon-ML or Prim3-intron-ASTRAL trees, where it was sister to all other Myrsinoideae except Coris and Stimpsonia.
Our results suggest that Pleiomeris and Heberdenia, two monotypic genera endemic to the Canary Islands, are nested within Myrsine. However, in the Ericales1-exon-ASTRAL tree Pleiomeris + Heberdenia was sister to Myrsine, and in the Prim2-exon-ASTRAL tree Heberdenia was sister to Myrsine + Pleiomeris. Yang and Hu (2022) found that Heberdenia was nested within Myrsine, whereas the placement of Pleiomeris was inconclusive based on nuclear internal transcribed spacer (ITS) sequences. Julius et al. (2021) found that Pleiomeris was nested within Myrsine using ITS as well, but did not sample Heberdenia. Appelhans et al. (2020) reported the relationship among these genera as a polytomy and found evidence of extensive introgression among some members of Myrsine. The differences among ML and ASTRAL trees in our study could be due to discordant histories among genes caused by incomplete lineage sorting and/or introgression. Future studies should investigate the monophyly of Myrsine by analyzing the influence of individual gene regions on likelihood calculations and testing whether patterns of phylogenetic discordance might be due to introgression.

Relationships within the Ardisioids and their systematic implications
A particular strength of our sampling was for Ardisia and its close relatives that together form the "Ardisioids." We sampled nearly all accepted genera of Myrsinoideae as well as the type species and several subgenera of Ardisia (POWO, 2022). Our results unequivocally demonstrate that Ardisia as currently circumscribed is not monophyletic, nor are the tribes Ardiseae and Myrsineae of Mez (1902). The nonmonophyly of Ardisia has been suggested from recent investigations based on ITS sequences and other gene regions (Julius et al., 2021;Yang and Hu, 2022), as well as an Ericales-wide biogeographic study (Rose et al., 2018). We found that 19 genera were interdigitated with Ardisia, which together comprise a strongly supported clade (Fig. 3). Several of the five genera of Myrsinoideae that we did not sample, including Amblyanthus, Ctenardisia, Mangenotiella, and Vegaea, are likely nested within the Ardisioids as well because they share several morphological similarities with this group (Schmid, 2011;Ståhl and Anderberg, 2004).
We found several strongly supported clades within the Ardisioids, some of which corresponded to previously described subgenera, a finding that parallels those of other recent studies. Julius et al. (2021) sampled 60 mostly Old World Ardisia taxa for their ITS phylogenetic analysis and found that, with some notable exceptions, many clades they recovered roughly correspond to the recognized subgenera of Ardisia. Yang and Hu (2022) conducted an even broader study of Old World Ardisia and their bacterial symbionts and found support for the monophyly of the Ardisia subgenera Crispardisia, Pimelandra, Stylardisia, and their newly circumscribed Bladhia s.s. and Odontophylla (resulting from the transfer of some members of subgenus Bladhia sensu lato to the new Odontophylla). Yang and Hu (2022) also found that the Old World members of Hymenandra, Badula, Oncostemum, and Sadiria were nested within Ardisia and referred to that clade as the "Ardisia generic complex." Our diverse sampling of Ardisioids shows that this complex contains more currently recognized genera than has ever been shown before, because it includes all Neotropical members of the Ardisioids as well.
In all ML and ASTRAL trees, Elingamita and Tapeinosperma were sister to one another and formed a clade with the New World Ardisioids. This conflicts with the results of Yan et al. (2019), who sequenced whole chloroplast genomes from 13 Myrsinoideae and found two Tapeinosperma species (neither of which were sampled in our study) to be more closely related to three Old World Ardisia species than to Elingamita and two species of Parathesis (the only New World Ardisioids they sampled). This suggests either that the chloroplast genomes of these species have a different phylogenetic history than most regions of their nuclear genomes, or that Tapeinosperma is not monophyletic, with some species being more closely related to Elingamita and the New World Ardisioids and others to the Old World Ardisioids.
Based on our results and those from other studies, an extensive taxonomic revision of generic limits within the Ardisioids appears warranted. To circumscribe monophyletic genera, Ardisia could be combined with most other genera within Myrsinoideae to form a single megadiverse, globally distributed genus; this genus would consequently contain well over one third of all Primulaceae species. However, as we note above, there are many well-supported clades within the Ardisioids, many of which correspond to subgenera of Ardisia, display clear morphological and anatomical synapomorphies, and are restricted to particular geographical regions (Yang and Hu, 2022). Alternatively, the Ardisioids could be divided into several genera, an option that is perhaps more consistent with recent trends in taxonomic work on the group (e.g., Drinkell and Utteridge, 2015;Dubéarnès et al., 2015;Julius et al., 2021;Julius and Utteridge, 2012;Yang and Hu, 2022). In either case, such a revision would be an immense task, because Ardisia alone currently has over 700 accepted species (POWO, 2022) and additional phylogenetic studies with extensive sampling are required before any revision can be confidently undertaken. The Neotropical Jamaican endemic A. tinifolia is the type species of the genus; thus, dividing the Ardisioids into multiple genera would almost certainly result in a newly circumscribed Ardisia being restricted to the Neotropics. Many of the anatomical and structural characters historically used to describe genera within the Ardisioids appear to be homoplasious in the light of molecular phylogenetic studies (Yang and Hu, 2022). Thus, detailed morphological analysis will likely be necessary in tandem with broadly inclusive molecular studies of the group to identify which characters are synapomorphies for distinct clades and which represent evolutionary convergence. Combining phylogenomic datasets with existing data from ITS and chloroplast genes will likely be important for resolving the relationships among the nearly 1,000 recognized species of Ardisioids and testing the monophyly of additional genera (Notes S6).

Diversification times and biogeographical insights
Our estimates for divergence times generally agree with those reported in other recent studies (reviewed by Stevens, 2001 onward) and had narrow uncertainty intervals, probably resulting from the large size of the Ericales1-exon supermatrix (Notes S5). We found that Primulaceae began diversifying ca. 77 Ma, during the Campanian (Late Cretaceous) and most likely occupied both the Palearctic and Neotropics (Notes S4). The fossil genus Miranthus E.M.Friis, P.R.Crane & K.R.Pedersen is known from the Palearctic (Friis et al., 2021), consistent with our inferred ancestral area of Primulaceae. Rose et al. (2018) conducted a biogeographic analysis that spanned Ericales and represents the most recent biogeographic hypothesis of Primulaceae based on phylogenetically broad sampling. They inferred the ancestral area of Primuloideae + Theophrastoideae + Myrsinoideae to be the Palearctic and Neotropics as well, whereas in our results the most likely area for that node was Palearctic only. Our results agree with those of Rose et al. (2018) that the MCRA of crown Lysimachia and other early ancestors of Myrsinoideae occupied the Palearctic, as did early Primuloideae before the latter began colonizing the Nearctic ca. 16 Ma. Boucher et al. (2016) investigated the diversification of three clades of alpine Primuloideae; we inferred older crown ages than Boucher et al. (2016) for Primula (ca. 46 vs 22 Ma), Androsace + Pomatosace (ca. 43 vs 26 Ma), and the clade that includes Primula and Androsace (ca. 55 vs 33 Ma), though for the latter two, our point estimate did fall within their estimated uncertainty interval (dates approximated from Figure 2 of that study). Differences in our estimates may be due to taxon sampling, differences in approach to time-calibration (penalized likelihood versus a Bayesian approach with a lognormal relaxed molecular clock), and sequence type (353 nuclear genes versus ITS and chloroplast genes). Our results agree with previous suggestions that Samolus originated in the Neotropics, as did Theophrastoideae (Rose et al., 2018;Stevens, 2001 onward). Compared to Rose et al. (2018), we inferred a relatively young crown age (ca. 9 -11 Ma) for Maesa (Table S11), but this is likely an artifact of including only three samples from members of this genus in our study, resulting in the absence of species that would otherwise capture the oldest divergences in the clade.

Biogeographic history of the Ardisioids
The inferred ancestral area of the Old World Ardisioids was Indo-Malaya, from which various lineages later dispersed to Australasia, Oceania, and the Afrotropics. Rose et al. (2018) were not able to clearly resolve the biogeographic history of the Woody Myrsinoideae but suggested that some Neotropical Ardisia might form a clade, within which some Old World Ardisia (including A. glauca Mez and A. speciosa Blume) as well as Elingamita, Parathesis, Stylogyne, and some Hymenandra might be nested. We found that no Old World Ardisia were nested within the New World Ardisioids, which could be due to better phylogenetic resolution in our study, conflicting histories among nuclear and organellar genomes, or the fact that we did not sample those particular species. Our results agree with Rose et al. (2018) that the lineage subtending Badula and Oncostemum likely dispersed from Indo-Malaya to the Afrotropics ca. 10 Ma, although we inferred this date to be slightly older at ca. 12 Ma.
Based on the DEC* model, the ancestral area of the MRCA of the Ardisioids and Myrsinoids was Indo-Malaya. Our results suggest that after divergence from the Myrsinoids ca. 29 Ma, the Ardisioids dispersed to the Neotropics, such that the ancestor of all extant Ardisioids occupied both the Indo-Malayan and Neotropical realms ca. 25 Ma. The Indo-Malayan population then began diversifying into the Old World Ardisioids, whereas the Neotropical population dispersed to Australasia, such that the MRCA of the New World Ardisioids and Elingamita + Tapeinosperma occurred in both the Neotropics and Australasia ca. 23 Ma. The Neotropical lineage then diversified into the New World Ardisioids, and the Australasian population diversified into Tapeinosperma, the center of diversity of which is now in New Caledonia and nearby Pacific Islands, and Elingamita, which is monotypic and endemic to New Zealand (Notes S6). We suspect that this model-based inference (i.e., that an Ardisioid lineage dispersed from Indo-Malaya to the Neotropics and then back to Australasia) is an artifact of not having sampled any extant Indo-Malayan Tapeinosperma, of which there are at least two species in Borneo (POWO, 2022). Not sampling additional Australasian Tapeinosperma could also have influenced the inferred ancestral range of the Ardisioids because we sampled only one of about 79 accepted species of that genus (POWO, 2022). Additionally, our biogeographic analysis did not include distance-based model components, which might better account for the fact that Australasia is directly adjacent to the Indo-Malayan realm whereas the Neotropics were at the time, and still are, several thousand miles away. A more parsimonious biogeographic hypothesis is that after the divergence of the lineage that became the Old World Ardisioids, an ancestor of the New World Ardisioids and Elingamita + Tapeinosperma occupied Indo-Malaya, from which members of this clade dispersed to Australasia and/or islands in the Pacific and from there dispersed to the Neotropics. Alternatively, dispersal may have occurred directly from Indo-Malaya trans-oceanically to the Neotropics, and separately to Australasia, after divergence from the Old World Ardisioids. Better sampling within Tapeinosperma and the Old World Ardisioids is needed to investigate which, if any, scenario is better supported by phylogenetic evidence and the distributions of extant species. Pipoly and Ricketson (1999)  Although we did not sample any Old World members of Hymenandra, when viewed in the context of other recent studies, our results also suggest that the genus is not monophyletic. We found that the Central American species H. pittieri was nested within the New World Ardisioids. Julius et al. (2021) found that Bornean species H. rosea B.C.Stone and H. beamanii B.C.Stone were nested within a large clade consisting of Ardisia subgenera Crispardisia, Pyrgus, Tinus, and Bladhia as well as the genus Sadiria; we found strong support that Sadiria and a member of Ardisia subgenus Crispardisia (i.e., Ardisia crenata) are members of the Old World Ardisioids and not closely related to H. pittieri. Yang and Hu (2022) also found an Old World species of Hymenandra to be nested within Old World clades of Ardisia and noted that some morphological characters used to distinguish the two genera, including the degree to which the anther filaments are fused to one another and the corolla, have evolved multiple times in Myrsinoideae. The degree to which stamens appear as if they are fused in various members of the Ardisioids may also be influenced by whether the floral material is observed dried, preserved in alcohol, or fresh (T.M.A. Utteridge, personal observation). Julius et al. (2021) found that the Old World Hymenandra may themselves not be monophyletic. In light of this evidence, our results suggest that Neotropical and Paleotropical Hymenandra are not closely related and that their "disjunct distribution" is an artifact of the non-monophyly of the genus. D.A. Larson et al.

The evolution of habit in Primulaceae
Early circumscriptions of family limits within the primuloid clade used habit to separate the woody Myrsinaceae from herbaceous Primulaceae s.s., but this trait now appears to be homoplasious ( Fig. 5; Anderberg and Ståhl, 1995;Källersjö et al., 2000). The MRCA of all Primulaceae was probably woody because the family is usually inferred to be sister to Ebenaceae (trees) or else placed along the backbone of Ericales, among which most families are woody (e.g., Rose et al., 2018;Larson et al., 2020). In addition, Maesa is strongly supported as sister to the rest of Primulaceae and are woody shrubs, trees, scrambling climbers, or lianas Sumanon et al., 2021;Utteridge, 2012). In our ancestral state reconstructions, we fixed the prior for the habit of the MRCA of Primulaceae + Ebenaceae + Sapotaceae as woody, rather than employing an uninformative or empirically estimated prior. This likely influenced the inferred state of the MRCA of Primulaceae, which we found to be woody in all analyses. Future ancestral state analyses of habit that include additional taxa from across Ericales may provide better evidence and a more complete characterization of the remaining uncertainty regarding the habit of the MRCA of all extant Primulaceae.
Stochastic character mapping showed that the number of parameters included in the model affected the inferred ancestral state of some nodes in some cases, as did differences among phylogenetic trees (Figures S16-S24). However, our findings were consistent in the sense that, whenever the model allowed for herbaceous to woody transitions (i.e., analyses with the either the ER or ARD model), the results suggested that an herbaceous ancestor gave rise to the Woody Myrsinoideae clade. Lens et al. (2005) conducted a morphological analysis of wood anatomy across Primulaceae and found no anatomical evidence in species of the Woody Myrsinoideae to suggest an herbaceous ancestor. Lens et al. (2005) did note, however, that the wood of Theophrastoideae (recognized as Theophrastaceae in that study) is characterized by very short vessel elements and fibers similar to those found in Coris and Lysimachia (both of which they thought evolved from herbaceous ancestors), as well as Aegiceras (mangroves).
We conducted stochastic character mapping with a "no reversal" model to demonstrate the hypothetical evolutionary history implied if it is assumed that the Woody Myrsinoideae have never had an herbaceous ancestor since the divergence of Primulaceae from other Ericales. For each of the three dated trees used in stochastic mapping, the no reversal scenario implies that there have been at least three transitions to an herbaceous habit within Myrsinoideae (separately along the branches leading to Stimpsonia, Coris, and Ardisiandra + Cyclamen + Lysimachia) and at least four or five transitions to herbaceousness across Primulaceae as a whole. In the Prim2-exon-ML and Prim3-intron-ASTRAL trees, which were not formally analyzed with ancestral state reconstructions, Ardisiandra was inferred to be sister to the Woody Myrsinoideae + Cyclamen + Lysimachia, a topology that implies four transitions in Myrsinoideae in the no reversal scenario: one in Stimpsonia, a second in Ardisiandra, a third in Coris, and a fourth in Cyclamen + Lysimachia. Habit is also tightly associated with fruit type across Myrsinoideae; although our results strongly suggest that they do not form a clade, all herbaceous Myrsinoideae share capsular fruits, whereas woody members of the subfamily instead have drupaceous ones. Whether the association of these traits might be due to life-history tradeoffs influenced by lifespan or other traits associated with habit may warrant future research.
Because there appear to have been a relatively few transitions between woody and herbaceous habits in Primulaceae (e.g., a minimum of only three transitions are needed to explain habit evolution for species in our focal tree; Fig. 5), it may be difficult to accurately estimate transition rates for this clade. Increasing taxon sampling to more accurately represent the proportion of woody versus herbaceous species and capture additional transitions, such as the secondary gain of woodiness in Hawaiian Lysimachia (Hao et al., 2004), could help in parameterizing models in future studies. A more nuanced categorization beyond the binary treatment of habit used here might also provide novel insights into the evolution of these traits. Future anatomical and/or gene expression studies may shed light on the mechanisms responsible for habit transitions in Myrsinoideae: such studies may also offer additional insights into the possibility that a common ancestor of some or all subfamilies of Primulaceae was herbaceous.

Conclusions
Our phylogenomic results were largely consistent with previous hypotheses of the evolutionary relationships among subfamilies and genera of Primulaceae but provide a considerable improvement in resolution and support. This refined phylogenetic hypothesis of Primulaceae highlights numerous taxonomic issues in the group and suggests several directions for future research into the evolution of morphology and mating systems in the clade. The methods used here, including the Angiosperms353 probes and tree-based alignment trimming, appear to be useful for resolving all but perhaps the most recent divergences in Primulaceae. Future studies with similar methods and even more extensive sampling would likely shed light on the monophyly of genera such as Labisia, Tapeinosperma, and Discocalyx, as well as clarify additional relationships within genera that have been shown to be nonmonophyletic, including Ardisia, Primula, and Androsace. Such studies could also further clarify the biogeography of specific clades by providing a more complete characterization of their extant diversity.
The outlook for Primulaceae systematics is promising and rife with opportunities for advances across disciplines. Given recent methodological advances and reductions in sequencing costs, we consider the technology to build a complete tree of known Primulaceae species already within reach. Among the remaining major challenges for such an effort include the difficulty in sampling the appropriate material from herbaria located across the globe. Many herbaria have few digitized records, which severely limits the ability of investigators to plan sampling and verify records remotely. Accurate identification of specimens requires detailed taxonomic expertise and presents a further obstacle to complete species sampling. Broad collaboration among research groups and collections-based institutions, as well as the efficiencies realized by equitable data sharing and overlapping target genes, will all benefit future efforts towards a more comprehensive understanding of the Primulaceae phylogeny.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The data, novel scripts and output files that support the findings of this study are openly available from Dryad and Zenodo at https://doi. org/10.5061/dryad.1zcrjdfvb. Raw sequence reads are available from NCBI: BioProject PRJNA839108, BioProject PRJNA847568, and umbrella project PRJEB35285.