Comprehensive phylogeny of Pieridae butterflies reveals strong correlation between diversification and temperature

Summary Temperature is thought to be a key factor influencing global species richness patterns. We investigate the link between temperature and diversification in the butterfly family Pieridae by combining next generation DNA sequences and published molecular data with fine-grained distribution data. We sampled nearly 600 pierid butterfly species to infer the most comprehensive molecular phylogeny of the family and curated a distribution dataset of more than 800,000 occurrences. We found strong evidence that species in environments with more stable daily temperatures or cooler maximum temperatures in the warm seasons have higher speciation rates. Furthermore, speciation and extinction rates decreased in tandem with global temperatures through geological time, resulting in a constant net diversification.

Divergence time estimates varied depending on the calibration scheme, with calibrations taken from Kawahara et al. 53 resulting in younger dates (Table S1).The crown age of Pieridae was between 69.6 and 64.5 Ma with a stem age between 83.5 and 76.1 Ma.Subfamily crown ages were estimated to be 35.0-32.7 Ma for Dismorphiinae; 49.8-44.2Ma for Coliadinae; and 49.9-48.9Ma for Pierinae (Table S1).The five Afrotropical Pseudopontiinae species are on a long branch with extant taxa diversifying only 5.1-3.8Ma (Table S1).

Diversification patterns of Pieridae
Pierid lineages have steadily increased since the family's origin around 70 Ma, according to the deterministic lineage through time plot (dLTT) (Figure S2).In contrast, fitted pulled speciation rates (PSR) gradually decreased from the stem lineage until 40 Ma, when rates started to increase, followed by two peaks between 20 Ma and the present (Figure S3).The BAMM (Bayesian Analysis of Macroevolutionary Mixtures) estimation of net diversification rates identified five clades with increased diversification rates: Colias; a subclade of Catasticta (+Archonias); Delias; and clades within Pieris and Mylothris (Figures 1 and S4).These patterns were consistent with our RevBayes analysis (Figure S5).The distinct shift configuration of BAMM suggests a high (>0.75)marginal probability of shifts for Colias, Pareronia, and a subclade of Catasticta (+Archonias) (Figure S4).

Pieridae diversity and temperature through time
An ancestral state reconstruction of annual mean temperature indicates that the ancestor of Pieridae was likely found in warm climates.Switches to colder, temperate climates occurred multiple times, such as in the genera Colias and Pieris (Figure S6), and the reconstruction of minimum temperature in the coldest month shows pronounced habitat shifts to colder temperatures in these genera (Figure S7).There was a clear expansion toward habitats with greater annual temperature ranges (seasonality) in Aporia, Colias, and Pieris (Figure S8).Reconstructions of mean diurnal range (Figure S9) and maximum temperature in the warmest month (Figure S10) did not show pronounced variation across the tree.
The best-fit model explaining Pieridae diversification dynamics was a positive exponential relationship for both speciation and extinction rates with (paleo)temperature (BEnv.VarDEnv.Var_EXPO) (Table S2).Global temperature generally decreased during the past 40 My during which most of the Pieridae crown group evolved.Thus, the exponential relationships of speciation and extinction rates with paleoclimate mean these rates also decreased (slowed) with time toward the present.The best model fit for the largest Pieridae subfamily, Pierinae, describes a pattern in which speciation rate was constant but extinction rate decreased with paleotemperature (BCSTDEnv.Var_EXPO), meaning that net diversification remained positive in the group.

Present day temperature and Pieridae diversity
Our phylogenetic generalized least squares (PGLS) analyses found significant relationships between current temperature and speciation/ extinction for two variables: (1) minimum temperature of the coldest month (positive for speciation, negative for extinction) and (2) annual temperature range (negative for speciation, positive for extinction).We also found a significant negative relationship between speciation and mean diurnal range (Figure S11).The annual mean temperature and maximum temperature of the warmest month was not significantly related to speciation or extinction.
When QuaSSE (Quantitative State Speciation and Extinction) was used to fit different likelihood functions of speciation on the tree for different temperature-related variables, the only significant (DAIC >2) variables were (1) mean diurnal range and (2) maximum temperature in the warmest month (Table S3).We found a negative sigmoidal relationship with drift between speciation and these two variables (https:// doi.org/10.5061/dryad.c59zw3rbh).This means that speciation rates are higher in habitats with low diurnal temperature ranges and with mild summer temperatures, but there is a steep decline in speciation rates when the diurnal range exceeds 11 C and, separately, when summer temperatures are higher than 31 C. PGLS and QuaSSE both indicated a negative relationship between speciation and mean diurnal temperature range (Table S3).The PGLS analysis of maximum temperature of the warmest month was not significant, but QuaSSE suggests a negative correlation as the best fit for this variable, indicating that extremely high temperatures negatively affect speciation rates.The direction of correlations between speciation and minimum temperature of the coldest month and temperature annual range differed between the two methods.However, PGLS p values were significant while the QuaSSE DAIC was less than 2.These results suggest that milder temperatures in the coldest month and less variable temperature throughout the year are correlated with higher speciation.
The discrepancies between PGLS and QuaSSE are likely an algorithmic artifact.The PGLS used tip rate estimations extracted from BAMM, while QuaSSE evaluated correlations using its own lambda (speciation) and mu (extinction) estimates.Additionally, QuaSSE fits the data to five different models while PGLS is limited to a linear regression, which may not be the best model for the data.

DISCUSSION
Temperature stability and warmer climates are associated with higher diversification over time Paleotemperatures partially explain current pierid diversification patterns.Throughout their 70 My evolutionary history, the diversification rate of Pieridae is positively associated with warmer temperatures.The MRCA of the family likely inhabited warm climates, and speciation and extinction rates remained positive but have slowed along with global cooling that began in the late Eocene.Thus, higher diversification rates are associated with warmer temperatures through time.Interestingly, our investigations of diversification dynamics in the most species-rich subfamily, Pierinae, are different from those of the family as a whole.Net diversification has increased since the Cenozoic as a result of decreasing extinction rates combined with constant speciation.Future research could further clarify these dynamics by examining smaller clades within Pieridae but with complete or nearly complete species-level sampling.

Temperate climates are associated with higher speciation across space
From their likely tropical origins, there have been numerous shifts from tropical to temperate areas with greater temperature variability throughout the year, and several of these shifts are associated with increased diversification, notably in the genera Colias and Pieris.Delias, the most species-rich genus of butterflies, is often associated with equatorial habitats, but many species are found in environments with cooler temperatures (Figure S6) including tropical and subtropical mountains.

High variability in daily temperatures is associated with decreased diversification
We employed several analyses to identify the specific climatic factors associated with diversification rate shifts.Different analyses gave conflicting results about which aspects of temperature play an outsized role in determining extant diversity.PGLS analyses concluded that (1) diversification increases in cold areas (increased speciation and decreased extinction associated with lowest temperature in the coldest month); (2) diversification declines in areas with large temperature fluctuations throughout the year (lower speciation and increased extinction associated with annual temperature range); and (3) speciation declines in areas with large temperature fluctuation throughout the 24 h day (mean diurnal range).Annual mean temperature and hottest temperature of the warmest month were not significant in these analyses.On the other hand, the QuaSSE analysis identified this latter variable as significant, along with mean diurnal range.The QuaSSE analysis indicated that speciation rates are higher in habitats with low daily temperature variation (<11 C) and with cool summer temperatures that rarely exceed 31 C. We therefore tentatively conclude that large daily fluctuations in temperature-a feature of temperate and montane habitats-are associated with increased speciation, a conclusion on which both analyses agree.
Seasonality influences species' distributions 54 and is thought to be correlated with butterfly species richness. 55In general, biodiversity is greater in stable, tropical environments, and these areas are often characterized by high temperatures that are generally more constant throughout the year. 56In warm seasons, sunlight and ambient temperature are the main factors influencing butterfly activity, and if conditions are not ideal, butterflies will shift the timing and length of their activity. 57The diversification of white and yellow butterflies is higher in these environments with stable circadian temperatures.The climatic stability hypothesis 13 predicts that the tropics are especially diverse in part because the climate does not fluctuate much throughout the day or year.We found no support for this hypothesis in Pieridae.However, one corollary of this idea was supported: large daily temperature fluctuations (lack of stability) are associated with decreased diversification (specifically, decreased speciation).
Although warmer environments are often linked to life history traits that accelerate speciation, [10][11][12] we found that speciation rates are elevated in temperate environments.Butterflies have an optimal internal temperature range for survival but can tolerate greater environmental ranges because they have some ability to control their thermal physiology. 31,42,58Nonetheless, anything below or above the tolerable range can be lethal. 31The optimal temperature range usually trends toward warmer temperatures, and most species are active during warm seasons.In some cases, the warmest month is the only time of the year when butterflies can reproduce.This is especially true for Arctic and alpine species.However, extremely warm temperatures can reduce activity in some species. 57Recognizing these trends is crucial in light of the current, unprecedentedly rapid increase in global temperatures during the Anthropocene, particularly our evidence that speciation rates decrease with increasing temperature.
Franze ´n et al. 57 found that the lifespan of the Palearctic papilionid Parnassius apollo is negatively affected by increased median temperatures and that the lycaenid Phengaris arion lives longest at an optimal temperature of ca.27 C, suggesting stabilizing selection on temperature.][61] Butterflies use different strategies and mechanisms to maintain body temperature, which can disrupt their regular behavior. 42They may, for example, find a shaded area to hide, 31 which then prevents them from feeding and finding mates.
There is a uniform distribution of taxa in warm regions with mild winters across the Pieridae phylogeny (Figures S6 and S7), which is unsurprising considering Pieridae are generally most diverse in warm climates (as in other butterfly families 62 ).However, there are multiple independent origins of cold-adapted species (e.g., Baltia spp.and Reliquia santamarta) including clades that are sister to taxa inhabiting warm or mild climates (such as the clade that includes Tatochila, Hyposchila, Theochila, Pierphulia, Phulia, and Infraphulia).Some of these shifts are associated with adaptations to habitats at higher elevations, as shown by the convergence of morphology among groups that inhabit similarly cold, high-elevation habitats around the world. 39,43Our results confirm the recent finding 48 that the monotypic genus Reliquia, endemic to a single mountain range in tropical Colombia is nested in the mostly Holarctic genus Pontia (including Baltia).Pontia inhabits colder environs at similarly high elevations in the northern hemisphere.This observation partially uncovers the intriguing interplay of Pieridae biogeography and adaptation to high elevations around the world.Furthermore, our BAMM analysis highlighted heightened speciation rates in some temperate pierid clades.

Phylogenetic relationships of Pieridae
Our study provides a robust phylogenetic framework for pierid butterflies and a foundation for testing many macroevolutionary questions.We show, with strong support (UFBoot and SH-aLRT = 100), that Dismorphiinae are sister to a clade containing Coliadinae, Pseudopontiinae, and Pierinae, in agreement with Wahlberg et al. 44 and Kawahara et al. 25 All tribal relationships of Pierinae were strongly supported and differ from previous studies. 25,44,50All of the Pieridae sequence data used by Kawahara et al. 25 are incorporated into our dataset, but the inferred trees differ in tribal topology (Figures 2 and S1).This is likely a result of our considerably increased taxon sampling.
We recovered the subfamily Pseudopontiinae on an extremely long branch (as shown previously in the study of Mitter et al. 63 ) of approximately 50 My in length (Table S1), which likely contributes to the difficulty of confidently inferring the phylogenetic placement of this subfamily.Topological uncertainty hinders inference of ancestral states of the group as well.The morphologically unique Afrotropical subfamily Pseudopontiinae was thought to only include a single species until 2011, 43,63 when molecular data demonstrated the existence of four additional cryptic species.Even though we incorporated additional molecular data for several species in the group, support for its position in relation to the other subfamilies is still low (as in Espeland et al. 49 and Chazot et al. 50).We also find that several genera, including Euchloe and Ganyra, are not monophyletic (Figure 1), and will require further taxonomic investigations.
The Pieridae crown age of around 70 Ma overlaps the K-Pg (Cretaceous-Paleogene) boundary, a major extinction event.The lineage giving rise to the family pre-dated this event, and pierids started diversifying during the Paleogene (Figure S2).

Limitations of the study
Divergence time estimation for butterflies has been historically difficult due to the notable underrepresentation of butterflies in the fossil record.We included an analysis with the most rigorously justified relevant butterfly fossils from recent works as well as a set of analyses using secondary calibrations but consider that the use of secondary calibrations themselves is a limitation on our downstream analyses.Future discovery of butterfly fossils will ideally be needed to help solidify our understanding of butterfly divergence times.Further, some of the analyses we conducted do not allow the fraction of missing species to be specified (the sampling fraction), and thus our 50% sampling of Pieridae could have biased some results.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: partitions into 69 partitions but did not further partition these by codon positions to avoid extremely high numbers of a priori partitions for such a large taxon set.Separately, we ran ModelFinder to identify the best models of nucleotide evolution for our partitions using all available models.With these 69 partitions and models, we then performed 100 independent IQ-TREE tree searches with 1000 Ultrafast bootstraps (UF-Boot) and 1000 Shimodaira-Hasegawa approximate likelihood ratio test (SH-aLRT) replicates to test for branch support.We implemented the '-bnni' command when running these analyses to alleviate concerns about model violation inherent in the Ultrafast bootstrap method. 89All phylogenetic analyses were run on the University of Florida HiPerGator2 Cluster (https://www.rc.ufl.edu/about/hipergator/).

Divergence time estimation
We used two different secondary calibration schemes obtained from the trees of Espeland et al. 49 and Kawahara et al. 53 to estimate pierid divergence times (Table S4).Ages, based on 95% confidence intervals in these studies, were treated as hard minimum/maximum ages and we employed the penalized likelihood tree dating with TreePL. 69TreePL requires an input tree, for which we used the most likely tree (highest likelihood of 100 runs) inferred with IQ-TREE.TreePL outputs divergence times as single dates per node.In order to provide a range of dates for each divergence event, we performed TreePL independently 100 times and summarized results, following the approach of St Laurent et al. 90 This method uses several custom Python scripts (available at https://github.com/sunray1/treepl)and creates 100 bootstrap replicates in RAxML 70 v.8.2.12 using the 425-locus data matrix, partitioned according to the same 69 partitions inferred with ModelFinder.These 100 bootstrap alignments were then used to infer branch lengths on the constrained input phylogeny.We generated 100 bootstrap trees, utilizing optimization parameters that were determined with the 'prime' command and random subsample and replicate cross-validation with the 'randomcv' command for identifying the best smoothing parameter.This optimization and smoothing step was performed three times on each tree to identify the best combination of parameters and assess consistency in the smoothing parameter.All analyses used the 'thorough' command in TreePL.These analyses resulted in 100 dated trees varying only in branch lengths, which were summarized as a maximum clade credibility (MCC) tree in TreeAnnotator v.1.10.4 in the BEAST package. 71The MCC tree was used in all downstream analyses that required a dated phylogeny.
In addition to making use of secondary calibrations in TreePL, we conducted an analysis using fossil calibrations as the minimum ages.To calibrate the root, a median age of Angiosperms (139.4Ma) from Magallo ´n et al. 91 was used done by Espeland et al. 49 ).In total, eight fossil calibrations were applied to the following places in the tree (fossil names and their age noted in parentheses): 1) the Parnassiinae stem age (Thaites ruminiana, 23 Ma); 2) the Hesperiidae stem (Protocoeliades kristenseni, 54 Ma); 3) an internal Hesperiidae crown age (Pamphilites abdita, 23 Ma); 4) Nymphidiini (Riodinidae) stem (Theope sp., 15 Ma); 5) the crown node of Satyrinae + Heliconiinae (Nymphalidae) (Vanessa amerindica, 33.7 Ma); 6) Aporia (Pieridae) crown age (Aporia cf.crataegi, 2.6 Ma); 7) Belenois (Pieridae) crown age (Belenois crawshayi, 0.02 Ma); 8) split between Coliadinae and (Pseudopontiinae + Pierinae) (Vanessa pluto, 16 Ma).Each of these fossils were selected because they can be confidently identified as belonging to a particular lineage, making them well-suited to calibrate dating analyses. 92TreePL was run on this dataset as noted above, with the exception that the root was a hard maximum age and all fossils were hard minimum ages.For our discussion, we focus on estimated dates obtained using calibrations from Kawahara et al., 53 because this is the most comprehensive, fossil-calibrated dated phylogeny of Lepidoptera to date.

Distribution and bioclimatic data
We used the R library occCite v.0.4.9 72 to assemble distribution data.It provides a platform to compile georeferenced data from global databases while also generating a list of primary data sources for each record from institutional collections.We extracted occurrence data using the command 'occQuery' and followed the taxonomy of Pieridae according to Lamas. 34This command checks species names against the Global Names Index (gni.globalnames.org),then uses rgbif 73 to send a query to GBIF, 93 requesting all records with geographic coordinates.We removed records that did not have decimal places to increase accuracy and removed records with identical coordinates to the first four decimal places because they were too precise to provide meaningful additional information.Original occurrence data sources with digital object identifiers (DOIs) were gathered using 'occCitation'.We also extracted additional occurrence data from the literature and directly from GBIF for 37 species that were not obtained using the method described above (see full reference list on Dryad at https://doi.org/10.5061/dryad.c59zw3rbh).
We mapped all records by species in an atlas using the package ggplot2 v.3.3.5, 74 with a custom R script (https://github.com/hannahlowens/PieridaeDiversity).The atlas was inspected for occurrence points that appeared as outliers or errors based on published distribution maps and opinions of Pieridae experts (authors and collaborators -see Acknowledgments) and we subsequently removed these putatively erroneous records.We also combined the records of the cryptic species Leptidea sinapis, L. juvenica and L. reali as the ''sinapis complex'' due to difficulty in delimiting distribution in these species. 94The final dataset had over 800,000 records for 541 pierid species, representing 91% of species in the phylogeny (https://doi.org/10.5061/dryad.c59zw3rbh).We wanted to have present day representation of distribution for a more informative dataset on abiotic requirements, therefore we did not remove records that could be associated with established introductions and migration.
We extracted bioclimatic data for the geographic locations of these records from the WorldClim dataset (2.5 arc-minute resolution 95 ) using the package raster v.3.4-10. 75Since climatic stability can refer to different timescales including paleoclimates, 96 we use the "temperature stability" to refer to annual variability (seasonality) and daily variability, depending on the context.Considering our hypotheses regarding the roles of extreme conditions and seasonality, we focused on the following variables: annual mean temperature (average of BIO1), mean diurnal temperature range (average of BIO2, which is the mean of (daily max temp -daily min temp) within a month), maximum

Figure 1 .
Figure 1.Evolutionary relationships and diversification patterns of Pieridae Time-calibrated tree of 593 species.Branches with significant net diversification rate shifts as estimated by BAMM are indicated with red and blue circles.Annual mean temperature (BIO1, WorldClim) for each species is indicated with colored bars in the outer ring.Butterfly images are not to scale.Bottom left inset shows a cladogram of the subfamily and tribal relationships in this figure.

Figure 2 .
Figure 2. Summary of the inferred relationships among higher-level Pieridae taxa in recent molecular phylogenetic studies.

TABLE
d RESOURCE AVAILABILITY B Lead contact B Materials availability B Data and code availability d METHOD DETAILS B Molecular data B Distribution and bioclimatic data B Diversification and correlation analyses