Sex determination, longevity, and the birth and death of reptilian species

Abstract Vertebrate sex‐determining mechanisms (SDMs) are triggered by the genotype (GSD), by temperature (TSD), or occasionally, by both. The causes and consequences of SDM diversity remain enigmatic. Theory predicts SDM effects on species diversification, and life‐span effects on SDM evolutionary turnover. Yet, evidence is conflicting in clades with labile SDMs, such as reptiles. Here, we investigate whether SDM is associated with diversification in turtles and lizards, and whether alterative factors, such as lifespan's effect on transition rates, could explain the relative prevalence of SDMs in turtles and lizards (including and excluding snakes). We assembled a comprehensive dataset of SDM states for squamates and turtles and leveraged large phylogenies for these two groups. We found no evidence that SDMs affect turtle, squamate, or lizard diversification. However, SDM transition rates differ between groups. In lizards TSD‐to‐GSD surpass GSD‐to‐TSD transitions, explaining the predominance of GSD lizards in nature. SDM transitions are fewer in turtles and the rates are similar to each other (TSD‐to‐GSD equals GSD‐to‐TSD), which, coupled with TSD ancestry, could explain TSD's predominance in turtles. These contrasting patterns can be explained by differences in life history. Namely, our data support the notion that in general, shorter lizard lifespan renders TSD detrimental favoring GSD evolution in squamates, whereas turtle longevity permits TSD retention. Thus, based on the macro‐evolutionary evidence we uncovered, we hypothesize that turtles and lizards followed different evolutionary trajectories with respect to SDM, likely mediated by differences in lifespan. Combined, our findings revealed a complex evolutionary interplay between SDMs and life histories that warrants further research that should make use of expanded datasets on unexamined taxa to enable more conclusive analyses.


1.
: Dataset used in this study [taxonomy follows (Uetz & Hosek, 2015)] Table S1b: Taxonomic coverage of turtle and squamate families used in this study 2. Figure S1 3. Results using alternative SDM assignment for species with mixed or equivocal SDM as listed in Table S1: • Table S2: MacroCAIC results using alternative SDM assignment • Table S3: BAMM estimation for the number of rate shifts in diversification.

The number of rates shifts with the highest probability in each group is marked in bold.
• Table S4: STRAPP results • Table S5: Summary of transition rate parameters estimates using the MK2 model with both Maximum Likelihood and Bayesian (MCMC) methodologies for the turtles, lizards, and squamate data sets using the alternative SDM assignment.
• Table S6: Log likelihood differences (∆LL) obtained between the single (BM1) and two rate (BM2) Brownian motion models of evolution, and between the single (OU1) and two (OU2) optimums, as estimated for lifepan in turtles, lizards and squamates. σ 2 GSD and σ 2 TSD, OptimumGSD, and OptimumTSD: estimated parameters for GSD and TSD lineages using the alternative SDM assignment. 3 3. Results using alternative SDM assignment for species with mixed or equivocal SDM as listed in Table S1   Significance of the MCMC analyses is estimated by calculating the proportion of MCMC steps (i.e., the posterior probability, PP) in which qTG was higher than qGT. PP value above 0.975 or below 0.025 indicates a significant difference between the two rates. Table S5: Log likelihood differences (∆LL) obtained between the single (BM1) and two rate (BM2) Brownian motion models of evolution, and between the single (OU1) and two (OU2) optimums, as estimated for lifepan in turtles, lizards and squamates. σ 2 GSD and σ 2 TSD, OptimumGSD, and OptimumTSD: estimated parameters for GSD and TSD lineages using the alternative SDM assignment.

SIMULATION ANALYSES TO TEST FOR BiSSE SENSITIVITY
Estimation of diversification rates could be influenced by additional factors other than the trait of interest (here SDM), that can alter the tree shape in a way that elevates the false-positive rate (FitzJohn, 2012, Rabosky & Goldberg, 2015. We thus compared the log-likelihood difference (∆LL) for the competing models as inferred using our empirical data against those obtained using data generated by simulating characters that do not influence diversification. Specifically, we used a parametric bootstrapping approach to obtain the null distribution of the competing BiSSE models (i.e., equal versus unequal speciation models). We simulated 100 random distributions of neutral characters (assuming no effect on diversification) on the same empiricallyderived phylogenies of turtles and lizards. To obtain the simulated parameter values, we first estimated the two transition rates (GSD to TSD and TSD to GSD) according to a BiSSE model with equal extinction and speciation (Mq) and the root state set to TSD (which was inferred as the root state, see Results). We then simulated a binary trait along the tree [using sim.character function within the package diversitree (FitzJohn, 2012)] with the transition rates estimated using the MK2 model (with the root state set to TSD). We then applied BiSSE to compare between a model of unequal speciation (denoted Msq for unequal speciation and transition rates) with a nested model that assumes equal speciation rates but unequal transition rates (denoted Mq for unequal transition rates). In both models, extinction is modeled as equal because extinction rates are difficult to estimate and are particularly sensitive to sampling biases (see Results). This procedure resulted in an expected distribution of ∆LL under the null model. Finally, the empirically-derived ∆LL between models Msq and Mq in the real data was compared to the corresponding simulated distributions to obtain a p value according to the proportion of simulated ∆LL values in the simulated data that were equal or greater than the observed value. We applied BiSSE twice, first with the complete trees, to make use of as much phylogenetic data as possible, and second, after the trees were pruned to include only taxa with known SDM, so that the results could be compared to the results of the rest of the analyses.

RESULTS
We first used the BiSSE approach (Maddison et al., 2007) to test whether SDM is associated with altered diversification rates in turtles and lizards. The Ms (unequal speciation) was identified as the best-fitted model for turtles (∆AIC = 0), suggesting that speciation rates were higher for TSD than for GSD lineages, whereas extinction rates estimates were near zero and indistinguishable between SDMs (Table S7). We note that extinction rates are difficult to estimate and are particularly sensitive to sampling biases [ (Rabosky, 2010); but see (Beaulieu & O'Meara, 2015)].
In lizards, Msq (unequal speciation and transition) was the best model-fitted, suggesting that speciation rates were higher for GSD than for TSD lineages but that transition rate from TSD to GSD was significantly higher than the transition rate from GSD to TSD.
In squamates, Mseq (unequal speciation, extinction, and transition) was the best model-fitted, suggesting that speciation and extinction rates were higher for GSD than for TSD lineages but that transition rate from TSD to GSD was significantly higher than the transition rate from GSD to TSD. Most importantly however, results from our parametric bootstrapping procedure using neutral binary traits, showed that in all groups, at least 65% of the simulations resulted in ∆LL values that are equal or greater than the observed value. Thus, the observed differences in rates inferred using BiSSE are not significantly different than what can be expected by chance. When pruned trees were used, some of the estimated rates and chosen models were different. However, similar to the results obtained with the full trees, the parametric bootstrapping showed that in all groups, 42-58% of the simulations resulted in ∆LL values that are equal or greater than the observed value, suggesting, again, that the observed differences in rates are not significantly different than what can be expected by chance. S7: Summary of parameters estimates using the best-fitted BiSSE model for the turtles, lizards, and squamates data sets using the full phylogenies. The Ms model (∆AIC=0) was chosen in turtles, whereas Msq was chosen in lizards. λ= Speciation rate; μ = extinction rate; qG→T = transition rate from GSD to TSD; qT→G = transition rate from TSD to GSD. ∆AIC is the difference in AIC of each model relative to the best supported model. Parameter estimates are given based on the best supported model for each dataset.

Group
Parameter estimates ∆AIC

DIVERSIFICATION ANALYSES USING MARKOV CHAIN MONTE CARLO (MCMC) SAMPLING
A Markov chain Monte Carlo (MCMC) sampling approach described in (FitzJohn et al., 2009) was used to estimate the posterior probability distributions for each of the six parameters. Posterior distributions were estimated using an exponential prior distribution (with mean set to twice the maximal ML rate estimate under a trait-independent model; λG = λT, μG = μT, qG→T = qG→T) placed on the six parameters. MCMC chains were started at the estimated parameters through ML, and were run for 10,000 steps; the first 10% of the steps were discarded as burn-in.
To test whether estimated extinction and speciation rates differ between TSD and GSD lineages, we calculated the percentage of BiSSE MCMC steps in which the GSD rate was higher than that of the TSD state (i.e., the posterior probability, PP, of GSD lineages having a higher rate than TSD lineages). For example, to test whether extinction rates differ, we calculated the percentage of post burn-in steps in which , > T G μ μ and interpreted 975 0 ≥ . ) >μ PP(μ T G as significant support for the conclusion that GSD lineages go extinct at a higher rate than TSD ones, with the converse, , supporting higher TSD extinction. To ensure that the MCMC search sample throughout the parameter space, we ran two chains starting from the top two MLE points. In the squamates/lizard datasets these two chains failed to converge, getting stuck in separate hills of the likelihood surface, leading to opposite interpretation of the data ( Figure S1). The chain that resulted in higher TSD speciation ( Figure S1a), which is compatible with the ML analysis presented in the main text, sampled the parameter space at substantially higher likelihood surface compared to the chain that resulted in higher GSD speciation (Figure S1b) (average difference between the two chain ca. 80 loglikelihood values). Figure S2. BiSSE MCMC results of using the two best MLE points as the starting points for the MCMC sampler using the lizard dataset. Posterior probability density distributions from MCMC analyses are shown for speciation rates for GSD (pink) and TSD lineages (blue). The MCMC starting point values of each parameter are marked by vertical lines. The red lines show the prior distribution (set according to a trait-independent model). (a) MCMC chain was initiated from the best-fitted ML point, leading to estimation of higher speciation rate in TSD lineages. (b) MCMC chain was initiated from the second-best-fitted ML point, leading to estimation of higher speciation rate in GSD lineages.

SIMULATIONS TO TEST EFFECT OF MISSING DATA ON TRANSITION RATE ESTIMATES IN BiSSE
We also tested the effect of missing data on the estimation of the transition rates in BiSSE, given that information for a substantial portion of extant lizards is lacking. For this, we simulated random trees with 1,000 tips with equal speciation rates (lambda = 0.1), no extinction, and varying transition rates (q01 = 0.1, q10 = 0.1, 0.05, 0.025) and carried out 100 simulations for each parameter combination. In each simulation, the data were analyzed by BiSSE with 100, 25, or 5% of the state data ( Figure S3). The estimated transition rates are shown in Figure S3. The results illustrate that missing data leads to increased variance in the estimated transition rates, although the average estimation is rather accurate.