Timing and ecological priority shaped the diversification of sedges in the Himalayas

Background Diversification patterns in the Himalayas have been important to our understanding of global biodiversity. Despite recent broad-scale studies, the most diverse angiosperm genus of the temperate zone—Carex L. (Cyperaceae), with ca. 2100 species worldwide—has not yet been studied in the Himalayas, which contains 189 Carex species. Here the timing and phylogenetic pattern of lineage and ecological diversification were inferred in this ecologically significant genus. We particularly investigated whether priority, adaptation to ecological conditions, or both explain the highly successful radiation of the Kobresia clade (ca. 60 species, of which around 40 are present in the Himalayas) of Himalayan Carex. Methods Phylogenetic relationships were inferred using maximum likelihood analysis of two nuclear ribosomal DNA (nrDNA) regions (ITS and ETS) and one plastid gene (matK); the resulting tree was time-calibrated using penalized likelihood and a fossil calibration at the root of the tree. Biogeographical reconstruction for estimation of historical events and ancestral ranges was performed using the dispersal-extinction-cladogenesis (DEC) model, and reciprocal effects between biogeography and diversification were inferred using the geographic state speciation and extinction (GeoSSE) model. Climatic envelopes for all species for which mapped specimen data available were estimated using climatic data from WORLDCLIM, and climatic niche evolution was inferred using a combination of Ornstein-Uhlenbeck models of shifting adaptive optima and maximum likelihood inference of ancestral character states under a Brownian motion model. Results The Himalayan Carex flora represents three of the five major Carex clades, each represented by multiple origins within the Himalayas. The oldest Carex radiation in the region, dating to ca. 20 Ma, near the time of Himalayan orogeny, gave rise to the now abundant Kobresia clade via long-distance dispersal from the Nearctic. The Himalayan Carex flora comprises a heterogeneous sample of diversifications drawn from throughout the cosmopolitan, but mostly temperate, Carex radiation. Most radiations are relatively recent, but the widespread and diverse Himalayan Kobresia radiation arose at the early Miocene. The timing and predominance of Kobresia in high-elevation Himalayan meadows suggests that Kobresia may have excluded other Carex lineages: the success of Kobresia in the Himalayas, in other words, appears to be a consequence largely of priority, competitive exclusion and historical contingency.


Introduction
The habitat and topographic diversity of mountains make them important centers of 249 (total 966 species in which 51 endemic to Himalayas, 838 non-Himalayas while 77 species present 250 in both regions) sense biogeographic coding (using supplementary Table S2 for geographic states 251 coding to narrow and broad sense coding and resulting dated phylogeny). Diversification rate was 252 estimated in three states: A representing species endemic to the non-Himalayas, B representing 253 species endemic to the Himalayas, and AB representing species present in both regions. In this 254 model, we estimate speciation rates of species in region A (sA) and B (sB), as well as sAB, 255 speciation rates for taxa that give rise to two daughter species, one in each region. Likewise, 256 geographical range expansion from A or B to AB were estimated with rates of dispersal dA and 257 dB respectively and range contraction with rates of extinction xA and xB. 258 Full and constrained alternative GeoSSE models were tested: the full model, in which rates 259 of speciation, extinction and dispersal differed among two regions; a constrained model in which 260 speciation in taxa distributed across two regions was set to zero (sAB ~ 0); and a model in which 261 rates of speciation and extinction were constrained to be the same between regions (sA ~ sB; xA 262~ xB). Models were compared using the Akaike Information Criterion. The best-fit model was also 263 fitted using Bayesian Markov chain Monte Carlo (MCMC) for 100,000 generations and an 264 exponential prior to estimate parameter distributions. In the BSM analysis, in total 1000 stochastic mapping replicates with 50000 maximum 277 trees per branch were conducted on ML tree. The alternative biogeographic model implemented 278 in BioGeoBEARS was not considered, as the inclusion of the jump parameter is not directly 279 testable relative to non-jump models (Ree & Sanmartin, 2018) and introduces complexities that 280 are not necessary to explain our biogeographic scenarios. 281 282 Climatic niche modelling 283 To characterize climatic envelopes for the species for which specimen data were available, we 284 extracted 19 bioclimatic variables from WORLDCLIM data using the R package raster 285 (WorldClim v1.4; Hijmans et al., 2005). The average value for species occurrence data and 19 286 bioclimatic variables from a total of 965,556 data for 850 unique species were estimated. The 287 obtained data for 838 species were further proceeded after removing outliers from WORLDCLIM 288 data. For each bioclimatic variable, ancestral character states were estimated assuming a Brownian 289 motion trait evolution process using the 'fastAnc' function in phytools (Revell, 2012). Then non-290 metric multidimensional scaling (NMDS) ordination was employed using Euclidean distances 291 normalized to unit variance on bioclimatic data for the tips as well as the internal nodes of the tree. 292 The stress from K=1 to K=10 was calculated and plotted stress against dimension to estimate how 293 much additional information was extracted from the bioclimatic data with each additional NMDS 294 axis.

295
Data were plotted with internal nodes of interest, representing ancestors of Himalayan  338 Himalayan taxa (59 species) in core Carex occurred as small clades or isolated (single) species 339 dispersed throughout the clade. Our topology may be poorly supported in some clades, but the 340 groups and branching we retrieved was fully compatible with the Global Carex Group (2016) tree, 341 where the clades retrieved were more strongly supported.

353
Monophyly of the Himalayan taxa within each major clade was strongly rejected at the 354 0.01 level based on the Shimodaira-Hasegawa test, except for the Kobresia species in the core 355 Unispicate clade, where monophyly cannot be rejected (Table 3). We focused an additional test on 356 the monophyly of Kobresia, in which Kobresia was constrained to be monophyletic and only the 357 Unispicate taxa included. In this test, monophyly of Kobresia was not significantly rejected (ΔlnL 358 = -1.32). Thus, while Kobresia was found to be polyphyletic in previous work (Global Carex 359 Group, 2016) and in the current study, we considered this result poorly supported by the data, 360 pending additional study. 361 362 Ancestral range reconstruction 363 Initial biogeographic reconstruction under the DEC model was conducted on the "narrow sense" 364 (Supplementary file S9) biogeographic coding, which failed to recover any clade endemic to the 365 Himalayas, which is at odds with our observation that numerous species are predominantly 366 Himalayan in origin. We thus present here analysis of data coded in the "broad sense" (see 367 methods; Supplementary file S10), using the stochastic mapping (BSM) method on the DEC model 368 as implemented in BioGeoBEARS. We generated 1000 stochastic maps in every 50000 trees per 369 branch in BSM analysis. In each stochastic map, total event counts of cladogenetic events were 370 higher which were based on vicariance (5.6% events) and sympatric (94.4% events) processes than 371 that of anagenetic which were based on only dispersal events (Supplementary Table S7). However, 372 an interesting finding in BSM results was that none of the events showed "range switching 373 dispersal", while all the observed dispersal was of "range expansion".

374
Under the broad sense biogeographic coding through BSM, 128 Himalayan taxa were 375 distributed among three major clades: 1) Vignea (16%), 2) core Unispicate (29%) and 3) core 376 Carex (54.6%) (Figure 1)  Manuscript to be reviewed 402 Carex dispersals were inferred from the Himalayas into these four regions and particularly 403 Australasia. However, our sampling was somehow biased towards the Nearctic region, this was a 404 direct consequence of using the Global Carex Group (2016) dataset, which is the largest and most 405 complete Carex dataset to date (>50% of the total diversity).

406
407 Diversification dynamics of Himalayan species 408 Among the three GeoSSE models evaluated with AIC values, the best-fit model (Table 4) (Figure 1) is remarkable and at odds with our expectations at the outset of the study. It is 544 also somewhat remarkable that dispersal rate out of the Himalayas (0.28 events per million years) 545 is twice as high as diversification rate within the Himalayas (0.16 events per million years), and it 546 is perhaps telling that narrow sense biogeographic coding failed to retrieve any clades endemic to 547 the Himalayas.     Manuscript to be reviewed  Manuscript to be reviewed

Estimates of diversification in Himalayan vs non-Himalayan lineages using Geographic
State Speciation and Extinction (GeoSSE) models.
In these models, the Himalayas construed broadly or narrowly (see methods) is denoted as area B; areas outside the Himalayas are denoted as area A. Abbreviations: lnL = loglikelihood; AIC = Akaike Information Criterion; sA, sB = speciation rate in area A, B; sAB = speciation rate in taxa whose distribution includes both areas A and B; dA, dB = dispersal from area A to B or B to A respectively; xA, xB = extinction rate in area A, B.
1 Table 4 Estimates of diversification in Himalayan vs non-Himalayan lineages using Geographic State Speciation and Extinction 2 (GeoSSE) models. In these models, the Himalayas construed broadly or narrowly (see methods) is denoted as area B; areas outside the 3 Himalayas are denoted as area A. Abbreviations: lnL = log-likelihood; AIC = Akaike Information Criterion; sA, sB = speciation rate in 4 area A, B; sAB = speciation rate in taxa whose distribution includes both areas A and B; dA, dB = dispersal from area A to B or B to A 5 respectively; xA, xB = extinction rate in area A, B.