New insights into the formation of biodiversity hotspots of the Kenyan flora

This study aimed to investigate the distribution patterns of plant diversity in Kenya, how climatic fluctuations and orogeny shaped them, and the formation of its β‐diversity.


| INTRODUC TI ON
Kenya intersects three biodiversity hotspots, that is, Eastern Afromontane, Coastal Forests of Eastern Africa and Horn of Africa (Marchese, 2015). The complex topography is marked by largescale extensional features, such as the East African Rift, widespread volcanic activity, and anomalously subsided basins and uplifted domes, which profoundly influenced the distribution and amount of rainfall under control by the African-Indian monsoon and westerly moisture sources, resulting in a richness of species (Couvreur et al., 2021;Griffiths, 1993;Wichura et al., 2010). These geological processes also led to stronger temperature and altitude gradients from the coastal plains to the uplands of the East African Plateau, which created an extensive mountainous area with cooler temperatures (Linder, 2017). Accompanied by the paleoclimatic fluctuations, such as the Last Glacial Maximum, the extensive mountainous area finally created a patchwork of closely spaced humid and arid sedimentary environments that support diverse vegetation and became centres of biodiversity and endemism (Dagallier et al., 2020;de Menocal, 1995;Linder, 2001;Maslin et al., 2014;Pierre et al., 2006).
It is well known that orogeny has promoted topographic heterogeneity and created new habitats for species to evolve and diversify (Antonelli et al., 2018). Many studies have argued that mountains are both cradles and museums and are sources of new evolutionary lineages that later colonized surrounding lowlands or nearby alpine mountains (Antonelli et al., 2009;Ding et al., 2020;Hutter et al., 2017). Irrespective of any focus on biological processes, the mountain-geobiodiversity hypothesis was proposed to explain the high levels of biodiversity in mountain regions (Muellner-Riehl et al., 2019). The mountain-geobiodiversity hypothesis is based on three conditions: (a) the presence of lowland, montane and alpine zones (areas with a high diversity of niches); (b) climatic fluctuations for a "species pump" effect (drivers of species divergence); and (c) high-relief terrain with full elevational zonation (forming candidate refugia regions and easily migrated regions; Muellner-Riehl et al., 2019). Biotic and abiotic factors may act in parallel to biodiversity patterns, even though we consider them separately according to different hypotheses.
It is challenging to account for both ecological and evolutionary processes when depicting biodiversity patterns (Graham & Fine, 2008;Saladin et al., 2019). In the face of this challenge, a widely recognized approach is to incorporate phylogenies into biodiversity analyses (Helmus et al., 2007;Miles & Dunham, 1993;Pagel, 1999;Swenson, 2013). β-diversity interacts with α-diversity gradients, and both result in community assembly through local and regional filters. Thus, β-diversity may capture the dynamic nature of biodiversity patterns better than simply measuring alpha diversity (Soininen et al., 2018). Turnover and nestedness are two components of β-diversity. The former refers to β-diversity attributable to species replacement, whereas the latter indicates species loss or gain and relates to richness differences between the communities (Baselga, 2010). Phylogenetic β-diversity is usually applied to measure the dissimilarity of phylogenetic composition between different assemblages (Anderson et al., 2011;Baselga, 2012). It considers phylogenetic relatedness to explain biota evolutionary history and biodiversity patterns across environmental gradients on regional or global scales (Dobrovolski et al., 2012). Understanding the processes that organize biodiversity in time and space helps design robust protected area networks (Socolar et al., 2016).
To date, there is no specific research on the delineation of the Kenyan flora. According to White (1983), most areas of Kenya between the highlands and the coastal belt are located in the Somalia-Masai region; the remaining parts belong to the Afromontane archipelago and the Lake Victoria regions. Under the bioregionalization framework of the African continent, Linder et al. (2012) suggested that East Africa (from Kivu of the Democratic Republic of Congo, Uganda, Kenya, to northcentral Tanzania) is a large Regional Mosaic of intermingling vegetation types with distinct floras. This region falls into the transition zone between the dry north and south and the humid west (Fayolle et al., 2019;Linder et al., 2012). Other classifications roughly delineated Kenya into the East African montane region and the Coastal Region (Droissart et al., 2018;Fayolle et al., 2014). The African montane region was formed by a continental magmatic arc of hundreds of volcanic structures with a north-south orientation, from the southern margin of the Turkana basin to Kajiado (Baker et al., 1971;Feibel, 2011;Global Volcanism Program, 2013;Hetzel & Strecker, 1994;McCall & Bristow, 1965;Williams, 1970;Williams et al., 1984). It is crucial to measure its spatial phylogeny and dissect its flora. Evolutionary biodiversity studies of the Trans-Mexican Volcanic Belt suggested that climatic fluctuations, together with recent volcanism, have driven the diversification and local persistence of biodiversity in the Mexican highlands (Mastretta-Yanes et al., 2015;Rodríguez et al., 2018;Ruiz-Sanchez et al., 2013;Sosa et al., 2018). We expect that there may be areas with strong turnover along the precipitation gradient extending from the volcanic to the coastal area, which is consistent with the delimitation of dry and wet habitats.
The volcanic area is mainly distributed in western to central Kenya Volcanic Region. The high turnover component in climatically stable regions may have preserved old lineages and the prevalence of endemic species within narrow ranges.

K E Y W O R D S
biodiversity hotspots, latitudinal gradient, nestedness, phylogenetic structure, phytogeographical regions, turnover, β-diversity and coincides with the Eastern Afromontane biodiversity hotspot, while the coastal area in eastern Kenya overlaps with the Coastal Forests of Eastern Africa hotspot. As the environments are not functionally equivalent to determining community composition (Gomez et al., 2019), we hypothesized that high phylogenetic turnover was found in volcanic and coastal areas, whereas low phylogenetic turnover prevailed in dry areas.
Here, we investigated the patterns of phylogenetic turnover and identified the phytogeographical regions of Kenya based on the spatial phylogenetic analysis of angiosperms. The aims of our study were as follows: (a) to delineate the phytogeographical regions of Kenya; (b) to test whether high phylogenetic turnover component occurs in the volcanic area; and (c) to investigate the underlying mechanisms that shaped the β-diversity patterns in the biodiversity hotspots of East Africa.

| Distribution records
The comprehensive angiosperm checklist of Kenya was compiled based on the data from the Flora of Tropical East Africa (FTEA;Zhou et al., 2017). We gathered detailed distribution information of each species from FTEA, local checklists and other published sources (Agnew & Agnew, 1994;Fischer et al., 2011;FTEA editors, 1952FTEA editors, -2012Girma et al., 2015;Luke, 2005) and then transformed them into county-level distributions following the contemporary administrative divisions of Kenya. In addition, we collected the species occurrence data of angiosperms from RAINBIO (Dauby et al., 2016) and the Global Biodiversity Information Facility (GBIF, https://doi. org/10.15468/ dl.njawff). We used the R package "speciesgeocodeR" (Töpel et al., 2017) to filter the species occurrence in the datasets, checking for obvious errors such as empty coordinates, terrestrial species reported in the sea and points located outside the country boundary of Kenya. Information on species growth forms was assembled from RAINBIO and FTEA. Taxa with unclear growth form status were defined according to expert experience. The list of species growth forms is shown in Appendix S1. All the species names were standardized following The Plant List version 1.1 (available at http://www.thepl antli st.org) by the R package "Taxonstand" (Cayuela et al., 2012). After excluding alien and non-angiosperm species and filtering out species not in the checklist of Kenya, there were 36,567 coordinate occurrence records including longitude and latitude from RAINBIO and GBIF, and 15,997 county-level records from FTEA and other consulted sources. All the distribution records included 5822 species belong to 191 families and 1465 genera. There were 1843 species uniquely from FTEA and other sources and only 29 species uniquely from GBIF and RAINBIO (Table S4.2). Since some small geographical gaps in specimen collection still exist in the northeast parts of Kenya, we reduced the sampling bias caused by unequal sampling areas by up-scaling the distribution records, testing both 50-km × 50-km and 100-km × 100-km grid cells. We assigned the county-level distribution data to the overlapping grid cells (Details for other conversion method were provided in Appendix S4, Figure S4.2 and Table S4.1. Species list completeness assessment was provided in Appendix S9, Table S9.5 and Figure S9.13), and occurrence data were projected into the grid cells directly. Grid cells were excluded if the land area occupied <50% of the entire cell (i.e. <1250 km 2 ).
Because the distribution data consisted of presence-only data, we transferred the species into presence-absence data to extend their applicability (Appendix S4). Following these steps, we generated a species-level presence-absence matrix for the remaining 239 grid cells.

| Phylogenetic analyses
We sampled all available taxa for four plastid genes (atpB, matK, ndhF and rbcL) to reconstruct the phylogenetic tree of angiosperms.
We downloaded all available sequences of 2583 species for the target DNA regions from GenBank (last download on January 11, 2019). If two or more sequences were available for the same locus of a species, we kept the longest sequence. Moreover, we generated 1910 new sequences representing 815 species for which the sequence was unavailable or of low quality in the public database.
At least one copy of the specimen was deposited in the Herbarium of the Institute of Botany, Chinese Academy of Sciences (Herbarium Code: PE) for all the samples collected in the field. The methods for DNA extraction and amplification followed Chen's protocol (Chen, Yang, et al., 2016). The alignment was conducted with MAFFT (Katoh et al., 2019), and the relatively fast-evolving markers matK and ndhF were further refined with GENEIOUS R9 v.9.0.5 (Biomatters Ltd., Auckland, New Zealand; Kearse et al., 2012).
The matrix obtained by the above steps was used to reconstruct a preliminary phylogenetic tree using RaxML 8.2.12 (Stamatakis, 2014) on the CIPRES Science Gateway Web server (Miller et al., 2010)  Podocarpus milanjianus). We checked our preliminary phylogeny by comparing the single gene trees and combining them with those of previous studies to identify species with probable inappropriate systematic positions. For species with unexpected positions, we determined whether any of the sequences were chimeric or incorrectly identified using the BLAST tool in GenBank (Altschul et al., 1990).
We removed low-quality sequences, duplicates and ambiguous taxonomical samples, but kept the species at provisional status.
The final alignment included 3363 species, representing four gene regions and a total length of 8201 base pairs (bp). This combined matrix was divided into four partitions, and run on RaxML 8.2.12 (Stamatakis, 2014) with independent parameters on each model. The analysis comprised 1000 bootstrap replicates performed using the GTR + GAMMA model of nucleotide substitution.

| Time calibration
We used a penalized likelihood method as implemented in treePL (Smith & O'Meara, 2012) to conduct the time calibration of our phylogeny. Node calibrations were estimated using 72 fossil calibration points scattered among the angiosperms (Lu et al., 2018). The hard maximum and minimum age for angiosperms was set at 140 and 125 Myr, respectively (Lu et al., 2018). At the beginning, we set the smooth parameter free as 1000, and the "prime" option was performed to determine the optimal run parameter for the "thorough" option of the cross-validation analysis (opt = 1, optad = 1 and optcvad = 1). Then, we obtained the best optimization parameters for smooth (smooth = 0.1). Finally, the analysis was carried out again with the optimal smooth, and a time tree including 3363 angiosperm species native to Kenya was reconstructed (Appendix S10).

| Species tree reconstruction
With the above time tree as the backbone, a species-level tree including 5912 angiosperm species from Kenya was generated by inserting non-sampled congeneric species using the R package "V.PhyloMaker" (Jin & Qian, 2019). Our species-level tree covered almost 98.2% of the native and currently accepted species of Kenya (Appendix S5, Figure S5.3). This full tree was used for the phylogenetic diversity and phylogenetic β-diversity analyses.
Polytomies are particularly acute when missing tips are added to phylogenies based on taxonomic information by V.PhyloMaker. To evaluate the potential influence of polytomies within these analyses on the calculation of phylogenetic diversity and β-diversity, we combined taxonomic information with a time-calibrated backbone phylogeny to compatibly place unsampled lineages to a super-tree using a birth-death-incomplete sampling estimator ("TACT," Taxonomic Addition for Complete Trees, a kind of polytomy resolver; Chang et al., 2020). The results of phylogenetic diversity and β-diversity based on the TACT tree were then compared with the results calculated from the V.PhyloMaker tree.

| Metrics of taxonomic and phylogenetic βdiversity
Following Baselga (2012), we calculated three measures of taxonomic β-diversity, the Sørensen dissimilarity index (β sor ), the Simpson dissimilarity index (β sim ) and the nestedness index (β sne ) for each pair of grid cells. The total β-diversity was represented by β sor , where β sim was the species turnover part of β-diversity and β sne was the nestedness component of β-diversity. β sor and its two components are defined as follows (Baselga, 2012): where "a" is the number of shared species between two grid cells, and "b" and "c" are the number of species unique to each grid cell. β sim ranges from 0 (if species composition is identical between grid cells) to 1 (if there are no shared taxa). When applied to phylogenetic β-diversity, shared and unique species are replaced with shared and unique branch lengths, respectively (Qian et al., 2020). Phylogenetic turnover (pβ sim ), which is adapted from β sim , can be used to quantify the phylogenetic distance among grid cells (i.e. pairwise compositional turnover) (Hardy et al., 2012). pβ sim ranges from 0 (if species compositions are identical between grid cells and hence share the same branch lengths) to 1 (if grids' species compositions are distinct that share no similar branch lengths).
As pβ sim is strongly correlated with β sim , we constructed null models to calculate the random expectations of pβ sim based on a pairwise approach. A null distribution was generated for each grid cell by randomizing the tips of the phylogeny 999 times; this randomized the phylogenetic turnover between grid cells while preserving the underlying taxonomic turnover differences between grid cells. Null distributions were used to calculate standardized effect sizes of pβ sim (SES.pβ sim ), where the expectations of the null distribution were subtracted from the observed pβ sim and divided by the standard deviation of the null distribution. A positive SES indicated higher phylogenetic turnover than expected based on a random sampling of the regional flora; a negative SES indicated the opposite.
The analysis was performed in R 3.6.1 (R Core Team, 2019) with the R package "betapart" (Baselga & Orme, 2012). Under the Sørensen dissimilarity, we chose β sim and pβ sim as the indices to delineate biogeographical regions because they are weakly affected by differences in species richness between localities (Fabien & Anthi, 2014;Kreft & Jetz, 2010;). To evaluate the sensitivity of our results, we also compared the β sim and pβ sim values with the turnover component obtained from the Jaccard's dissimilarity index (β jtu and pβ jtu ; Baselga, 2012).

| Phylogenetic β-diversity analyses
We applied two approaches to investigate patterns of β-diversity across Kenya: a pairwise approach and a multiple-site approach. To explicitly consider the scale dependence of the results (Soininen et al., 2007), we used the closest three, four, eight and twelve grid cells adjacent to the focal grid ( Figure S5.4). For the pairwise approach (based on phylo.beta.pair function), we obtained the dissimilarity values between the focal grid and all adjacent grid cells and assigned the average value for the focal grid cell. In the multiple-site approach (based on phylo.beta.multi function), a single value was obtained from all compared samples (all grid cells in the window), and we assigned the single value to the focal grid cell. Since each site was a repeated sample in the pairwise approach, it was not independent; also, the pairwise approach ignores information on species shared among more than two sites. The multiple-site approach avoids the repetition of each site and prevents information loss when species appear in more than two sites (Diserud & Ødegaard, 2007). Besides, when the sites are viewed as a random set of observations from a region, evaluating overall similarity from the average of the pairwise similarities could be misleading (Diserud & Ødegaard, 2007). We herein compared two methods.
Based on the two datasets, we produced maps showing the relative importance of the turnover-resultant pβ sor component according to the formula: Values less than one indicated that pβ sor was mainly determined by nestedness-resultant dissimilarity (pβ sne ). In contrast, values greater than one indicated that pβ sor was the result of a greater influence of the dissimilarity component due to species replacement between grids (pβ sim ). The pβ ratio for woody and herbaceous genera was calculated separately to compare their spatial distribution patterns across Kenya. The map of pβ ratio based on the multiple-site approach on eight grid cells was then intersected with the map of nature reserves in Kenya (UNEP-WCMC and IUCN, 2022).

| Cluster analysis
Pairwise similarity among multiple sites is an appropriate approach for displaying species turnover along an environmental gradient or geodistance (Penone et al., 2016). Following Ye et al. (2019Ye et al. ( , 2020, the grid cells were clustered by applying eight hierarchical methods to β sim and pβ sim pairwise dissimilarity matrices and applied the cophenetic correlation coefficient and Gower's distance to access the ideal cluster method. All the analyses were implemented by the R package "stats" (R Core Team, 2019). An elbow criterion was employed to determine the optimum number of clusters, which is based on the two-level hybrid clustering approach (Zhao et al., 2011). This function was implemented by the "GMD" package (Zhao & Sandelin, 2014) in R. The analysis results showed that the unweighted pair-group method with arithmetic means (UPGMA) was the reasonable method (Table S4.3).

| Ordination analysis
Non-metric multidimensional scaling (NMDS) was performed to illustrate the spatial turnover and relationships between the clusters in an alternative way. The method is robust at simplifying the variables of a multidimensional space into a low-dimensional space (Minchin, 1987). Grid cells in the NMDS ordination plots were coloured according to their UPGMA cluster membership to visualize the relative distance between geographical regions. The degree of NMDS ordination was assessed by the stress value, which ranged from 0 to 1. Smaller values indicate that the NMDS results are more consolidated. The analysis was carried out using the "vegan" package (Dixon, 2003).

| Spatial phylogenetic analysis
2.6.1 | Phylogenetic diversity Faith's (1992) phylogenetic diversity (PD) is the sum of all phylogenetic branch lengths that connect species in a community.
Considering that Faith's PD is affected by species richness, we calculated the standardized effect size of PD (SES.PD) based on the taxa. labels model (Lu et al., 2018). Both PD and SES.PD were calculated by species tree by the R package "picante" (Kembel et al., 2010).

| Phylogenetic structure
The net relatedness index (NRI) and the nearest taxon index (NTI) were calculated to investigate the phylogenetic structure (clustering or overdispersion) of angiosperms across Kenya (Webb et al., 2002). and SES.MNTD were calculated based on the full species tree by the R package "picante" (Kembel et al., 2010), then transformed to NRI and NTI by multiplying by −1.

| Surrogate variable for volcanic influence
We investigated all occurrences, distribution and volume information of volcanos in Kenya (Guth, 2013). To investigate whether the pβ ratio pattern is related to the distance between volcanoes and grid cells, we first computed the Euclidean distances between the focal grid cell and all volcanoes in the grid cells group (here, we used one focal and eight neighbouring grid cells group as research unit); second, we computed the distances between the focal grid cell and the nearest volcano (volcano not shown in the nine grid cells group); third, for the grid cell groups with no volcano, we computed the distances between the focal grid cell and the nearest volcano directly; finally, the reciprocal sum of the Euclidean distances for a focal grid cell was used as the independent variable representing volcanic influence on pβ ratio . The relationship between the reciprocal distance and pβ ratio was modelled using a generalized linear model. We assumed that the closer the grid is to the volcano, the greater the impact of the volcano on pβ ratio . After calculating the inverse distance, we gave more weight to volcanoes closer to the focal grid. When the grid groups contain many volcanos, this processing method can effectively prevent a large distance sum causing a violation of our hypothesis.

| Phytogeographical regionalization
The UPGMA clustering of grid cell assemblages based on both pβ sim and β sim dissimilarity matrices yielded five major phytogeographical In particular, grid cells belonging to the West Highland Region overlapped with those belonging to the Volcanic Region.

| Patterns of phylogenetic β -diversity
The Jaccard and Sørensen indices produced similar results. They all showed significant correlations (p < .001), with correlation coefficients all above .91 (Table S4.4). We present the results for pairwise and multiple-site approaches with the Sørensen index, The pβ ratio was found to be significantly associated with the distance between a grid and a volcano, irrespective of growth forms ( Figure 3). The effect of the presence of a volcano on phylogenetic β-diversity decreased with the increase in the distance between grid F I G U R E 2 Geographic patterns of the reciprocal of the distance between grid and volcano. The bold line on the map represents the boundaries of floristic regions as shown in Figure 1. The circles under each letter correspond to the general location of the representative biomes in each region (a, the West Highland Region mostly with tropical rainforest dominated by Kakamega Forest and Nandi Forest; b, Volcanic Region with the famous Eastern Afromontane area, represented by Mount Kenya, Mount Aberdares and Mau Forest; c, Turkana Region, with the dry sandy and rocky river course region with Acacia-Commiphora bushland and Acacia-Grewia-Balanites bushland; d, Pan Coastal Region with lowland evergreen rain forest and dry forest; e, Mandera Region, which is more similar to the Horn of Africa.). The images provide an overview of the representative species and growth forms that make up each biome. Species morphology adaptation to different habitats, from western moist forests and alpine region to the eastern lowland woodland and coastal forests, can be observed. (a1-a3), Habenaria malacophylla, Brachycorythis buchananii and Leea guineensis; (b1-b6), Lobelia gregoriana, Dendrosenecio battiscombei, Erica arborea, Protea caffra subsp. kilimandscharica, Euphorbia candelabrum and Calodendrum capense; (c1-c3), Adenium obesum, Balanites glabra and Commiphora schimperi; (d1-d3), Aerangis kirkii, Pseudobersama mossambicensis and Isolona cauliflora; (e1), Albuca abyssinica. and volcano. The observed pβ ratio showed a general monotonic decline with increasing latitude in the north of the equator on pairwise and multiple-site approaches and growth forms (woody and herbaceous; Figure S7.10). The phylogenetic turnover peaked at the equator and at latitude 4°S, which corresponds to the volcanic and coastal area. The lowest phylogenetic turnover was represented by the Turkana Basin ( Figure S7.10a-c). In Kenya, the main volcano distribution area (the Western part of Kenya between 2°S and 2°N) has a distinctly different topography than the eastern part ( Figure S7.10a-c). These results from the V.PhyloMaker tree were consistent with those based on the TACT tree ( Figure S7.11), suggesting that polytomies did not significantly bias our findings. The proportion of protected areas in the volcanic region was higher than in other regions ( Figure 5). These nature reserves are fragmented and the larger contiguous areas with nature reserves are distributed in coastal areas (Taita Hills).

| Spatial phylogenetic pattern
We obtained a series of spatial phylogenetic patterns, including species richness (SR), phylogenetic diversity (PD), standard effective size of phylogenetic diversity (SES.PD), NRI and NTI (Figure 4). The patterns of SR and PD were similar, presenting higher values in the volcanic and coastal areas (Figure 4a (Appendix S8, Figure S8.12a-d). Based on the TACT tree, a more significant phylogenetic overdispersion of NTI was detected in the volcanic area ( Figure S8.12e).

| DISCUSS ION
Our study is the first quantitative synthesis of β-diversity components in a volcano-dominated landscape. It allows starting to understand the causes underlying turnover and nestedness and the formation history of the Kenyan flora.

| Formation of Kenyan flora
In this study, both phylogenetic dissimilarity and taxonomic dissimilarity approaches yielded five floristic regions in Kenya.
The shape and boundary of each region are broadly similar. The circumscription of the Mandera Region and Turkana Region varies between the two approaches, which might be caused by how evolutionary history is accounted for in the pβ sim method. The pβ sim method considers the phylogenetic distance of lineages from distinct clades segregated along grid cells in the Mandera and Turkana regions, which reflects the long compositional history F I G U R E 4 Geographic patterns of species richness, phylogenetic diversity (PD) and standardized effect size of phylogenetic diversity (SES.PD) and phylogenetic structure (NRI and NTI) estimated under the V.PhyloMaker tree. The maps were generated using ArcGIS 10.6 in Universal Transverse Mercator projection.
within each grid cell. However, it is hard for the β sim method to detect the compositional history. This situation exacerbates when old lineage extinction and new radiation lead to consistent nestedness patterns along the homogeneity habitat. As both areas were covered by a homogeneous habitat and presented high nestedness, lineage exchange was more important than species exchange (Peixoto et al., 2017). Referring to pβ sim , the Mandera Region may be more similar to the Horn of Africa, representing a unique composition of floristic history (Wang et al., 2019). However, results from β sim suggested that the Mandera Region broadly included the eastern part of Kenya, which was defined as the Somalian Region before (Linder et al., 2012). The delineation of the Mandera Region based on pβ sim might be more reasonable, considering the sampling coverage Highland Region varies between the results of pβ sim and β sim . We followed the results of pβ sim , as in the Mandera Region. Therefore, because of the variable boundary of the small regions shown here, it is somewhat difficult to explain which is more reasonable and needs to be considered in parallel with the nature of pβ sim and β sim . On the contrary, some species near the northern and western boundaries of Kenya may be distributed in only a few localities within Kenya (restricted to the Mandera and West Highland Regions) but have a much wider range extending beyond these regions. This may also lead to bias in the cluster analysis results. The boundary of the Volcanic and Pan Coastal regions was broadly similar to the high turnover area of Kenya. The phytogeographical regionalization results implied potential distinct evolutionary histories of these two regions, even though they are both high turnover areas.
The nestedness and turnover components of phylogenetic βdiversity are higher than the random expectation in climatically and geologically unstable regions, implying that the flora of Kenya has been impacted by climate change and orogeny ( Figure S7.9a,b).
Within a region like Turkana under the influence of past climatic change, phylogenetic diversity would be lower than the randomized phylogenetic diversity expectation from whole phylogenetic tree when the species are clustering in specific clades (Figure 4c). This means that the extant species from specific clades that adapted to a similar environment have suffered from strong environmental filtering and show strong phylogenetical signals. Diverse habitats are currently distributed in the Volcanic Region, in which the Miocene paleobotanical records also document a great variety of environments and habitats, ranging from evergreen to deciduous forests and from wet to dry woodlands (Bonnefille, 2010). Changeable habitats may have already existed in Kenya for a long time under several climate and inter-climate optimal phases (Westerhold et al., 2020).
The interaction between climate and orogeny driving the formation of unique montane forest types, savanna or alpine vegetation could serve as a source of new species to neighbouring areas. The high turnover components observed in climatically and geologically stable regions such as the coastal area could be caused by the preservation of ancient lineages.

| Macroevolutionary mechanisms underlying phylogenetic nestedness
The β-diversity pattern in low latitude areas has been reported to be dominated by turnover (Dobrovolski et al., 2012;Soininen et al., 2018). However, we found several areas that were dominated by nestedness at low latitudes, which may have been shaped by different evolutionary histories compared with turnover-dominated areas. Under the floristic evolutionary history, long-term climatic fluctuations that potentially affect biodiversity need to be considered (Muellner-Riehl et al., 2019). Environmental filtering plays a strong role in determining species' composition assemblage at a regional/local scale (Swenson et al., 2012); the assembly only maintains the species that can tolerate the limits of the environment (Cardillo, 2011). The phylogenetic niche conservatism hypothesis indicates that closely related species tend to adapt to similar environmental conditions; thus, an assemblage may become phylogenetically clustered F I G U R E 5 Distribution of nature reserves in the geographic pattern map of pβ ratio (the relative importance of the turnoverresultant pβ sor component). (Cardillo, 2011;Wiens & Donoghue, 2004). Typically, the amount and seasonal distribution of rainfall in East Africa is mainly controlled by three factors, that is, the Intertropical Convergence Zone (ITCZ), the interaction of moist airflows from the Atlantic Ocean and the Indian Ocean at the continental scale, and the regional scale orogeny (Linder, 2017). These factors could be the best predictors of the distribution of modern East African patchy vegetation. Plio-Pleistocene aridification events in East Africa deeply reshaped the vegetation from closed canopy to open savanna (Bobe, 2006;Cane & Molnar, 2001;de Menocal, 1995). Gradually, the progression towards a harsh environment took place at the eastern equator and the northern parts of Kenya. Besides, global scale aridification events and topographic change at the regional scale induced by rifting played a dominant force in the late Neogene climatic formation in East Africa rather than a background role (Pierre et al., 2006). to the studies that focused on the extremely low temperature of the Last Glacial Maximum phase, which shaped nestedness patterns (Dobrovolski et al., 2012;Qian et al., 2020), our results also support that high extinction risks in a harsh environment might have been critical to shaping the nestedness dominant phylogenetic β-diversity pattern of Kenya.

| Macroevolutionary mechanisms underlying phylogenetic turnover
The highest turnover was found in volcanic and coastal areas ( Figure S7.10), which represent the Eastern Afromontane hotspot and Coastal Forests of Eastern Africa hotspot, respectively. The turnover shows a non-monotonic relationship between high and low latitudes, a pattern found in North America as well (Xing & He, 2019). In the vicinity of the Rift, escarpments, mountains and volcanoes present a succession of different vegetation belts, that is, savanna and woodland at low elevation, mountain forest at mid-elevation, and afro-alpine heathland at high elevations (above 3000 m), following increasing rainfall and decreasing temperature altitudinal gradients (Bonnefille, 2010). Thus, a higher phylogenetic turnover was well predicted by the greater ecological opportunity in the heterogeneous landscape of the Volcanic Region. Because of the highly heterogeneous topography, the volcanic area is known as a "sky island" and is considered a natural laboratory for testing the progress of species diversification and colonization (Cox et al., 2014).
Significant phylogenetic Neo-and paleo-endemism was detected in the volcanic area (Dagallier et al., 2020). The NRI in the volcanic area showed phylogenetic clustering at a deep level, meanwhile, the NTI showed phylogenetic overdispersion at a shallow level (Figures 4 and   S8.12). The higher proportion of clustered assemblages for NRI compared with NTI parallels the high species richness and high PD in the volcanic area. This pattern suggests that if there were a niche filled by ancient species, the interspecific competition would increase at a local scale as the niche becomes saturated. The higher proportion of overdispersion assemblages for NTI compared with NRI indicates the possibility that species are rapidly radiating in a suitable habitat of the Volcanic Region. This pattern suggests that both opportunities and challenges coexist in the volcanic regions. In terms of the opportunities, geological events contributed greatly to the ecological opportunities for speciation and colonization; in terms of the challenges, frequent geological events were a disadvantage to old lineages accumulation. The accumulation of young lineages caused by in situ differentiation and colonization promoted the floristic diversification in the Southeast Asia and North America volcanic areas (Merckx et al., 2015;Rodríguez et al., 2018;Sosa et al., 2018), and the African volcanic areas also have the similar routes (Blackburn & Measey, 2009;Chen, Wang, & Renner, 2016;Gizaw et al., 2016;Kandziora et al., 2016).
A larger proportion of overdispersion for NRI was detected in the coastal area compared with NTI than in the volcanic area. The coastal area environment is stable under the influence of aridification, including part of the East Arc Mountain (Lovett et al., 2005), where the flora has strong affinities to that of the western Guineo-Congolian Region (Lovett, 1993). This is a relic of tropical forests continuously distributed since the early Tertiary (Couvreur et al., 2008). Ancient lineages such as Platypterocarpus, Neohemsleya and Brochoneura can be observed in this area. The coastal area was once a refuge for a high percentage of tropical elements that were widely distributed in Tropical East Africa. Thanks to environmental stability, the ancient lineages persisted and generated habitat specialists in coastal areas.
These relic elements, in turn, mainly contributed to the high turnover component pattern in this area.

| Biological conservation within hotspots
The distribution of protected areas in Kenya roughly coincides with hotspots of phylogenetic turnover (pβ ratio > 1) that are evenly distributed along coastal-volcanic regions. Protected area networks like those in Kenya could capture species turnover along spatial or environmental gradients and the inestimable potential risk of species loss (Tuomisto et al., 2003). Small protected areas are scattered between larger ones and serve as corridors (especially in the Eastern Afromontane Region) and could facilitate species range-shifts in response to climate change to decrease the likelihood of local and regional extinction (Fagan, 2002;Pearson, 2006). Given that conservation is underfunded, protected area selection should be optimized (Venter et al., 2014). The creation of few large and several small reserves could reduce conflicts between agricultural development and biodiversity conservation.

| Limitations
According to the species-area relationship (SAR) model constructed by Qian et al. (2021), we found that the average completeness of species in the grid cell for the data from GBIF and RAINBIO is only 11% in Kenya. The current GBIF and RAINBIO data along are not sufficient in addressing biogeographic and macroecological questions for the flora of Africa (Qian et al., 2021).
While, after combining with data from FTEA and other consulted sources, the average completeness of species in the grid cell reached 70.3% in Kenya (Appendix S9, Table S9.5). Data from FTEA and other consulted sources not only improve the completeness of Kenya angiosperms list (Tables S4.1 and S4.2), but also are more evenly distributed and can reduce the geographic bias of GBIF and RAINBIO data ( Figure S9.13b). In addition, species list completeness estimation by SAR model may be higher than the actual richness in eastern and northern Kenya or lower than actual species richness in volcanic area of Kenya. Therefore, the grid cells distributed in the volcanic area of Kenya with species completeness between 51% and 100% should be the focal area of plant investigation in the future ( Figure S9.13c). The species richness of these grid cells may be much higher than estimated value. Based on this fact, our current analysis may underestimate the level of species/phylogenetic diversity in this area. However, a more complete set of data will supply to identify local diversity characteristics underneath regional patterns. With the further improvement of distribution data, the understanding of the formation mechanism of plant diversity pattern in Kenya will become more specific and refined.
Assigning each county-level distribution record to 50km × 50-km grid cells can lead to some false occurrences. The greater the difference between county and grid area, the more likely there will be such erroneous distribution data and bias may be introduced in the biogeographic analysis. Comparing the results from 50-km × 50-km grid cells (small scale) to those from a 100-km × 100-km grid cell (large scale), we found that PD and pβ ratio were not sensitive to the grid scale and the patterns were broadly similar (Figures 3 and 4b; Appendix S9, Figure S9.14a,b), while the boundaries of the phytogeographical regions varied (Figures 1b and S9.14c). The West highland and Volcanic regions produced by small-scale grid cells matched the vegetation map ( Figure S3), while the larger-scale grid cells could not distinguish the boundaries between rainforest and montane forest. This may be due to the small county size and the highly heterogeneous habitats. While, the Turkana and Mandera regions, both of which have homogenous habitats (mainly the Acacia-Commiphora bushland or the semi-desert scrub), were less sensitive to the way we processed distributed data. For the Pan Coastal Region, neither the small nor the larger scale could identify reasonable boundaries that could separate the inland bushland and thicket from the coastal vegetation. This is because there are fewer, larger and highly heterogeneous counties distributed here. In the future, more coordinated data could help delineate the boundaries more accurately. Despite limitations in the way we approached the distribution records, we still believe that the patterns on 50km × 50-km grid cells reasonably explained the biogeographic formation of the flora.

| CON CLUS IONS
Our results showed that climatic fluctuations and orogeny influenced the phylogenetic β-diversity pattern of the Kenyan flora. The turnover and nestedness components of β-diversity display distinct geographic patterns, reflecting contrasting legacies of past climate change and orogeny. We suggest that the complex topography of the Volcanic Region generated high climatic complexity with environmental gradients along elevation and latitude, promoting a high turnover component that may explain why the Volcanic Region is the most diverse floristic region of Kenya. Areas with long-term climate stability are prone to keeping ancient lineages, which are the main elements that contribute to the turnover dominant region. This study contributes to elucidating the evolutionary history and formation mechanism of the Kenyan flora and improves the understanding of the key factors responsible for the biodiversity patterns of the hotspots of East Africa.

ACK N OWLED G EM ENTS
We would like to thank Yang-Jun Lai, Qian Zhang, Yu-Chang Yang and Yi-Shuo Liang for providing helpful suggestions. We thank Hai-Hua Hu, Yun Liu and Yan-Ting Niu for assistance with data analysis.
We thank two anonymous reviewers and the editor for constructive feedback on earlier versions of this manuscript. This study was  (2021077) and Vietnam National Foundation for Science and Technology Development (NAFOSTED, grant number: 106.03-2019.12).

CO N FLTI C T O F I NTE R E S T
The authors declare that they have no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets that support the findings of this study are available on Dryad, and DNA sequences were deposited in the GenBank with accession numbers offered in Dryad at https://doi.org/10.5061/ dryad.18931 zd0p.