A modern pollen data set for the forest–meadow–steppe ecotone from the Tibetan Plateau and its potential use in past vegetation reconstruction

The relationship between modern pollen and vegetation provides the basis for the interpretation of stratigraphic pollen assemblages and the quantitative reconstruction of past vegetation. We selected 168 topsoil samples from four different vegetation types on the south‐eastern Tibetan Plateau to explore the relationships between pollen assemblages, vegetation and climate. The results reveal that pollen assemblages discriminate the vegetation types well; the subalpine coniferous and evergreen broad‐leaved forest is characterized by a high proportion of arboreal taxa (e.g. Pinus, Picea, Betula); the alpine shrub and meadow and alpine steppe vegetation types are dominated by Cyperaceae, followed by Artemisia and Brassicaceae; and the alpine steppe‐shrub is characterized by a high percentage of Artemisia, with Cyperaceae, Asteraceae and Brassicaceae as common taxa. Redundancy analysis shows that mean temperature of the coldest month (Mtco) is the main climatic factor that influences pollen distribution. Pollen diversity indices (including richness and evenness) gradually decrease from SE to NW. The random forest classification has good performance in distinguishing vegetation types. Our study supplies a comparatively detailed description of the relationship between the pollen assemblage and vegetation in the forest–meadow–steppe ecotone on the south‐eastern Tibetan Plateau. In addition, the random forest model has potential application for reconstructing the past vegetation type of the fossil pollen spectra on the south‐eastern Tibetan Plateau.

Analysis of fossil pollen preserved in sediments is one of the main approaches to reconstructing past vegetation and provides an important basis for inferring palaeoclimate (Birks 2019). Understanding the modern pollen distribution and its driving factors is essential for reconstructing past vegetation using quantitative fossil pollen assemblages, and thus expanding the modern pollen data set and ensuring its quality will improve the precision of quantitative reconstructions of past vegetation and climate (Cao et al. 2018(Cao et al. , 2021. A preliminary modern pollen data set from across the entirety of China was established at the end of the last century (Sun et al. 1999). The data set was extended over the following years and has been employed in past vegetation and climate reconstructions Cao et al. 2014;Zheng et al. 2014;Herzschuh et al. 2019). At a regional spatial scale, modern pollen data sets and their relationship with vegetation and climate have been compiled for north-western China (Luo et al. 2009;Qin et al. 2015;Huang et al. 2018), northeastern China Li et al. 2012bLi et al. , 2015Han et al. 2020), north-central China Xu et al. 2009Xu et al. , 2010 and the Tibetan Plateau (Yu et al. 2001;Shen et al. 2006;Herzschuh et al. 2010;Lu et al. 2011;Qin 2021). However, geographic gaps in sampling sites still exist, which restricts our understanding of the modern representatives of pollen to vegetation and climate, for instance, the forest-meadow-steppe ecotone on the south-eastern Tibetan Plateau.
As a unique ecological unit, with an average elevation of more than 4000 m a.s.l., complex geomorphology and extreme climatic conditions, the ecosystem on the Tibetan Plateau is quite sensitive to climate change, and its future is uncertain under global climate change Yao et al. 2017). Owing to its ecological importance, long-term change patterns and driving factors of vegetation on the Tibetan Plateau can provide critical insights for developing strategies to sustainably manage its ecosystem (Piao et al. 2015;Zhu et al. 2015;Ma et al. 2019). The Tibetan Plateau, as an independent alpine ecosystem, has abundant flora and a complex vegetation composition, and the vegetation distribution in this ecotone has been often studied owing to its high plant diversity and sensitivity to spatial-temporal dynamics (Shen et al. 2008a;Li et al. 2019a). An ecotone is a basic landscape unit, and is the transition area between adjacent ecological systems (Wilson & Agnew 1992;Laurance et al. 2001). It has characteristics of macrocosm, dynamism and transition, and is a region where ecological structure and function change rapidly on temporal and spatial scales (Gosz 1992). The southeastern Tibetan Plateau has significant vegetation succession within a relatively limited spatial distance and long elevation gradients and includes forest-meadow and meadow-steppe ecotones. The forest-meadow ecotone is generally distributed along the 4400 m a.s.l. contour line, and close to the treeline ecotone in the south-eastern Tibetan Plateau (Wu 1989;Liang et al. 2010), while the meadow-steppe ecotone mainly aligns with the 400 mm isohyet (Tibetan Investigation Group 1988;Shen et al. 2008a) (Fig. S1). Past shifts in the ecotone's vegetation could thus reflect the fluctuation of past climate, making fossil pollen an effective proxy of ecotone shifts and their response to climate change. Previous palynological studies on ecotones of the southeastern Tibetan Plateau mainly focus on the relationship between pollen distribution and elevation (Lu et al. 2004;Li et al. 2012a;Zhang et al. 2012), while the relationship between pollen distribution and climate variables in the ecotone on the south-eastern Tibetan Plateau and the quantitative reconstruction of past vegetation and climate based on the modern pollen data set is less investigated. Hence, more high-quality pollen data with an even distribution across the environmental landscape are necessary to assess how representative modern pollen is of the vegetation and climate of the forest-meadowsteppe ecotone on the Tibetan Plateau and to improve the precision of past climate and vegetation reconstructions.
In this study, we collected 168 topsoil samples from four vegetation types across the forest-meadow-steppe ecotone on the south-eastern Tibetan Plateau for palynological analysis using ordination, machine learning and diversity estimates. The vegetation types are subalpine coniferous and evergreen broad-leaved forest; alpine shrub and meadow; alpine steppe; and alpine steppe-shrub. The major objectives of our research are: (i) to investigate the representation by modern pollen of the vegetation and climate in the forest-meadow-steppe ecotone; and (ii) to evaluate the suitability of this modern pollen data set for vegetation reconstruction.

Study area and sample collection
The study area is part of the south-eastern Tibetan Plateau, ranging from 86.69 to 100.4°E and from 28.21 to 35.77°N, with elevation ranging from 2100 to 5070 m a.s.l. (Fig. 1). This region is mainly affected by the southern branch of westerly circulation during the winter half-year, which results in cold, dry and windy conditions. The study region is controlled by the Indian Summer Monsoon originating from the Bay of Bengal and transported along the Eastern Himalaya and the Hengduan mountains in the summer half-year, when conditions are cool and moist, with precipitation concentrated in summer and autumn (Yao et al. 2013) (Fig. S2).
Vegetation follows the summer monsoon strength, which decreases from the SE to the NW (Lin & Wu 1981;Zhang 2007). Our samples were collected from four vegetation types principally found across four vegetation zones (B-E in Fig. 1). The subalpine coniferous and evergreen broad-leaved forest zone is in the SE and mainly comprises broad-leaved evergreen forest in the southerly part and montane coniferous forest in the northerly part. In the evergreen forest, the spatial distribution of the vegetation shows clear vertical zonation, with oak forest in the basal zone. Picea, Quercus, Pinus and Abies are found at elevations between 2500 and 4300 m a.s.l., grading into dwarf-shrub vegetation dominated by Rhododendron (Ericaceae) above the forest belt. The higher elevations have a sparse alpine vegetation on stonyground or are covered by perpetual snow. In the coniferous forest, the area above 4300 m a.s.l. is dominated by Picea on the shady slopes and Juniperus on the sunny slopes. In the valleys, arid shrub vegetation occurs including Sophora, Ceratostigma and Rhamnus at low elevations, with shrub-meadow or subalpine shrub composed of Potentilla, Caragana and Rosaceae above the forest belt.
The alpine shrub and meadow zone covers the central part of the study area, and the vegetation community is composed of Kobresia and Polygonaceae. The eastern part of this zone comprises Potentilla, Rhododendron and Salix at low elevations on shady slopes. Alpine meadow is particularly found in the central region and is dominated by Cyperaceae as well as Polygonaceae and Poaceae, for which the upper elevational limit ranges from 4700 to 5000 m a.s.l. At higher elevations, the sparse alpine vegetation includes Saussurea, Arenaria and Rhodiola.
The alpine steppe zone is widespread in the western part of the study area and dominates below 4900 m a.s.l., mainly consisting of Stipa, Festuca and Artemisia, with alpine meadow occurring between 4900 and 5300 m a.s.l. and mainly consisting of Kobresia and Festuca. The mountain tops are covered by sparse alpine vegetation.
The alpine steppe-shrub zone is distributed to the south of the alpine steppe zone. Shrubby steppe, including Sophora, Ceratostigma, Aristida, Pennisetum, Rosa, Berberis, Caragana, Stipa, Potentilla and Artemisia, is distributed at lower elevations. Mid-elevations are dominated by alpine meadow in the east, which consists of Kobresia, Juniperus, Rhododendron, Salix and Potentilla, and alpine steppe composed of Stipa and Artemisia in the west. The vegetation type of high elevations shifts to sparse alpine vegetation on stony grounds (Zhang 1978;Tibetan Investigation Group 1988;Zhang et al. 2007).
A total of 168 topsoil samples were collected from the study area with an even distribution in 2018 and 2019 ( Fig. 1). Each sample is a composite of five subsamples retrieved from either moss polsters plus~1 cm of underlying topsoil or 1 cm of pure topsoil. The longitude, latitude and elevation for each sample were measured by GPS. In addition, we conducted a 15 9 15 m vegetation survey for each sample site, and recorded the species, quantities and fraction of trees, shrubs and herbs (Table S1). The pollen data map onto four vegetation types (Shen et al. 2006;Lu et al. 2011), with names following Lu et al. (2011). Some samples are classified based on their local vegetation types rather than the regional vegetation, and are marked on Fig. 1 with thick circles. There are 41 samples collected from subalpine coniferous and evergreen broad-leaved forest, 71 from alpine shrub and meadow, 25 from alpine steppeshrub and 31 from alpine steppe. To mitigate the influence of individual trees on the pollen assemblages, our forest samples were selected from an open area (e.g. glades) at a certain distance from forest patches.

Pollen analysis
To extract the pollen from the topsoil samples, we used the hydrofluoric acid treatment (Faegri & Iversen 1989). Sample weights ranged from 5 to 20 g based on different sample types (moss, 5 g; soil, 20 g). Lycopodium spores (27 560 grains per tablet) were added to each sample to calculate the pollen concentration. The samples were processed with 10% HCl, 10% NaOH, 36% HF and acetolysis (a 9:1 mixture of acetic anhydride and sulfuric acid), sieved through a 7 lm nylon mesh to remove impurities, and then had glycerin added to facilitate slidemaking after centrifugation. Pollen grains were identified with a Leica optical microscope and more than 500 terrestrial pollen grains were counted for each sample. Pollen identification refers to the atlas of pollen and spores for common plants from the eastern Tibetan Plateau (Cao et al. 2020) and Chinese pollen books (Wang et al. 1995;Tang et al. 2016). Pollen percentages were calculated based on the sum of terrestrial pollen grains. Pollen diagrams were generated using Tilia software (Grimm 2011) with pollen zones matching the aforementioned vegetation types.

Modern climate data
We retrieved the climate data for the sampling sites from the China Meteorological Forcing Dataset (CMFD; gridded near-surface meteorological data set; 1 km spatial resolution) for 1979-2018 (He et al. 2020). Geographic distances of each sampling site to each pixel in the CMFD were calculated based on their longitude and latitude using the rdist.earth function in the fields package version 9.6.1 (Nychka et al. 2020) for R (version 3.6.0; R Core Team 2019). The climatic data of the nearest pixel to each sample were assigned to that sample. We extracted four climatic variables for each sampling sitemean annual precipitation (P ann ), mean annual temperature (T ann ), mean temperature of the coldest month (Mt co ) and warmest month (Mt wa ).

Numerical analyses
In order to reveal the relationship between modern pollen assemblages and climate, ordination analyses were employed on the pollen datawhich had been squareroot transformed to stabilize variances and optimize the signal-to-noise ratio (Prentice 1980 (Hill & Gauch 1980). The length of the first detrended correspondence analysis axis was 2.01, suggesting a linear relationship between the pollen data and the climate variables (linear <3; unimodal >4), and therefore principal component analysis (PCA) and redundancy analysis (RDA) would be the most suitable models to investigate these data. To test whether there is high colinearity between climatic variables, we need to assess the variance inflation factors (VIF). High collinearity between explanatory variables exists when VIF >20, so we need to exclude any variable that has high co-linearity until all VIF values are <20.
Palynological data are useful for investigating past floristic richness and diversity (Jackson & Blois 2015), and palynologists have developed various methods to do so, such as Simpson's index, Shannon's information index (entropy) or Williams (1964) a-index, and Hill numbers (Birks et al. 2016). Hill numbers were proposed as diversity measures by MacArthur (1965) and Hill (1973) and have been applied in palynology (Birks & Line 1992). We calculated richness (N0) with the iNEXT package version 2.0.12, which includes rarefaction (Hsieh et al. 2016) in R. Pollen evenness measures the distribution of the pollen taxa in pollen assemblages. The pollen evenness of our study is calculated as (N2-1)/(N1-1), where N2 is the number of very abundant taxa and N1 is the number of abundant taxa in the sample, a modification by Alatalo (1981), which provides information about the different pollen taxa (Felde et al. 2015).
Random forest is an ensemble learning method based on decision tree classification algorithms and is determined by voting on the output of each classification tree (Breiman 2001). This model has been used to evaluate the accuracy of the relationship between pollen assemblages and vegetation type (Sobol & Finkelstein 2018;Sobol et al. 2019;Qin 2021). Here, we evaluate which pollen taxa should be included by selecting random samples several times and assessing the output results, and then optimize the model by deleting the least important variables. We found 16 pollen taxa which have an impact on the vegetation classification. The model was run in the Random Forest package (version 4.7-1; Liaw 2020) in R on square-root transformed pollen data. Subalpine coniferous and evergreen broad-leaved forest and alpine shrub and meadow are called forest and meadow in the random forest algorithm, and the alpine shrub-steppe and alpine steppe are combined into steppe.
The elevation of samples from alpine steppe-shrub ranges from 3550 to 4839 m a.s.l., and the pollen spectra are dominated by Artemisia (3.8-82.5; 29.5%), Cyperaceae (1.5-89.3; 29.2%), Asteraceae (0-52; 8.2%) and Brassicaceae (0-31.2; 5.6%). The percentages of Artemi-sia and Asteraceae have the highest value of the four groups while the mean percentage of Cyperaceae is lower than for alpine shrub and meadow and alpine steppe assemblages. The average pollen concentration is 10.8910 3 grains g À1 .
Ordination analyses. -The first two axes of the PCA capture 48.1% of the total variance (Fig. 3). PCA axis 1 separates the cold-tolerant species such as Cyperaceae and Thalictrum (left side of plot) from the warmth-loving species such as Pinus, Artemisia and Brassicaceae (right side of plot). PCA axis 1 probably reflects temperature variation with the positive direction representing warm conditions. Arboreal pollen is situated at the positive end of PCA axis 2, and drought-tolerant species such as Alpine steppe and meadow Subalpine coniferous and evergreen broad-leaved forest We use the linear model of RDA to better understand the relationship between pollen assemblages and the environment of the south-eastern Tibetan Plateau, with P ann , T ann , Mt co and Mt wa as the principal environmental variables because the PCA results show that temperature and precipitation are the main factors. The results of a multiple linear regression model indicate that the VIFs of T ann , Mt co and Mt wa are higher than 20 (Table 1), suggesting a strong co-linearity between them (Fig. S3). We deleted T ann first, as it is an annual averaged value that ignores seasonal variations, which resulted in all other VIF values becoming lower than 5, suggesting low collinearities between P ann , Mt co and Mt wa . The first two axes explain 15% (axis 1, 11.7%; axis 2, 3.3%) of the total pollen data. Mt co captures the highest variance of the data set, indicating that Mt co has the largest influence on the composition and distribution of the pollen assemblages in the data set (Table 1). The RDA separates the pollen taxa into three groups (Fig. 4). Pollen taxa indicating a warm and humid climate, e.g. Pinus, Picea, Abies, Quercus (E) and Polygonaceae, have a higher abundance in the subalpine coniferous and evergreen broad-leaved forest and are found in the lower right of the biplot. Artemisia, Asteraceae and Amaranthaceae, which reflect warm and dry climate conditions, lie in the upper right of the biplot and are important components of the alpine steppe-shrub. Pollen taxa including Cyperaceae and Brassicaceae, which are associated with low July temperatures, are located on the left of the biplot, accounting for a very high proportion of the alpine shrub and meadow and alpine steppe.

Random forest
On the whole, the random forest model classifies the vegetation types reasonably well based on the 168 modern pollen assemblages from the south-eastern Tibetan Plateau, although 35 samples are misclassified ( Table 2). Alpine meadow has the highest accuracy (82%) with 58 accurately classified samples, while eight samples are misidentified as steppe and five samples are misidentified as forest. Alpine steppe has moderate accuracy (79%) with 44 accurately classified samples, plus 12 samples misclassified as steppe. The forest vegetation type has the lowest prediction accuracy (75%) with 31 samples predicted accurately, whereas 10 samples are misidentified as meadow (five samples) or steppe (five samples) (Fig. S4).

Discussion
Characteristics of the south-eastern Tibetan Plateau pollen data set Spatial distribution of arboreal pollen taxa. -The results show that the pollen assemblages map reasonably well onto the vegetation types. The most obvious feature is that the arboreal pollen percentage of the subalpine coniferous and evergreen broad-leaved forest is much higher than for the other vegetation types with values sometimes exceeding 50% (Fig. 2). This suggests that the arboreal pollen is derived from the regional forest as trees were not present in the local vegetation when the sampling plots (15915 m) were surveyed. The southeastern Tibetan Plateau has been described as a forest concentration area (Lu et al. 2004), with Pinus, Picea, Abies and Betula having good propagation capabilities. Modern pollen research suggests that their pollen is easily transported to regions beyond their release sites VIF, variance inflation factor; P ann = mean annual precipitation (mm); Mt co = mean temperature of the coldest month (°C); Mt wa = mean temperature of the warmest month (°C); T ann = annual mean temperature (°C). (Lu et al. 2004;Xu et al. 2007;Zhang et al. 2012).
Previous surface pollen studies also conclude that pollen assemblages from forest regions are characterized by a high percentage of arboreal pollen even though no trees exist in the local vegetation communities. Pollen assemblages from samples selected from areas adjacent to the The dominant pollen taxa of alpine shrub and meadow, alpine steppe-shrub and alpine steppe are often the same, consisting of Cyperaceae, Artemisia, Brassicaceae, Asteraceae, Amaranthaceae and Poaceae. Moreover, arboreal pollen including Pinus, Quercus (E), Betula and Carpinus typically appear in all samples, but the proportion of arboreal pollen in alpine steppe-shrub and alpine steppe samples is higher than in alpine shrub and meadow. Arboreal pollen occurring in non-forest surface samples has been confirmed by many previous studies and is mainly long-distance transported by wind, although its abundance is notably lower than in samples from forest areas (Lu et al. 2010;Li et al. 2019a). Our pollen assemblages are similarly influenced by regional and exogenous arboreal pollen, which has been transported over a long distance from the forest communities, and the low pollen concentration in the alpine steppe and alpine steppe-shrub samples has amplified the importance of arboreal pollen in the pollen assemblages (Fig. 2).
Relationship between pollen assemblages and climate. -In our study, the PCA shows that temperature has a higher influence on pollen distribution than precipitation (Fig. 3), and RDA also suggests that Mt co is the most important climatic determinant of pollen distribution on the south-eastern Tibetan Plateau (Fig. 4, Table 1), which is in contrast to previous studies. Herzschuh et al. (2010) used 112 lake surface samples in the central and eastern Tibetan Plateau to evaluate the relationship between pollen, vegetation and climate using redundancy analysis, and their results show that the correlation between P ann and pollen is higher than for T ann and T July . Cao et al. (2021) analysed the pollen assemblage characteristics of lake sediment-surface samples from the alpine meadow on the eastern and central Tibetan Plateau, and concluded that P ann is the most important climatic factor for pollen distribution on the eastern Tibetan Plateau. The discrepancy between our reconstruction and previous studies is mainly due to the spatial difference in sample distribution. The samples of Herzschuh et al. (2010) and Cao et al. (2021) are mostly randomly placed along a longitudinal gradient, while the sample distribution in our study specifically covers the subalpine coniferous and evergreen broad-leaved forest and alpine shrub and meadow where precipitation is mainly higher than 400 mm, and temperature is the dominant factor determining the vegetation variance (Zheng 1995;Zhao et al. 2011;Liang et al. 2020). Furthermore, the results of RDA analysis are influenced by the strong noise of the pollen data owing to large climatic gradients and causes the low inertia (15%) explained by the first two axes.
Spatial distribution of pollen diversity on the south-eastern Tibetan Plateau. -The pollen diversity measures of the four vegetation types (Fig. 5) show some differences. Subalpine coniferous and evergreen broad-leaved forest has the highest richness and evenness values as the samples are selected from the forest-shrub-meadow ecotone where the proportions of the dominant pollen taxa vary little. Furthermore, varying topography and better climate conditions are major contributors to the rich plant diversity (Zhang 2007). The alpine steppeshrub has similarly high richness but slightly lower evenness, which could be related to a high percentage of Artemisia in the pollen assemblages. Alpine shrub and meadow have a lower richness than alpine steppe, which is consistent with previous studies, such as Li (2018), who proposes that the pollen diversity of alpine meadow is lower than alpine steppe because of a high proportion of Cyperaceae, as is the case in our study. The evenness of alpine steppe and alpine shrub and meadow are similar, probably because some alpine steppe sites in our data set were located at the ecotone between alpine shrub and meadow and alpine steppe, which has a greater affinity with alpine meadow.

Pollen-vegetation classification model
Comparison of different vegetation reconstruction methods. -There are several approaches that can be taken to reconstruct vegetation based on modern pollen assemblages including discrimination analysis (DA), biomization and random forests. Discriminant analysis can be used to identify vegetation types based on their various characteristics, such as the dominant pollen types of each vegetation type. It is widely used to classify modern pollen samples into a priori defined groups (Ma et al. 2008;Shen et al. 2008a, b;Reese & Liu 2010;Guo et al. 2020). However, DA has limitations that reduce its accuracy as it is not suitable for non-Gaussian distributed samples and there is a danger of overfitting (Roessner et al. 2011). Biomization calculates a biome affinity score for the pollen assemblages and has been widely applied to reconstruct past vegetation variation at abroad scale (Yu et al. 2000;Ni et al. 2014;Sun et al. 2020;Qin et al. 2022). Biomization is an ideal method to investigate the relationship between climate and vegetation at the continental scale, but suffers at the regional  Fig. S4). Ten samples from the forest group were erroneously identified as steppe or meadow, which may be because they were collected from the forest-shrub-meadow ecotone in forest patches. Five samples from the meadow group were misidentified as forest, most likely because the sample sites were adjacent to the forest. The meadow and steppe are easily confused with one another in the pollen-vegetation classification, with eight samples of the meadow group being misclassified as steppe and 12 samples of the steppe group misclassified as meadow. This is probably because the samples come from the ecotone between alpine shrub and meadow, alpine steppe-shrub and alpine steppe, and the dominant pollen taxa of these vegetation zones are not distinctive. Alpine shrub and meadow in our study is characterized by a high proportion of Cyperaceae, which is consistent with former studies (Herzschuh et al. 2010;Lu et al. 2011;Shen et al. 2021). The alpine steppe has a high proportion of Cyperaceae, which is unlike typical alpine steppe, and the samples are clustered in the eastern part of the south-eastern Tibetan Plateau. Herzschuh (2007) and Qin (2021) suggest that higher amounts of Cyperaceae pollen are expected in this eastern area because of its higher precipitation. The main pollen taxa of alpine steppeshrub in our data set are Artemisia, Cyperaceae and Asteraceae, but arboreal and shrub taxa only appeared sporadically in individual samples, in contrast to assemblages from typical alpine shrublands. Herzschuh et al. (2010) find that subalpine shrub samples are characterized by a steady percentage of shrubland taxa including Salix, Spiraea and Berberis, while Shen et al. (2021) find that the shrubland vegetation zone is characterized by high percentages of shrub pollen such as Ericaceae, Salix and Cupressaceae (probably Juniperus). In summary, the random forest approach indicates that it is possible to identify the vegetation type from topsoil pollen assemblages and thus has the potential to reconstruct the palaeovegetation on the Tibetan Plateau.

Conclusions
Our study explores the relationship between pollen assemblages, vegetation and climate based on 168 topsoil samples from the south-eastern Tibetan Plateau. The pollen assemblages can distinguish the forest from the non-forest vegetation types via an abundance of arboreal taxa. The subalpine coniferous and evergreen broadleaved forest has a high proportion of arboreal taxa including Pinus, Abies, Picea and Quercus (E). The common pollen taxa in the alpine shrub and meadow, alpine steppe-shrub and alpine steppe show little difference and are dominated by Cyperaceae, Artemisia, Asteraceae and Brassicaceae. Mt co is the most important climate variable driving the distribution of pollen assemblages, and pollen diversity decreases along a geographical gradient from SE to NW. The pollenvegetation model based on the topsoil samples indicates that the random forest algorithm can distinguish well the vegetation types.     .   Table S1. Locations of the 168 topsoil sampling sites and vegetation survey information from the south-eastern Tibetan Plateau. Long., longitude; Lat., latitude.