Diatoms define a novel freshwater biogeography of the Antarctic

Terrestrial biota in the Antarctic are more globally distinct and highly structured bio-geographically than previously believed, but information on biogeographic patterns and endemism in freshwater communities is largely lacking. We studied biogeographic patterns of Antarctic freshwater diatoms based on the analysis of species occurrences in a dataset of 439 lakes spread across the Antarctic realm. Highly distinct diatom floras, both in terms of composition and richness, characterize Continental Antarctica, Maritime Antarctica and the sub-Antarctic islands, with marked biogeographic provincialism in each region. A total of 44% of all species is estimated to be endemic to the Antarctic, and most of them are confined to a single biogeographic region. The level of endemism significantly increases with increasing latitude and geographic isolation. Our results have implications for conservation planning, and suggest that successful dispersal of freshwater diatoms to and within the Antarctic is limited, fostering the evolution of highly endemic diatom floras.


Introduction
Biogeographic patterns form the basis for spatial conservation planning (Whittaker et al. 2005). For terrestrial plants and animals (Kreft andJetz 2010, Jenkins et al. 2013), and increasingly also for marine macro-organisms (Stuart-Smith et al. 2013), conservation strategies are founded on a good understanding of the underlying processes generating biogeographic patterns. However, the relative importance of processes influencing the composition and diversity of communities (i.e. dispersal, speciation, selection and drift, Vellend 2010) remains poorly understood for most freshwater organisms (Abell et al. 2008, Collen et al. 2014, Kotov and Taylor 2019. Notable exceptions include freshwater fish (Dawson et al. 2017, Leroy et al. 2019) and amphibians (Jenkins et al. 2013), for which the legacy of continental drift and historical 2 connectivity of watersheds has been shown to determine biogeographic patterns (Collen et al. 2014). For most other groups of aquatic organisms, however, the prevailing view was that they mostly have cosmopolitan distributions resulting from their generally large population sizes and high dispersal capacities (Fontaneto 2019). Consequently, local occurrence of most aquatic taxa is thought to be largely determined by environmental selection (Soininen and Teittinen 2019), and the application of broad morphological species concepts reinforced this idea (Vyverman et al. 2010). Recent taxonomic studies, however, based on multiple lines of evidence (including genetic data), have revealed the existence of widespread (semi-)cryptic diversity, restricted geographical distributions and regional endemism in many groups (April et al. 2011, Kotov andTaylor 2019). These recent insights also suggest that biogeographic patterns may vary among taxonomic groups in relation to variation in, for example dispersal mode (active or passive) and capacity, as well as body size, which were shown to be important drivers of metacommunity structure in pond and lake biota at the regional scale (De Bie et al. 2012, Benito et al. 2018. However, issues with low sampling density and the generally limited geographic scope of most studies hamper our understanding of more broad scale (continental to global) biogeographic patterns in most freshwater organisms (Verleyen et al. 2009, Collen et al. 2014, and our ability to delineate biogeographic realms (Udvardy 1975, Kreft and Jetz 2010, Holt et al. 2013. The Antarctic Realm (hereafter referred to as 'the Antarctic') is one of the eight floristic biogeographic realms (Udvardy 1975) and although various classification schemes have been proposed for its biogeographic subdivision, it is generally considered to comprise Continental Antarctica, the Maritime Antarctic and the sub-Antarctic and southern cold-temperate islands (Chown and Convey 2007). For Continental and Maritime Antarctica, a subdivision into 16 ecoregions or Antarctic Conservation Biogeographic Regions (ACBRs) based on a meta-analysis of studies on terrestrial macro-organisms (e.g. nematodes, tardigrades, mites and mosses) was proposed to feed into conservation planning and management (Terauds et al. 2012, Terauds andLee 2016). Proper conservation planning and management of the Antarctic is urgent for protecting its often endemic biota (Convey et al. 2014(Convey et al. , 2020 as most of them have narrow distributional ranges due to the long-term survival of taxa in isolated glacial refugia during ice ages , Fraser et al. 2014, Biersma et al. 2018). In the sub-Antarctic, isolation has long been thought to underpin patterns in its terrestrial biodiversity leading to three distinct provinces (Smith 1984), namely the South Atlantic, South Indian and South Pacific Provinces. Recent molecular studies of sub-Antarctic terrestrial plants and invertebrates have partly challenged this view and revealed that many organisms showed occasional historic long distance dispersal events, but that ongoing gene flow across the region is rare (McGaughran et al. 2019, see Moon et al. 2017 for a review).
For lakes, compilations of distributional data in the Antarctic to date only exist for crustaceans (Dartnall 2017).
Based on this dataset, it was concluded that the majority of species probably have an Antarctic wide distribution, and that there is limited evidence for biogeographic structuring (Díaz et al. 2019). By contrast, several studies documenting new species of for example chlorophytes (De Wever et al. 2009), bdelloid rotifers (Iakovenko et al. 2015) and cyclopoid copepods (Karanovic et al. 2014) suggest that endemism and narrower ranges may be more widespread among freshwater organisms of the Antarctic than hitherto assumed. However, to date, few molecular studies of Antarctic freshwater biota exist, hindering the identification of region-specific (or more narrowly distributed) lineages in groups where morphological conservatism is widespread.
In this study, we analyse freshwater diatom diversity and biogeography in the Antarctic based on a taxonomically intercalibrated, highly resolved dataset comprising species composition data from 439 lakes (between 45° and 77°S) in Continental Antarctica, Maritime Antarctica and the sub-Antarctic islands (Supporting information). We used diatoms because of the exceptionally detailed morphology of their frustules, which generally allow identification at the morphospecies level. We tested the null hypothesis that the diversity and geographic distributions of diatoms are predominantly structured by local environmental factors, following the assumption that dispersal is unlimited in microbial eukaryotes (Finlay 2002, Cermeño andFalkowski 2009), and hence that endemism should be absent or rare. Rejection of this null hypothesis would imply that due to the geographic isolation of the Antarctic, its diatom richness is lower compared with the Arctic. In addition, if Antarctic diatoms are dispersal limited, biogeographic provincialism should exist between isolated ice-free regions, and regions at higher latitudes with harsher environmental conditions would not merely be a subset of the communities at lower latitudes. In that case, species turnover should dominate over nestedness in explaining beta-diversity patterns in the Antarctic (Baselga 2010). Finally, if Antarctic diatoms are dispersal limited, the levels of endemism should increase, while species richness should decrease with increasing geographic isolation according to the theory of island biogeography (MacArthur and Wilson 1967).

Dataset development
We generated an Antarctic-wide dataset containing diatom species occurrence data and limnological variables in 439 freshwater lakes. The dataset consists of a compilation of all freshwater diatom data from the Antarctic and sub-Antarctic islands published after 1993, as well as new data from understudied regions (Fig. 1, Supporting information). All diatom identifications were verified by one taxonomic expert based on re-evaluation of the original slides or material, which ensured full taxonomic consistency between the samples. The identifications are based on the most recent morphology-based Figure 1. Map of the Antarctic showing the studied lake districts (lowercase), and the three main biogeographic regions, namely Maritime Antarctica, Continental Antarctica and sub-Antarctica. The ACBRs are colored following Terauds et al. (2012) and Terauds and Lee (2016) (bottom legend). The red contour line delineates Maritime Antarctica and the green contour line encompasses the three oceanic sub-Antarctic provinces (green capital letters). The pie charts summarize the proportion of species restricted to the three biogeographic regions and the Antarctic (i.e. occurring in at least two of the three regions), the percentage of taxa in need of revision or taxonomic investigation, and the proportion of species also occurring outside the Antarctic. The inset (bottom left) shows species accumulation curves revealing the relatively complete sampling effort in each of the three biogeographic regions. Standardized regional diatom richness (iterated 100 times) differs between the three biogeographic regions, with the sub-Antarctic islands (green) being the most diverse (mean = 232 species, min = 216, max = 250, total = 270), followed by Maritime Antarctica (red, 120 species), and Continental Antarctica (blue, mean = 58 species, min = 55, max = 59, total = 59). taxonomic insights, documented in > 85 papers and books since 2000 (Supporting information). For each species, its geographic distribution was determined and assigned to one of the following types: non-endemic species also occurring outside the Antarctic, taxa for which insufficient data are available due to taxonomic uncertainties or limited literature data, and narrow and broad endemics, which are confined to one of the main biogeographic regions in the Antarctic, or found in at least two out of the three regions in the Antarctic, respectively.
The entire dataset covers the three sub-Antarctic biogeographic provinces (Van der Putten et al. 2010), and all ACBRs (with the exception of North Victoria Land and Prince Charles Mountains) where freshwater lakes occur (Terauds et al. 2012, Terauds and Lee 2016) ( Fig. 1; Supporting information). Samples for diatom analysis were taken from the upper 5-10 mm of lake surface sediments. In deeper lakes (maximum depth exceeding 2 m), surface sediments were sampled from sediment cores extracted from the deepest part using a gravity corer. In shallow lakes and ponds, samples were taken using a spatula. Due to the low sedimentation rates in these polar environments (Verleyen et al. 2011), the samples represent an integrated record of diatom growth, typically spanning several years in lakes at these latitudes. Diatom samples were prepared following the methods described in Verleyen et al. (2003) and Van de Vijver et al. (2012). For light microscopy, a subsample of cleaned material was dried onto a glass cover slip and mounted in Naphrax ® . Diatom species were identified using a microscope equipped with differential interference contrast (Nomarski) at a magnification of 1000× using oil immersion, and based on an inventory of 300-600 specimens (depending on the diversity) in each sample. Presence-absence data were used because they better reflect the biogeography of taxa compared with relative abundance data (Lomolino et al. 2016). For the identification of difficult taxa (i.e. species belonging to the genera Humidophila, Luticola, Mayamaea or Nitzschia), scanning electron microscopy was additionally used. Subsamples were directly air-dried onto specimen stubs, sputter-coated and examined with a ZEISS Ultra SEM at 3 kV (Natural History Museum, London, UK), a JEOL JSM-840 operated at 15 kV (Ghent University, Belgium) or a JEOL JSM-7100 operated at 2 kV (Meise Botanic Garden, Belgium).
For all lakes, measurements of pH and specific conductance were obtained with calibrated field meters during sampling. The specific conductance was used for selecting only freshwater lakes (specific conductance < 1.5 mS cm −1 ). For most Maritime and Continental Antarctic lakes, as well as those from sub-Antarctic Marion Island, we additionally measured the concentrations of the major ions (Na + , K + , Ca 2+ , Mg 2+ , Cl − ) and nutrients (NO 3 − -N, NH 4 + -N, PO 4 3− -P, and/ or total phosphorous and total nitrogen). These environmental data were determined as described in the original publications (Supporting information) and Verleyen et al. (2012) for Marion Island. The lakes for which these additional environmental data are available (n = 213), were subjected to variation partitioning analysis (below).

Local and regional species richness
Local richness equaled the number of species in each lake. The relationships between local richness and latitude, local environmental conditions (pH and specific conductance), air surface temperature, the time since deglaciation of the region, and a variable approximating geographic isolation was assessed using the glm() function in R ver. 3.5.1 (<www.rproject.org>) with a Poisson error distribution and cubic polynomials. Temperature data were extracted from publicly available databases (CRUTEM4, Jones et al. 2012, Osborn andJones 2014) and included mean annual surface air temperature and the difference in mean temperature between summer (December-February) and winter (June-August) months. Mean annual air surface temperature was used as an approximation for mean energy availability in regions. The time since deglaciation is based on 14 C, optically stimulated luminescence and/or U/Th dates of basal lacustrine sediments and algal limestones (Hendy 2000, Bentley et al. 2014, Hodgson et al. 2014, Mackintosh et al. 2014. When age constraints from lacustrine sediments were not available, we assumed that lakes originated after the region became ice-free or isolated from the sea, as inferred from cosmogenic isotope dating of landforms, and/or 14 C dating of 1) marine fossils in raised beaches on land (Bentley et al. 2014, Mackintosh et al. 2014, Verleyen et al. 2017), or 2) organic material in the basal layers of peat cores for the sub-Antarctic islands (Hodgson et al. 2014). The degree of geographic isolation was calculated as the distance of the lake to the nearest continent (excluding Antarctica).
In order to account for non-linear relations between richness and latitude and the significant variables retained in the multiple regression analysis (below), also generalized additive models (GAM) were developed using the R package 'mgvc' (Wood 2011), selecting the 'REML' method and the quasipoisson error distribution. Performance was checked using the 'gam.check' function. For multivariate GAM, only variables correlated to local richness with an absolute value of r > 0.5 were selected, and those with inter-variable correlations of r > 0.75 were removed, retaining mean annual temperature and geographic isolation. ANOVA and subsequent pairwise Tukey t-tests were used to assess whether significant differences exist in the local richness between the three biogeographic regions.
Regional species richness was standardized to the lowest number of lakes present in one of the three biogeographic regions (n = 105 in Maritime Antarctica) using species accumulation curves (1000 permutations) in the R-package Vegan (Oksanen et al. 2018) ver. 2.5-2.

Delineating biogeographic regions and provinces
Diatom biogeographic regions and provinces were delineated based on significant similarity profile routine (SIMPROF) to test for the presence of sample clusters without the assumption of using a priori defined groups (Clarke et al. 2008). SIMPROF was run with 1000 iterations (num.expected), alpha set to 0.000001 and average clustering using the Bray-Curtis dissimilarity of presence-absence data with the 'clustsig' package in R ver. 1.1 (Whitaker et al. 2014). Biogeographic patterns were subsequently tested using canonical analysis of principal coordinates (CAP, Anderson and Willis 2003). For CAP, a principal coordinate analysis was performed on a Bray-Curtis dissimilarity matrix followed by a canonical discriminant analysis of predefined groups with the 'CAPdiscrim' function in the R package BiodiversityR (Kindt and Coe 2005). CAP enables assessment of the extent to which samples are effectively classified into a priori defined groups using the correct classification rate (CCR) statistic for the following three hierarchical levels. First, we evaluated whether the three main biogeographic regions traditionally recognized in the Antarctic (Maritime Antarctica, Continental Antarctica and the sub-Antarctic islands; Chown and Convey 2007) were also present in the diatom dataset. Taxonomic differentiation between the three main biogeographic regions was also assessed at a higher taxonomic level by calculating the number of genera in each of the major diatom clades following the division of Kociolek (2018). Second, for the sub-Antarctic islands we evaluated whether diatom occurrence data from this region could be further partitioned into the three main oceanic provinces also seen in plants and other groups ( Fig. 1; Skottsberg 1960, Van der Putten et al. 2010). Third, we assessed whether the diatom communities in Maritime Antarctica and Continental Antarctica grouped according to the previously defined ACBRs (Terauds et al. 2012, Terauds andLee 2016).

Partitioning and calculating beta-diversity
We partitioned the pairwise beta diversity between lakes following Baselga (2010) into 1) species turnover (i.e. replacement of species) as calculated by the Simpson's pairwise dissimilarity index, and 2) nestedness (i.e. species loss) as calculated by the Sørensen's pairwise dissimilarity index minus the Simpson's pairwise dissimilarity index. In order to assess the similarity between and within the predefined regions at the three hierarchical levels (above), we also calculated the mean pairwise Jaccard index, and the number of shared species between the three biogeographic regions, the ACBRs, the oceanic provinces, and between the lakes from each lake district.

Redundancy and variation partitioning analysis
Redundancy analysis (RDA) with forward selection using Monte Carlo Permutation tests (999 permutations) and variation partitioning analysis (Peres-Neto et al. 2006) were used to assess which geographical and environmental variables significantly explained variation in the diatom community composition between the lakes. These analyses were applied on a subset of lakes (n = 213) for which measurements of specific conductance, pH, Na + , K + , Mg 2+ , Ca 2+ , Cl − , NO 3 − -N, NH 4 + -N and PO 4 3+ -P were available, in addition to temperature, and spatial and historical variables. The three matrices used in the variation partitioning analysis were 1) the biotic matrix containing the species Hellinger-transformed presence-absence data, 2) the matrix containing the limnological and temperature variables and 3) a matrix with spatial variables and a historical variable. All environmental variables were logarithmically transformed, except pH, which was square root transformed. Temperature data were those used in the multivariate GAM for richness and endemism as described above (i.e. mean annual surface air temperature and the difference in mean temperature between summer and winter). Spatial variables included the distance of the lake to the nearest continent (excluding Antarctica), and the eigenvectors corresponding to the positive eigenvalues resulting from a principal coordinate analysis (the so-called principal coordinates of neighbour matrices or PCNMs) of a matrix of geographic distances between the sampling sites (Borcard and Legendre 2002). Using Moran's I (Dray et al. 2006) calculated in the R package PCNM ver. 2.1 (Legendre et al. 2013), only the significantly spatially autocorrelated PCNM vectors were selected. For these selected PCNM vectors, latitude and longitude were modelled by distance based Moran Eigenvector Maps (dbMEM, Dray et al. 2006, Legendre et al. 2013, which are created by orthogonal projections of the variation within a matrix of Euclidian distances between the sampling sites (Legendre and Legendre 2012). We retained only those dbMEM vectors that maximized the Moran's index of autocorrelation (Dray et al. 2006). As historical variable, the minimum age of lakes present in the ice-free regions was included (above). Variation inflation factor (VIF) analysis was used to reduce multicollinearity between variables. Variables with a VIF value > 10 were stepwise removed based on the variable with the highest p-value. We subsequently assessed the unique contribution of local environmental and climatic versus the spatial and historical variables in structuring the diatom communities using variation partitioning (Peres-Neto et al. 2006). This analysis resulted in four independent fractions, including the unexplained variation in diatom occurrence patterns, the unique effect of local environmental and climate variables, the unique effect of spatial and historical variables and the overlap between both groups of significant predictors, respectively. The total amount of variation explained equals the sum of the last three fractions. Monte Carlo permutation tests (999 permutations) were used to assess the significance of the ordination axes. The multiple coefficients of determination were adjusted (R 2 adj ) to correct for differences in the number of samples and the number of independent variables in both groups of predictors (Peres-Neto et al. 2006).

Patterns in endemism
The percentage of regional endemics present in each lake was calculated by dividing the number of species restricted to the Antarctic (including both broad and narrow endemics) by the total number of species (i.e. local richness). ANOVA and subsequent pairwise Tukey t-tests were used to assess whether significant differences exist in the proportion of endemic species between the three biogeographic regions. Similar to local richness, regression analyses were used to assess the effect of latitude, local environmental conditions (pH and 6 specific conductance) and the historical and spatial factors on the proportion of endemic taxa. A multiple GAM was developed as described above for species richness, selecting the 'REML' method and assuming a Gaussian error distribution. Variables were checked a priori for multicollinearity and correlation with the proportion of endemics as described above, retaining only mean annual temperature and isolation as predictors.
The distribution of the non-endemics, and broad and narrow endemics in each of the three regions was assessed by calculating their proportion of occurrence in the lakes in each region. The proportion of endemics and non-endemics shared between the ACBRs and sub-Antarctic provinces was used to assess their prevalence in the Antarctic.

Results and discussion
With a total of 370 species recorded from 439 lakes, the diatom flora of the Antarctic is impoverished compared to the Arctic. The Arctic Circumpolar Diatom Database, comprising 572 lakes across a narrower latitudinal belt of 50-73°N, amounts to a total of 4014 diatom taxa (Pienitz and Cournoyer 2017). In addition, in one single region in High-Arctic Svalbard, a total of 310 diatom species were identified from 53 freshwater ponds and lakes (Pinseel et al. 2017). This remarkable difference between both polar regions is unlikely an effect of undersampling in the Antarctic because species accumulation curves reached a plateau, showing that our inventories of lacustrine diatom floras are nearly complete in each of the three major biogeographic regions (Fig. 1). Standardized regional diatom richness differs between these regions, with the sub-Antarctic region being the most diverse (mean = 232 species), followed by Maritime Antarctica (mean = 120 species), and Continental Antarctica (mean = 58 species). Similarly, local species richness differs significantly between the three regions (Supporting information) and significantly decreases (R 2 = 0.22, p < 0.001) with increasing latitude (Fig. 2; Supporting information). This decreasing trend in local richness is particularly pronounced in Continental Antarctica and is significantly explained by mean temperature and isolation as revealed by non-linear multiple regression (R 2 adj = 0.51, p temp < 0.001, p isolation < 0.005; Supporting information), with the effect of isolation being in agreement with the theory of island biogeography (MacArthur and Wilson 1967). While the depauperate nature of plant and terrestrial animal biota in the Antarctic is well-known (Pugh andConvey 2008, Pointing et al. 2015) and generally attributed to the combined effects of the geological and climatological history of the region , the low diatom species diversity of Antarctic lakes refutes the widespread idea of easy and global dispersal of protists (Finlay 2002). Rather, our data strongly suggest that successful dispersal events to the Antarctic are rare, considering the fact that the range in abiotic conditions (i.e. low input of solar radiation, the length of the growing season and the presence of snow and ice, as well as physical and chemical properties) found in Antarctic lakes is comparable to the physical habitat diversity in the (High-)Arctic (Vincent and Laybourn-Parry 2008). While Arctic and Antarctic lakes differ in their food web structure (Vincent and Laybourn-Parry 2008), and Continental Antarctic lakes lack catchment vegetation, we propose that the isolated nature of islands and ice-free regions and the low number of lakes in the Antarctic limit dispersal of diatoms to the region, which very likely has a more profound influence on its diversity than abiotic factors. This is in line with earlier evidence based on genuslevel diversity gradients in Southern Hemisphere lake diatoms (Vyverman et al. 2007) and with findings in terrestrial plants (Pointing et al. 2015).
We next assessed if, and to what degree, diatom floras in the Antarctic are differentiated between regions. Similarity profile clustering (SIMPROF) of species occurrence data (Supporting information) revealed a clear separation between the canonical biogeographic regions, with the sub-Antarctic region being more distinct from Maritime and Continental Antarctica. The robustness of this large-scale biogeographic structure is confirmed by a 100% correct classification rate of lakes in the three a priori defined regions in a canonical analysis of principal coordinates (Fig. 3A). Diatom floras from each of these three regions are thus highly distinct and only 4% of the 370 species occur in all three biogeographic regions (Fig. 4). Similarly, a mere 7% of the species are shared between Maritime and Continental Antarctica, while 5% and 13% are shared between the sub-Antarctic islands, and Continental and Maritime Antarctica, respectively. This strong differentiation between the three main regions is also observed in terrestrial taxa such as mites, springtails and nematodes (Chown and Convey 2007), and is conserved at higher taxonomic levels representing the major diatom clades (Supporting information). Epiphytic rhopalodioid and surirelloid diatoms are only present in the sub-Antarctic and acidophilous eunotioids are represented by a single genus in the sub-Antarctic and Maritime Antarctica, but are absent from Continental Antarctica. Cymbelloids are underrepresented in the latter region (1 genus) compared with Maritime Antarctica and particularly sub-Antarctica. Achnanthoids are relatively overrepresented in Continental Antarctica where they mainly consist of epipelic growth forms. Centricales being characterized by generally planktonic species are Figure 3. Biplot of canonical analysis of principal coordinates (CAP) of (A) the 439 lakes in Maritime Antarctica (in red), Continental Antarctica (in blue) and sub-Antarctica (in green). (B-D) are separate CAP biplots for Maritime Antarctica, Continental Antarctica and the sub-Antarctic islands, respectively. The results are congruent with the SIMPROF clustering analysis (Supporting information). CCR is the correct classification rate, and denotes the percentage of lakes that are grouped in their respective a priori defined biogeographic entities. The symbols in A are as in Fig. 2. The colors in (B) and (C) follow those of the ACBRs and in (D) those of the sub-Antarctic provinces as in Fig. 1. virtually absent in Continental Antarctica, with the exception of an undescribed species occurring in only seven lakes in the Prydz Bay region. The rare occurrence of centric diatoms in Continental Antarctica is likely related to the lack of turbulent mixing of lake water columns as a result of their prolonged lake ice cover and the lack of open water conditions.
Within each of the three regions, additional geographic structuring of diatom floras emerges ( Fig. 3B-D, Supporting information). In Continental and Maritime Antarctica the diatom communities are clearly differentiated between the different terrestrial ACBRs (Terauds et al. 2012), with the exception of Dronning Maud Land lakes which have a diatom flora similar to that of Enderby Land (Fig. 3B-C). In the sub-Antarctic, diatom floras cluster according to the three oceanic provinces recognized for plants and insects (Fig. 3D, Supporting information; Chown andConvey 2007, Van der Putten et al. 2010). This pronounced biogeographic structuring therefore closely resembles that of terrestrial free-living invertebrates (Pugh andConvey 2008, Chown andConvey 2016), but contradicts the presumed Antarctic-wide distribution observed in most freshwater crustaceans (Díaz et al. 2019). Partitioning of beta diversity (Baselga 2010) confirms the strong regional-scale structuring of diatom floras (Supporting information). Species turnover between individual regions is on average 11 times higher than nestedness, showing that individual ACBRs and the three oceanic provinces have largely different diatom floras, which accords with for example springtails in the Antarctic (Baird et al. 2019). This is however counter to expectations based on the hypothesis of increasing selection among an ubiquitous diatom flora with increasing latitude and thus harsher environmental conditions. In this scenario, less diverse Maritime and Continental Antarctic diatom communities would merely be subsets of hardy species derived from the richer floras occurring in the sub-Antarctic region. The high turnover observed between the three main biogeographic regions (Supporting information), therefore confirms that diatoms are dispersal limited in the Antarctic.
In order to further explore possible contributions of environmental filtering and historic and biological processes to regional-scale turnover in diatom composition, we applied redundancy and variation partitioning analysis (Supporting information). This revealed that nearly half of the variation in diatom species composition can be explained by spatial/ historical and abiotic factors. Although both spatial/historical factors and environment each uniquely explain significant portions of the diatom data (resp. 11 and 9%), the significant overlap between both sets of predictors in explaining diatom turnover (24%) reflects the strong covariation between climatic and spatial variables (Supporting information). This suggests that the assembly of lacustrine diatom floras within the Antarctic is strongly influenced by both environmental filtering and spatial/historical factors (including geographical isolation).
Although it is difficult, if not impossible, to prove or disprove endemicity among micro-organisms (Vyverman et al. 2010), an extensive literature survey (Supporting information) allowed us to classify species according to their known geographic ranges into non-endemic species also occurring outside the Antarctic, narrow and broad endemics, and a group of taxa for which insufficient data are available due to taxonomic uncertainties or limited literature data (Fig. 1). Nearly half of all diatom species in our dataset (163/370) are only known to occur in the Antarctic and are thus considered as being endemic ( Fig. 4; Supporting information). This high incidence of endemism converges with evidence for marked endemism in Antarctic macroscopic biota , Pugh and Convey 2008, including (amongst other taxa) springtails (90% endemics in the Antarctic (Baird et al. 2019)), lichens (33-50% endemics in Continental Antarctica (Peat et al. 2007)), tardigrades (73% endemics in Continental Antarctica (Guidetti et al. 2019)), and bdelloid rotifers (95% endemics in Antarctica (Iakovenko et al. 2015)).
While nine out of the 163 endemic diatom species are found throughout the Antarctic or are shared (n = 26) between two regions (i.e. broad endemics), a substantial number of species have a narrower range and appear to be confined to Continental Antarctica (17 species), Maritime Antarctica (36 species) or the sub-Antarctic islands (75 species; Fig. 4). The share of endemic species to regional diatom richness is highest in Continental Antarctica (64%), followed by Maritime Antarctica (58%) and the sub-Antarctic (37%), and is significantly different between the three regions when calculated for each lake separately (Supporting information). The proportion of endemic species increases with increasing latitude (R 2 adj = 0.43, p < 0.001; Fig. 5), and negatively correlates with mean annual temperature (R 2 adj = 0.457, p < 0.001) and is positively correlated with geographic isolation (R 2 adj = 0.29, p < 0.001; Supporting information). Multiple non-linear (GAM) regression analysis confirmed this outcome, as differences in the proportion of endemics in each lake could be significantly explained by temperature and geographic isolation (R 2 adj = 0.54, p temp < 0.001, p isolation < 0.001, Supporting information). This suggests that evolution in isolation has played a critical role in shaping the present-day diatom community structure in the Antarctic, and resulted in its high level of endemism.
Whilst endemism is present in the majority of the genera, the fraction of endemic species in Maritime and Continental Antarctica is particularly high in predominantly terrestrial genera such as Luticola, Muelleria and Humidophila (Supporting information). We hypothesize that this is related to survival of these lineages in terrestrial glacial refugia and their subsequent (re-)colonisation of lakes, because terrestrial diatoms generally are more tolerant to environmental stresses, such as freezing (Souffreau et al. 2010, Hejduková et al. 2019. During ice ages most currently existing lakes were either permanently covered by lake ice (Hodgson et al. 2005) or overridden by the Antarctic Ice Sheets (Bentley et al. 2014). Terrestrial diatom lineages probably survived glacial periods in cryoconite holes (pockets of liquid water formed by windblown dust absorbing solar radiation; Stanish et al. 2013), or, alternatively, in ice-free terrestrial refugia  in regions that are currently ice-free or that existed during periods of lower relative sea level (Verleyen et al. 2020). The subsequent post-glacial (re-)colonization of lakes by these taxa and their subsequent radiation may have been facilitated by the availability of empty niches due to regional extinction (or severe population declines following widespread habitat loss) of typically lacustrine species during Pleistocene ice ages. However, these colonization events must have occurred over relatively small distances, given the low amount of endemic species shared between the three main biogeographic regions, and the vast majority of endemics being restricted to only a few ACBRs or sub-Antarctic provinces (Supporting information), and to a small proportion of the lakes (Supporting information). This suggests that icefree regions in the Antarctic are sufficiently geographically isolated to prevent regular successful colonization events by diatoms as is also suggested to be the case in terrestrial invertebrates (Convey et al. 2014). Interestingly, the nonendemic taxa also have restricted ranges and are confined to only one or a few regions (Supporting information), suggesting that successful colonization of isolated lake districts by cosmopolitan taxa resulted in the local establishment of such species, but that subsequent dispersal to other lake districts was and still is limited. These restricted distribution patterns in both endemic and non-endemic diatoms resulted in similar biogeographic structuring into ACBRs and sub-Antarctic provinces in both groups (Supporting information). An additional observation is that broad endemics have a higher prevalence compared to narrow endemics, as they are generally present in a higher proportion of lakes within a given region compared to species restricted to only one of the Figure 5. Latitudinal gradient in the level of endemism (R 2 adj = 0.43; p < 0.001), which is calculated as the total number of narrow and broad endemics divided by the total number of species in each lake. Colors denote the three biogeographic regions: sub-Antarctica (green), Maritime Antarctica (red) and Continental Antarctica (blue). The symbols denote the different ACBRs or sub-Antarctic provinces. three regions (Supporting information), which suggests that broad and narrow endemic species may differ in their dispersal ability. In addition, broad endemics might have a longer association with the Antarctic, and hence more time for successful colonization of regions, but this should be confirmed by molecular clock analysis. Moreover, population genetic studies in these broad endemics might potentially reveal the existence of local genetic clusters as similarly observed in, for example, springtails (McGaughran et al. 2019).
Combined, our results suggest that diatom floras of the Antarctic are dispersal limited and largely shaped by significant in situ speciation/extinction, in addition to local environmental factors, resulting in distinct biogeographic regions also recognised in many terrestrial invertebrates (Convey et al. 2014 and Chown and Convey 2016 for reviews). Although future surveys of less explored regions within the Antarctic may reveal a broader distribution for some narrow endemic diatom species, we expect that taxonomic revision of understudied groups may further increase the level of endemism because of the likeliness of significant (semi)cryptic diversity (Mann 1999). Early evidence for additional endemism among presumably cosmopolitan diatoms was recently found in the terrestrial diatom Pinnularia borealis, for which the existence of regionally restricted cryptic species in the Antarctic and elsewhere was demonstrated (Pinseel et al. 2020). It will be revealing to see to what extent other groups of microorganisms conform to the biogeographic scheme observed in diatoms, considering the divergent and deep phylogenetic origins of microbial groups and their diversity in life histories (Burki et al. 2020).
Our results represent a compelling case for the importance of historical factors and dispersal limitation in shaping the evolution and biogeography of freshwater diatoms in the Antarctic, and may serve as a novel paradigm for Antarctic freshwater biogeography. In addition, our findings also have two important implications for conservation planning. In particular, measures should be taken to prevent the introduction of non-native microbes into the Antarctic, as the introduction of exotic taxa may affect local communities and potentially result in the extinction of endemic species. In addition, because our data reveal that different ACBRs and oceanic provinces are characterized by highly dissimilar diatom communities and few species are shared between regions, the unintentional transportation of microorganisms within the Antarctic should also be avoided in order to protect regions against increased homogenization of their diatom floras. This was already suggested for other taxa (Hughes et al. 2019) and might require more stringent measures than those currently taken by some national scientific program managers and tourist operators (Hughes et al. 2015). However, considering the steady increase in tourism and scientific activities in the Antarctic (Coetzee et al. 2017), as well as forecast climate and environmental changes favoring establishment of exotic taxa (Duffy et al. 2017), preventing the introduction of non-native species and the homogenization of diatom floras between regions should be a high priority on the international conservation agenda.