Autecology of terrestrial diatoms under anthropic disturbance and across climate zones

Like aquatic diatoms


Introduction
Diatoms are a group of microscopic, single-celled algae living in almost all moist and aquatic environments with sufficient light (Dixit et al., 1992).Most species show very specific preferences for a broad range of environmental factors such as pH, nutrients and salinity, making them one of the most commonly used bio-indicators for water quality assessment and paleolimnological analysis (Smol et al., 2001;Smol and Stoermer, 2010).In order to infer degradation levels of water bodies or reconstruct past environmental conditions, autecological indicator values are regularly established (Carayon et al., 2019;Van de Vijver et al., 2002).These values are generally derived from weighted averaging; a simple, reliable and extensively used technique for estimating taxa indicator values and assumes that taxa abundance follows a unimodal relationship with a given environmental variable.Indicator values are the basis of some widely used diatom indices such as the IPS (Specific Pollution Sensitivity Index; Cemagref, 1982) and BDI (Biological diatom index; Lenoir and Coste, 1996).Also, they have been gathered in synthetic trait matrices to be used for other types of ecological diagnosis (Carayon et al., 2020;Taylor et al., 2007;Dam et al., 1994).Overall, the calculation of autecological values for diatoms has proven to be a very useful tool in many aspects of water quality research (Poikane et al., 2020).
While aquatic diatoms are commonly studied and ecologically well characterised, mainly due to their general use in water quality monitoring programmes, studies on limno-terrestrial diatom communities (i.e. here used as diatom assemblages that can be found on soils) are rather scarce.However, ecological studies showed that diatom communities on soils are also quite responsive to several environmental variables such as soil moisture and pH (Antonelli et al., 2017;Lund, 1945;Van de Vijver et al., 2002;Van de Vijver and Beyens, 1998;Van Kerckvoorde et al., 2000).As a result, Van de Vijver et al. (2002), who investigated 106 soil diatom samples on Île de la Possession (Crozet, sub-Antarctica), could establish optimum values of soil moisture for the most common taxa in their dataset occurring there.Also, Lund (1945), who sampled 66 different soils in the UK, was able to determine tolerance ranges to pH for 24 taxa.In addition to soil moisture and pH, disturbances caused by farming practices also play a key role in structuring terrestrial diatom assemblages (Antonelli et al., 2017;Foets et al., 2020a;Heger et al., 2012;Stanek-Tarkowska and Noga, 2012;Vacht et al., 2014).Foets et al. (2020a) found that disturbed areas were less diverse and that land uses with different disturbance levels could be differentiated solely based on the community composition.They also noticed that the species composition remains stable throughout the year, meaning that variation in soil moisture availability, irradiance and temperature does not play a significant role.Besides the direct influence of these disturbances on diatoms, they indirectly affect other variables as well such as organic matter, nitrogen and carbon content in the soil, which in turn also impact diatoms (Stanek-Tarkowska et al., 2018a).Both nitrogen and carbon are often found as significant explanatory variables for the diatom species composition (Antonelli et al., 2017;Vacht et al., 2014), while organic matter increases the moisture holding capacity of the soil and acts as a buffer against dryness (Stanek-Tarkowska et al., 2018a;Stanek-Tarkowska and Noga, 2012).Thus, terrestrial diatoms are sensitive to several environmental variables, but so far, most studies focused on the communities' structure and not at the species level preferences.
Pending more ecological knowledge on terrestrial diatoms, several studies explored their potential as environmental markers.Many of them tried to explain the occurrence of diatoms on soils based on the ordinal classifications created by Dam et al. (1994) (see for instance Antonelli et al., 2017;Stanek-Tarkowska et al., 2013;Stanek-Tarkowska and Noga, 2012).Despite that these classifications are based on data acquired from aquatic samples in the Netherlands (and ecological values reported in the published literature), they are nevertheless frequently used in many European countries as a reference for autecological studies.However, for numerous terrestrial taxa, similar values are lacking, as they only occur sporadically in aquatic environments, and assigned values may not really reflect their behaviour in true terrestrial conditions (Barragán et al., 2018).There is a pressing need for assigning autecological values to terrestrial diatoms, reflecting their preferences in terrestrial environments and which would enable us to unlock their potential as environmental markers.
Despite the worldwide application of some diatom-based indices (e. g.IPS), there is nowadays a tendency to develop indices and indicator values specific to certain regions, since these values would better reflect environmental conditions in that area (Carayon et al., 2020(Carayon et al., , 2019;;Lavoie et al., 2009).Unfortunately, data on terrestrial diatom ecology remains scarce and developing an index or defining indicator values at the local scale is at present still not possible.Therefore, we aim to provide robust autecological values for common diatom taxa living on soils based on data of several ecological studies carried out across a range of climate and geographical conditions.We will analyse several environmental variables, which previously showed to influence diatom communities living on soils.This shall enable us not only to use them later as environmental indicators, but also to expand the existing toolbox of environmental markers applied in for example hydrology, soil and science.
The studies of Foets et al. (2020a) and Foets et al. (2020b) were conducted in the Attert River basin in Luxembourg.Terrestrial diatom samples were taken at the soil surface according to the method described by Coles et al. (2016) in 16 sites around the catchment every 3-5 weeks for a period of 14 months (October 2017 to November 2018) totalizing 206 samples that were included in the present study.Subsequently, the same soil samples were pooled and used for pH and nutrient analysis.Of the several environmental variables included in that study, only pH measured in 1:5 H 2 O, volumetric soil moisture content (VWC, expressed in percent) measured in situ (30 times around the sampling area), and the bioavailable fraction of total nitrogen (TN), Phosphorus (P) and Dissolved organic carbon (DOC) (both in mg L − 1 soil extraction) were used.The latter two were analysed with ICP-OES after a 0.01 M CaCl 2 extraction following (Houba et al., 2000).Furthermore, they incorporated five different types of land use (forest, undisturbed grassland, grassland disturbed by cattle grazing, grassland disturbed by agriculture and agricultural fields).
Similar to Foets et al. (2020a) and Foets et al. (2020b), the study of Antonelli et al. (2017) was also carried out in the Attert River basin from August 2014 to March 2015.Diatoms were sampled as in Foets et al. (2020a) and Foets et al. (2020b) at 34 locations during three sampling campaigns totalling 92 samples.Topsoil samples for physico-chemical analysis were collected simultaneously with a shovel.Contrary to Foets et al. (2020a), only three land use types were distinguished.However, based on photographs and additional information on the sampling areas, we were able to adjust the different land uses according to their classification.Besides, soil moisture was measured gravimetrically (%) rather than volumetrically, pH in H 2 O, KCl and CaCl and C, H and N (all expressed in percent) using a CHN analyser.Of those variables, pH H2O , N and C were included.While pH is merged with the results of Foets et al. (2020a) and Foets et al. (2020b), N and C are regarded as different variables.Barragán et al. (2018) sampled four locations in the Attert basin comprising four different anthropogenic pressures.At each location, they collected 10 soil samples for diatom analysis following the method described in Coles et al. (2016) and determined soil moisture volumetrically.Since soil moisture measurements were done as in Foets et al. (2020b), both variables were aggregated.Also, for this study, photographs and information on the locations were provided and enabled us to classify the different land uses.
In Stanek-Tarkowska and Noga (2012), two agricultural fields under different tillage systems, located in the Subcarpathian region (SE Poland), were sampled once.The samples were collected at a depth of 0-3 cm and put into petri dishes.In that study, some samples were also cultured, but these were excluded here.Besides, they also collected soil samples from the surface layer (0-10 cm) of which they measured pH in 1:2.5 KCl.These values were converted to pH 1:5 H 2 O following the Eq.
(1) provided in Kabała et al. (2016) before aggregating the data.pH H2O1:5 = − 1.95 + 11.58*log 10 (pH KCl1:2.5 ) (1) The study of Stanek-Tarkowska et al. (2015) was also conducted in Poland.There, one site located in Pogórska Wola was sampled for nine consecutive months in 2011.Soil samples for diatom analyses were collected from a 0-3 cm deep layer and placed in petri dishes (three replicates), whereas samples for physico-chemical analysis were taken from the topsoil layer (0-5 cm) at the same time.pH was measured in a 1:2.5 KCl solution and was converted following Eq.(1) before data merging.
Stanek-Tarkowska et al. (2018a) collected monthly samples from April to November for four consecutive years (2013-2016) at two agricultural fields near Rzeszów (Poland) (n = 60).There, two replicates for diatom analyses were taken from the 0-5 cm soil layer and placed in petri dishes, while undisturbed soil samples were taken with 100 m cylinders for physico-chemical analysis.pH was determined as in Foets et al. (2020a) and Foets et al. (2020b), while gravimetric soil moisture content was converted to volumetric moisture content using soil bulk density.Both variables were used in our study without further adjustments.
The dataset of Van Kerckvoorde et al. (2000) included 30 diatom samples originating from 30 different sites in the Zackenberg area (NE Greenland) covering an area of approximately 4 km 2 .For those samples, the upper 3 cm of the soils was collected.In addition to diatom community data, soil moisture and pH were measured similar to Foets et al. (2020b), while SOM was derived from LOI (Loss-On-Ignition).All three variables were included without modification in the final dataset.
Finally, Van de Vijver et al. (2002) analysed 106 samples collected on the sub-Antarctic Île de la Possession ( Îles Crozet).Of those, originated from permanently wet areas or moistened rocks (i.e.moisture content > 75%) and were excluded from the final dataset, as they did not reflect the true terrestrial soil conditions as is the case in the other studies.In the sub-Antarctic study, several environmental variables were assessed, but only pH 1:5 H 2 O and VWC were incorporated, because the other variables were measured in a different way and could therefore not be converted.
After merging the community datasets, the entire dataset was made taxonomically consistent by updating species names to the most recent diatom taxonomy, putting synonyms together and assigning four-letter codes to each taxon according to the latest Omnidia version (March 2019; Lecointe et al., 1993).Since identification of terrestrial diatoms is often sparse in the literature, several publications were used, including Ettl and Gärtner (1995), Krammer (2000), Lange-Bertalot (2001), Lange-Bertalot (2011), Lange-Bertalot et al. ( 2017), Levkov et al. (2013) and Lund (1946), along with some essential studies on terrestrial algae (Brendemuhl, 1949;Hustedt, 1942;Petersen, 1915Petersen, , 1928Petersen, , 1935) ) and some recent studies on terrestrial diatoms (Reichardt, 2008(Reichardt, , 2012;;Wetzel et al., 2015).Available diatom pictures from the different studies were used to make the datasets taxonomic consistent as much as possible.Taxon names were kept in the broad sense (i.e.sensu lato) as our aim was to give general autecological values.This resulted in a final dataset including 516 soil diatom samples across 166 different sites, covering 710 taxa (including varieties, subspecies and forms), and the following variables: pH, VWC, TN, DOC, N, C and SOM.Table 1 provides an overview of all data.

Statistics
The Shannon-Wiener diversity and species richness were calculated for each sample.However, the communities given in Stanek-Tarkowska et al. (2018a) were excluded from further analysis, as the data only contained the 14 most abundant taxa and not the entire communities.Furthermore, the significance of soil moisture, pH and type of land use in explaining the variation in those two variables was investigated.For this, a generalised mixed model with site as random variable to account for repeated measurements was set up.The Shannon-Wiener index between four different regions (i.e.Luxembourg, Poland, sub-Antarctica and Greenland) and five land use types (i.e.forest, undisturbed grassland, grazed grassland, agricultural grassland and agricultural field) was analysed with analysis of variance (ANOVA) and assessed the difference further with parametric post hoc tests.Species rarefaction curves for each region and land use type using the function specaccum incorporated in the vegan R-package.
Next, the community dataset was reduced by removing rare taxa.A taxon was considered 'rare' when it did not occur with a minimum relative abundance of 2.5% in at least five samples (i.e.0.01% of the samples).The identification of most of those taxa was also uncertain (i.e.sp., cf. or aff.).Then, species autecological values were computed using weighted averaging regression (ter Braak and Looman, 1986).After the calculation of the ecological indicator values, the inverse algorithm (i.e.WA calibration) was used to build a model predicting value based on Fig. 2. Shannon-Wiener and species richness of the samples and comparison between land uses and regions.A, frequency bar plot of the taxon richness per sample.B, Sample-based species rarefaction curves per land use type.C, Shannon diversity per region and land use type.F, forest; UG, undisturbed grassland; GG, grazed grassland; AG, agricultural grassland; AF, agricultural field; Pr, pristine (i.e.Antarctica and Greenland).Lux, Luxembourg; Pol, Poland; Ant, sub-Antarctica; Green, Greenland.These analyses include 456 samples and rare species.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)floristic data.These two values were then compared and assessed after bootstrap cross-validations (n = 1000) using root mean square error (RMSE; Wallach and Goffinet, 1989) and correlation coefficients as model validation metrics.The best results (i.e.lowest RMSE and highest r 2 ) were obtained with data that was tolerance down-weighted and transformed according to inverse deshrinking models.
Additionally, a training-set containing half of the samples (i.e.all odd samples) for pH and soil moisture was created.Then the goodnessof-fit was assessed by passively fitting the passive samples (i.e.all even samples) into a constrained ordination of the training-set with pH or soil moisture as the sole constraint.The passive samples were positioned as supplementary samples within the ordination space by means of transition equations given in ter Braak and Šmilauer (2002).This determines subsequently a score for each passive sample by taking the weighted average (canonical ordination analysis, CCA) of the species scores extracted from the ordination of the training-set samples.Hence, the even samples are positioned within the ordination without influencing the underlying ordination solely based on the training set (Birks et al., 2012).CCA as ordination method was chosen, since DCA on the reduced species data revealed a unimodal distribution for the first four axes (S.D. > 4).Next, the distribution of the squared residual distances between each sampling point and its fitted position on the first constrained axis for the training-set was calculated.Any passive sample that has a squared residual fit greater than the 90th percentile distance for the training-set samples is poorly fitted within the calibration function model (Birks et al., 1990).Afterwards, the two sample sets were switched and the analysis was done again to verify the previous outcome.All previous calculations were done with the residLen function from the analogue R-package (Simpson and Oksanen, 2020).
After estimating autecological values (optimum and tolerance) for each taxon, the optimum values of pH and soil moisture were compared with the updated indicator values of the same variables assigned by Dam et al. (1994).Differences between the two sets of optimum values were tested using ANOVA.In case of significant differences were found, the Tukey HSD test was used to reveal which categories deviated from the others.The normality and homoscedasticity of the model residuals were checked prior to statistical analysis.For all aforementioned statistical analyses were performed using the R statistical program (R v. 3.6.3.; http://www.r-project.org/) and additional functions from the R-packages vegan (version 2.5-6; Oksanen et al., 2019), analogue (version 0.17-4; Simpson and Oksanen, 2020) and rioja (version 0.9-21; Juggings, 2017).

Results
An overall mean species richness per sample of 21.7 ± 11.1 with a maximum of 81 and a minimum of 2 (Fig. 2) was observed in the entire dataset.From the species rarefaction curves, we observe that communities sampled in undisturbed areas have generally a higher species richness per sample than disturbed sites.A closer look reveals that after approximately 80 samples counted, we should find between 300 and 350 different species in forested sites, which is twice the number found for agricultural fields.Also, the rarefaction curves for the samples collected in Greenland and sub-Antarctica reach a clear asymptote at approximately 250 taxa.
The Shannon diversity values are significantly lower in areas that are highly disturbed by agriculture (AF) (F = 13.74,df = 443, R 2 = 0.028, P < 0.01), while pastures, undisturbed grassland and forest did not present a significantly different diversity (P > 0.05).Furthermore, we notice that the communities collected in Greenland and Poland have a lower diversity than the ones taken in Luxembourg and sub-Antarctica (F = 11.22,df = 452, R 2 = 0.063, P < 0.05).It must be noted however, that all samples from the Polish sampling sites used here are coming from anthropic disturbed areas and are therefore comparable to samples from disturbed fields (AF, AG) elsewhere.However, the sites were not assigned to a land use type, since verification material (i.e.pictures and additional notes on the sampling areas) was not available.In addition, we ran a generalised mixed model with 'site' as random variable to check whether differences in species richness and diversity could be explained by land use, soil moisture and/or pH.The first model revealed that the type of land use (P < 0.01) and soil pH (P < 0.05), which was positively correlated, explained 84.5% of species richness, whereas the second one showed that they both explained 62.7% of the species diversity (P landuse < 0.05, P pH < 0.01).Soil moisture was not significant in both models (p > 0.05).Although samples are coming from contrasted environments, the results indicate that the species richness and diversity are similar between the samples and are likely driven by the same environmental factors.

Table 2
Validation of the weighted average of different environmental variables.Results of the validation (r 2 and RMSE) are given for the data that was tolerance downweighted and inverse deshrinked.The reference between brackets gives the study where the variable was analysed.Also, the number of samples included in the analysis and their range are provided.SOM, Soil Organic Matter; VWC, Volumetric Moisture Content; RMSE, Root Mean Square Error.(Table 2), while pH and soil moisture reach an overall r 2 of respectively 0.57 and 0.62 (Fig. 3).Furthermore, we divided the samples based on the type of land use (anthropic disturbed vs. undisturbed) and ran the same analyses again.For both pH and soil moisture content, we now observed increased correlation coefficients.pH reached 0.68 and had a RMSE of 0.47 compared to 0.56 and 0.44 when only disturbed habitats were considered, whereas for soil moisture a substantial difference of r 2 was seen between disturbed (0.34) and undisturbed areas (0.69).The RMSE also increased with 0.5 compared to the overall value.
In addition to evaluating the regression between the observed and estimated environmental variables, we also calculated 'goodness-of-fit' statistics from a CCA ordination for pH and soil moisture.Therefore, we subdivided the samples to create two equally large datasets: a training set and a passive sample set.For pH, both sets contained 194 samples, whereas for soil moisture the sets contained respectively 206 and 205 samples.The distribution of the squared residuals for all four sets of Fig. 3. Relationships between observed and estimated values using weighted averaging.A separate analysis has been done for the disturbed habitats (AG, GG, AF) and undisturbed (UG, F, Pr).Here, only taxa were included with a minimum abundance of 2.5% in at least five samples.Regarding pH, 202 and 173 data points were included in the validation of the undisturbed and disturbed habitats respectively, while there were 209 and 142 for soil moisture.
samples is shown in Fig. 4. In the training sets, we observe that for both pH and VWC 10% of the samples were poorly fitted (i.e.outside 90th percentile), whereas those values increased to 16.5 (pH) and 14.6% (VWC) for the passive samples.Similar percentages were found when we switched the sample sets.So, the analysis of the squared residual distances for the passive samples to pH and soil moisture indicates that 82-85% of the samples are well fitted within the unimodal responsemodel framework and that we can therefore be confident of the estimated autecological values.
The obtained autecological values were then compared with previous studies of Van de Vijver et al. (2002) and Lund (1945) for respectively volumetric soil moisture content and pH (Table 3).Prior to comparing the values, the diatom identifications made by Lund (1946) were checked if his identifications were the same as ours.Lund (1945) unfortunately determined pH using a colorimeter, whereas in our study, an electrometric method was used.Although the outcomes of the two methods should not deviate much more than 0.2 units from each other according to Haines et al. (1983), comparisons should be interpreted with care.Nevertheless, it seems that all ranges for pH overlap well with the values of Lund (1945).However, Tryblionella debilis Arnott has a very small tolerance range (7-7.2) specified by Lund (1945), whereas we calculated a rather broad pH tolerance ranging from 6.0 to 7.8.Also, Humidophila contenta s.l.(Grunow) R.L.Lowe et al. (including Humidophila biceps (Grunow) P.Furey, K.Manoylov & R.L.Lowe) occurs on very acidic soil (pH = 3.9) according to Lund (1945), while this study indicates that both taxa are normally present on soils with a pH between respectively 5.7 and 7 and 5.18 and 6.28.Furthermore, we see some larger differences between the ranges for soil moisture content.The ranges assigned by Van de Vijver et al. (2002) tend to go more towards higher moisture concentrations and are generally broader, compared to ours.This is not only the case for common, widespread taxa, but also for taxa such as Humidophila crozetikerguelensis (Le Cohu & Van de Vijver) R. L.Lowe et al., H. comperei and Planothidium aueri (Krasske) Lange-Bertalot which were only found in the samples of Van de Vijver et al. (2002).However, this did not result in non-overlapping ranges, except for the very common Hantzschia abundans Lange-Bertalot.For that species, we found a range between 7.5 and 31%, while the tolerance calculated by Van de Vijver et al. (2002) ranged between 31 and 71%.Bar this exception, the tolerance ranges for both soil moisture and pH overlap well with previous research.
Optima and tolerance values for common on soils occurring diatom     4.
generally not present on soils anymore when the moisture content reaches 35 to 40%.Overall, these autecological values will improve our knowledge of the ecology of terrestrial diatoms and enable us to better use these organisms as environmental markers.
Finally, we analysed if our inferred optimum values correspond to the categories for pH and moisture assigned by Dam et al. (1994) (Fig. 7).The latter indicates how likely it is that a species will occur in a terrestrial environment and at which prevailing moisture conditions.Concerning pH, we see that our optimum values correspond well with the different Dam et al. (1994) categories (F = 9.87, df = 115, R 2 = 0.23, P < 0.001).However, our autecological values indicate that it would be better to take a pH of 6.5 instead of 7 to differentiate the categories 2, 3 and 4 from each other when evaluating soils.Contrary to pH, there was no significant difference of our optimum values for VWC between the different categories (F = 0.35, df = 81, R 2 = -0.03,P = 0.84).This is interesting, since we expected that taxa categorised as aquatic require a higher threshold for soil moisture than terrestrial taxa to be present on terrestrial habitats.Now it seems that, independent of their assigned category, diatoms generally need a VWC of around 20% to grow and reproduce on soils.

Discussion
For this research, we combined eight different datasets on terrestrial diatoms coming from four distinct regions, encompassing 516 soil samples from166 different sites and covering 710 taxa.A first analysis of this data revealed significant relationships of species diversity and richness with pH and anthropic disturbance, while soil moisture was not important.These observations are in line with previous research.Hoffmann (1989) and Lund (1945) noted that diatoms generally prefer neutral to alkaline soils.However, both pointed out that pH is often correlated with other variables (e.g.CaCO 3 ) and that it will change the nutrient availability in soils, which could also affect diatoms.Regarding different land uses, Barragán et al. (2018) and Foets et al. (2020a) observed that less disturbed areas were more species rich and diversified.The difference between the two extremes (agricultural field and forest) observed here was 150 taxa, doubling the number of species occurring on agricultural fields.Even though soil moisture is crucial for diatoms to grow and reproduce, recent studies indicated that soil moisture does not influence the community composition as such (Foets et al., 2020a;Zhang et al., 2020).However, it does play a key role in the absolute diatom abundances (i.e.primary production) (Foets et al., 2020b).Overall, these analyses confirm the main patterns in the variability of diatom communities and reveal a huge difference in species richness between undisturbed and disturbed areas.
In a next step of our analysis, we checked the sensitivity of diatom taxa for pH, VWC, TN, P, DOC and SOM.Despite that these variables all have been documented in having an effect on the community composition (Antonelli et al., 2017;Lund, 1945;Vacht et al., 2014;Van de Vijver et al., 2002;Van Kerckvoorde et al., 2000), we only found that the optimum values assigned for pH and soil moisture gave decent results with r 2 -values of respectively 0.57 and 0.62.A side note here is that considerably less data has been included in the calculation for TN, P, DOC, SOM and C/N.Thus, adding more samples to the dataset may improve the model metrics for those variables.However, research by Gremmen et al. (2007) and Van de Vijver et al. (2002) on terrestrial diatoms revealed high r 2 -values of 0.85 and 0.68 for respectively altitude and soil moisture, including less data points than we did.However, these studies only used data coming from anthropogenic undisturbed environments situated on sub-Antarctic islands.This latter factor is also very important, as our validation metrics improved considerably after removing the samples collected in anthropogenic disturbed habitats, whilst reaching values similar to Van de Vijver et al. (2002).This observation indicates that anthropogenic disturbance is perhaps the principal factor defining taxon occurrences.In order to improve the validation metrics for nitrogen, carbon and organic matter, it will be important that, apart from adding data, a significant part of the data should come from anthropogenic undisturbed sites.Only then, we will know if we could get acceptable autecological values for those variables and eventually apply them in future research and management.
After validating the calculated autecological values for pH and soil moisture, we compared them with previous research of Lund (1945) and Van de Vijver et al. (2002).Although pH was measured differently, the ranges seem to overlap well and there is no real indication that the ones established by Lund (1945) are constantly more acidic or alkali than ours, suggesting that the outcome of the two methods does indeed not seem to diverge much from each other.We noticed that the tolerance ranges for soil moisture calculated by Van de Vijver et al. (2002) tend to go more towards higher values and are generally broader.This is probably due to the selection and removal of the data, since the same observation is done for taxa exclusively occurring in sub-Antarctic samples.While Van de Vijver et al. (2002) included diatoms found on soils ranging between 0 and 100% saturation, we only selected sites with a maximum saturation of 75.5%.In addition, our samples generally had

Table 4
Optimum and tolerance values of pH and volumetric soil moisture content (VWC) for the most abundant soil diatoms.The diatom taxa, which occurred in at least 5 samples with an abundance of minimum 2.5%.The four-letter code is retrieved from OMNIDIA.Apart from comparing the tolerance ranges of pH and soil moisture, we also checked whether the updated Dam et al. (1994) classification is in accordance with those optimum values.We found that diatom taxa occurring in both aquatic and terrestrial environments have similar preferences in both environments, meaning that the categories for pH are still useful in terrestrial settings.However, if used for soils, the threshold value between categories 2, 3 and 4 should be set around 6.5 instead of 7 to have a better interpretation.Contrary to pH, the categories for soil moisture did not work for terrestrial environments and results indicate that taxa, either assigned as rather aquatic or terrestrial by Dam et al. (1994), can (frequently) occur on soils if the moisture concentration is minimum 20%.A similar observation was reported by Stokes (1940), who found that algae function and grow best when 40 to 60% of moisture-holding capacity of the soil is reached.Besides, this outcome also affects the use of diatoms as hydrological tracers, since in those studies Dam et al. (1994) categories 4 and 5 were used to classify diatoms as being 'terrestrial' (Pfister et al., 2017(Pfister et al., , 2009)).Knowing that diatoms from other categories also regularly occur on soils, we should revise the classification, which eventually would lead to the inclusion of more diatoms that can be used for tracing hydrological connectivity.Although the classification of Dam et al. (1994) has proven to be very useful, it does not always provide the correct ecological answers for diatoms present on soils.
As mentioned before, adding data, independent of the location, to the existing dataset might still improve the validation and optimum values for the investigated variables provided that the community data is taxonomically consistent.This is, however, not easy since taxonomy in diatom research changes constantly and verification is rather timeconsuming and not always possible.In this study, we could not verify everything and due to this inconsistency, it is possible that (much) 'noise' ended up in the results.However, since terrestrial diatoms are far less studied than strictly aquatic diatoms, taxonomical changes occur in a slower pace.Very often, old diatom publications such as Ettl and Gärtner (1995), which is a compilation of terrestrial diatom studies in Europe and Lund (1945) and Lund (1946), among other references mentioned earlier are still consulted for identification, making taxonomical harmonization easier between the different datasets.Noise will also result from differences in soil moisture between the soil moisture observation and the diatom sample, since soil moisture can be highly variable even at small distances (Teuling and Troch, 2005).Also, we did not account for (pseudo-)cryptic species (i.e.genetically different species, but morphologically (almost) undistinguishable) such as P. borealis and H. amphioxys (Maltsev and Kulikovskiy, 2017;Pinseel et al., 2020Pinseel et al., , 2019;;Souffreau et al., 2013).Both are known to have developed different adaptations and/or tolerances to certain climatic conditions (Souffreau, 2011;Souffreau et al., 2010).As they are widespread, common diatoms, we can imagine that many more cosmopolitan species such as H. contenta, H. abundans, P. obscura are also (pseudo-) cryptic.A way to solve this is to switch to molecular techniques (i.e.DNA metabarcoding and High-Throughput Sequencing) for community analysis.These techniques are seen as a fast, efficient and low-cost solution to the rather time-consuming microscopic diatom identifications (Rivera et al., 2020;Vasselon et al., 2017).It will also enable us to increase sampling frequency and to combine and link different soil organisms with each other, paving the way for monitoring programmes and other related applications (Fløjgaard et al., 2019;Orgiazzi et al., 2015).Moreover, it will be also a solution for the cryptic diversity and thus it will still be possible to combine datasets from different climatic regions.
Another point that we should be aware of when collecting data for intercomparing purposes is to align and standardise the soil sampling and analysis of environmental variables.Due to this, we were not able to fully utilise our combined dataset.Of the different variables included here, we believe more focus should be on soil organic matter, since an encouraging r 2 of 0.29 was obtained based on a limited number of only 30 samples.Besides, it is strongly related to anthropic disturbances and the moisture-holding capacity of the soil (Hudson, 1994;Stanek-Tarkowska et al., 2018a), both of which diatoms are responsive to and likely sensitive as well.Furthermore, in regards of their potential usage as a measure for soil quality, SOM is often seen as an important indicator (Bünemann et al., 2018) and it will therefore be interesting to explore this variable more in relation to diatoms.

Conclusions
In this study, we defined autecological values for pH and soil moisture content for the most common, widespread soil diatoms and compared them with previous research of Lund (1945), Dam et al. (1994) and Van de Vijver et al. (2002).Besides our results showing similarities with those studies, they also indicated a significant improvement on the existing indicator values.Moreover, we expanded the list of terrestrial diatoms to 249 taxa to be used as environmental markers in different research fields.We believe that future studies should focus on molecular techniques as it will be possible to speed up the identification process and discriminate better between cryptic species.Additionally, soil sample collection and nutrient analysis should be standardised in order to enable a better pooling of data concerning diatom ecology in the future.

Fig. 4 .
Fig. 4. Density plots of the squared residual fit to pH and soil moisture.The results are given for the training data set (odd samples; upper panels) and the passive samples (even samples; lower panels) derived from passively overlaying the odd samples on to a canonical correspondence analysis ordination (CCA) of the trainingset samples.The pH training set includes 194 samples, while the set for soil moisture yields 206 samples.The labelled dashed lines are for the 90th, 95th, and 99th percentiles of the distributions of the two sets of squared residual lengths.Samples lying beyond the 99th percentile are extremely poorly fitted to pH and soil moisture, respectively, those between the 95th and 99th percentiles are very poorly fitted, and those samples between the 90th and 95th percentiles are poorly fitted.

Fig. 5 .
Fig. 5. Optimum and tolerance values of pH for the dominant soil diatom taxa (10% in 10 samples).Diatom taxa are abbreviated with a four-letter code following Omnidia.The diatom taxa with their respective Omnidia codes are given in Table4.

Fig. 6 .
Fig. 6.Optimum and tolerance values of volumetric soil moisture content for the dominant soil diatom taxa (10% in 10 samples).Diatom taxa are abbreviated with a four-letter code following Omnidia.The diatom taxa with their respective Omnidia codes are given in Table4.

Table 1
Overview of the different datasets and variables included in this study.The number of cells per sample is the minimum number of diatom cells that was targeted in the corresponding study.However, sometimes this number could not be reached due to a low presence of diatoms in the samples.Converted or modified factors are indicated by '*'. 1 Study only included the most abundant taxa, therefore it was only used for the calculation of the autecological values.

Table 3
Previous and calculated tolerance ranges.Previous ranges are based on studies of Van de Vijver et al. (2002) for soil moisture and Lund (1945) concerning pH.Lund determined pH colorimetrically, while we determined it electrometrically in a 1:5 water solution.Therefore, values may deviate a bit.*, Species only present in the samples of Van de Vijver et al. (2002).taxa with respect to pH and soil moisture are shown in Figs. 5 and 6.There, we observe that some taxa such as Eunotia exigua (Brébisson ex Kützing) Rabenhorst, S. nana and Meridion circulare (Greville) C.A. Agardh are very sensitive to pH.For example, E. exigua only occurs in very acidic environments, while S. nana is rather restricted to soils with a neutral pH.On the contrary, several species including H. ingeae, Luticola robusta Van de Vijver, Ledeganck & Beyens, Frustulia vulgaris (Thwaites) De Toni, H. crozetikerguelensis and Stauroneis pseudoagrestis Lange-Bertalot & Werum are rather tolerant to pH and could be present on a wide range of soils.We observe similar patterns for VWC.Certain species such as Adlafia minuscula (Grunow) Lange-Bertalot var.minuscula, Adlafia minuscula var.muralis (Grunow) Lange-Bertalot, E. exigua and Chamaepinnularia obsoleta (Hustedt) C.E.Wetzel & Ector only require a low moisture content to grow and reproduce (i.e.low optimum

Table 4
(continued ) Van de Vijver et al. (2002) content averaging 28%, whereasVan de Vijver et al. (2002)had an average content of around 45%. Considering those differences, the ranges of moisture (and pH) overlap very well and show that data selection plays an important role in calculating indicator values.