Machine learning reveals that sodium concentration and temperature influence alkenone occurrence in Swiss and worldwide freshwater lakes

Lacustrine

Unlike Group 3 Isochrysidales, lacustrine Isochrysidales are not present in all lakes (e.g., Brassell et al., 2022).So far, Group 1 Isochrysidales have not been successfully isolated for laboratory culture.Therefore, we can only rely on environmental studies to better understand their ecological preferences.Several studies tried to understand which parameters could influence alkenone occurrence in lakes comparing lakes with and without alkenones in Europe (Cranwell, 1985;Zink et al., 2001;Pearson et al., 2008), Asia (Chu et al., 2005;Liu et al., 2011;Zhao et al., 2014;McColl, 2016;Song et al., 2016;Yao et al., 2019;He et al., 2020;Yao et al., 2021;Yao et al., 2022;Bulkhin et al., 2023), North America (Toney et al., 2010;Toney et al., 2011;Longo et al., 2016;Plancq et al., 2018a), Greenland (D' Andrea and Huang, 2005;D' Andrea et al., 2011) and globally distributed lakes (Longo et al., 2018).Simple comparisons, principal component analysis (PCA) or logistic regressions were used to determine which environmental factors might influence alkenone occurrence and abundance in the various datasets.Plancq et al. (2018a) was the first study to use a model -a binomial regression model, member of the family of generalized linear models -which allowed the authors to test and compare the importance of several variables for alkenone occurrence and abundance.They found salinity, water temperature, lake depth, stratification and pH to be the main controls of alkenone occurrence in 106 Canadian prairie lakes, including mainly saline lakes.However, very few of these studies focused on freshwater lakes.As freshwater and saline lakes do not host the same Isochrysidales groups, the parameters influencing the occurrence of Group 1 in freshwater lakes and Group 2 in saline lakes could be different.Moreover, all studies, so far, were focused on a specific region with limited ranges for some environmental variables.
Here we investigate alkenone occurrence and producer diversity in 56 Swiss freshwater lakes for which numerous environmental data was collected.To assess which environmental variables control the occurrence of alkenones in Swiss lakes, we use a new approach based on a type of machine learning, random forest (RF, Breiman, 2001).With this non-linear model, we seek to identify the best predictors of alkenone occurrence in Swiss lakes.We combine our data with all previous data available on presence/absence of alkenones in global freshwater lakes (total number of 396 lakes) and compare the results obtained with the models trained exclusively with Swiss lakes and with both Swiss and global lakes.The RF model assesses the importance of each environmental variable for the prediction of alkenone occurrence.Investigating these statistical relationships can help reveal biological mechanisms and thus, improve our understanding of the ecological preferences for Isochrysidales.
2 Material and methods

Sites and sampling
For this study, 56 freshwater lakes were studied: 55 Swiss lakes and one in France, close to the border with Switzerland (Figure 1A; Table 1).Surface sediments were collected between 2011 and 2020 with a gravity corer from the deepest point in the lake, whenever possible.The cores were stored at 4°C until sampling.
The geographic characteristics of the studied lakes span a wide-range of physical gradients, including an altitudinal gradient from 193 to 2,447 m, maximal depth from 3 to 372 m and mean annual air temperature (MAAT) from −0.1°C to 14.3°C (Table 1; Supplementary Table S1).Lakes were sampled in the two main geological ensembles of Switzerland: the Swiss Plateau, Jura mountains and the external zone of the Alps, covered with sedimentary rocks, and the internal zone of the Alps, characterized by crystalline bedrock (Figure 1A; Supplementary Table S1).

Environmental parameters
The physico-chemical parameters of the surface waters (0-15 m) of the Swiss lakes (Table 1; Supplementary Table S1) are from long-term monitoring projects conducted by the environmental agencies of Swiss cantons and by Eawag (www.datalakes-eawag.ch).Data was obtained from the Naïade database (naiades.eaufrance.fr) for the French lake, Lac des Rousses.Data for Lake Constance have been provided by the Bodensee-Wasserinformationssystem (BOWIS) database, which is managed by the Internationalen Gewasserschutzkommission fur den Bodensee (IGKB).The data for Lake Lugano was provided by the International Commission for the Protection of Italian-Swiss Waters (CIPAIS, www.cipais.org).When data was not available, we measured pH, conductivity and oxygen concentration with a WTW multi-parameter sonde at 0.5 m and we sampled 1 L of water at 0.5 m below the surface to measure major ions and trace elements.Water temperatures were not available for all lakes, so we used mean annual air temperatures from the closest MeteoSwiss meteorological stations and corrected for any altitudinal Frontiers in Earth Science 02 frontiersin.orgdifference applying a lapse rate of 0.6°C/100 m (Gandouin et al., 2016).When possible, a mean of the 10 years preceding the coring was calculated for all physico-chemical parameters and MAAT, assuming that 1 cm of sediment integrates on average 10 years of sedimentation (Supplementary Table S2).
Salinity was equated to the total dissolved solids (TDS), which was calculated from conductivity using the equation from Pawlowicz and Feistel (2012): where TDS is expressed in mg/kg, which corresponds to mg/L in freshwater lakes, and κ 25 , the conductivity at 25°C in µS/cm.According to Pawlowicz and Feistel (2012), this method results in an error within about ± 20%.The calculated salinities for Swiss lakes are very low (0.01-0.73 g/L, Table 1) and so are the errors.Therefore, this approximation seems to be good enough to compare with other lakes.
The mixing regime of lakes was deduced from long-term water temperature monitoring, modeling (simstrat.eawag.ch)and the literature.When the stratification status was unknown, we deduced it using the following method: for each lake, we calculated the thermocline depth from lake area according to Hanna (1990), (Eq.2) and we compared the resulting thermocline depth with the actual depth of the lake.If the depth of the lake was at least 2 m deeper than the calculated thermocline depth, we classified the lake as stratified.
Log THER = 0.185 Log (A) + 0.842 (n = 167, r = 0.91 and RMS = 0.009) (2) where THER is the thermocline depth in m and A the lake area in km 2 (RMS = residual mean square).
For each lake, the geological catchment was classified as sedimentary or crystalline using the Swiss geological map provided by SwissTopo and the Georesources Switzerland Group (map.geo.admin.ch).When lakes had both sedimentary and crystalline rocks in their catchment, they were attributed to the class of the dominant rock type.

Global database
In order to compare the results found in Swiss lakes with previous results from the literature, we collected all the data available on global freshwater lakes investigated for alkenone presence and constructed a global database (Figure 1B; Supplementary Table S3).We only considered surface or subsurface sediments.When salinity data was available, we used a limit of 3 g/L; Lake Little Manitou (Canada) and Yarkov basin of Chany Lake (Russia) exceeded this limit (salinity of 3.62 and 7.1 g/L, respectively) but were included in the database as the alkenone distribution indicated the presence of Group 1 alkenones (Plancq et al., 2018a;Krivonogov et al., 2023).Otherwise, we selected lakes classified as fresh.All lakes only containing Group 2 alkenones were excluded.The database includes 340 lakes globally distributed, among which 103 lakes contain alkenones: 67 host Group 1 alkenones including 32 where Group 1 was genetically confirmed, 10 host a mix of Group 1/2 alkenones including 3 where the mixing was genetically confirmed (Figure 1B; Supplementary Table S3).For the remaining 26 lakes, it was not possible to determine which alkenone group was present.However, as the probability of Group 1 presence is very high for lakes with salinities lower than 6 g/L (Yao et al., 2020), it is likely that they host Group 1 alkenones or a mix of Group 1/2.The WorldClim database (Fick and Hijmans, 2017) was used to provide MAAT when it was not provided.We used the method described above to deduce the stratification status when it was not mentioned.

Sample preparation
For each lake, the top 0-1 or 1-2 cm sediments were sampled and freeze-dried.1-5 g of sediments were ground and homogenized before extraction with an accelerated solvent extraction system ASE350 (Dionex) with dichloromethane:methanol (DCM:MeOH 9:1, v:v) at 120°C and 1,200 psi.The total lipid extracts (TLEs) were split in two equal parts.One part was saponified by adding 1 mL of 1 M KOH in MeOH:H 2 O (95:5, v:v).The mixture was heated for 3 h at 65°C.After cooling to room temperature, NaCl in H 2 O (5%) was used to quench the solution, which was then acidified to pH 2 with concentrated HCl in H 2 O.The lipid fraction was extracted with hexane (100%) three times and cleaned through a silica gel column with DCM (100%).
The saponified and non-saponified parts of the TLE were separated by silica gel chromatography into alkane, ketone and polar fractions using hexane, DCM and MeOH, respectively.The ketone fraction of some samples were further purified to remove coeluting compounds interfering with alkenones using silver nitrate impregnated silica gel (D' Andrea et al., 2007) with DCM (100%) followed by ethyl acetate (100%).The alkenones eluted in the last fraction.

Alkenone analysis
The alkenone fractions were analyzed using an Agilent 7890B gas chromatography (GC) system equipped with a flame-ionization detector (FID) following methods described in Martin et al. (2023).18-pentatriacontanone was added to the alkenone fractions and saponified alkenone fractions before injection as an internal standard for quantification.Samples were dissolved in hexane and introduced to the GC system using splitless injection (320°C).Hydrogen was used as the carrier gas.Samples were analyzed with three different methods using the Agilent VF-200ms column (60 m × 250 μm × 0.10 μm) or the Restek Rtx-200 column (105 m × 250 μm × 0.25 μm) with parameters showed in Supplementary Table S4.Both columns were shown to provide very similar results (Martin et al., 2023).
Alkenone peaks were identified by comparing GC retention time with those of a culture of Group 2 Ruttnera lamellosa RCC3687 and published data.The repeatability of the measurements was assessed by measuring several samples several times, a few days apart.The mean of standard deviations for the calculated RIK 37 (see Eq. 4) was 0.019 (n = 31), 0.032 (n = 16) for the RIK 38E index (see Eq. 5) and 3.8% (n = 27) for the C 37 alkenone quantification. 10.3389/feart.2024.1409389

Chromatogram selection and correction
The three analytical methods using different GC columns provide equivalent results (Martin et al., 2023).Therefore, for each lake, we chose the GC method that had the strongest signal and the best separation of alkenone peaks (Supplementary Table S5).
Saponification was sufficient in removing most of the co-eluting compounds in the elution zone of the C 37 alkenones.Saponification did not alter the original alkenone distribution (Martin et al., 2023).Therefore, saponified samples were preferentially selected, except in cases where the signal was too weak.Some samples went through an additional silver-nitrate purification after the saponification.Silver-nitrate purification led to significant changes in the C 37 alkenone distribution for most of the samples (Martin et al., 2023), which were, thus, excluded.Only two samples that did undergo silver-nitrate purification, were selected as their C 37 alkenone relative abundances remained unchanged after purification.
Saponification was shown to reduce C 37 alkenone concentrations by almost half on average (Martin et al., 2023).Therefore, the C 37 alkenone concentrations of the saponified samples had to be corrected.For each saponified sample, the concentration change due to saponification was calculated for each C 37 alkenone as the ratio between the concentration before and after saponification {Δ(C 37:m ) saponification = [(C 37:m ) after saponification /(C 37:m ) before saponification ], where m indicates the degree of unsaturation ranging from 2 to 4}.An average ratio for all C 37 alkenones ( Δ(C 37 ) saponification ) was calculated for each saponified sample excluding the alkenones which underwent the removal of a co-eluting peak due to saponification.The inverse of this ratio (1/ Δ(C 37 ) saponification ) was then used as a correction factor to multiply the concentration obtained for saponified samples in order to correct the decrease of C 37 alkenone concentrations caused by saponification.The same correction method was applied to the two samples that underwent the additional silver-nitrate purification.
Such corrections were difficult to implement for the C 38 and C 39 alkenones.First, the corresponding portions of the chromatograms were often disturbed by co-eluting compounds.Second, when it was possible to calculate concentration changes due to saponification, for each sample, the ratios obtained were often very different for each of the C 38 and C 39 alkenones.Therefore, we only discuss the C 37 alkenone concentrations in this manuscript.These corrections affect only the concentrations and do not affect the indices, which are based on ratios (see Section 2.7).

Alkenone-based indices
The relative abundance of each C 37 alkenone to the total abundance of C 37 alkenones was calculated as proposed by Rosell-Melé (1998): where m refers to the number of double bonds, which ranges from 2 to 4.

Modeling alkenone presence or absence
In order to investigate which environmental variables could influence the presence or absence of alkenones in Swiss lakes, we used our Swiss dataset to train a random forest (RF) model.The model uses the environmental variables to classify the lakes into two categories: presence or absence of alkenones.To do so, we used the R package randomForest (Liaw andWiener, 2002) on R (v4.2.3, R Core Team, 2023).We also used our global dataset, which includes all freshwater lakes previously investigated for alkenone presence in the literature and the Swiss lakes, to train a global RF model.Comparing the results of both models will allow us to assess whether the behavior of alkenone producers in freshwater lakes towards environmental variables is similar at regional and global scales.

Data preparation
We removed two lakes from our Swiss dataset as too many environmental variables were missing (Lakes Cama and Ritom, Supplementary Table S1).For the global model, we combined the dataset from the 54 Swiss lakes with an additional 78 global lakes for which major ion concentrations were available, except five Greenland lakes for which salinity and sulfate (SO 4 2− ) concentrations were missing (total of 132 lakes out of the 396 lakes of the combined global and Swiss datasets, Supplementary Table S6).
For both models, the two categories (presence or absence of alkenones) are only slightly unbalanced (respectively, 61% and 44% of lakes with alkenones for the Swiss and global models, and 39% and 56% of lakes without alkenones).The concentrations below the detection limit (DL, 9 ion concentration data out of 972 and 1716 total data for the Swiss and global models, respectively) were substituted by the DL divided by 2 following the recommendation of Farnham et al. (2002).Substitution is a debated approach but for a small proportion of non-detects and low DL, it is considered a valid approach (Adjei and Stevens, 2022).The missing data (0.8% and 0.5% of the data for the Swiss and global models, respectively) were imputed using the impute() function of the R package randomForest (Liaw and Wiener, 2002).

Variable selection
Random forest is resistant to multicollinearity (Breiman, 2001;Liaw and Wiener, 2002) but the variable importance benefits from a reduction of the level of correlation between the explanatory variables.The Pearson correlation matrix for the Swiss dataset shows that salinity is highly correlated with conductivity, elevation with MAAT, and the chloride (Cl − ) concentration with sodium (Na + ) concentration (|r| > 0.9, Supplementary Table S7).Conductivity is also strongly correlated with the calcium (Ca 2+ ) concentration (r = 0.88, 10.3389/feart.2024.1409389Supplementary Table S7), which is the predominant ion for almost all studied lakes (Supplementary Table S1), and the magnesium (Mg 2+ ) concentration (r = 0.76, Supplementary Table S7).The highest correlation among the other variables is 0.76 (Supplementary Table S7).We trained a first model (RF1, Supplementary Table S8) with 15 variables excluding the variables with a Pearson correlation coefficient whose absolute value was higher than 0.9 (salinity, elevation and Cl − ).A second model (RF2, Supplementary Table S8) was trained after further excluding Ca 2+ concentration.We chose to keep conductivity as it is considered an important variable for explaining alkenone occurrence in previous studies (D ' Andrea and Huang, 2005;Longo et al., 2016).We also removed the least important parameters keeping (RF3) or excluding Ca 2+ (RF4, Supplementary Table S8).
For the global dataset, we selected only the variables for which data was available for almost all selected global and Swiss lakes (Supplementary Table S6).We excluded conductivity since data was lacking for many of the lakes in the global database.Among the 13 selected variables (Supplementary Table S8), the Pearson correlation matrix revealed high correlations between salinity and Na + and SO 4 2− concentrations (r = 0.88 and 0.86, respectively, Supplementary Table S9).SO 4 2− concentration is also correlated with Mg 2+ and Na + concentrations (r = 0.87 and 0.77, respectively, Supplementary Table S9).Mg 2+ and potassium (K + ) concentrations are also correlated (r = 0.75, Supplementary Table S9).The highest remaining correlation among the other variables is 0.63 (Supplementary Table S9).We trained a first model (RFG1, Supplementary Table S8) with all 13 variables and a second one (RFG2, Supplementary Table S8) that excluded salinity, SO 4 2− and Mg 2+ concentrations.

Hyperparameter optimization
We selected the best model hyperparameters mtry (see Supplementary Text S1) using 7-fold cross-validation (CV, see Section 2.8.5) with random splitting.The accuracy, i.e. the proportion of correctly classified samples among the total number of samples (see Supplementary Text S2), was used as the metric to evaluate the performance of the model.The best performance was obtained for a mtry value of 6 for the Swiss models RF1 and RF2, 3 for the Swiss models RF3 and RF4, and 4 for the global models (Supplementary Figures S1A,B; Supplementary Table S8).ntree was chosen so that the model could reach stability (ntree = 2000 for all the Swiss models and ntree = 3,000 for the global models, Supplementary Figures S1C,D; Supplementary Table S8).

Variable importance
The importance of each variable can be quantified by the mean decrease in accuracy (MDA) and the mean decrease in Gini (Gini).MDA represents the decrease in accuracy associated with the removal of a given explanatory variable; the higher the decrease in accuracy, the higher the importance of the variable.Gini measures the loss of purity of the nodes (see Supplementary Text S1) caused by the exclusion of a given variable.The node purity is linked with the importance of the variable in the model so that the higher the loss of node purity, the higher the importance of the variable.
We trained the random forest model with the entire Swiss dataset to obtain a better evaluation of the importance of the environmental variables.We relied on the CV process (see Section 2.8.5) to assess the performance of the model and the robustness of the variable importance analysis.The same method was used with the global dataset in order to allow comparisons between both models.The importance results of the models, which are based on statistical relationships, indicate potential biological mechanisms.

Cross-validation of the model
Seven-fold CV was performed to evaluate the model performance and the variable importance analysis.This method randomly split the dataset into seven subsections; six are used for training the model while the remaining one is used for validating.
The training and validating of the model were repeated seven times while shuffling the subsections used for training and validating.The accuracy and importance were reported for each fold.

Accumulated local effects (ALE) plots
ALE plots allow us to isolate the relationship of a given explanatory variable with the predicted outcome of the model (Molnar, 2020).They show the evolution of the prediction of the model across the range of values of each variable.ALE can be used when the variables are correlated (Molnar, 2020).They can reveal complex relationships, for example, curves with an optimum.ALE were obtained using the FeatureEffects() function from the R package iml (Molnar et al., 2018) and plotted with ggplot 2 (Wickham, 2016).
For each variable included in the models, the ALE plots show the evolution of the probability of alkenone occurrence across the range of values taken by each variable.Ranges for which the probability is positive are favorable for alkenone occurrence, while negative probability reflects unfavorable conditions.Since we could include only a limited proportion of the lakes from the global database (132 lakes out of 396) because of numerous missing data, we compared the ALE plot results with the distribution of the entire dataset.For each variable considered, we compared the relative frequency distributions of lakes with alkenones, lakes without alkenones and all the lakes (frequency distribution divided by the total number of samples for each category, Supplementary Figure S2).For a given range, if the relative frequency of lakes with alkenones (f(Alkenones)) is higher than the one of lakes without alkenones (f(No alkenones)), it means that the proportion of lakes with alkenones is higher than the one of lakes without alkenones; then the considered range is favorable for alkenone occurrence.Therefore, looking at the difference between f(Alkenones) and f(No alkenones) highlights the favorable (f(Alkenones) − f(No alkenones) > 0) and unfavorable (f(Alkenones) − f(No alkenones) < 0) ranges for alkenone occurrence.For a given variable, if the distribution of lakes with alkenones is very close to the one of lakes without alkenones, then the variable has not much impact on alkenone occurrence.
Unfortunately, we could not train a random forest to investigate which variables influence alkenone concentrations; the dataset (n = 52) was too small.However, for each variable, we plotted the C 37 alkenone concentrations of all the lakes containing alkenones in the global dataset for which the C 37 alkenone concentration was available, to detect the most favorable conditions for high alkenone concentrations.Alkenone concentrations are available for only a part of the global lakes and among them C 37 alkenone concentration were not available for a few lakes: the German lakes from Zink et al. (2001) and the Greenland lakes from D' Andrea and Huang (2005).
All the lakes containing alkenones displayed the tri-unsaturated C 37 alkenone isomer (C 37:3b ) and when alkenones were in sufficient abundance, the alkenone distribution of the lakes featured the complete suite of alkenones including the C 38 Me, C 39 Et alkenones and the other tri-unsaturated alkenone isomers (C 38:3b Et, C 38:3b Me and C 39:3b , Figure 2; Table 2).For most lakes, the C 37:4 alkenone was the most abundant with C 37:4 relative abundances ranging from 36.3% to 58.9% of the total C 37 alkenones and an average value of 45.6% (Figure 2A; Table 2; Eq. 3).However, twelve lakes had a C 37:3a dominant profile with C 37:3a relative abundances ranging from 37.6% to 47.4% of the total C 37 alkenones (mean of 42.8%, Figure 2B; Table 2).

Model performance and variable importance
The first random forest model for Swiss lakes (RF1) resulted in an accuracy of 78% (mean accuracy of 76% across the CV folds with a standard error of 12%, Supplementary Table S8), this corresponds to the proportion of test samples correctly classified by the model (Supplementary Text S2).The model was slightly more efficient at correctly classifying lakes with alkenones (sensitivity = 78%, see Supplementary Text S2) than the lakes without alkenones (specificity = 77%, Supplementary Table S8).Reducing the correlations among the variables (RF2) led to very similar results (accuracy of 78%, mean accuracy of 72% ± 3% across the CV folds, Supplementary Table S8).Removing the parameters with negative MDA values slightly improved both model performances (Supplementary Table S8).
The model for the global database and Swiss lakes including all variables (RFG1) resulted in an accuracy of 81% (mean accuracy of 80% ± 2% across the CV folds), a sensitivity of 78% and a specificity of 84%.The performance of the model remained very similar when the correlations among the variables were reduced (RFG2, accuracy of 83%, mean accuracy of 80% ± 3% across the CV folds, Supplementary Table S8).
The indices for variable importance show that Na + concentration and MAAT are the most important parameters for the Swiss dataset (Figure 5A; Supplementary Figure S3).S3) highlighting the robustness of the variable importance analysis.

Probability of alkenone occurrence in freshwater lakes
The ALE plots show the evolution of the probability of alkenone occurrence across the range of values of a given variable.They were compared with the relative frequency distributions of lakes with and without alkenones considering the entire dataset.For almost all variables, both the ALE plots obtained from the Swiss and global models and the distributions of lakes with and without alkenones included one or several optimum(s).

Influence of physical parameters
There are two optimal MAAT ranges for alkenone occurrence: from −17°C to 2°C and between 10°C and 12°C (Figures 6A1-A3).High alkenone concentrations are found within similar temperature ranges (<−3°C, between 0°C and 5°C, and around 10°C, Figure 6A4), most of them being found at MAAT lower than 5°C.The range between 10°C and 12°C is the most favorable for alkenone occurrence but for alkenone abundance, the most favorable range is below 5°C.All the lakes hosting both Group 1 and Group 2 alkenones have MAAT higher than 0°C (except for North Killeak Lake, whose MAAT is −5°C) and most of them are concentrated between 8°C and 12°C (Supplementary Figure S4); whereas most of the lakes containing only Group 1 alkenones have MAAT lower than 6°C, with a peak between −10 and −8°C.We note that in the highest part of the occurrence range of alkenones (12°C-14°C), there are only alkenones whose producers are undetermined (Supplementary Figure S4), making uncertain the upper MAAT limit of Group 1 alkenone occurrence.
The optimal range for alkenone occurrence is found in lakes with depths ranging from 8 to 200 m (Figures 6B1-B3).The best conditions correspond to lakes with depths ranging from 10 to 50 m, where most of the highest alkenone concentrations are also found (from 6 to 15 m and between 20 and 45 m, Figure 6B4).Mixing of Groups 1 and 2 Isochrysidales are frequent in deep lakes (100-200 m, Supplementary Figure S4), while Group 1 alone are rarer in such lakes.
Stratified lakes are more favorable for alkenone occurrence than mixed lakes (Figures 7A,B).74% of the mixed lakes in the entire global dataset are devoid of alkenones against 33% of the stratified lakes (Figure 7C).Stratified lakes also host the highest alkenone Examples of partial GC-FID chromatograms associated with the RIK 37 values for the two typical alkenone distributions found in the studied Swiss lakes: Lake Taney with C 37:4 dominant profile (A) and Lake Lucern with C 37:3a dominant profile (B).
concentrations and have higher mean alkenone concentrations than mixed lakes (2.8 and 1.6 μg/g sed, respectively, Figure 7D).Small (<0.8 km 2 ) and mid-sized lakes (from 8 to 25 km 2 , Figures 6C1-C3) are favorable for alkenone occurrence.The highest alkenone concentrations are also found in these two ranges (<1 km 2 and between 6 and 15 km 2 , Figure 6C4).Alkenones are more frequent in lakes at low to moderate elevations (Supplementary Figures S2, S5).

Influence of major ions
Ca 2+ concentrations lower than 50 mg/L are the most favorable for alkenone occurrence and abundance, even if high concentrations are also favorable, to a lesser extent (Figures 8F1-F4).However, the distribution of the lakes with alkenones depending on Ca 2+ concentration is very similar to the one of the lakes without alkenones as well as the one of all studied lakes (Supplementary Figure S2).This suggests that Ca 2+ concentration has not much impact on alkenone occurrence as also indicated by the global model (Figure 5B, Supplementary Figure S3).
For the remaining considered ions, the optimal range for alkenone occurrence is found at low concentrations: between 0.3 and 8 mg/L for K + and lower than 25 mg/L for the other ions (Figures 8A1-E3).Most of the highest alkenone concentrations are included in these ranges and are generally divided into two peaks: one at very low ion concentrations (<∼ 2 mg/L) and another in the high part of the range (between 2 and 4.5 mg/L for K + and between ∼7 and 20 mg/L for the other ions, Figures 8A4-E4).It seems that there is a threshold for alkenone occurrence corresponding to a Na +   concentration close to 1 mg/L.It is not a strict threshold though, as alkenones are present in Lake Taney (Switzerland), which has a Na + concentration of 0.4 mg/L and hosts the second highest alkenone concentration of the database (Supplementary Tables S1, S3).
A small group of lakes with high ion concentrations contain high alkenone concentrations, including Lake Matarak, which hosts the highest alkenone concentration of the database (Figures 8A4-E4; Supplementary Table S3).Among these lakes, North Killeak Lake in Alaska, contains Group 2 Isochrysidales in very small quantities together with Group 1 and has the   S3; Supplementary Figure S4).Two lakes containing alkenones whose group is unknown also have the highest SO 4 2− concentrations of the global database (Figure 8D4; Supplementary Table S3; Supplementary Figure S4), but lakes containing Group 1 alkenones have close SO 4 2− concentrations.
Apart from North Killeak Lake, the lakes hosting both Group 1 and Group 2 Isochrysidales have a distribution similar to the one of the lakes containing only Group 1; even if their occurrence range is often narrower, which is likely due to their smaller number.In particular, the mixing of both groups is not found at Cl − concentrations lower than 1 mg/L, while almost one-third of the lakes hosting the Group 1 alone are found below this value (Supplementary Figure S4).The situation is similar for SO 4 2− concentrations, where lakes with mixed Group 1/2 are mainly concentrated in the range 10-50 mg/L, while lakes with Group 1 are more widely distributed (Supplementary Figure S4).

Influence of salinity, conductivity, alkalinity and pH
In freshwater lakes, low salinities appear as the most favorable for alkenone occurrence (<0.1 and between 0.2 and 0.6 g/L, Figures 9A1-A3).The highest concentrations are also found at low salinities (<0.6 g/L) but there is another peak of high alkenone concentrations between 1 and 1.5 g/L (Figure 9A4).
As found for conductivity, low to moderate alkalinity values (from 1 to 100 mg/L) are the most favorable for alkenone occurrence and abundance (Figure 10).The distribution of lakes with mixed Group 1/2 alkenones is different from the one of Group 1 alone but this is mainly due to the reduced number of data for lakes with mixed Group 1/2 (Supplementary Figure S4).
The most favorable conditions for alkenone occurrence are found for pH ranging from 7.0 to 8.5, especially from 7.5 to 8.5 (Figures 9C1-C3).Most of the highest alkenone concentrations are also found in this range (Figure 9C4).

Influence of nutrients and trace elements
Low concentrations of TN, TP (<1.5 and <0.1 mg/L, respectively, Figures 11A-D) and trace elements (Supplementary Figure S6) are the most favorable for alkenone occurrence.The highest alkenone concentrations are also found at low TN, TP (<0.1 mg/L for TP and <1 mg/L for TN, Figures 11E,F) and trace element concentrations (Supplementary Figure S7).
Increasing probabilities of alkenone presence were associated with cold or mild temperatures, small to mid-sized stratified freshwater lakes with depths ranging from 10 to 50 m, low ion concentrations, low salinities, low to moderate conductivity and alkalinity values, moderately alkaline pH (7.0-8.5) and low nutrient and trace element content.These favorable conditions for alkenone presence generally coincide with the ranges where alkenones are present in high concentrations in our global dataset of freshwater lakes.

Alkenone distributions and diversity in Swiss lakes
Alkenones were detected in 59% of the studied lakes (Figure 1A; Table 1).The concentrations of C 37 alkenones in Swiss lakes (from 0.1 to 20.0 μg/g, mean 1.9 μg/g, Table 2) are similar to those of the global database (from 0.01 to 27.0 μg/g, mean of 2.5 μg/g, Supplementary Table S3).The highest alkenone concentrations were reported in Greenland Lake BrayaSø (82.7 mg/g TOC, D' Andrea and Huang, 2005).
The tri-unsaturated C 37 alkenone isomer (C 37:3b ), which is specific to the Group 1 Isochrysidales (Longo et al., 2016) is present in all the Swiss lakes containing alkenones, as well as the complete suite of alkenones including the C 38 Me, C 39 Et alkenones and the   other tri-unsaturated alkenone isomers (C 38:3b Et, C 38:3b Me and C 39:3b ), when alkenones were in sufficient abundance (Figure 2; Table 2).Most lakes had distributions dominated by the C 37:4 alkenone (36.3%-58.9% of the total C 37 alkenones, mean of 45.6%, Figure 2A; Table 2), which are characteristic of Group 1-type alkenones.Longo et al. (2016) defined the isomeric ratio of ketones RIK 37 (Eq.4) based on the specificity of the C 37:3b isomer to the Group 1 Isochrysidales to differentiate the Group 1 alkenone distributions from Group 2 and Group 3 distributions.The majority of our lakes (20 lakes) had RIK 37 values ranging from 0.55 to 0.64 (mean of 0.61, Figure 3; Table 2).This falls within the RIK 37 range (0.48-0.64) defined by Longo et al. (2018) for freshwater lakes in the Northern Hemisphere containing Group 1-type alkenones (Supplementary Table S10).RIK 37 values of 1, in contrast, indicate that the alkenones are only produced by Group 2 or Group 3 Isochrysidales.
Twelve lakes had a C 37:3a dominant profile (37.6%-47.4% of the total C 37 alkenones, mean of 42.8%, Figure 2B; Table 2).These 12 lakes had RIK 37 values higher than 0.64 (0.64-0.76 with a mean value of 0.68) except for Lake Lungern, which had a RIK 37 value of 0.63 (Figure 3; Table 2).This suggests that these 12 lakes likely contain both Group 1 and Group 2 Isochrysidales.Lake Joux had a RIK 37 value higher than 0.64 (0.70), even though it had a distribution characteristic of Group 1 Isochrysidales with a dominant C 37:4 peak (Figure 3; Table 2).However, another compound co-eluted with the C 37:3b alkenone, which persisted even after saponification and silver-nitrate purification, and likely biased the RIK 37 value.
The C 38:3b Et isomer can also be used to separate alkenone distributions by phylotype through the isomeric ratio of ketones RIK 38E defined by Longo et al. (2016) (Eq. 5).Unfortunately, we were not able to calculate the RIK 38E values for all the samples due to low abundances or the presence of co-eluting compounds.
However, in Swiss lakes where we were able to calculate the RIK 38E index, the values were lower than 0.57 (0.17-0.57, mean value of 0.39) except in Lakes Taillères, Rot, Lucern and Mauen (RIK 38E values of 0.59, 0.69, 0.73 and 0.83, respectively, Figure 4; Table 2).The C 38:3b Et isomer is produced by some Group 2 Isochrysidales in trace amounts, therefore RIK 38E values ranging from 0.75 to 1 are inferred as containing Group 2 Isochrysidales while values between 0 and 0.57 likely reflect Group 1-type alkenones in Northern Hemisphere lakes (Supplementary Table S10; Longo et al., 2016;Longo et al., 2018).Based on the RIK 38E index, the majority of our lakes likely contain Group 1 alkenones.
Combining RIK 37 and RIK 38E values for Group 1 and Group 2 Isochrysidales from the literature with our data allows us to confidently infer, in agreement with our previous conclusions, that the majority of Swiss lakes likely contain only Group 1 Isochrysidales (Figure 4).Lakes Rot, Lucern and Mauen are outside of the Group 1 range (Figure 4), as well as the eight other lakes with RIK 37 values higher than 0.64 (Figure 3; Table 2).The higher RIK 37 values are consistent with lakes that host a mix of Group 1 and Group 2 Isochrysidales (Figure 4; Supplementary Table S10).The RIK 37 and RIK 38E values of these 11 lakes remain closer to the Group 1 haptophyte upper limits than the Group 2 Isochrysidales lower limits, suggesting that Group 1 Isochrysidales may be more abundant in these lakes than Group 2 Isochrysidales.
Lake Taillères stands at the limits of the Group 1 range (Figure 4), as well as Lake Burgäschi (Figure 3).The RIK 37 values of these two lakes (0.63 and 0.64, respectively, Table 2) are less than or equal to the upper limit of the RIK 37 values of lakes hosting genetically confirmed Group 1 Isochrysidales (0.64, Supplementary Table S10), which was recorded in Lake Schmaler Luzin in Germany (Longo et al., 2018).Thus, these two lakes are included in the Group 1 range.However, the RIK 38E value Impact of nutrient concentrations on alkenone occurrence and abundance.ALE plots for the Swiss RF model (A,B).The dashed blue line represents ALE = 0, indicating that predictions are not significantly affected.The density of feature distribution is shown on the x-axis, with each tick corresponding to one lake.Regions with low density should be interpreted with caution.Difference between the relative frequencies of Swiss and global lakes with and without alkenones depending on each tested variable (C,D).Red (black) hatching indicates favorable (unfavorable) ranges of values for alkenone occurrence (see Section 2.8.6).Distribution of alkenone concentrations depending on each tested variable (E,F).Group 1 alkenones are noted with blue symbols and mixed Group 1/2 with orange ones.Swiss lakes have round symbols while lakes from the global database are noted with diamonds.Red shaded areas highlight the ranges where f(Alkenones) − f(No alkenones) is positive.The total number of lakes where alkenone concentration was measured is indicated.Note that we zoomed in on the concentrations below 1.5 μg/g.10.3389/feart.2024.1409389 of Lake Taillères slightly exceeds the known range of RIK 38E values for Group 1 Isochrysidales (0.59 versus 0.57, Table 2; Supplementary Table S10), suggesting that the range of RIK 38E values for Group 1 Isochrysidales should be extended.Yao et al. (2019) pointed out that the primers used in many of the marker gene analyses would not pick up Group 1 Isochrysidales belonging to the Group 1b (formerly EV clade).Thus, the true range of lakes harboring Group 1 Isochrysidales is not fully considered.The alternative is that Lake Taillères hosts a small proportion of Group 2 Isochrysidales.
The twelve lakes likely containing a mix of Group 1 and 2 Isochrysidales have a higher proportion of C 37:3a than C 37:4 (mean of 42.8% vs. 28.8%, Figure 2B; Table 2).Previous studies, identified three subclades within Group 2 that correspond to different ecological niches within saline lakes: Group 2i and 2w1 that mainly occur at low and intermediate salinities; and Group 2w2 that prefers to occur in hypersaline lakes (Wang et al., 2021;Yao et al., 2022).The mixed alkenone profiles found in the Swiss lakes, likely correspond to Group 1 and 2w1 Isochrysidales.Typical chromatograms of dominant Group 2w1 contain a higher proportion of C 37:3a compared to the C 37:4 alkenone; unlike Group 2i, which is characterized by a high C 37:4 proportion (Yao et al., 2022).Moreover, the characteristic alkenone of Group 2i, the C 39:4 Me alkenone, is absent from our chromatograms (Yao et al., 2022; Figure 2B).One likely scenario is that the ice-associated Isochrysidales are represented by Group 1's in these lakes -often detected during ice-off.The presence of Group 2w2 seems unlikely as these alkenone producers prefer hypersaline lakes (Yao et al., 2022).Moreover, Swiss lakes correspond to the known ecological preferences of Group 2w1 Isochrysidales: they have low salinities and low abundances of Na + and Cl − (Supplementary Table S1) (Yao et al., 2022).
Group 2 Isochrysidales are mainly found in oligohaline to hyperhaline lakes (e.g., Longo et al., 2016;Yao et al., 2020;2022).The transition from Group 1 to Group 2 Isochrysidales has been found to occur across a salinity range of ∼1-10 g/L (Yao et al., 2020).However, here we report Group 2 alkenones in 12 lakes with salinities lower than 0.45 g/L (Table 1).Yao et al. (2019) and Wang et al. (2019) also detected Group 2 Isochrysidales, in small number, in freshwater lakes from China and Alaska based on genomic analyses; while Yao et al. (2021) found Group 2 Isochrysidales together with Group 1 in 5 Chinese lakes with salinity ranging from 0.7 to 2.07 g/L.Therefore, Group 2 Isochrysidales seem to be more common than initially thought in lakes with low salinities.
In conclusion, all the studied lakes in Switzerland containing alkenones have a characteristic Group 1 signature.The alkenone distributions of 12 lakes indicate that they likely contain both Group 1 and Group 2, more specifically Group 2w1 Isochrysidales, with the Group 1 being present in higher abundance.Marker gene analyses will be conducted in the future to further explore the composition of the Isochrysidales communities in Swiss lakes.Alkenones were also found in freshwater lakes in the United Kingdom, Germany, and France (Cranwell, 1985;Zink et al., 2001;Simon et al., 2013;2015;Figure 1B), suggesting that alkenones are common in mid-latitude European freshwater lakes.

Variable importance
Both Swiss and global models found Na + concentration and MAAT among the most important variables for alkenone occurrence (Figure 5; Supplementary Figure S3).Depth appears less important in the Swiss model compared to the global model, where it is the most important variable (Figure 5; Supplementary Figure S3).These results are consistent with those of the model of Plancq et al. (2018a): where water temperature and depth were among the most important parameters influencing alkenone occurrence, while stratification and pH appeared less important (Figure 5B; Supplementary Figure S3).However, Plancq et al. (2018a) found salinity to be the main variable determining alkenone occurrence, whereas it is one of the least important parameters in our global RF model (Figure 5B; Supplementary Figure S3); although salinity is highly correlated with Na + and SO 4 2− concentrations (r = 0.88 and 0.86, respectively, Supplementary Table S9), which rank among the most important parameters.Na + is a dominant cation in 45% of the lakes for which major ion compositions are available in the entire global dataset (n = 168, Supplementary Tables S1, S3) but Ca 2+ is dominant in 52% of them, Mg 2+ in 14% and K + in 1%.The proportions are similar in the lakes used for the global model (44% for Na + , 58% for Ca 2+ , 11% for Mg 2+ and 1% for K + , Supplementary Table S6).In fact, salinity is more correlated with the sum of the cations than with Na + alone (R 2 = 0.89 and 0.77, respectively, Supplementary Figure S8).Therefore, in freshwater lakes, salinity is also influenced by other ions which are less important for alkenone occurrence than Na + or SO 4 2-such as Ca 2+ and Cl − (Figure 5; Supplementary Figure S3).This could explain the low importance of salinity in our model.On the other hand, Na + is the main ion responsible for salinity in saline lakes.In the study of Plancq et al. (2018a), which includes mainly saline lakes, with salinity ranging from 0.1 to 102 g/L, Na + is by far the most correlated ion with salinity (R 2 = 0.90 against 0.55 for HCO 3 − , the second highest correlated ion).Therefore, it seems likely that Group 1 (dominant in freshwater lakes) and Group 2 alkenones (dominant in saline lakes) occurrence are mainly controlled by the same parameters: Na + concentration, depth and temperature.
In the Swiss dataset, Na + and Cl − concentrations are highly correlated (r = 0.93, Supplementary Table S7).This likely reflects a common source for both ions in Swiss lakes, probably halite.Both ions are often increased by anthropogenic sources (e.g., Müller and Gächter, 2012).However, they are less correlated in the global model (r = 0.53, Supplementary Table S9) whose results suggest that Cl − concentration has a limited importance for alkenone occurrence (Figure 5B; Supplementary Figure S3).More generally, ions are intercorrelated in both datasets: K + and Na + are linked within the Swiss dataset as well as SO 4 2− and Ca 2+ (r = 0.76 in both cases, Supplementary Table S7), while in the global dataset, SO 4 2− is strongly correlated with Mg 2+ and Na + (r = 0.87 and 0.77, respectively), and Mg 2+ with K + (r = 0.75, Supplementary Table S9).
Frontiers in Earth Science 21 frontiersin.orgmissing in previous studies and more data would be necessary to test if dimictic lakes are more favorable than other types of lakes.Lake area was never considered as a parameter that could influence alkenone occurrence.The distribution of the lakes with alkenones depending on lake area is very similar to the one of the lakes without alkenones as well as the one of all studied lakes (Supplementary Figure S2).This suggests that area does not have a strong influence on alkenone occurrence, which is also indicated by the global model (Figure 5B; Supplementary Figure S3).
Elevation appears as an important parameter in the global model (Figure 5B; Supplementary Figure S3).However, elevation is not expected to directly impact alkenone producers.Elevation does not show any strong correlation in the global model (Supplementary Table S9) but it is likely correlated with stratification.The pattern of alkenone occurrence more likely reflects the distribution of the studied lakes rather than a biological influence of elevation on alkenone producers.

Impact of major ions
Only a few studies reported major ion concentrations in connection with alkenones, thus the impact of major ion concentrations on the occurrence of alkenones has rarely been assessed.
On one hand, Yao et al. (2019) suggested that high major ion concentrations, especially Na + , K + and Mg 2+ , were unfavorable for Group 1 alkenones and Toney et al. (2011) suggested as well that high Mg 2+ concentrations could be unfavorable for alkenone producers.On the other hand, Toney et al. (2010) and Toney et al. (2011) found that alkenones were present in high abundances in lakes with high Na + and K + concentrations and suggested that elevated Na + concentration may be critical for alkenone occurrence.Our results showing two optimal ranges for alkenone occurrence and abundance, one at low ion concentration and a minor one at high concentrations (Figure 8), reconcile previous studies that only detected one of these optima due to reduced range of study.Elevated SO 4 2-concentrations were suggested to favor alkenone presence and abundance in freshwater and saline lakes (Pearson et al., 2008;Toney et al., 2010;2011;Zhao et al., 2014;Longo et al., 2016).However, considering only freshwater lakes, low SO 4 2-concentrations appear to be the most favorable conditions (Figures 8D1-D4).SO 4 2-, K + , Ca 2+ and Mg 2+ are essential for green plants, where they play a role in various critical functions such as activation of enzymatic reactions, maintenance of membrane potential and osmotic homeostasis, as well as negative and positive charge equilibrium, and redox buffer (Maathuis, 2009).Unicellular phototrophs require similar mineral macronutrients to complete their life cycle, despite being evolutionarily distantly related (Bhattacharya and Medlin, 1998).However, when present in high quantities, some ions can have negative effects; high SO 4 2concentrations can be toxic (Maathuis, 2009) and elevated Na + concentrations alter the osmotic regulation, protein synthesis and photosynthesis, in particular through over-competition with other cations (EL-Sheekh, 2004;Singh et al., 2018).Several experiments observed a decrease of algal growth with increasing input of NaCl (Gorain et al., 2013;Battah et al., 2014;Sikorski, 2021).Na + is often abundant in the environment thus, organisms have to maintain a low level of Na + in their cells (Li et al., 2023).
Isochrysidales seem to be well adapted to do so given that there have been multiple marine-freshwater transitions in the evolution of haptophytes (Simon et al., 2013).K + can help algae deal with salt and alkali stress (Li et al., 2023).In fact, organisms maintain a high level of K + in their cells, while this ion is usually present in low concentrations in the environment, and some K + transport systems were found to help algae maintain the high K + /Na + ratio, making them tolerant to high Na + and low K + conditions (Li et al., 2023).However, in freshwater lakes, Isochrysidales seem to prefer lakes with low Na + concentrations, even if they can live in lakes with higher concentrations (Figure 8A3; Supplementary Figure S2).The lipid content of algae was found to increase with NaCl input (Rao Ranga et al., 2007;Gorain et al., 2013;Singh et al., 2018).A similar mechanism could explain the higher alkenone concentrations found in saline lakes as a response to saline stress (see Section 4.2.2.3).A lack of Ca 2+ also resulted in a rise in lipid content, while an increase of Mg 2+ led to the same result and was accompanied by an increase in biomass (Gorain et al., 2013).Accordingly, all these ions appear as relatively important for alkenone occurrence in the models, except Cl − and Ca 2+ , which is maybe more important for plants than for algae (Figure 5; Supplementary Figure S3).

Impact of salinity, conductivity, alkalinity and pH
Freshwater lakes with low salinity, low to moderate conductivity and alkalinity values and moderately alkaline pH (7.0-8.5) are the most favorable for alkenone occurrence (Figures 9,10).The ranges for high alkenone concentrations are similar for conductivity and alkalinity but different for salinity (<0.6 g/L and between 1 and 1.5 g/L) and pH Figures 9,10).
Considering only freshwater lakes (maximal salinity of 3 g/L except for two exceptions at 3.6 and 7.1 g/L, see Section 2.3), low salinities appear as the most favorable for alkenone occurrence (<0.1 and between 0.2 and 0.6 g/L, Figures 9A1-A3) and abundance (<0.6 g/L) Figures 9A4.Mixing of Groups 1 and 2 alkenones are slightly more frequent in lakes with higher salinities (>0.7 g/L, Supplementary Figure S4) compared to Group 1 alone.Wang et al. (2019) suggested that the presence of Group 2 Isochrysidales in North Killeak Lake could be linked to the relatively high salinity of the lake (1.1 g/L, Supplementary Table S3).However, the mixing occurs at salinity as low as 0.04 g/L (Supplementary Table S3) and most lakes with mixed Group 1/2 are found between 0.1 and 0.5 g/L (Supplementary Figure S4).Salinity plays a role in shaping microbial communities but it is mainly linked with NaCl whose effects were discussed above.
The influence of conductivity on alkenone presence was already reported; D' Andrea and Huang (2005) and Longo et al. (2016) noted that lakes with elevated conductivity are favorable for alkenone occurrence and abundance.Elevated alkalinity values were also reported to be favorable for alkenone occurrence and abundance in previous studies (Longo et al., 2016;Wang et al., 2019).However, Zink et al. (2001) noted that high alkalinity was not mandatory for alkenone occurrence.Extending the number of lakes and the range of conductivity and alkalinity values demonstrates that lakes with low to moderate conductivity and alkalinity values are the most favorable for alkenones 10).
The optimal range for alkenone occurrence and abundance for salinity, conductivity and alkalinity is found at low and moderate values.As these broad chemical parameters depend on the ion content, this likely reflects the fact that the optimal range for alkenones is found at low concentrations for all ions (Figure 8) rather than a direct effect on algae.This could explain the presence of conductivity among the most important parameters in the Swiss model as conductivity is significantly correlated with almost all ions (r = 0.88 for Ca 2+ , 0.76 for Mg 2+ , 0.59 for SO 4 2− , 0.52 for Na + and 0.50 for Cl − , Supplementary Table S7).pH was proposed as an important parameter controlling alkenone occurrence in previous studies (e.g., Toney et al., 2010;Longo et al., 2016;Plancq et al., 2018a;Yao et al., 2019).The most favorable conditions for alkenone occurrence and abundance are found for pH ranging from 7.0 to 8.5, especially from 7.5 to 8.5 (Figures 9C1-C4).This is in agreement with the optimal range of pH found by Yao et al. (2019) for Group 1 alkenone occurrence: ∼7.3-8.8.However, our global database extends the optimal range for alkenone concentrations proposed by Yao et al. (2019) from ∼7.3-8.8 to 7.7 to 9.4, with the highest alkenone concentration found at a pH of 9.0 (Figure 9C4).This is in agreement with previous studies which found that alkenone concentrations were higher in alkaline lakes (Toney et al., 2010;Longo et al., 2016).However, pH does not appear among the most important parameters controlling alkenone occurrence (Figure 5; Supplementary Figure S3) as previously found by Plancq et al. (2018a).

Impact of nutrients and trace elements
The best conditions for alkenone occurrence and abundance are found in lakes with reduced nutrient and element trace content (Figure 11; Supplementary Figures S6, S7).Longo et al. (2016) and Yao et al. (2019) had also proposed that lakes with low nutrient content were more favorable for alkenone occurrence and abundance.However, very low nutrient content is not favorable for alkenone occurrence (<0.1 for TN and <0.005 for TP, Figures 11C,D) and higher nutrient contents can also be, to a lesser extent, favorable for alkenone occurrence (TN > 2 mg/L and TP > 2.5 mg/L, Figures 11C,D).In these higher ranges, mixed Group 1/2 Isochrysidales are more frequent compared with Group 1 alone; notably, the lakes with the highest TP and TN concentrations contain both alkenone groups (Supplementary Figure S4; Supplementary Table S1).Yao et al. (2019) suggested that high nutrient content could be responsible for the occurrence of Group 2 Isochrysidales in freshwater lakes.However, most lakes hosting both alkenone groups have low nutrient contents (Supplementary Figure S4).
However, the distribution of the lakes with alkenones depending on TP and TN is very similar to the one of the lakes without alkenones as well as the one of all studied lakes (Supplementary Figure S2).This suggests that nutrient content has not much impact on alkenone occurrence as also indicated by the Swiss model (Figure 5A; Supplementary Figure S3).However, the nutrient concentrations are available for only a small part of the lakes, thus more data would be necessary to confirm these results.Yet, the changes of nutrient concentration through seasons are thought to influence the timing of the bloom and so, the life cycle of both Group 1 and Group 2 Isochrysidales (D' Andrea and Huang, 2005;Toney et al., 2010;D' Andrea et al., 2011;Theroux et al., 2020).Experiments showed that decreasing nitrate inputs were associated with a decrease of growth rate and increasing lipid content in algae (Battah et al., 2014), so changes in nutrient could affect the alkenone production through the bloom period.Yao et al. (2019) suggested that elevated concentrations in several trace elements could preclude Group 1 alkenone occurrence in freshwater lakes.Adding the data from Swiss lakes to their results extended the range of trace element concentrations for which alkenones are present for Fe, Mn and Cu (Supplementary Table S11).For most of the considered trace elements, alkenones are present at least in some of the lakes with the highest trace element concentrations.However, alkenones are absent from the lakes with the highest concentrations of Fe, Zn, Mo, Co and Al (Supplementary Figure S6; Supplementary Table S11).Alkenones are also less frequent at elevated concentrations of Mn, Cu, Pb, As and Cd (Supplementary Figure S6).These conditions seem to be less favorable for or even preclude alkenone occurrence.Conversely, elevated concentrations of Li, Cr, Ba and Br are slightly favorable for alkenone occurrence, while elevated concentrations of U do not have any impact (Supplementary Figure S6).For all the trace elements considered, the highest alkenone concentrations are found for low trace element concentrations (Supplementary Figure S7).This suggests that high concentrations of the considered trace elements could be less favorable for alkenone production as previously proposed by Yao et al. (2019).For the majority of the considered trace elements, mixed Group 1/2 Isochrysidales are more frequent than Group 1 alone at the highest concentrations (Supplementary Figure S10).Yao et al. (2019) proposed that, in certain specific environmental conditions, Group 2 Isochrysidales have higher requirements for some trace elements relative to Group 1.However, for most trace elements, mixed Group 1/2 alkenones are more frequent at low concentrations like Group 1 alone (Supplementary Figure S10).Trace elements can be essential for algal metabolism but when present in too high concentrations, they can disrupt critical biological functions and become harmful (Yao et al., 2019 and references therein).However, trace element concentrations are reported only for a few lakes (Supplementary Figure S6; Supplementary Table S11), thus more data would be necessary to confirm these results.
For most of the tested variables, the best conditions to find high alkenone concentrations in our global dataset of freshwater lakes, are similar to the one found for alkenone occurrence.However, for salinity, MAAT and pH, there are some differences.High alkenone concentrations are found in freshwater lakes with higher salinities (around 1 g/L), higher pH (∼7.5-9.4) and in colder environments (MAAT <5°C) compared to freshwater lakes which are the most prone to host alkenone producers.This suggests that the occurrence and abundance of alkenone producers could be influenced by different variables and/or have different optimal ranges.These optimal ranges are not affected by the small portion of lakes containing alkenones where the alkenone producer is undetermined (n = 26) as their frequency distribution matches that of lakes containing Group 1 alkenones (Supplementary Figures S4, S10).The only exception is for MAAT, where alkenones with an undetermined group are the only ones present in the highest range 12°C-14°C.
Comparing the distribution of the lakes hosting Group 1 (n = 88) and mixed Group 1/2 alkenones (n = 22) in the entire global dataset, it appears that the favorable ranges for the occurrence of Group 2 together with Group 1 alkenones in freshwater lakes are very similar to and/or included inside those for Group 1 alone for almost all variables.We noted some differences in the pattern of frequency distribution: Group 2 Isochrysidales preferred warmer environments, were present more frequently in deep and large lakes, with higher salinities and Cl − concentrations, and were more tolerant to high concentrations of nutrients and some trace elements compared to Group 1 Isochrysidales alone.The distribution of alkenone concentrations of mixed Group 1/2 alkenones has a very similar pattern to the one of Group 1 alkenones for almost all variables.This aligns with our previous conclusion that Group 1 and Group 2 alkenone occurrence and abundance in freshwater lakes could be controlled by the same parameters.However, these results were obtained on a limited number of lakes.Moreover, we did not consider in our study the freshwater lakes containing exclusively Group 2 alkenones.Therefore, more studies are necessary to better define the optimal ranges of Group 2 Isochrysidales occurrence in freshwater lakes.Yet it appears that Group 2 Isochrysidales can occur in lakes with low salinities (from 0.04 g/L).Their occurrence can be linked with anthropogenic activities in modern (Yao et al., 2019) and past environments (Richter et al., 2021a), but their presence is not necessarily linked with specific conditions (e.g.high trace element concentrations, high nutrient content).Therefore, freshwater lakes are not immune to phylotype mixing and alkenone producers should always be carefully assessed prior to any paleotemperature reconstruction.
For most variables, the RF models revealed that the probability of alkenone occurrence and the distribution of alkenone concentrations included one or several optimum(s).Such complex relationships would have been impossible to capture with a PCA, a logistic regression or another linear model.We still need to better understand which are the biological mechanisms involved to produce these optimums.

Conclusion
We found alkenones in 33 out of the 56 investigated freshwater lakes, which suggests that lacustrine alkenones are common in Switzerland and more generally in mid-latitude European freshwater lakes.Detected alkenones likely belong to the Group 1 Isochrysidales and in 12 lakes, we found a mixed Group 1/Group 2 signature.Genomic analyses will bring further insights in the diversity of Isochrysidales communities in Swiss lakes.
We used, for the first time, random forest to explore the environmental variables influencing alkenone occurrence.For Swiss lakes, Na + concentration and MAAT were the most important variables to explain alkenone occurrence.For the global model, including Swiss lakes and all freshwater lakes previously investigated for alkenone presence, depth was the most important parameter, followed by MAAT and Na + , SO4 2-and K + concentrations.These variables are thought to play an important role in the metabolism and life cycle of alkenone producers.Our results are very close to those found for freshwater and saline lakes by Plancq et al. (2018a), suggesting that Group 1 and Group 2 alkenone occurrence could be controlled by the same parameters.This is reinforced by the very similar distributions of lakes containing only Group 1 Isochrysidales and those containing mixed Group 1/2 Isochrysidales for almost all variables.
Considering the data from our global database, freshwater alkenone producers are more likely to occur in small and midsized stratified lakes with a moderate depth, a neutral to slightly basic pH (7.0-8.5),low to moderate conductivity and alkalinity, and low major ion concentrations, salinity, and nutrient content, in cold or mild climates.To find high alkenone concentrations, the characteristics are similar except that salinity and pH are higher (0-1.5 g/L and 7.7-9.4,respectively) and MAAT colder (<5°C).
RF is a powerful tool which is able to reveal complex nonlinear relationships between variables, especially relationships with optimum(s).Such relationships cannot be detected with PCA or logistic regression, which were commonly used in previous studies to investigate the influence of environmental parameters on alkenone occurrence and abundance.
The similarity between the results of the Swiss and global models suggests that the environmental variables controlling the occurrence of freshwater alkenone producers in freshwater lakes are homogenous worldwide.More data are needed to further explore freshwater Isochrysidales ecology, in particular in under-sampled regions such as the tropics and the Southern Hemisphere.Therefore, for future studies, we recommend measuring and reporting as many environmental variables as possible, in particular major ion concentrations, for future machine learning analysis.
Group 2 Isochrysidales are increasingly reported in freshwater lakes showing that they are not immune to phylotype mixing.Therefore, alkenone producers should always be assessed before reconstructing paleotemperatures.

Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article.This work was made possible thanks to an Eawag discretionary funding.Open access funding by Swiss Federal Institute of Aquatic Science and Technology (Eawag).

FIGURE 3 RIK
FIGURE 3RIK 37 values of the studied Swiss lakes.The dashed line represents the upper limit of the RIK 37 values for Group 1 Isochrysidales as found byLongo et al. (2018).

FIGURE 5
FIGURE 5 Variable importance measured by mean decrease in accuracy and mean decrease in Gini for the Swiss (A) and global RF models [(B), see Section 2.8.4].

FIGURE 6 Frontiers
FIGURE 6Impact of physical parameters on alkenone occurrence and abundance.Accumulated local effects (ALE) plots for the Swiss (A1-C1) and global (A2-C2) RF models.The dashed blue line represents ALE = 0, indicating that predictions are not significantly affected.The density of feature distribution is shown on the x-axis, with each tick corresponding to one lake.Regions with low density should be interpreted with caution.(A3-C3) Difference between the relative frequencies of Swiss and global lakes with and without alkenones depending on each tested variable.Red (black) hatching indicates favorable (unfavorable) ranges of values for alkenone occurrence (see Section 2.8.6).(A4-C4) Distribution of alkenone concentrations depending on each tested variable.Group 1 alkenones are noted with blue symbols, mixed Group 1/2 with orange ones and alkenones whose group is undetermined with purple ones.Swiss lakes have round symbols while lakes from the global database are noted with diamonds.Red shaded areas highlight the ranges where f(Alkenones) − f(No alkenones) is positive.The total number of lakes where alkenone concentration was measured is indicated.Note that we zoomed in on the concentrations below 1.5 μg/g.

FIGURE 7
FIGURE 7 Impact of stratification on alkenone occurrence and abundance.ALE plots for the Swiss (A) and global (B) RF models.The dashed blue line represents ALE = 0, indicating that predictions are not significantly affected.The density of feature distribution is shown on the x-axis, with each tick corresponding to one lake.(C) Histogram showing the relative frequency of stratified and mixed lakes with (red) and without alkenones (black) considering the Swiss and global datasets.(D) Box plot showing the distribution of C 37 alkenone concentrations in stratified and mixed lakes from the Swiss and global datasets.The number of lakes where alkenone concentration was measured is indicated for each category.The mean of C 37 alkenone concentrations for each category is represented by a black cross.Note that we zoomed in on the concentrations below 1.5 μg/g.

FIGURE 8 (
FIGURE 8(Continued).Impact of major ion concentrations on alkenone occurrence and abundance.ALE plots for the Swiss (A1-F1) and global (A2-F2) RF models.The dashed blue line represents ALE = 0, indicating that predictions are not significantly affected.The density of feature distribution is shown on the x-axis, with each tick corresponding to one lake.Regions with low density should be interpreted with caution.Cl − concentration was excluded from the Swiss model (see Section 2.8.2) so, there is no ALE plot for this ion for the Swiss model.(A3-F3) Difference between the relative frequencies of Swiss and global lakes with and without alkenones depending on each tested variable.Red (black) hatching indicates favorable (unfavorable) ranges of values for alkenone occurrence (see Section 2.8.6).(A4-F4) Distribution of alkenone concentrations depending on each tested variable.Group 1 alkenones are noted with blue symbols, mixed Group 1/2 with orange ones and alkenones whose group is undetermined with purple ones.Swiss lakes have round symbols while lakes from the global database are noted with diamonds.Red shaded areas highlight the ranges where f(Alkenones) − f(No alkenones) is positive.The total number of lakes where alkenone concentration was measured is indicated.Note that we zoomed in on the concentrations below 1.5 μg/g.

FIGURE 9
FIGURE 9Impact of salinity, conductivity and pH on alkenone occurrence and abundance.ALE plots for the Swiss (B1-C1) and global (A2,C2) RF models.The dashed blue line represents ALE = 0, indicating that predictions are not significantly affected.The density of feature distribution is shown on the x-axis, with each tick corresponding to one lake.Regions with low density should be interpreted with caution.Salinity was excluded from the Swiss model and conductivity from the global model (see Section 2.8.2) so, there is no ALE plots for these variables for the Swiss model and the global model, respectively.(A3-C3) Difference between the relative frequencies of Swiss and global lakes with and without alkenones depending on each tested variable.Red (black) hatching indicates favorable (unfavorable) ranges of values for alkenone occurrence (see Section 2.8.6).(A4-C4) Distribution of alkenone concentrations depending on each tested variable.Group 1 alkenones are noted with blue symbols, mixed Group 1/2 with orange ones and alkenones whose group is undetermined with purple ones.Swiss lakes have round symbols while lakes from the global database are noted with diamonds.Red shaded areas highlight the ranges where f(Alkenones) − f(No alkenones) is positive.The total number of lakes where alkenone concentration was measured is indicated.Note that we zoomed in on the concentrations below 1.5 μg/g.

FIGURE 10
FIGURE 10Impact of alkalinity on alkenone occurrence and abundance.(A) Difference between the relative frequencies of Swiss and global lakes with and without alkenones depending on alkalinity.Red (black) hatching indicates favorable (unfavorable) ranges of values for alkenone occurrence (see Section 2.8.6).(B) Distribution of alkenone concentrations depending on alkalinity.Group 1 alkenones are noted with blue symbols, mixed Group 1/2 with orange ones and alkenones whose group is undetermined with purple ones.Swiss lakes have round symbols while lakes from the global database are noted with diamonds.Red shaded areas highlight the ranges where f(Alkenones) − f(No alkenones) is positive.Note that we zoomed in on the concentrations below 1.5 μg/g.

TABLE 1
Location and physico-chemical parameters of the surface waters (0-15 m) of the studied Swiss lakes.When possible, the average of the 10 years preceding the coring was calculated.
(Continued on the following page)

TABLE 1 (
Continued) Location and physico-chemical parameters of the surface waters (0-15 m) of the studied Swiss lakes.When possible, the average of the 10 years preceding the coring was calculated.
They are followed by a second group of parameters including conductivity, area, depth and K + concentration with significantly lower MDA and Gini values.SO 4 2MDA or Gini), Mg 2+ and Ca 2+ concentrations are either in the last or the second group.Lake depth is the most important parameter for the global dataset (Figure 5B; Supplementary Figure S3).MAAT, Na + and SO 4 2− concentrations come after and then, K + concentration with elevation.Another group, whose importance index values are lower than 50%, includes Cl − concentration, pH and lake area.Ca 2+ concentration and salinity constitute the last group with importance index values lower than 25%.Depending on the importance index considered, Mg 2+ concentration is included either in the second or second to last group, while stratification is part of either the second or last group.For both Swiss and global models, the different versions of the models led to very similar importance results (Supplementary Figure − , O 2 , total phosphorus and nitrogen (TP and TN) concentrations, stratification, pH and geological catchment have very low values.Depending on the importance index considered (

TABLE 2 C
37 alkenone fractional abundances and concentrations together with RIK indices for Swiss lakes containing alkenones.Lakes were separated in two groups, Group 1 and mixed Group 1/2 (see Section 4.1).
(Continued on the following page)Frontiers in Earth Science 10 frontiersin.org

TABLE 2 (
Continued) C 37 alkenone fractional abundances and concentrations together with RIK indices for Swiss lakes containing alkenones.Lakes were separated in two groups, Group 1 and mixed Group 1/2 (see Section 4.1).
* Likely biased by a peak co-eluting with the C 37:3b alkenone.FIGURE 2